AVL Array

Get the latest
version via bzr

Frequently Asked Questions (FAQs)

Page contents

Usage

Is avl_array a substitute for list, vector and deque?

May I simply useavl_arrayeverywhere instead of spending my time deciding between list, vector and deque?

How should I decide what to use in every case?

Why?

Complexity? O(log N)? What's that?

Is avl_array like a map with an integer key?

How should I iterate through an avl_array: with [] with ++, or with +=?

Does it mean I should never use [] or +=?

What is the NPSV stuff?

What would I want it for?

Why do I get a compiler error when I try to use stable_sort()?

Design

What's the performance price of NPSV and stable_sort() when they are not wanted?

Why aren't reverse iterators defined with the std::reverse_iterator template?

Why are tree nodes also in a circular doubly linked list?

Why are iterators tagged as bidirectional instead of random access?

Why are some algorithms implemented inside the avl_array class?

What is that O(min{N, n log N}) complexity?

What's the cost, and how precise is it?

Why don't iterators and/or tree nodes contain a pointer to their container object?

Why does avl_array inherit from avl_array_node_tree_fields instead of having a member of that class, or a simple pointer?

Why AVL trees?

Usage

Is `avl_array` a substitute for`list`, `vector` and `deque`?

Not at all.

May I simply use `avl_array` everywhere instead of spending my time deciding between `list`,`vector`and `deque`?

No! Please, go on reading.

How should I decide what to use in every case?

First, follow the wise, wide spread, official recommendations. Choose list when you are planning to insert/erase everywhere in the sequence and you don't need random access. Choose vector when you are planning to use random access and you only need fast insert/erase at the end of the sequence. Choose deque when you are planning to use random access and you need fast insert/erase at both ends of the sequence (but not in the middle).
When none of these fits, that is, when you are planning to use intensely both random access and insert/erase everywhere in the sequence (not only in the ends), then choose avl_array.

Why?

Because avl_array has logarithmic time complexity (O(log N)) at both random access and insert/erase.

Complexity? O(log N)? What's that?

See Big O notation in the Wikipedia or the introduction to this documentation.

Is `avl_array` like a `map` with an integer key?

Not at all. Unlike map, avl_array is a sequence container. In a map, insert/erase of one element doesn't affect the keys of the other elements. In avl_array, insert/erase of one element alters the indexes (call them keys if you prefer, but they are not ;) of all following elements in the sequence.

How should I iterate through an `avl_array`: with `[]` with `++`, or with `+=`?

It's faster with ++. Do it like this:
for (it=container.begin(); it!=container.end(); it++)  // BTW, use !=, not <
{ /* do something with *it */ }
instead of:
for (i=0; i<container.size(); i++)
{ /* do something with container[i] */ }
Comparing iterators with == or != is also faster than with <, >,<=or >=.

Does it mean I should never use`[]`or `+=`?

No. While every iterator++ only needs a single step, the number of steps needed for iterator+=n oriterator[n]is at least2*log2(|n|)(larger on average, and up to 2*log2(size) in the worst case). For container[n], the average number of steps is about log2(size). Make your numbers.

What is the NPSV stuff?

NPSV stands for "Non-Proportional Sequence View". It is a mechanism that provides one or more additional random access indices. These indices are affected by insert/erase operations in the same way as the ordinary random access index (numbers from 0 to size-1). The difference is that they are not necessarily integers and they don't necessarily grow on a constant basis (by one in the ordinary random access index). Seeavl_array.hppand the NPSV-related examples for more information.

What would I want it for?

For storing the data of text editors, text processors or internet browsers. You are using one right now (or you used it for printing what you have in your hands). In fact, this idea is already in use in GTK:
struct _GtkTextBTreeNode {
   //...
   int num_lines;    /* Total number of lines (leaves) in
                      * the subtree rooted here. */
   int num_chars;    /* Number of chars below here */
   //...
};
Think of e-books, mobile phones...

Why do I get a compiler error when I try to use `stable_sort()`?

Stable sort requires one extra unsigned integer per element, so it is disabled by default. Just specify std::size_t in the fourth parameter of the avl_array template instantiation. You will need to fill the previous parameters too (allocator and NPSV width type). Use the default values if you don't need them. For example:
avl_array<employee,                      // Your class
          std::allocator<employee>,      // Default allocator for your class
          mkr::detail::empty_number,     // No NPSV
          std::size_t>                   // Enable stable sort
  staff;

Design

What's the performance price of NPSV and `stable_sort()` when they are not wanted?

None. All compilers should optimize them away when the corresponding template parameters are set to their default values. See rationale.

Why aren't reverse iterators defined with the `std::reverse_iterator` template?

Because it would spoil, for reverse iterators, a valuable property: they remain valid, referring the same element until it is erased (or the iterator modified, of course). The std::reverse_iterator template is based on embedding a normal iterator referring the next element, thus making the validity of the reverse iterator dependent on what happens to that next element. See rationale.

Why are tree nodes also in a circular doubly linked list?

It has an obvious memory cost, but it slightly simplifies the maintenance of the tree balance, and more important: it lows down the average and worst case time complexity of many operations, like iterators ++ and --, or massive insert/erase/move operations. See rationale.

Why are iterators tagged as bidirectional instead of random access?

Because they provide random access, but not with constant time complexity (O(1)). The specializations of generic algorithms for random access assume O(1) random access. This assumption (wrong in the case of avl_array) would make far slower than their bidirectional counterparts. Not only proportionally slower, but of a higher complexity order. A perfect example of this are the implementations of reverse() shown in the SGI STL iterator tags overview. See rationale.

Why are some algorithms implemented inside the `avl_array` class?

As those in list (see splice,unique, reverse, merge andsortin the SGI STL implementation), they support operations otherwise prohibitive or even impossible. They work in-place, without moving or copying the contained objects, and they benefit from the internal structure of the implementation to achieve a better performance (not only proportionally, but with a lower complexity order). See rationale.

What is that O(min{N, n log N}) complexity?

There are two ways for multiple insert/erase/move operations: 1) repeating single insert/erase/move operations, maintaining the tree(s) integrity and restoring balance every time; and 2) ignoring the tree structure, working with the list, and rebuilding the tree(s) only in the end.
The first way has a cost of O(n log N), where n is the number of elements to insert/erase/move, and N is the total number of elements in the container. The second way has a cost of O(N). A fast estimation is made to decide which way is better in each case. See rationale.

What's the cost, and how precise is it?

It has no loops and it uses integer arithmetic only. Therefore, the cost is almost null compared with both the total time consumed by the operation and the gain achieved in extreme cases. The estimation will be possibly wrong in a small interval near the critical point (where both ways take the same time), but it will work well with the big numbers, thus saving a lot of execution time. See rationale.

Why don't iterators and/or tree nodes contain a pointer to their container object?

Because they don't need to. Maintaining it would be a waste of space and time (in the case of tree nodes), or it would spoil valuable properties (in the case of iterators). See rationale.

Why does `avl_array` inherit from `avl_array_node_tree_fields` instead of having a member of that class, or a simple pointer?

Because that way the container object can be reached from the tree nodes by following the parent pointers until a NULL pointer is found. See rationale.

Why AVL trees?

Just because. Well, I find them easier to understand (and implement, and maintain...) than Red-Black trees, which have more corner cases. AVL trees have better lookup time, because they guarantee better balance. On the other hand, for the same reason, one can expect Red-Black trees to make fewer rotations. This should be considered as a matter of implementation. It affects neither the interface nor the algorithmic complexity of any operation.

Revised 19 May, 2010