AVL Array |
Get the latest
|
- Introduction
- End position
- Sentinel node
- Iterators validity
- Rank/count tree
- AVL tree
- Node classes hierarchy
- Not fully random access
- Fully bidirectional
- Friendship gateway
- Group operations
- Algorithms
- Stable sort
- Non-Proportional Sequence View (NPSV)
- Multi NPSV
This rationale explains the main design decisions. The overview should be read first. The design part of the FAQ can be used as a light version of this rationale.
The end position (or rather past-the-end) must be taken into account while designing containers and iterators. It is a special iterator value that a) doesn't refer any element and therefore can't be dereferenced, and b) is still somehow connected to the container.
Two options can be considered for representing this end position:
- A NULL pointer
- It requires an additional pointer to the container, so that...
- ...it can be distinguished from the singular iterator
- ...the previous element (last) can be found on
--
- It requires special treatment on many operations (like
--
)- A pointer to a special 'dummy' node
- No pointer to container is required, regarded that every container has its own dummy node
- No special treatment is required, regarded that the dummy node is linked as one more node
- The dummy node should be different: it should not contain any payload (
value_type
)The second option has been chosen for its obvious advantages. The no-payload-dummy problem is discussed below.
Handling lists and trees, it's usual to have a special node called sentinel that is never deallocated, so that the head or root pointers always point to a non-NULL address. This simplifies algorithms managing them. For example, in a circular doubly linked list without a sentinel, the code for appending one element would be:
With a sentinel before the beginning of the list, it would be:if (first==NULL) { first = newnode; newnode->next = newnode->prev = newnode; } else { newnode->next = first; newnode->prev = first->prev; newnode->next->prev = newnode->prev->next = newnode; }newnode->next = sentinel; newnode->prev = sentinel->prev; newnode->next->prev = newnode->prev->next = newnode;Note that, in circular doubly linked lists, having a sentinel before the beginning and having it after the end are exactly the same thing. In an
avl_array
the sentinel node is the dummy node at the same time. It represents the position for bothend()
andrend()
. It is the root of the tree and its parent and right child pointers are always NULL.
In linked data structures like lists or trees there's no reason for invalidating iterators on insertion/removal operations (unless for those iterators referring the elements to be removed, of course). This is a very valuable property for the user.
Reverse iterators could have been defined with the
std::reverse_iterator
template. This template is usually based on embedding a normal iterator referring the next element of the intended one. For example: the reverse iterator returned byrbegin()
would contain a normal (straight) iterator referring past-the-end.With this technique, the validity of every reverse iterator does not depend on the intended element, but on its neighbor. In other words, removing an element invalidates the reverse iterators that refer to the previous element. For this reason, reverse iterators have been defined from scratch, just like normal (straight) iterators.
Neither iterators nor reverse iterators contain anything but a pointer to the node of the refered element. Thus, they remain valid until the refered element is erased, even when it is moved within the container, or into a different container (
move()
,splice()
,swap()
and other operations).
As it has been commented in the overview, a special kind of rank trees has been chosen for
avl_array
. Normal rank trees store, in every node, the rank of the node in its subtree. That is, the count of nodes in it's left subtree:dummy,7 / D,3 / \ B,1 F,1 / \ / \ A,0 C,0 E,0 G,0The total number of nodes can be calculated by traversing the right-most branch. When a node is added/removed, some ranks must be incremented/decremented in the path to the root (only in nodes that have the modified branch in their left subtree).
In the trees used by
avl_array
, every node stores the total count of nodes in its subtree (including the node itself):dummy,8 / D,7 / \ B,3 F,3 / \ / \ A,1 C,1 E,1 G,1The main advantage of this approach is the simplicity of the update algorithm: when a node (or a subtree) is added/removed, all counts must be updated from this point up in the way to the root. The count of a node is updated by adding the counts of its two children plus one.
AVL trees have been chosen for simplicity. They are probably easier to understand (and implement, and maintain...) than Red-Black trees, which have more corner cases. AVL trees have better lookup time, because they guarantee better balance. On the other hand, for the same reason, one can expect Red-Black trees to make fewer rotations. This should be considered as a matter of implementation. It affects neither the interface nor the algorithmic complexity of any operation.
The dummy node is not aggregated as a member variable to the container class. Instead, it is inherited:
avl_array_node_tree_fields (without value_type) | ________|________ | | | | avl_array avl_array_node (without value_type) (with value_type)All pointers in the base class (
m_parent
,m_next
,m_prev
,m_children[L]
andm_children[R]
) are pointers to the base class, though they will point to objects of the derived classes. Pointer conversions are always done with static casts.Ordinary nodes have non-NULL values in their
m_parent
pointer, while the dummy node has always the NULL value in itsm_parent
pointer. This fact is exploited for ensuring the correct use of the nodes via assertions and/or exceptions.The container of any element can be obtained in logarithmic time (with a travel to the root), so neither nodes nor iterators need to have a pointer to the container. Note that having such a pointer in iterators would invalidate them when the referenced element was moved from a container to another, and having it in nodes would require a lot of extra memory and time for maintenance.
The pointer to the container is only needed in some operations that already have logarithmic complexity or above. Some operations with lower complexity can optionally use it for assertions regarding the correct use of the library (see asserts and exceptions below).
The iterators of
avl_array
provide random access, but they are not fully qualified random access iterators. The reason is that random access operations are provided with O(log N) complexity, instead of O(1). Algorithms specialized for random access iterators assume O(1) random access in their optimizations.Therefore, avl_array iterators have been tagged as bidirectional instead of random access.
A perfect example for this point are the implementations of
reverse()
shown in the SGI STL iterator tags overview. The specialization for bidirectional iterators uses a loop like this one:The specialization for random access iterators uses a loop like:while (first != last && first != --last) // Two comparisons iter_swap(first++, last);while (first < last) // Just one comparison iter_swap(first++, --last);In both cases
iter_swap()
makes a by-value swap of the referenced elements.The random access version benefits from the
<
operator, saving one comparison per step. This is a gain for truly random access iterators: complexity is still O(n), but a slight proportional time reduction is achieved. Though, withavl_array
iterators, this would raise the complexity to O(n log N). For example, with anavl_array
containing 1000 elements the<
operation would be, on average, 10 times slower (or worse) than the!=
operation.
On the other hand, operators
++
and--
are inavl_array
iterators as fast as inlist
: O(1) worst case (not only amortized). For this reason, it's much better to stick to the algorithm specializations for bidirectional iterators, especially if they work by value (copying instead of extracting and inserting).Friendship gateway
Two additional parameters of the iterator classes templates (
Ptr
andRef
) support the 2-in-1 trick used for instantiating two iterator versions (mutable and const) using a single template. The drawback is that it hardens making them friend of each other. The workaround for this little inconvenience is using theavl_array
class as a friendship gateway between different kinds of iterators, including reverse iterators by the way.
Individual insert/erase/move operations have O(log N) time complexity. Group operations would have O(n log N) time complexity if they were implemented as n individual operations. Indeed, this is how they are performed in cases where n is small compared with N. But when n is not so small, an algorithm with O(N) complexity is used.
The perfect example is the copy constructor: it might be implemented with a loop calling
push_back()
N times. This would have O(N log N) complexity.A better approach would be copying the tree structure of the source container. An even better approach (and the chosen one) is building the tree in perfect balance. The method
build_known_size_tree()
builds a tree in perfect balance, without function recursion and with O(N) time complexity. The same method is used in all constructors, excepting the default constructor, which only needs to initialize an emptyavl_array
.But constructors are not the only place where this approach is useful. Any group operation can benefit from it when the number of elements to insert/erase/move is large enough. The method
worth_rebuild()
detects this by comparing N with n·log2(N), or more exactly: final_N with n·log2(average_N). When n is great enough, the operation is performed ignoring the tree structure (working with the linked list only) and rebuilding the tree only in the end.Loops and floating point calculations are avoided in
worth_rebuild()
. This implies some loss of precision, but a broad estimation suffices here. Note that the comparison assumes that both complexities are translated into time multiplying by the same constant, which is obviously false. Therefore, the estimation will probably favor one side or the other near the critical point, where both algorithms take the same time. Though, it will work for the big numbers, saving a lot of time.The algorithms
unique()
andmerge()
always usebuild_known_size_tree()
.
Global algorithms (like
reverse()
ormerge()
) that work by value, leave iterators referring the same positions of the sequence. As inlist
, local versions of some algorithms are provided inavl_array
. These algorithms don't work copying payload values (value_type
) but changing nodes' links. This is a great advantage when thevalue_type
is big, or it's assignment heavy. Additionally, these algorithms benefit from the internal structure of the implementation in two ways:
Binary search follows the natural search path of the tree, instead of using random access (O(log N), not O(1), remember) for reaching the center position. This provides O(log N) binary search and O(N log n) sort. Otherwise, they both would be multiplied by log2N (note that log21000 is about 10).
Algorithms like
merge()
or massive move/insert/erase operations treat the container as a list, and rebuild the tree only in the end (perfectly balanced and in record O(N) time), thus achieving O(N) complexity instead of O(N log N).
The search tree sort algorithm is not stable. A stable version can be provided, but it requires O(N) extra memory: one extra
size_t
field per element. The extra field of every element stores its old position, so that the previous order between equal elements can be reproduced in the new tree. For this reason,stable_sort()
is by default disabled. The type of the extra field is the fourth parameter of theavl_array
template instantiation. Its default value isempty_number
, a class intended to be optimized away by the compiler. Specifyingstd::size_t
instead as fourth parameter enablesstable_sort()
.
See Frequently Asked Questions for a brief description of this feature.
This feature is expected to be required only in very particular situations. The class
empty_number
is used again as default value for the NPSV width type parameter (the third one in theavl_array
template).The NPSV is based on the same core idea of
avl_array
: in every node a field stores the total amount of space occupied by its subtree in the sequence. The main difference is that in the normal sequence view (or rank) every node occupies one unit of space while, in the NPSV, every node occupies its own width (stored in another field).The NPSV widths are added bottom-up, like the nodes counts. The operations that modify the tree must update them. Therefore, this feature must be implemented inside the
avl_array
class.As it was explained in the overview, every node stores the number of nodes in its subtree (including the node itself), instead of its rank in the subtree. The same scheme is used in NPSV, where the rank approach would be prone to cumulative errors if a floating point type was used as width.
Changing the NPSV width of a node requires updating the width sums of every node in the path to the root. The time complexity of this operation is O(log N). Changing every width in an
avl_array
is, therefore, O(N log N). The lazy sums update mode provides a better option: NPSV widths can be changed postponing the sums update, achieving O(N) time complexity instead of O(N log N).
A composite width can be used for the NPSV, providing more than one view at the same time. An overloaded version of
npsv_at_pos()
uses a comparison functor for finding a node using the corresponding view.
Revised 19 May, 2010
Copyright © Universidad de Alcalá 2006. See accompanying license.