|
|
|
|
|
|
Continue Discussing Trees |
|
Examine more advanced trees |
|
2-3 (evaluate what we learned) |
|
B-Trees |
|
AVL |
|
2-3-4 |
|
red-black trees |
|
|
|
|
A 2-3 tree is always balanced |
|
Therefore, you can search it in all situations
with logarithmic efficiency of the binary search |
|
You might be concerned about the extra work in
the insertion/deletion algorithms to split and merge the nodes... |
|
|
|
|
But, rigorous mathematical analysis has proved
that this extra work to maintain structure is not significant |
|
It is sufficient to consider only the time
required to locate an item (or a position to insert) |
|
|
|
|
|
So, if 2-3 trees are so good, why not have nodes
that can have more data items and more than 3 children? |
|
Well, remember why 2-3 trees are great? |
|
because they are balanced and that balanced
structure is pretty easy to maintain |
|
|
|
|
|
The advantage is not that the tree is shorter
than a balanced binary search tree |
|
the reduction in height is actually offset by
the extra comparisons that have to be made to find out which branch to take |
|
actually a binary search tree that is balanced
minimizes the amount of work required to support ADT table operations |
|
|
|
|
|
But, with binary search trees balance is hard to
maintain |
|
A 2-3 tree is really a compromise |
|
Searching may not be quite as efficient as a
binary tree of minimum height |
|
but, it is relatively simple to maintain |
|
|
|
|
|
Allowing nodes to have more than 3 children
would require more comparisons and would therefore be counter productive |
|
unless you are working with external storage and
each node requires a disk access, then we use b-trees which have the
minimum height possible |
|
|
|
|
|
Tables stored externally can be searched with
B-Trees. |
|
B-Trees are a more generalized approach than the
2-3 Tree |
|
With externally stored tables, we want to keep
the search tree as short as possible; it is much faster to do extra
comparisons at a particular node than try to find the next node. |
|
|
|
|
|
Every time we want to get another node, |
|
we have to access the external file and read in
the appropriate information. |
|
It takes far less time to operate on a
particular node (i.e., doing comparisons) once it has been read in. |
|
This means that for externally stored tables we
should try to reduce the height of the tree...even if it means doing more
comparisons at every node. |
|
|
|
|
|
Therefore, with an external search tree, |
|
we allow each node to have as many children as
possible. |
|
If a node is to have m children, then you must
be able to allocate enough memory for that node to contain the data and m
pointers to the node. |
|
The data such a node must have must be m-1 key
values. |
|
|
|
|
|
Remember in a binary search tree, |
|
if a node has 2 children then it contains one
data value (i.e., one value). |
|
You can think of the data value at a node as
separating the data values in the two child subtrees. |
|
All keys to the left are less than the node's
data value and all key values to the right are greater than or equal. |
|
The value of the data at a particular node tells
you which branch to take. |
|
|
|
|
|
In a 2-3 tree, |
|
if a node has 3 children then it must contain
two key values. |
|
These two values separate the key values in the
node's three child subtrees. |
|
All of the key values in the left subtree are
less than the node's smaller key value; |
|
all of the key values in the middle subtree are
between the node's two key values; |
|
all of the key values in the right subtree are
greater than or equal to the node's larger key |
|
|
|
|
|
Ideally, you should structure these types of
trees such that every internal node has m children and all leaves are at
the same level. |
|
For example, if m is 5 -- then every node should
have 5 children and 4 data values. |
|
But, this is too difficult to maintain when you
are doing a variety of insertions and deletions. |
|
|
|
|
|
So, we can require that B-trees be balanced (as
we saw with 2-3 trees)... |
|
but the number of children for any internal node
should be able to be somewhere between m and (m div 2)+1. |
|
We call this a B-Tree of degree m |
|
This requires that all leaves be at the same
level (balanced). |
|
|
|
|
|
Each node contains between m-1 and (m div 2)
values. |
|
Each internal node has one more child than it
has values. |
|
There is one exception; |
|
the root of the tree can contain as few as 1
value and can have as few as two children (or none -- if the tree consists
of only a root!). |
|
|
|
|
Notice, a 2-3 tree is a B-tree of degree 3. |
|
Data can be inserted into a B-tree using the
same strategy |
|
of splitting and |
|
merging nodes |
|
that we discussed |
|
Here is a B-tree |
|
of degree 5: |
|
|
|
|
|
Then, insert 55. |
|
The first step is to locate the leaf of the tree
in which this index belongs by determining where the search for 55 would
terminate. |
|
We would find that we would want to insert 55 in
the node containing 50,56,57, 58. |
|
But, that would cause this node to contain 5
records. Since a node can contain only 4 records, you must split this node
into two...the new left node gets the two smaller values and the new right
node gets the two larger values. |
|
|
|
|
The record with the middle key value (56) is
moved up to the parent: |
|
|
|
|
|
This causes two problems, |
|
the parent now has six children and five
records!! |
|
So, we must split the parent into two nodes and
move the middle data value up to its parent. |
|
Remember, when we split an internal node, we
need to also move that node's children too |
|
Since the root only has 2 data items, we can
simply add 56 there. |
|
The solution is on the next slide... |
|
|
|
|
|
|
|
Notice, that if the root had needed to be spit, |
|
the new root will contain only one value and
have only 2 children (that is why we have the exception to the B-Tree
definition stated earlier). |
|
To traverse a B-Tree in sorted order, all we
need to do is visit the search keys in sorted order by using an inorder
traversal of the B-Tree. |
|
|
|
|
|
But, are there other alternatives? |
|
Remember the advantage of trees is that they are
well suited for problems that are hierarchical in nature and they are much
faster than linked lists |
|
but, this is not valid if the tree in not
balanced |
|
luckily, there are a number of techniques to
balance a binary tree |
|
|
|
|
|
Some of the balancing techniques require
constant restructuring of the tree as data is inserted |
|
the AVL algorithm uses this approach |
|
Some algorithms consist of build an unbalanced
tree and then reordering the data once the tree is generated |
|
this can be simple but depending on the
frequency of data being inserted, it may not be realistic |
|
|
|
|
|
The “brute force” technique is to create an
array of pointers to your data by traversing an unbalanced BST using
“inorder” traversal |
|
then re-build the tree by splitting the array in
the middle for each subarray (much like what we have seen with the binary
search algorithm used with arrays) |
|
the middle data item should be the root, as it
splits what is less than it, and what is greater! |
|
|
|
|
|
|
The algorithm for the “brute force” approach is: |
|
balance(data_type data [], int first, int last) |
|
if (first <= last) { |
|
int
middle = (first + last)/2; |
|
insert(data[middle]); |
|
balance(data, first, middle-1); |
|
balance(data, middle+1, last); |
|
|
|
|
|
The “brute force” technique has a serious
drawback |
|
all of the data must be put in an array before a
balanced tree can be created |
|
what would happen if you weren’t using pointers
to the data but instances of the data? |
|
if an unbalanced tree is not used (i.e., the
data is directly inserted into the array from the client), then a sorting
algorithm must be used and fixed size issues arise |
|
|
|
|
|
The AVL tree is a classical method proposed by
Adelson-Velskii and Landis |
|
creates an “admissible tree” (its original
name!) |
|
focuses on rebalancing the tree locally to the
portion of the tree affected by insertion and deletions |
|
it allows the height of the left and right
subtrees of every node to differ by at most one |
|
|
|
|
|
With AVL trees |
|
each node must now keep track of the “balance
factors” which records the differences between the heights of the left and
right subtrees |
|
the balance factor is the height of the right
subtree minus the height of the left subtree |
|
all balance factors must be +1, 0, or -1 |
|
notice, this does meet the definition we learned
about for a balanced tree |
|
|
|
|
|
However, the concept of AVL trees always
includes implicitly the techniques for balancing trees |
|
and does not guarantee that the resulting tree
is perfectly balanced (unlike all of the other techniques we have seen so
far) |
|
but, an AVL tree can be searched almost as
efficiently as a minimum height binary search tree |
|
but insert and removal are not as efficient |
|
|
|
|
|
AVL trees actually maintains the height close to
minimum by monitoring the shape of the tree as you insert and delete |
|
After you insert/delete |
|
the tree is checked to see if any node differs
by more than 1 in height |
|
if it does, you rearrange the nodes to restore
balance |
|
But, as you can guess, we can’t arbitrarily
rearrange nodes....we must keep proper order |
|
|
|
|
|
What we do is rotate the tree to make it
balanced |
|
Rotations are not necessary after every
insertion & deletion (it is only needed when the height differs by more
than 1) |
|
experiments indicate that deletions in 78% of
the cases require no rebalancing |
|
and only 53% of the insertions do not bring the
tree out of balance |
|
|
|
|
|
Single rotation is one type of rotation: |
|
In the following, the tree was fine after
inserting 20, 10, 40, 30, 50...but when 60 is inserted... |
|
|
|
|
|
Start at the node inserted...move up the tree
(recursively return) |
|
examining the balancing factor |
|
stop when it is not +1, 0, -1 and rotate from
the “heavy” side to the “light” |
|
|
|
|
|
If a single rotation does not create a balanced
tree |
|
then a double rotation is required |
|
first rotate the subtree at the root where the
problem occurred |
|
and then rotate the tree’s root |
|
there is, however, on special case: |
|
|
|
|
|
In class, walk through a few examples on your
own (and then on the board) building AVL trees |
|
so you can understand the process of rotations |
|
insert: 50,60,30,70,55,20,52,65,40 |
|
or, insert: 10, 20, 30, 40, 50, 60, 70, 80 |
|
what would the corresponding BST and 2-3 tree
looked like? |
|
|
|
|
|
The main question you should be facing with an
AVL tree is |
|
whether or not such restructuring is always
necessary |
|
binary search trees are used to insert,
retrieve, and delete elements quickly and the speed of performing these
operations i the issue, not the shape of the tree |
|
performance can be improved by balancing the
tree but luckily this is not the only method available |
|
|
|
|
|
Now let’s go back to rethinking about how we
organize our nodes |
|
maybe instead of trying to balance the tree we
keep the tree balancing at all times (perfectly balanced) |
|
but the 2-3 tree had a flaw in that there may be
situations where each node is “full” requiring a rippling effect of nodes
being split as you recursively return back to the root |
|
|
|
|
|
A 2-3-4 tree solves this problem |
|
which allows 4-nodes which are nodes that have 4
pieces of data and 3 children |
|
each insertion and deletion can have fewer steps
than are required by a 2-3 tree (when looking at the insertions/deletions
in isolation) |
|
but does this by using more memory |
|
essentially, each node can have 1,2, or 3 pieces
of data, and 4 child pointers!!!!! |
|
|
|
|
|
A 2-3-4 tree solves this problem |
|
a node can either be a leaf or, |
|
if it has 1 data item there are 2 children, |
|
2 data items has 3 children, and |
|
3 data items has 4 children |
|
A 2-3-4 tree remains perfectly balanced |
|
but its insertion algorithm splits the nodes as
it traverses down the tree toward a leaf, rather than upon the return to
the root |
|
|
|
|
|
As you travel down the tree to insert a data
item, |
|
if you encounter a node with 3 pieces of data
you immediate split the node at that time (just as we did with a 2-3
tree...but now we don’t use the new data we are trying to insert...because
we haven’t inserted it yet!) |
|
then, you continue traveling towards a leaf to
insert the data |
|
|
|
|
|
What this means is that the tree cannot contain
all nodes with 3 pieces of data. Impossible. |
|
In fact, on insert, once you insert data at a
leaf it is guaranteed that the leaf’s parent will not have 3 pieces of
data... |
|
because if it did, it would have split on the
way to find the leaf! |
|
|
|
|
|
The advantage of both the 2-3 and 2-3-4 trees |
|
is that they are easy to maintain balance (not
that their height is shorter due to the extra comparisons required) |
|
where the 2-3-4 tree has an advantage is that
the insertion/deletion algs require only one pass through the tree so they
are simpler than those for a 2-3 tree |
|
decrease in effort makes them
attractive.......... |
|
|
|
|
|
On the other hand, 2-3-4 trees require more
storage than a binary search tree |
|
and more storage (and less efficiently used
storage) than a 2-3 tree |
|
But, a binary search tree may be inappropriate |
|
because it may not be balanced |
|
so we use a red-black tree which is a special
binary search tree |
|
|
|
|
|
A red-black tree is a BST representation of a
2-3-4 tree with 2 extra fields in the node to represent whether the
connection is within the current node or a child |
|
it retains the advantages of a 2-3-4 tree
without the storage overhead! |
|
with all of the benefits of a binary search tree
and none of the drawbacks! |
|
|
|
|
The idea is to represent a node with 2 pieces of
data and 3 children as a binary search tree with red and black child
pointers |
|
|
|
|
And, we represent a node with 3 pieces of data
and 4 children as a binary search tree with red and black child pointers |
|
|
|
|
|
In class, walk through examples of |
|
2-3 |
|
2-3-4 |
|
AVL |
|
BST |
|
and see how you can take a 2-3-4 and turn it
into a red black tree (make sure to read the chapter on advanced trees!!!) |
|
|
|
|
|
For next time, |
|
practice creating each of these trees on your
own so that you understand the insertion algorithms |
|
think about what would be needed to remove nodes
from these trees |
|
try deleting a leaf and an internal node from
you 2-3, AVL, and 2-3-4 trees |
|