Data Structures
Topic #10

Today’s Agenda
Continue Discussing Trees
Examine more advanced trees
2-3 (evaluate what we learned)
B-Trees
AVL
2-3-4
red-black trees

Discuss 2-3 Trees
A 2-3 tree is always balanced
Therefore, you can search it in all situations with logarithmic efficiency of the binary search
You might be concerned about the extra work in the insertion/deletion algorithms to split and merge the nodes...

Discuss 2-3 Trees
But, rigorous mathematical analysis has proved that this extra work to maintain structure is not significant
It is sufficient to consider only the time required to locate an item (or a position to insert)

Discuss 2-3 Trees
So, if 2-3 trees are so good, why not have nodes that can have more data items and more than 3 children?
Well, remember why 2-3 trees are great?
because they are balanced and that balanced structure is pretty easy to maintain

Discuss 2-3 Trees
The advantage is not that the tree is shorter than a balanced binary search tree
the reduction in height is actually offset by the extra comparisons that have to be made to find out which branch to take
actually a binary search tree that is balanced minimizes the amount of work required to support ADT table operations

Discuss 2-3 Trees
But, with binary search trees balance is hard to maintain
A 2-3 tree is really a compromise
Searching may not be quite as efficient as a binary tree of minimum height
but, it is relatively simple to maintain

Discuss 2-3 Trees
Allowing nodes to have more than 3 children would require more comparisons and would therefore be counter productive
unless you are working with external storage and each node requires a disk access, then we use b-trees which have the minimum height possible

Discuss B-Trees
Tables stored externally can be searched with B-Trees.
B-Trees are a more generalized approach than the 2-3 Tree
With externally stored tables, we want to keep the search tree as short as possible; it is much faster to do extra comparisons at a particular node than try to find the next node.

Discuss B-Trees
Every time we want to get another node,
we have to access the external file and read in the appropriate information.
It takes far less time to operate on a particular node (i.e., doing comparisons) once it has been read in.
This means that for externally stored tables we should try to reduce the height of the tree...even if it means doing more comparisons at every node.

Discuss B-Trees
Therefore, with an external search tree,
we allow each node to have as many children as possible.
If a node is to have m children, then you must be able to allocate enough memory for that node to contain the data and m pointers to the node.
The data such a node must have must be m-1 key values.

Discuss B-Trees
Remember in a binary search tree,
if a node has 2 children then it contains one data value (i.e., one value).
You can think of the data value at a node as separating the data values in the two child subtrees.
All keys to the left are less than the node's data value and all key values to the right are greater than or equal.
The value of the data at a particular node tells you which branch to take.

Discuss B-Trees
In a 2-3 tree,
if a node has 3 children then it must contain two key values.
These two values separate the key values in the node's three child subtrees.
All of the key values in the left subtree are less than the node's smaller key value;
all of the key values in the middle subtree are between the node's two key values;
all of the key values in the right subtree are greater than or equal to the node's larger key

Discuss B-Trees
Ideally, you should structure these types of trees such that every internal node has m children and all leaves are at the same level.
For example, if m is 5 -- then every node should have 5 children and 4 data values.
But, this is too difficult to maintain when you are doing a variety of insertions and deletions.

Discuss B-Trees
So, we can require that B-trees be balanced (as we saw with 2-3 trees)...
but the number of children for any internal node should be able to be somewhere between m and (m div 2)+1.
We call this a B-Tree of degree m
This requires that all leaves be at the same level (balanced).

Discuss B-Trees
Each node contains between m-1 and (m div 2) values.
Each internal node has one more child than it has values.
There is one exception;
the root of the tree can contain as few as 1 value and can have as few as two children (or none -- if the tree consists of only a root!).

Discuss B-Trees
Notice, a 2-3 tree is a B-tree of degree 3.
Data can be inserted into a B-tree using the same strategy
of splitting and
merging nodes
that we discussed
Here is a B-tree
of degree 5:

Discuss B-Trees
Then, insert 55.
The first step is to locate the leaf of the tree in which this index belongs by determining where the search for 55 would terminate.
We would find that we would want to insert 55 in the node containing 50,56,57, 58.
But, that would cause this node to contain 5 records. Since a node can contain only 4 records, you must split this node into two...the new left node gets the two smaller values and the new right node gets the two larger values.

Discuss B-Trees
The record with the middle key value (56) is moved up to the parent:

Discuss B-Trees
This causes two problems,
the parent now has six children and five records!!
So, we must split the parent into two nodes and move the middle data value up to its parent.
Remember, when we split an internal node, we need to also move that node's children too
Since the root only has 2 data items, we can simply add 56 there.
The solution is on the next slide...

Discuss B-Trees

Discuss B-Trees
Notice, that if the root had needed to be spit,
the new root will contain only one value and have only 2 children (that is why we have the exception to the B-Tree definition stated earlier).
To traverse a B-Tree in sorted order, all we need to do is visit the search keys in sorted order by using an inorder traversal of the B-Tree.

Balancing Algorithms
But, are there other alternatives?
Remember the advantage of trees is that they are well suited for problems that are hierarchical in nature and they are much faster than linked lists
but, this is not valid if the tree in not balanced
luckily, there are a number of techniques to balance a binary tree

Balancing Algorithms
Some of the balancing techniques require constant restructuring of the tree as data is inserted
the AVL algorithm uses this approach
Some algorithms consist of build an unbalanced tree and then reordering the data once the tree is generated
this can be simple but depending on the frequency of data being inserted, it may not be realistic

Balancing Algorithms
The “brute force” technique is to create an array of pointers to your data by traversing an unbalanced BST using “inorder” traversal
then re-build the tree by splitting the array in the middle for each subarray (much like what we have seen with the binary search algorithm used with arrays)
the middle data item should be the root, as it splits what is less than it, and what is greater!

Balancing Algorithms
The algorithm for the “brute force” approach is:
balance(data_type data [], int first, int last)
if (first <= last) {
   int middle = (first + last)/2;
  insert(data[middle]);
  balance(data, first, middle-1);
  balance(data, middle+1, last);

Balancing Algorithms
The “brute force” technique has a serious drawback
all of the data must be put in an array before a balanced tree can be created
what would happen if you weren’t using pointers to the data but instances of the data?
if an unbalanced tree is not used (i.e., the data is directly inserted into the array from the client), then a sorting algorithm must be used and fixed size issues arise

AVL Trees
The AVL tree is a classical method proposed by Adelson-Velskii and Landis
creates an “admissible tree” (its original name!)
focuses on rebalancing the tree locally to the portion of the tree affected by insertion and deletions
it allows the height of the left and right subtrees of every node to differ by at most one

AVL Trees
With AVL trees
each node must now keep track of the “balance factors” which records the differences between the heights of the left and right subtrees
the balance factor is the height of the right subtree minus the height of the left subtree
all balance factors must be +1, 0, or -1
notice, this does meet the definition we learned about for a balanced tree

AVL Trees
However, the concept of AVL trees always includes implicitly the techniques for balancing trees
and does not guarantee that the resulting tree is perfectly balanced (unlike all of the other techniques we have seen so far)
but, an AVL tree can be searched almost as efficiently as a minimum height binary search tree
but insert and removal are not as efficient

AVL Trees
AVL trees actually maintains the height close to minimum by monitoring the shape of the tree as you insert and delete
After you insert/delete
the tree is checked to see if any node differs by more than 1 in height
if it does, you rearrange the nodes to restore balance
But, as you can guess, we can’t arbitrarily rearrange nodes....we must keep proper order

AVL Trees
What we do is rotate the tree to make it balanced
Rotations are not necessary after every insertion & deletion (it is only needed when the height differs by more than 1)
experiments indicate that deletions in 78% of the cases require no rebalancing
and only 53% of the insertions do not bring the tree out of balance

AVL Trees
Single rotation is one type of rotation:
In the following, the tree was fine after inserting 20, 10, 40, 30, 50...but when 60 is inserted...

AVL Trees
Start at the node inserted...move up the tree (recursively return)
examining the balancing factor
stop when it is not +1, 0, -1 and rotate from the “heavy” side to the “light”

AVL Trees
If a single rotation does not create a balanced tree
then a double rotation is required
first rotate the subtree at the root where the problem occurred
and then rotate the tree’s root
there is, however, on special case:

AVL Trees
In class, walk through a few examples on your own (and then on the board) building AVL trees
so you can understand the process of rotations
insert: 50,60,30,70,55,20,52,65,40
or, insert: 10, 20, 30, 40, 50, 60, 70, 80
what would the corresponding BST and 2-3 tree looked like?

AVL Trees
The main question you should be facing with an AVL tree is
whether or not such restructuring is always necessary
binary search trees are used to insert, retrieve, and delete elements quickly and the speed of performing these operations i the issue, not the shape of the tree
performance can be improved by balancing the tree but luckily this is not the only method available

2-3-4 and red-black Trees
Now let’s go back to rethinking about how we organize our nodes
maybe instead of trying to balance the tree we keep the tree balancing at all times (perfectly balanced)
but the 2-3 tree had a flaw in that there may be situations where each node is “full” requiring a rippling effect of nodes being split as you recursively return back to the root

2-3-4 and red-black Trees
A 2-3-4 tree solves this problem
which allows 4-nodes which are nodes that have 4 pieces of data and 3 children
each insertion and deletion can have fewer steps than are required by a 2-3 tree (when looking at the insertions/deletions in isolation)
but does this by using more memory
essentially, each node can have 1,2, or 3 pieces of data, and 4 child pointers!!!!!

2-3-4 and red-black Trees
A 2-3-4 tree solves this problem
a node can either be a leaf or,
if it has 1 data item there are 2 children,
2 data items has 3 children, and
3 data items has 4 children
A 2-3-4 tree remains perfectly balanced
but its insertion algorithm splits the nodes as it traverses down the tree toward a leaf, rather than upon the return to the root

2-3-4 and red-black Trees
As you travel down the tree to insert a data item,
if you encounter a node with 3 pieces of data you immediate split the node at that time (just as we did with a 2-3 tree...but now we don’t use the new data we are trying to insert...because we haven’t inserted it yet!)
then, you continue traveling towards a leaf to insert the data

2-3-4 and red-black Trees
What this means is that the tree cannot contain all nodes with 3 pieces of data. Impossible.
In fact, on insert, once you insert data at a leaf it is guaranteed that the leaf’s parent will not have 3 pieces of data...
because if it did, it would have split on the way to find the leaf!

2-3-4 and red-black Trees
The advantage of both the 2-3 and 2-3-4 trees
is that they are easy to maintain balance (not that their height is shorter due to the extra comparisons required)
where the 2-3-4 tree has an advantage is that the insertion/deletion algs require only one pass through the tree so they are simpler than those for a 2-3 tree
decrease in effort makes them attractive..........

2-3-4 and red-black Trees
On the other hand, 2-3-4 trees require more storage than a binary search tree
and more storage (and less efficiently used storage) than a 2-3 tree
But, a binary search tree may be inappropriate
because it may not be balanced
so we use a red-black tree which is a special binary search tree

2-3-4 and red-black Trees
A red-black tree is a BST representation of a 2-3-4 tree with 2 extra fields in the node to represent whether the connection is within the current node or a child
it retains the advantages of a 2-3-4 tree without the storage overhead!
with all of the benefits of a binary search tree and none of the drawbacks!

2-3-4 and red-black Trees
The idea is to represent a node with 2 pieces of data and 3 children as a binary search tree with red and black child pointers

2-3-4 and red-black Trees
And, we represent a node with 3 pieces of data and 4 children as a binary search tree with red and black child pointers

2-3-4 and red-black Trees
In class, walk through examples of
2-3
2-3-4
AVL
BST
and see how you can take a 2-3-4 and turn it into a red black tree (make sure to read the chapter on advanced trees!!!)

2-3-4 and red-black Trees
For next time,
practice creating each of these trees on your own so that you understand the insertion algorithms
think about what would be needed to remove nodes from these trees
try deleting a leaf and an internal node from you 2-3, AVL, and 2-3-4 trees