Notes
Outline
Data Structures
Topic #8
Today’s Agenda
Continue Discussing Table Abstractions
But, this time, let’s talk about them in terms of new non-linear data structures
trees
which require that our data be organized in a hierarchical fashion
Tree Introduction
Remember when we learned about tables?
We found that none of the methods for implementing tables was really adequate.
With many applications, table operations end up not being as efficient as necessary.
We found that hashing is good for retrieval, but doesn't help if our goal is also to obtain a sorted list of information.
Tree Introduction
We found that the binary search also allows for fast retrieval,
but is limited to array implementations versus linked list.
Because of this, we need to move to more sophisticated implementations of tables, using binary search trees!
These are "nonlinear" implementations of the ADT table.
Tree Terminology
Trees are used to represent the relationship between data items.
All trees are hierarchical in nature which means there is a parent-child relationship between "nodes" in a tree.
The lines between nodes are called directed edges.
If there is a directed edge from node A to node B -- then A is the parent of B and B is a child of A.
Tree Terminology
Children of the same parent are called siblings.
Each node in a tree has at most one parent, starting at the top with the root node (which has no parent).
Parent of n The node directly above node n in the tree
Child of n The node directly below the node n in the tree
Tree Terminology
Root The only node in the tree with no parent
Leaf A node with no children
Siblings Nodes with a common parent
Ancestor of n A node on the path from the root to n
Tree Terminology
Descendant of n
A node on a path from n to a leaf
Empty tree
A tree with no nodes
Subtree of n
A tree that consists of a child of n and the child's descendants
Height
The number of nodes on the longest path from root to a leaf
Tree Terminology
Binary Tree
A tree in which each node has at most two children
Full Binary Tree
A binary tree of height h whose leaves are all at the level h and whose nodes all have two children; this is considered to be completely balanced
Binary Trees
A binary tree is a tree where each node has no more than 2 children.
If we traverse down a binary tree -- for every node -- there are either no children (making this node a leaf) or there are two children called the left and right subtrees
(A subtree is a subset of a tree including some node in the tree along with all of its descendants).
Binary Search Trees
The nodes of a binary tree contain values.
For a binary search tree, it is really sorted according to the key values in the nodes.
It allows us to traverse a binary tree and get our data in sorted order!
For example, for each node n, all values greater than n are located in the right subtree...all values less than n are located in the left subtree. Both subtrees are considered to be binary trees themselves.
Binary Search Trees
NOT a Binary Search Tree
Binary Search Trees
Notice that a binary tree organizes data in a way that facilitates searching the tree for a particular data item.
It ends up solving the problems of sorted-traversal with the linear implementations of the ADT table.
And, if reasonably balanced, it can provide a logarithmic retrieval, removal, and insertion performance!
Binary Trees
Before we go on, let's make sure we understand some concepts about trees.
Trees can come in many different shapes. Some trees are taller than others.
To find the height of a tree, we need to find the distance from the root to the farthest leaf. Or....you could think of it as the number of nodes on the longest path from the root to a leaf.
Binary Trees
Each of these trees has the same number of nodes -- but different heights:
Binary Trees
You will find that experts define heights differently.
For example, just by intuition you would think that the trees shown previously have a height of 2 and 4.
But, for the cleanest algorithms, we are going to define the height of a tree as the following (next slide)
Binary Trees
If a node is a root, the level is 1. If a node is not the root,
then it has a level 1 greater than its parent.
If the tree is entirely empty,
then it has a height of zero.
Otherwise, its height is equal to the maximum level of its nodes.
Using this definition,
the trees shown previously have the height of 3, 5, and 5.
Full Binary Trees
Now let's talk about full, complete, and balanced binary trees.
A full binary tree has all of its leaves at level h.
In the previous diagram, only the left hand tree is a full binary tree!
All nodes that are at a level less than the height of the tree have 2 children.
Complete Binary Trees
A complete binary tree is one which is a full binary tree to a level of its height-1 ...
then at the last level, it is filled from left to right. For example:
Binary Search Trees
This has a height of 4 and is a full binary tree at level 3.
But, at level 4, the leaves are filled in from left to right!
From this definition, we realize that a full binary tree is also considered to be a complete binary tree.
However, a complete binary tree does not necessarily mean it is full!
Implementing Binary Trees
Just like other ADTs,
we can implement a binary tree using pointers or arrays.
A pointer based implementation example:
struct node {
data value;
node * left_child;
node * right_child;
};
Binary Search Trees
In what situations would the data being “stored” in the node...
be represented by a pointer to the data?
struct node {
data * ptr_value;
when more than a single data structure needs to reference the same tree (e.g., two binary search trees referencing the same data but organized on two different keys!)
Binary Search Trees
In what situations would the data being “stored” in the node...
be represented by a pointer to a LLL node?
struct tree_node {
LLL_node * head;
when each node’s data is actually a list of items (a general purpose list, stack, queue, or other ordered list representation)
Binary Search Trees
In what situations would the data being “stored” in the node...
be represented by an array of data?
struct tree_node {
data ** array;
when each node’s data is actually a list of items (a general purpose list, stack, queue, or other ordered list representation), but where the size and efficiency of this data structure is preferred over a LLL
Implementing Binary Trees
class binary_tree {
public:
binary_tree();
~binary_tree();
int insert(const data &);
int remove(const key &);
int retrieve (const key &, data [], int & num_matches);
void display();
Implementing Binary Trees
//continued....class interface
private:
node * root;
};
Notice that instead of using “head” we use “root” to establish the “starting point” in the tree
If the tree is empty, root is NULL.
Implementing Binary Trees
Implementing Binary Trees
When we implement binary tree algorithms
we have a choice of using iteration or recursion and still have reasonably efficient results
remember why we didn’t use recursion for traversing through a standard linear linked list?
now, if the tree is reasonably balanced, we can traverse through a tree with a minimal number of recursive calls
Traversal through BSTs
Remember that a binary tree is either empty or it is in the form of a Root with two subtrees.
If the Root is empty, then the traversal algorithm should take no action (i.e., this is an empty tree -- a "degenerate" case).
If the Root is not empty, then we need to print the information in the root node and start traversing the left and right subtrees.
When a subtree is empty, then we know to stop traversing it.
Traversal through BSTs
Given all of this, the recursive traversal algorithm is:
Traverse (Root)
If the Tree is not empty then
Visit the node at the Root (maybe display)
Traverse(Left subtree)
Traverse(Right subtree)
Traversal through BSTs
But, this algorithm is not really complete.
When traversing any binary tree, the algorithm should have 3 choices of when to process the root:
before it traverses both subtrees (like this algorithm),
after it traverses the left subtree,
or after it traverses both subtrees.
Each of these traversal methods has a name: preorder, inorder, postorder.
Traversal through BSTs
You've already seen what the preorder traversal algorithm looks like...
it would traverse the following tree as: 60,20,10,5,15,40,30,70,65,85
but what would it be using inorder traversal?
or, post order traversal?
Traversal through BSTs
The inorder traversal algorithm would be:
Traverse (Root)
If the Tree is not empty then
Traverse(Left subtree)
Visit the node at the Root (display)
Traverse(Right subtree)
Traversal through BSTs
It would traverse the same tree as: 5,10,15,20,30,40,60,65,70,85;
Notice that this type of traversal produces the numbers in order.
Search trees can be set up so that
  all of the nodes in the left subtree
  are less than the nodes in the
  right subtree.
Traversal through BSTs
The postorder traversal is:
If the Tree is not empty then
Traverse(Left subtree)
Traverse(Right subtree)
Visit the node at the Root (maybe display)
It would traverse the same tree as:
5, 15, 10,30,40,20,65,85,70,60
Traversal through BSTs
Think about the code to traverse a tree inorder using a pointer based implementation:
void inorder_print(tree root) {
if (root) {
inorder_print(root->left_child);
cout <<root->value.name);
inorder_print(root->right_child);
}
Traversal through BSTs
Why do we pass root by value vs. by reference?
void inorder_print(tree root) {
Why don’t we say??
root = root->left_child;
As an exercise, try to write a nonrecursive version of this!
Using BSTs for Table ADTs
We can implement our ADT Table operations using a nonlinear approach of a binary search tree.
This provides the best features of a linear implementation that we previously talked about plus you can insert and delete items without having to shift data.
With a binary search tree we are able to take advantage of dynamic memory allocation.
Using BSTs for Table ADTs
Linear implementations of ADT table operations are still useful.
Remember when we talked about efficiency, it isn't good to overanalyze our problems.
If the size of the problem is small, it is unlikely that there will be enough efficiency gain to implement more difficult approaches.
In fact, if the size of the table is small using a linear implementation makes sense because the code is simple to write and read!
Using BSTs for Table ADTs
For test operations, we must define a binary search tree where for each node -- the search key is greater than all search keys in the left subtree and less than all search keys in the right subtree.
Since this is implicitly a sorted tree when we traverse it inorder, we can write efficient algorithms for retrieval, insertion, deletion, and traversal.
Remember, traversal of linear ADT tables was not a straightforward process!
Using BSTs for Table ADTs
Let's quickly look at a search algorithm for a binary search tree implemented using pointers (i.e., implementing our Retrieve ADT Table Operation):
The following is pseudo code:
int retrieve (tree *root, key &k, data & value){
if (!root) //we have an empty tree
return 0;
Using BSTs for Table ADTs
else if (root->value == k) {
value = root->value;
return 1;
}
else if (k < root->value) return retrieve(root->left_child, k,data);
else
return retrieve(root->right_child, k, data);
}
For Next Time...
To prepare for next class
write C++ code to insert a new data item at a leaf in the appropriate sub-tree using the binary search tree concept
think about what you might need to do to then remove an item?
what special cases will we need to consider?
how might we make a copy of a binary search tree?