Data Structures

Topic #8

Today’s Agenda

Continue Discussing Table Abstractions

But, this time, let’s talk about them in terms of new non-linear data structures

trees

which require that our data be organized in a hierarchical fashion

Tree Introduction

Remember when we learned about tables?

We found that none of the methods for implementing tables was really adequate.

With many applications, table operations end up not being as efficient as necessary.

We found that hashing is good for retrieval, but doesn't help if our goal is also to obtain a sorted list of information.

Tree Introduction

We found that the binary search also allows for fast retrieval,

but is limited to array implementations versus linked list.

Because of this, we need to move to more sophisticated implementations of tables, using binary search trees!

These are "nonlinear" implementations of the ADT table.

Tree Terminology

Trees are used to represent the relationship between data items.

All trees are hierarchical in nature which means there is a parent-child relationship between "nodes" in a tree.

The lines between nodes are called directed edges.

If there is a directed edge from node A to node B -- then A is the parent of B and B is a child of A.

Tree Terminology

Children of the same parent are called siblings.

Each node in a tree has at most one parent, starting at the top with the root node (which has no parent).

Parent of n The node directly above node n in the tree

Child of n The node directly below the node n in the tree

Tree Terminology

Root The only node in the tree with no parent

Leaf A node with no children

Siblings Nodes with a common parent

Ancestor of n A node on the path from the root to n

Tree Terminology

Descendant of n

A node on a path from n to a leaf

Empty tree

A tree with no nodes

Subtree of n

A tree that consists of a child of n and the child's descendants

Height

The number of nodes on the longest path from root to a leaf

Tree Terminology

Binary Tree

A tree in which each node has at most two children

Full Binary Tree

A binary tree of height h whose leaves are all at the level h and whose nodes all have two children; this is considered to be completely balanced

Binary Trees

A binary tree is a tree where each node has no more than 2 children.

If we traverse down a binary tree -- for every node -- there are either no children (making this node a leaf) or there are two children called the left and right subtrees

(A subtree is a subset of a tree including some node in the tree along with all of its descendants).

Binary Search Trees

The nodes of a binary tree contain values.

For a binary search tree, it is really sorted according to the key values in the nodes.

It allows us to traverse a binary tree and get our data in sorted order!

For example, for each node n, all values greater than n are located in the right subtree...all values less than n are located in the left subtree. Both subtrees are considered to be binary trees themselves.

Binary Search Trees

NOT a Binary Search Tree

Binary Search Trees

Notice that a binary tree organizes data in a way that facilitates searching the tree for a particular data item.

It ends up solving the problems of sorted-traversal with the linear implementations of the ADT table.

And, if reasonably balanced, it can provide a logarithmic retrieval, removal, and insertion performance!

Binary Trees

Before we go on, let's make sure we understand some concepts about trees.

Trees can come in many different shapes. Some trees are taller than others.

To find the height of a tree, we need to find the distance from the root to the farthest leaf. Or....you could think of it as the number of nodes on the longest path from the root to a leaf.

Binary Trees

Each of these trees has the same number of nodes -- but different heights:

Binary Trees

You will find that experts define heights differently.

For example, just by intuition you would think that the trees shown previously have a height of 2 and 4.

But, for the cleanest algorithms, we are going to define the height of a tree as the following (next slide)

Binary Trees

If a node is a root, the level is 1. If a node is not the root,

then it has a level 1 greater than its parent.

If the tree is entirely empty,

then it has a height of zero.

Otherwise, its height is equal to the maximum level of its nodes.

Using this definition,

the trees shown previously have the height of 3, 5, and 5.

Full Binary Trees

Now let's talk about full, complete, and balanced binary trees.

A full binary tree has all of its leaves at level h.

In the previous diagram, only the left hand tree is a full binary tree!

All nodes that are at a level less than the height of the tree have 2 children.

Complete Binary Trees

A complete binary tree is one which is a full binary tree to a level of its height-1 ...

then at the last level, it is filled from left to right. For example:

Binary Search Trees

This has a height of 4 and is a full binary tree at level 3.

But, at level 4, the leaves are filled in from left to right!

From this definition, we realize that a full binary tree is also considered to be a complete binary tree.

However, a complete binary tree does not necessarily mean it is full!

Implementing Binary Trees

Just like other ADTs,

we can implement a binary tree using pointers or arrays.

A pointer based implementation example:

struct node {

data value;

node * left_child;

node * right_child;

};

Binary Search Trees

In what situations would the data being “stored” in the node...

be represented by a pointer to the data?

struct node {

data * ptr_value;

when more than a single data structure needs to reference the same tree (e.g., two binary search trees referencing the same data but organized on two different keys!)

Binary Search Trees

In what situations would the data being “stored” in the node...

be represented by a pointer to a LLL node?

struct tree_node {

LLL_node * head;

when each node’s data is actually a list of items (a general purpose list, stack, queue, or other ordered list representation)

Binary Search Trees

In what situations would the data being “stored” in the node...

be represented by an array of data?

struct tree_node {

data ** array;

when each node’s data is actually a list of items (a general purpose list, stack, queue, or other ordered list representation), but where the size and efficiency of this data structure is preferred over a LLL

Implementing Binary Trees

class binary_tree {

public:

binary_tree();

~binary_tree();

int insert(const data &);

int remove(const key &);

int retrieve (const key &, data [], int & num_matches);

void display();

Implementing Binary Trees

//continued....class interface

private:

node * root;

};

Notice that instead of using “head” we use “root” to establish the “starting point” in the tree

If the tree is empty, root is NULL.

Implementing Binary Trees

Implementing Binary Trees

When we implement binary tree algorithms

we have a choice of using iteration or recursion and still have reasonably efficient results

remember why we didn’t use recursion for traversing through a standard linear linked list?

now, if the tree is reasonably balanced, we can traverse through a tree with a minimal number of recursive calls

Traversal through BSTs

Remember that a binary tree is either empty or it is in the form of a Root with two subtrees.

If the Root is empty, then the traversal algorithm should take no action (i.e., this is an empty tree -- a "degenerate" case).

If the Root is not empty, then we need to print the information in the root node and start traversing the left and right subtrees.

When a subtree is empty, then we know to stop traversing it.

Traversal through BSTs

Given all of this, the recursive traversal algorithm is:

Traverse (Root)

If the Tree is not empty then

Visit the node at the Root (maybe display)

Traverse(Left subtree)

Traverse(Right subtree)

Traversal through BSTs

But, this algorithm is not really complete.

When traversing any binary tree, the algorithm should have 3 choices of when to process the root:

before it traverses both subtrees (like this algorithm),

after it traverses the left subtree,

or after it traverses both subtrees.

Each of these traversal methods has a name: preorder, inorder, postorder.

Traversal through BSTs

You've already seen what the preorder traversal algorithm looks like...

it would traverse the following tree as: 60,20,10,5,15,40,30,70,65,85

but what would it be using inorder traversal?

or, post order traversal?

Traversal through BSTs

The inorder traversal algorithm would be:

Traverse (Root)

If the Tree is not empty then

Traverse(Left subtree)

Visit the node at the Root (display)

Traverse(Right subtree)

Traversal through BSTs

It would traverse the same tree as: 5,10,15,20,30,40,60,65,70,85;

Notice that this type of traversal produces the numbers in order.

Search trees can be set up so that

all of the nodes in the left subtree

are less than the nodes in the

right subtree.

Traversal through BSTs

The postorder traversal is:

If the Tree is not empty then

Traverse(Left subtree)

Traverse(Right subtree)

Visit the node at the Root (maybe display)

It would traverse the same tree as:

5, 15, 10,30,40,20,65,85,70,60

Traversal through BSTs

Think about the code to traverse a tree inorder using a pointer based implementation:

void inorder_print(tree root) {

if (root) {

inorder_print(root->left_child);

cout <<root->value.name);

inorder_print(root->right_child);

}

Traversal through BSTs

Why do we pass root by value vs. by reference?

void inorder_print(tree root) {

Why don’t we say??

root = root->left_child;

As an exercise, try to write a nonrecursive version of this!

Using BSTs for Table ADTs

We can implement our ADT Table operations using a nonlinear approach of a binary search tree.

This provides the best features of a linear implementation that we previously talked about plus you can insert and delete items without having to shift data.

With a binary search tree we are able to take advantage of dynamic memory allocation.

Using BSTs for Table ADTs

Linear implementations of ADT table operations are still useful.

Remember when we talked about efficiency, it isn't good to overanalyze our problems.

If the size of the problem is small, it is unlikely that there will be enough efficiency gain to implement more difficult approaches.

In fact, if the size of the table is small using a linear implementation makes sense because the code is simple to write and read!

Using BSTs for Table ADTs

For test operations, we must define a binary search tree where for each node -- the search key is greater than all search keys in the left subtree and less than all search keys in the right subtree.

Since this is implicitly a sorted tree when we traverse it inorder, we can write efficient algorithms for retrieval, insertion, deletion, and traversal.

Remember, traversal of linear ADT tables was not a straightforward process!

Using BSTs for Table ADTs

Let's quickly look at a search algorithm for a binary search tree implemented using pointers (i.e., implementing our Retrieve ADT Table Operation):

The following is pseudo code:

int retrieve (tree *root, key &k, data & value){

if (!root) //we have an empty tree

return 0;

Using BSTs for Table ADTs

else if (root->value == k) {

value = root->value;

return 1;

}

else if (k < root->value) return retrieve(root->left_child, k,data);

else

return retrieve(root->right_child, k, data);

}

For Next Time...

To prepare for next class

write C++ code to insert a new data item at a leaf in the appropriate sub-tree using the binary search tree concept

think about what you might need to do to then remove an item?

what special cases will we need to consider?

how might we make a copy of a binary search tree?


	Continue Discussing Table Abstractions
	But, this time, let’s talk about them in terms of new non-linear data structures
		trees
		which require that our data be organized in a hierarchical fashion


	Remember when we learned about tables?
		We found that none of the methods for implementing tables was really adequate.
		With many applications, table operations end up not being as efficient as necessary.
		We found that hashing is good for retrieval, but doesn't help if our goal is also to obtain a sorted list of information.


	We found that the binary search also allows for fast retrieval,
		but is limited to array implementations versus linked list.
		Because of this, we need to move to more sophisticated implementations of tables, using binary search trees!
		These are "nonlinear" implementations of the ADT table.


	Trees are used to represent the relationship between data items.
		All trees are hierarchical in nature which means there is a parent-child relationship between "nodes" in a tree.
		The lines between nodes are called directed edges.
		If there is a directed edge from node A to node B -- then A is the parent of B and B is a child of A.


	Children of the same parent are called siblings.
	Each node in a tree has at most one parent, starting at the top with the root node (which has no parent).
	Parent of n The node directly above node n in the tree
	Child of n The node directly below the node n in the tree


	Root The only node in the tree with no parent
	Leaf A node with no children
	Siblings Nodes with a common parent
	Ancestor of n A node on the path from the root to n


	Descendant of n
		A node on a path from n to a leaf
	Empty tree
		A tree with no nodes
	Subtree of n
		A tree that consists of a child of n and the child's descendants
	Height
		The number of nodes on the longest path from root to a leaf


	Binary Tree
		A tree in which each node has at most two children
	Full Binary Tree
		A binary tree of height h whose leaves are all at the level h and whose nodes all have two children; this is considered to be completely balanced


	A binary tree is a tree where each node has no more than 2 children.
		If we traverse down a binary tree -- for every node -- there are either no children (making this node a leaf) or there are two children called the left and right subtrees
		(A subtree is a subset of a tree including some node in the tree along with all of its descendants).


	The nodes of a binary tree contain values.
	For a binary search tree, it is really sorted according to the key values in the nodes.
		It allows us to traverse a binary tree and get our data in sorted order!
		For example, for each node n, all values greater than n are located in the right subtree...all values less than n are located in the left subtree. Both subtrees are considered to be binary trees themselves.


	Notice that a binary tree organizes data in a way that facilitates searching the tree for a particular data item.
	It ends up solving the problems of sorted-traversal with the linear implementations of the ADT table.
	And, if reasonably balanced, it can provide a logarithmic retrieval, removal, and insertion performance!


	Before we go on, let's make sure we understand some concepts about trees.
	Trees can come in many different shapes. Some trees are taller than others.
	To find the height of a tree, we need to find the distance from the root to the farthest leaf. Or....you could think of it as the number of nodes on the longest path from the root to a leaf.


	Each of these trees has the same number of nodes -- but different heights:


	You will find that experts define heights differently.
	For example, just by intuition you would think that the trees shown previously have a height of 2 and 4.
	But, for the cleanest algorithms, we are going to define the height of a tree as the following (next slide)


	If a node is a root, the level is 1. If a node is not the root,
		then it has a level 1 greater than its parent.
	If the tree is entirely empty,
		then it has a height of zero.
	Otherwise, its height is equal to the maximum level of its nodes.
	Using this definition,
		the trees shown previously have the height of 3, 5, and 5.


	Now let's talk about full, complete, and balanced binary trees.
	A full binary tree has all of its leaves at level h.
	In the previous diagram, only the left hand tree is a full binary tree!
	All nodes that are at a level less than the height of the tree have 2 children.


	A complete binary tree is one which is a full binary tree to a level of its height-1 ...
		then at the last level, it is filled from left to right. For example:


	This has a height of 4 and is a full binary tree at level 3.
	But, at level 4, the leaves are filled in from left to right!
	From this definition, we realize that a full binary tree is also considered to be a complete binary tree.
	However, a complete binary tree does not necessarily mean it is full!


	Just like other ADTs,
		we can implement a binary tree using pointers or arrays.
		A pointer based implementation example:
	struct node {
	data value;
	node * left_child;
	node * right_child;
	};


	In what situations would the data being “stored” in the node...
		be represented by a pointer to the data?
		struct node {
		data * ptr_value;
		when more than a single data structure needs to reference the same tree (e.g., two binary search trees referencing the same data but organized on two different keys!)


	In what situations would the data being “stored” in the node...
		be represented by a pointer to a LLL node?
		struct tree_node {
		LLL_node * head;
		when each node’s data is actually a list of items (a general purpose list, stack, queue, or other ordered list representation)


	In what situations would the data being “stored” in the node...
		be represented by an array of data?
		struct tree_node {
		data ** array;
		when each node’s data is actually a list of items (a general purpose list, stack, queue, or other ordered list representation), but where the size and efficiency of this data structure is preferred over a LLL


	class binary_tree {
	public:
	binary_tree();
	~binary_tree();
	int insert(const data &);
	int remove(const key &);
	int retrieve (const key &, data [], int & num_matches);
	void display();


	//continued....class interface
	private:
	node * root;
	};
	Notice that instead of using “head” we use “root” to establish the “starting point” in the tree
	If the tree is empty, root is NULL.


	When we implement binary tree algorithms
		we have a choice of using iteration or recursion and still have reasonably efficient results
		remember why we didn’t use recursion for traversing through a standard linear linked list?
		now, if the tree is reasonably balanced, we can traverse through a tree with a minimal number of recursive calls


	Remember that a binary tree is either empty or it is in the form of a Root with two subtrees.
		If the Root is empty, then the traversal algorithm should take no action (i.e., this is an empty tree -- a "degenerate" case).
		If the Root is not empty, then we need to print the information in the root node and start traversing the left and right subtrees.
		When a subtree is empty, then we know to stop traversing it.


	Given all of this, the recursive traversal algorithm is:
	Traverse (Root)
	If the Tree is not empty then
	Visit the node at the Root (maybe display)
	Traverse(Left subtree)
	Traverse(Right subtree)


	But, this algorithm is not really complete.
	When traversing any binary tree, the algorithm should have 3 choices of when to process the root:
		before it traverses both subtrees (like this algorithm),
		after it traverses the left subtree,
		or after it traverses both subtrees.
		Each of these traversal methods has a name: preorder, inorder, postorder.


	You've already seen what the preorder traversal algorithm looks like...
		it would traverse the following tree as: 60,20,10,5,15,40,30,70,65,85
		but what would it be using inorder traversal?
		or, post order traversal?


	The inorder traversal algorithm would be:
	Traverse (Root)
	If the Tree is not empty then
	Traverse(Left subtree)
	Visit the node at the Root (display)
	Traverse(Right subtree)