Week #10: Abstract Data Types and Classes

Week #10: Abstract Data Types and Classes

CS 161: Introduction to Computer Science 1

Abstract Data Types

• Up to this point, we have used only simple data types: float, integer, and character. Each of these types is useful because of the unique set of operations it allows. Integers and floats for arithmetic; characters for text processing. Each of these data types is stored in the computer in a different way.

• Therefore, even with these simple data types there are two important aspects of a data type: the set of operations it allows and its storage representation.

• These two aspects together comprise what is called an abstract data type (ADT).

• So what is the point?

• Well, the reason for pointing this out is that you have encountered data types that you have invented -- such as arrays or structures. Such types are used to improve the self-documentation of programs and to help guarantee that they run correctly.

• As we create new data types, it is important to think about the two aspects of the data type: its operations and its storage representation. Also, think about how it differs from and is related to other data types.

The Notion of a Data Structure

• We are now going to dive into finding out what a data structure is. They are similar to data types.

• A data structure is a way to structure data.

• It organizes data into composite items and provides operations for manipulating items

• A data structure is a way of organizing data so that it can be

stored, retrieved, and operated on in a particular way or fashion,

preferable one that is that is natural to the problem being solved.

Data Abstraction

• Once you design data structures, we can forget the inessential details and think of it in more general terms (like we do when we use procedure abstraction!).

Adapting a Known Algorithm

• If possible, try not to rewrite procedures that you have already written; but instead, adapt procedures to new applications!

Design by Concrete Example

• While designing a program or procedure, it may be helpful to think of the task in some concrete example and then desk check your algorithm using that example. Using abstract thoughts won't necessarily get you where you want to go.

Time Efficiency

• Using arrays, our programs are going to start to take longer to run. As they get more complicated and use more storage and more complicated structures, more time is needed by the computer to get the job done.

• So, we should always try to make a program efficient; however, this may end up complicating the programming task.

• Tie this into arrays and data structures ... look at the

sort subtask and compare the time required for the following two

nested loops:

Introduction to...Object Models

• The conceptual framework around which object-oriented programming is built is the object model. The major elements of the object model are abstraction, encapsulation, modularity, hierarchy, and overloading.

• Abstraction is the way we can capture the behavior of an object. Deciding upon the right set of abstractions (i.e., classes and objects) is the main concern when using the object-oriented approach. Choosing the system components and determining how to relate them to data & operations is dependent on the designer & their familiarity with the application. One designer might have a much higher level of abstraction than another. Basically, abstraction helps us to think about what we are doing & helps us structure how we are going to do it. It focuses on the "outside view" of an object.

The behavior of an object must be characterized as part of its abstraction. This behavior can be determined by looking at the operations that its clients (whatever uses this object) perform upon it as well as the operations that this object performs on others. These are the operations or the methods ("member functions" are the same thing); notice, all three of these terms mean the same thing - they have simply evolved from different programming languages.

The "tool" for creating new data types is the class. At its simplest, an abstract data type consists of a new class assembled from built-in data types, such as numbers and characters along with their operations. In more complex systems a new class can contain other objects.

• Encapsulation allows us to hide our implementation. This allows program changes to be made with minimal impact. Encapsulation is also referred to as "information hiding"...which allows the implementor to selectively prevent clients from seeing the inside of the abstraction. As you can guess, for abstraction to work we MUST have encapsulation. It allows us to hide the representation of the object as well as the implementation of its methods. For example, if I had a stack ADT... but allowed the user to manipulate the data structure outside of the push, pop, create routines, I would not be able to guarantee the integrity of the stack nor easily modify the program/application without a significant impact. I certainly wouldn't be able to change the manner in which the stack was implemented without impacting the routines that mucked around with the data structure itself. Encapsulation allows us to design our abstractions to prohibit this type of abuse.

By using encapsulation intelligently, we can localize design decisions that are likely to change. Then, as systems evolve and we become more familiar with a system's actual use, we may find that some operations take too long or take too much space. By originally encapsulating our objects, we can change the representation of the object to implement a more efficient algorithm without disturbing any of the clients that use the object. This is a primary benefit of encapsulation.

When we implement a class in C++, we specify the data, the methods (operations), as well as whether or not the data & methods can be seen by other objects. There can be public, private, and protected data and methods. C++ is the most flexible object-oriented language in terms of allowing clients to have visibility or limited access to information & methods. Data & methods that are public, are visible to all clients. Data & methods that are private are fully encapsulated. Data & methods that are protected are visible only to the class itself and to its subclasses through inheritance.

• Modularity allows us to partition our program into individual components. Since C++ classes and objects form the logical structure of the system, we can place these abstractions into modules to produce an overall system architecture. As you can see, modularity and encapsulation go hand in hand. We should group logically related classes and objects into the same module and expose only those features that other modules must see. Arbitrary modularization is sometimes worse than no modularization at all!

With traditional structured programming, we are familiar with creating modules that contain meaningful groups of subprograms (eg., all I/O routines in one module). With object-oriented technology, we decide based on the logical structure of the objects.

But, this may not be as easy as it seems. You have at least two competing forces. One desire is to encapsulate your abstractions and keep logical classes and their subclasses in separate modules. On the other hand, you also will need to make some of the abstractions visible to other modules & classes. To help give guidance, you might want to think about every data structure being private to its own module...and accessible only by the routines within the module but not by routines within other modules. Any other routine that requires information stored in a module's data structures should obtain it by calling the routines contained in that module. Modularity helps us to group logically related abstractions!

The good news is that with the principles of abstraction, encapsulation, and modularity, we can create objects that have well defined boundaries around single abstractions.

• Hierarchy allows us to simplify our understanding of the problem by creating abstractions in the form of a hierarchy. It allows us to order our abstractions, following the natural structure of the application's environment.

Most object-oriented systems are designed with a "kind of" hierarchy: inheritance. This defines a relationship among classes, where one class shares the structure and/or behavior in one or more classes. For example, a base class of a flower may have subclasses of rose, carnation, and daisy; each of these is a "kind of" flower. Here, our "superclasses" represent generalized abstractions and our subclasses represent specialized versions.

Another object-oriented structure is with a "part of" hierarchy. For example, a flower might again be the base class, but now one subclass is the blossom, another is the stem, and a third is the root system. Each of these is a "part of" the flower. Notice with this type of hierarchy, the structure and behavior of the subclasses is not inherited from the base class. These types of relationships are called "aggregation relationships". Another example of this type of relationship would be a flowering plant is a "part of" the plant family which is "part of" the plants in the yard.

• Overloading allows us to assign multiple meanings to the same method name. Let's look at an example using a Computer Graphics application. Let's say we have a 2D drawing engine (i.e., a graphics device) that can draw lines, polygons, circles, and ellipses. A conventional structured programming language would implement this problem by having a separate DrawLine, DrawPolygon, DrawCircle, and DrawEllipse routine. For each shape there is a different routine. With an object-oriented system, we could think of each kind of shape as representing a different class. We might have a Line class, a Filled Area class, and a Wireframe class. Through a technique called overloading, we can use the same name for the drawing method of every class -- even though the actual code to perform the drawing would be different from one class to another. This means that instead of having a DrawLine and a DrawPolygon...we could have a method called draw available for each class. Because each class knows only about its own version of draw and how to draw itself, there is no confusion!

Overloading simplifies your program because it allows you to use the same name for the same type of operation everywhere in a program. This also simplifies the way in which a client would request a method to be done...it doesn't have to have a big switch to access the right drawing method for the corresponding object. Instead, you just invoke "draw" for all classes! The neat thing is that if all classes use the same name for their drawing operation, none of the other classes has to keep track of the various names for this operation! THIS MEANS THAT OVERLOADING ELIMINATES the need to figure out which drawing message to send. This significantly simplifies the problem.

In addition, overloading is even more useful when you add a new shape. The new shape can create its own draw method, using the same name as the existing shapes.

With this process, each object may be of a different class - with each responding differently to a draw method. This actually means that we don't know which draw method is invoked until compile time or possibly even runtime. Hiding alternative procedures behind a common interface is called polymorphism (a Greek term which means "many forms"). Polymorphism is one of the most powerful features of OOP. With polymorphism, objects become more independent of each other and new objects can be added with minimal changes to existing objects; it allows for simpler systems which are designed to evolve over time.

All of this from the simple ability to use the same method name in more than one class!

Introduction to...Data Abstraction

• Data abstraction is used as a tool to increase the modularity of a program. It is used to build "walls" between a program and its data structures. During the design of a solution, you will find that you need to support various operations on the data and therefore need to define abstract data types. The actual data structures should only be specified after you have clearly defined the operations on the abstract data types to be used.

• Data abstraction makes us think about what we can do to collections of data and not how to do it. Data abstraction is a tool that allows us to develop each data structure in isolation from the rest of the solution. The other modules of the solution will know what operations they can perform on the data...but they should not depend on how the data is stored or how the operations are performed. Think of WHAT and not HOW!

• ADTs are collections of data values with corresponding operations. The specification of ADT operations indicates WHAT they do but not how to implement them. Don't confuse data structures with ADT's. Data abstraction allows programs to be oblivious to any change in the implementation of its data structures.

• Now let's look at some examples using abstract data types...separating the operations on the data from the implementation of these operations:

Let's take a list of grocery store items. Let say we have milk, eggs, and butter on the list. These items are not in sorted order, instead they are in the order in which they came into your mind. Operations on such a list could be: find the length of the list, insert an item into the list, delete an item in the list, search for an item in the list. The specifications of these operations contain no information on HOW to perform the operations...just that these things can be done on an ordered list. It is important that you not include implementation issues in the specification of an ADT's operations. The program should depend only on the behavior of the ADT.

• Once an ADT has been specified, you can design applications to access and manipulate the ADT's data solely in terms of its operations without ever knowing about its implementation. There is a wall between the implementation of an ADT ordered list and the rest of the program preventing you from knowing how the ordered list is stored.

• ADT's should be designed during the problem solving process. Let's say we want to determine the dates of all holidays. What is the data used? Well, dates consist of a month, day, and year. Therefore, we should create a new data type called DATE. This is a good start...but just having a user-defined data type doesn't specify and restrict the legal operations on the dates. In addition to the user-defined type, we need to specify the operations that are valid for the data. For this example, the valid operations are:

- Determine the date of the first day of a year

- Determine whether a date is the last day of a year

- Determine if a date is a holiday

- Determine the date of the day that follows a given day

• In C++ abstract data types can be implemented directly as part of the language! For this reason they are sometimes called "user created data types".