Edition July D Entity Relationship Models and Diagrams Representation Data Modeling and Relational Database Design. 1. Database Modeling and Design. 3 rd. Edition. Toby J. Teorey. University of Michigan. Lecture Introductory concepts; objectives of database management 2. at caite.info offers information on database modeling, database software, and .. The process of relational database model design is the method used to create a relational database Form (4NF), 5th Normal Form ( 5NF), and Domain Key Normal Form (DKNF). .. subsequent editions of the book .
|Language:||English, Spanish, Indonesian|
|Genre:||Business & Career|
|ePub File Size:||27.40 MB|
|PDF File Size:||20.61 MB|
|Distribution:||Free* [*Regsitration Required]|
Views 14MB Size Report. DOWNLOAD PDF DATABASE MODELING AND DESIGN Logical Design Fifth Edition TOBY TEOREY SAM LIGHTSTONE. Database Modeling. & Design. Fourth Edition . Fourth and Fifth Normal Forms ciates, caite.info). Purchase Database Modeling and Design - 5th Edition. Print Book & E-Book. DRM-free (EPub, PDF, Mobi). × DRM-Free Easy - Download and start reading.
Generalization can also be described in terms of inheritance, which specifies that all the attributes of a supertype are propagated down the hierarchy to entities of a lower type. An update to a nonkey attribute affects either the supertype or one subtype table, but not both. A database management system DBMS is a generalized software system for manipulating databases. These forms are discussed in much more detail in Chapters 2—6. Finally, the last section concludes with a few tips for UML usage. Optional existence can be specified using a zero. Thus, in Figure 5.
He has been with IBM since Tom Nadeau is the founder of Aladdin Software aladdinsoftware. His technical interests include data warehousing, OLAP, data mining and machine learning. Jagadish is a professor in EE and CS at the University of Michigan, Ann Arbor, where he is part of the database group affiliated with the bioinformatics program and the Center for Computational Medicine and Bioinformatics.
We are always looking for ways to improve customer experience on Elsevier. We would like to ask you for a moment of your time to fill in a short questionnaire, at the end of your visit. If you decide to participate, a new browser tab will open so you can complete the survey after you have completed your visit to this website. Thanks in advance for your time. Skip to content. Search for books, journals or webpages All Webpages Books Journals.
View on ScienceDirect. Paperback ISBN: Morgan Kaufmann. Published Date: Page Count: View all volumes in this series: Sorry, this product is currently out of stock.
Flexible - Read on multiple operating systems and devices. Easily read eBooks on smart phones, computers, or any eBook readers, including Kindle. When you read an eBook on VitalSource Bookshelf, enjoy such features as: Access online or offline, on mobile or desktop devices Bookmarks, highlights and notes sync across all your devices Smart study tools such as note sharing and subscription, review mode, and Microsoft OneNote integration Search and navigate content across your entire Bookshelf library Interactive notebook and read-aloud functionality Look up additional information online by highlighting a word or phrase.
Institutional Subscription. Instructor Ancillary Support Materials. Free Shipping Free global shipping No minimum order. Series Dedication Preface About the Authors 1. Query Optimization and Plan Selection 3. In-depth detail and plenty of real-world, practical examples throughout Loaded with design rules and illustrative case studies that are applicable to any SQL, UML, or XML-based system Immediately useful to anyone tasked with the creation of data models for the integration of large-scale enterprise data.
English Copyright: Powered by. He received a Ph. He also taught at the University of Illinois. He currently leads research in databases in the context of the Internet and in biomedicine. While many specialized database systems object-oriented, spatial, multimedia, etc.
Relational database design has evolved from an art to a science that has been partially implementable as a set of software design aids. Many of these design aids have appeared as the database component of computer-aided software engineering CASE tools, and many of them offer interactive modeling capability using a simplified data modeling approach.
Logical design—that is, the structure of basic data relationships and their definition in a particular database system—is largely the domain of application designers. This book is devoted to the logical design methodologies and tools most popular for relational databases today.
Physical design methodologies and tools are covered in a separate book. In this chapter, we review the basic concepts of database management and introduce the role of data modeling and database design in the database life cycle. Data and Database Management The basic component of a file in a file system is a data item, which is the smallest named unit of data that has meaning in the real world—for example, last name, first name, street address, ID number, and political party.
A group of related data items treated as a unit by an application is called a record. Examples of types of records are order, salesperson, customer, product, and department. A file is a collection of records of a single type. Database systems have built upon and expanded these definitions: In a relational database, a data item is called a column or attribute, a record is called a row or tuple, and a file is called a table.
A database is a more complex object; it is a collection of interrelated stored data that serves the needs of multiple users within one or more organizations—that is, an interrelated collection of many different types of tables.
The motivation for using databases rather than files has been greater availability to a diverse set of users, integration of data for easier access and update for complex transactions, and less redundancy of data. A database management system DBMS is a generalized software system for manipulating databases. A DBMS supports a logical view schema, subschema ; physical view access methods, data clustering ; data definition language; data manipulation language; and important utilities such as transaction management and concurrency control, data integrity, crash recovery, and security.
Data independence is the ability to make changes in either the logical or physical structure of the database without requiring reprogramming of application programs. It also makes database conversion and reorganization much easier.
Relational DBMSs provide a much higher degree of data independence than previous systems; they are the focus of our discussion on data modeling. Database Life Cycle The database life cycle incorporates the basic steps involved in designing a global schema of the logical database, allocating data across a computer network, and defining local DBMS-specific schemas.
Once the design is completed, the life cycle continues with database implementation and maintenance. This chapter contains an overview of the database life cycle, as shown in Figure 1. In succeeding chapters we will focus on the database design process from the modeling of requirements through logical design Steps I and II below. We illustrate the result of each step of the life cycle with a series of diagrams in Figure 1. Each diagram shows a possible form of the output of each step so the reader can see the progression of the design process from an idea to an actual database implementation.
These forms are discussed in much more detail in Chapters 2—6. Requirements analysis. The database requirements are determined by interviewing both the producers and users of data and using the information to produce a formal requirements specification. That specification includes the data required for processing, the natural data relationships, and the software platform for the database implementation.
As an example, Figure 1. Logical design. The global schema, a conceptual data model diagram that shows all the data and their relationships, is developed using techniques such as entity-relationship ER or UML. The data model constructs must be ultimately transformed into tables. Conceptual data modeling. The data requirements are analyzed and modeled by using an ER or UML diagram that includes many features we will study in Chapters 2 and 3, for example, semantics for optional relationships, ternary relationships, supertypes, and subtypes categories.
Processing requirements are typically specified using natural language expressions or SQL commands along with the frequency of occurrence.
Figure 1. View integration. Usually, when the design is large and more than one person is involved in requirements analysis, multiple views of data and relationships occur, resulting in inconsistencies due to variance in taxonomy, context, or perception. To eliminate redundancy and inconsistency from the model, these views must Figure 1. Salesperson sales-name addr dept job-level Order vacation-days Order-product order-no sales-name cust-no order-no prod-no Step II.
View integration requires the use of ER semantic tools such as identification of synonyms, aggregation, and generalization. In Figure 1. View integration is also important when applications have to be integrated, and each may be written with its own view of the database.
Transformation of the conceptual data model to SQL tables. Based on a categorization of data modeling constructs and a set of mapping rules, each relationship and its associated entities are transformed into a set of DBMS-specific candidate relational tables.
We will show these transformations in standard SQL in Chapter 5. Redundant tables are eliminated as part of this process. In our example, the tables in Step II. Normalization of tables. Given a table R , a set of attributes B is functionally dependent on another set of attributes A if, at each instant of time, each A value is associated with exactly one B value. Functional dependencies FDs are derived from the conceptual data model diagram and the semantics of data relationships in the requirements analysis.
They represent the dependencies among data elements that are unique identifiers keys of entities. Additional FDs, which represent the dependencies between key and nonkey attributes within entities, can be derived from the requirements specification.
Candidate relational tables associated with all derived FDs are normalized i. Finally, redundancies in the data that occur in normalized candidate tables are analyzed further for possible elimination, with the constraint that data integrity must be preserved. An example of normalization of the Salesperson table into the new Salesperson and SalesVacations tables is shown in Figure 1.
We note here that database tool vendors tend to use the term logical model to refer to the conceptual data model, and they use the term physical model to refer to the DBMS-specific implementation model e. We also note that many conceptual data models are obtained not from scratch, but from the process of reverse engineering from an existing DBMS-specific schema Silberschatz et al. Physical design. The physical design step involves the selection of indexes access methods , partitioning, and clustering of data.
The logical design methodology in Step II simplifies the approach to designing large relational databases by reducing the number of data dependencies that need to be analyzed. This is accomplished by inserting the conceptual data modeling and integration steps Steps II. The objective of these steps is an accurate representation of reality. Data integrity is preserved through normalization of the candidate tables created when the conceptual data model is transformed into a relational model.
The purpose of physical design is to then optimize performance. As part of the physical design, the global schema can sometimes be refined in limited ways to reflect processing query and transaction requirements if there are obvious large gains to be made in efficiency.
This is called denormalization. It consists of selecting dominant processes on the basis of high frequency, high volume, or explicit priority; defining simple extensions to tables that will improve query performance; evaluating total cost for query, update, and storage; and considering the side effects, such as possible loss of integrity.
This is particularly important for online analytical processing OLAP applications. Database implementation, monitoring, and modification. Once the design is completed, the database can be created through implementation of the formal schema using the data definition language DDL of a DBMS.
Then the data manipulation language DML can be used to query and update the database, as well as to set up indexes and establish constraints, such as referential integrity. As the database begins operation, monitoring indicates whether performance requirements are being met. If they are not being satisfied, modifications should be made to improve performance. Thus, the life cycle continues with monitoring, redesign, and modifications. In the next two chapters we look first at the basic data modeling concepts; then, starting in Chapter 4, we apply these concepts to the database design process.
Conceptual Data Modeling Conceptual data modeling is the driving component of logical database design. Let us take a look of how this important component came about and why it is important. Schema diagrams were formalized in the s by Charles Bachman. He used rectangles to denote record types and directed arrows from one record type to another to denote a one-to-many relationship among instances of records of the two types.
The entity-relationship ER approach for conceptual data modeling, one of the two approaches emphasized in this book, and described in detail in Chapter 2, was first presented in by Peter Chen. The Chen form of ER models uses rectangles to specify entities, which are somewhat analogous to records. It also uses diamond-shaped objects to represent the various types of relationships, which are differentiated by numbers or letters placed on the lines connecting the diamonds to the rectangles.
The Unified Modeling Language UML was introduced in by Grady Booch and James Rumbaugh and has become a standard graphical language for specifying and documenting large-scale software systems. We will use both the ER model and UML to illustrate the data modeling and logical database design examples throughout this book. In conceptual data modeling, the overriding emphasis is on simplicity and readability. The goal of conceptual schema design, where the ER and UML approaches are most useful, is to capture real-world data requirements in a simple and meaningful way that is understandable by both the database designer and the end user.
Summary Knowledge of data modeling and database design techniques is important for database practitioners and application developers. The database life cycle shows what steps are needed in a methodical approach to designing a database, from logical design, which is independent of the system environment, to physical design, which is based on the details of the database management system chosen to implement the database. Among the variety of data modeling approaches, the ER and UML data models are arguably the most popular in use today because of their simplicity and readability.
Tips and Insights for Database Professionals Tip 1. Work methodically through the steps of the life cycle. Each step is clearly defined and has produced a result that can serve as a valid input to the next step.
Tip 2. Correct design errors as soon as possible by going back to the previous step and trying new alternatives. The later you wait, the more costly the errors and the longer the fixes. Tip 3. Separate the logical and physical design completely because you are trying to satisfy completely different objectives. The objective is to obtain a feasible solution to satisfy all known and potential queries and updates.
Save the effort for optimization for physical design. The objective is to optimize performance for known and projected queries and updates. Database design textbooks that adhere to a significant portion of the relational database life cycle described in this chapter are Teorey and Fry , Muller , Stephens and Plew , Silverston , Harrington , Bagui , Hernandez and Getz , Simsion and Witt , Powell , Ambler and Sadalage , Scamell and Umanath , Halpin and Morgan , Mannino , Stephens , Churcher , and Hoberman Temporal time-varying databases are defined and discussed in Jenson and Snodgrass and Snodgrass Schema evolution during development, a frequently occurring problem, is addressed in Harriman, Hodgetts, and Leo Supertypes and Subtypes 23 Aggregation 27 Ternary Relationships 28 General n-ary Relationships 31 Exclusion Constraint 31 Foreign Keys and Referential Integrity 32 Summary 32 Tips and Insights for Database Professionals 33 Literature Summary 34 This chapter defines all the major entity—relationship ER concepts that can be applied to the conceptual data modeling phase of the database life cycle.
The ER model has two levels of definition—one that is quite simple and another that is considerably more complex. The simple level is the one used by most current design tools. It is quite helpful to the database designer who must communicate with end users about their data requirements. An example of a simple form of ER model using the Chen notation is shown in Figure 2.
In this example we want to keep track of videotapes and customers in a video store. It is easy to learn and applicable to a wide variety of design problems that might be encountered in industry and small businesses. As we will demonstrate, the simple form is easily translatable into SQL data definitions, and thus it has an immediate use as an aid for database implementation.
The complex level of ER model definition includes concepts that go well beyond the simple model. It includes concepts from the semantic models of artificial intelligence and from competing conceptual data models. Data modeling at this level helps the database designer capture more semantics without having to resort to narrative explanations. It is also useful to the database application Customer cust-id Figure 2. However, such detail in very large data model diagrams actually detracts from end user understanding.
Therefore, the simple level is recommended as the basic communication tool for database design verification. In the next section, we will look at the simple level of ER modeling described in the original work by Chen and extended by others. The following section presents the more advanced concepts that are less generally accepted but useful to describe certain semantics that cannot be constructed with the simple model.
Entities Entities are the principal data objects about which information is to be collected; they usually denote a person, place, thing, or event of informational interest.
A particular occurrence of an entity is called an entity instance, or sometimes an entity occurrence. In our example, Employee, Department, Division, Project, Skill, and Location are all examples of entities for easy reference, entity names will be capitalized throughout this text. The entity construct is a rectangle as depicted in Figure 2. The entity name is written inside the rectangle.
Relationships Relationships represent real-world associations among one or more entities, and as such, have no physical or conceptual existence other than that which depends upon their entity associations. Relationships are described in terms of degree, connectivity, and existence. These terms are defined in the sections that follow. The relationship construct is a diamond that connects the associated entities, as shown in Figure 2. The relationship name can be written inside or just outside the diamond.
A role is the name of one end of a relationship when each end needs a distinct name for clarity of the relationship. In most of the examples given in Figure 2. However, in some cases role names should be used to clarify ambiguities. For example, in the first case in Figure 2. Role names are typically nouns. A particular instance or occurrence of an attribute within an entity or relationship is called an attribute value.
Attributes of an entity such as Employee may include emp-id, emp-name, emp-address, phone-no, fax-no, job-title, and so on. The attribute construct is an ellipse with the attribute name inside or oblong as shown in Figure 2.
The attribute is connected to the entity it characterizes. There are two types of attributes: An identifier or key is used to uniquely determine an instance of an entity. For example, an identifier or key of Employee is emp-id; each instance of Employee has a different value for emp-id, and thus there are no duplicates of emp-id in the set of Employees.
Key attributes are underlined in the ER diagram, as shown in Figure 2. A descriptor or nonkey attribute is used to specify a nonunique characteristic of a particular entity instance. For example, a descriptor of Employee might be emp-name or job-title; different instances of Employee may have the same value for emp-name two John Smiths or job-title many Senior Programmers. Both identifiers and descriptors may consist of either a single attribute or some composite of attributes.
Some attributes, such as specialty-area, may be multivalued. The notation for multivalued attributes is shown with a double attachment line, as shown in Figure 2. Other attributes may be complex, such as an address that further subdivides into street, city, state, and zip code. Keys may also be categorized as either primary or secondary. A primary key fits the definition of an identifier given in this section in that it uniquely determines an instance of an entity. A secondary key fits the definition of a descriptor in that it is not necessarily unique to each entity instance.
These definitions are useful when entities are translated into SQL tables and indexes are built based on either primary or secondary keys. Weak entities are often depicted with a double-bordered rectangle see Figure 2. For example, in Figure 2.
The Employee-jobhistory for a particular employee only can exist if there exists an Employee entity for that employee. Degree of a Relationship The degree of a relationship is the number of entities associated in the relationship. Binary and ternary relationships are special cases where the degree is 2 and 3, respectively. An n-ary relationship is the general form for any degree n.
The notation for degree is illustrated in Figure 2. The binary relationship, an association between two entities, is by far the most common type in the natural world.
In fact, many modeling systems use only this type. In Figure 2. A binary recursive relationship e. It is called recursive because the entity relates only to another instance of its own type. The binary recursive relationship construct is a diamond with both connections to the same entity. A ternary relationship is an association among three entities. This type of relationship is required when binary relationships are not sufficient to accurately describe the semantics of the association.
The ternary relationship construct is a single diamond connected to three entities as shown in Figure 2. Sometimes a relationship is mistakenly modeled as ternary when it could be decomposed into two or three equivalent binary relationships. An entity may be involved in any number of relationships, and each relationship may be of any degree. Connectivity of a Relationship The connectivity of a relationship describes a constraint on the connection of the associated entity occurrences in the relationship.
The actual count of elements associated with the connectivity is called the cardinality of the relationship connectivity; it is used much less frequently than the connectivity constraint because the actual values are usually variable across instances of relationships.
Note that there are no standard terms for the connectivity concept, so the reader is admonished to look at the definition of these terms carefully when using a particular database design methodology. Figure 2. In the one-to-one case, the entity Department is managed by exactly one Employee, and each Employee manages exactly one Department.
On the Department side the minimum and maximum connectivities are both one—that is, each Employee works within exactly one Department. In the many-to-many case, a particular Employee may work on many Projects and each Project may have many Employees. We see that the maximum connectivity for Employee and Project is N in both directions, and the minimum connectivities are each defined implied as one.
Some situations, though rare, are such that the actual maximum connectivity is known. For example, a professional basketball team may be limited by conference rules to 12 players. In such a case, the number 12 could be placed next to an entity called Team Members on the many side of a relationship with an entity Team.
Most situations, however, have variable connectivity on the many side, as shown in all the examples of Figure 2. Attributes of a Relationship Attributes can be assigned to certain types of relationships as well as to entities. They are not normally assigned to oneto-one or one-to-many relationships because of potential ambiguities. Existence of an Entity in a Relationship Existence of an entity occurrence in a relationship is defined as either mandatory or optional.
When an occurrence of that entity need not always exist, it is considered optional. Optional existence, defined by a 0 on the connection line between an entity and a relationship, defines a minimum connectivity of zero.
Mandatory existence defines a minimum connectivity of one. When existence is unknown, we assume the minimum connectivity is one—that is, mandatory. Maximum connectivities are defined explicitly on the ER diagram as a constant if a number is shown on the ER diagram next to an entity or a variable by default if no number is shown on the ER diagram next to an entity.
Existence is often implicit in the real world. For example, an entity Employee associated with a dependent weak entity, Dependent, cannot be optional, but the weak entity is usually optional. Using the concept of optional existence, an entity instance may be able to exist in other relationships even though it is not participating in this particular relationship. Relationships have no explicit construct but are implied by the connection line between entities and a relationship name on the connection line.
Minimum connectivity is specified by either a 0 for zero or perpendicular line for one on the connection lines between entities. The term intersection entity is used to designate a weak entity, especially an entity that is equivalent to a many-to-many relationship.
Brown Bruce, The similarities with the Chen notation are obvious from Figure 2. Fortunately, any of these forms is reasonably easy to learn and read, and their equivalence for the basic ER concepts is obvious from the diagrams. Without a clear standard for the ER model, however, many other constructs are being used today in addition to the three types shown here.
Advanced ER Constructs Generalization: Supertypes and Subtypes The original ER model has been effectively used for communicating fundamental data and relationship definitions with the end user for a long time. However, using it to develop and integrate conceptual models with different end user views was severely limited until it could be extended to include database abstraction concepts such as generalization. The generalization relationship specifies that several types of entities with certain common attributes can be generalized into a higher-level entity type—a generic or superclass entity, which is more commonly known as a supertype entity.
IDEF1X notation. As an example, in Figure 2. The ER model construct for the generalization abstraction is the connection of a supertype entity with its subtypes, using a circle and the subset symbol on the connecting lines from the circle to the subtype entities. The circle contains a letter specifying a disjointness constraint see the following discussion.
Specialization, the reverse of generalization, is an inversion of the same concept; it indicates that subtypes specialize the supertype.
A supertype entity in one relationship may be a subtype entity in another relationship. Generalization can also be described in terms of inheritance, which specifies that all the attributes of a supertype are propagated down the hierarchy to entities of a lower type. Generalization can be further classified by two important constraints on the subtype entities: The disjointness constraint requires the subtype entities to be mutually exclusive.
Subtypes that are not disjoint i. As an example, the supertype entity Individual has two subtype entities, Employee and Customer; these subtypes could be described as overlapping or not mutually exclusive Figure 2.
Regardless of whether the subtypes are disjoint or overlapping, they may have additional special attributes in addition to the generic inherited attributes from the supertype. The completeness constraint requires the subtypes to be all-inclusive of the supertype.
Thus, subtypes can be defined as either total or partial coverage of the supertype. For example, in a generalization hierarchy with supertype Individual and subtypes Employee and Customer, the subtypes may be described as all-inclusive or total.
We denote this type of constraint by a double line between the supertype entity and the circle. This is indicated in Figure 2. Aggregation Aggregation is a form of abstraction between a supertype and subtype entity that is significantly different from the generalization abstraction. The construct for aggregation is similar to generalization in that the supertype entity is connected with the subtype entities with a circle; in this case, the letter A is shown in the circle. Furthermore, there are no inherited attributes in aggregation; each entity has its own unique set of attributes.
Ternary Relationships Ternary relationships are required when binary relationships are not sufficient to accurately describe the semantics of an association among three entities. Ternary relationships are somewhat more complex than binary relationships, however. The ER notation for a ternary relationship is shown in Figure 2.
In either case, it is assumed that one instance of each of the other entities is given. Assertion 1: One engineer, working under one manager, could be working on many projects.
Assertion 2: One project, under the direction of one manager, could have many engineers. Assertion 3: One engineer, working on one project, must have only a single manager. Each notebook belongs to one technician for each project. Note that a technician may still work on many projects and maintain different notebooks for different projects.
At a particular location, an employee works on only one project. At a particular location, there can be many employees assigned to a given project. Functional dependencies None d Figure 2. For example: All four forms of ternary relationships are illustrated in Figure 2. Ternary relationships can have attributes in the same way as many-to-many binary relationships can. The values of these attributes are uniquely determined by some combination of the keys of the entities associated with the relationship.
The meaning of this form can best be described in terms of the functional dependencies among the keys of the n associated entities. The collection of FDs that describe an n-ary relationship must each have n components: In a more complex database, other types of FDs may also exist within an n-ary relationship.
When this occurs, the ER model does not provide enough semantics by itself, and it must be supplemented with a narrative description of these dependencies. Class Time Exclusion Constraint The normal, or default, treatment of multiple relationships is the inclusive OR, which allows any or all of the entities to participate.
In some situations, however, multiple relationships may be affected by the exclusive OR exclusion constraint, which allows at most one entity instance among several entity types to participate in the relationship with a single root entity. At most, one of the associated entity instances could apply to an instance of Work-task. Foreign Keys and Referential Integrity A foreign key is an attribute of an entity or an equivalent SQL table, which may be either an identifier or a descriptor. A foreign key in one entity or table is taken from the same domain of values as the primary key in another parent table in order for the two tables to be connected to satisfy certain queries on the database.
Referential integrity requires that for every foreign key instance that exists in a table, the row and thus the key instance of the parent table associated with that foreign key instance must also exist. The referential integrity constraint has become integral to relational database design and is usually implied as a requirement for the resulting relational database implementation.
Chapter 5 illustrates the SQL implementation of referential integrity constraints. Summary The basic concepts of the ER model and their constructs are described in this chapter. An entity is a person, place, thing, or event of informational interest. Attributes are objects that provide descriptive information about entities.
Attributes may be unique identifiers or nonunique descriptors. Relationships describe the connectivity between entity instances: The degree of a relationship is the number of associated entities: The role name , or relationship name, defines the function of an entity in a relationship.
The concept of existence in a relationship determines whether an entity instance must exist mandatory or not optional. So, for example, the minimum connectivity of a binary relationship—that is, the number of entity instances on one side that are associated with one instance on the other side—can either be zero, if optional, or one, if mandatory. This simple form of ER models is used in most design tools and is easy to learn and apply to a variety of industrial and business applications.
It is also a very useful tool for communicating with the end user about the conceptual model and for verifying the assumptions made in the modeling process. A more complex form, a superset of the simple form, is useful for the more experienced designer who wants to capture greater semantic detail in diagram form, while avoiding having to write long and tedious narrative to explain certain requirements and constraints. The more advanced constructs in ER diagrams are sporadically used and have no generally accepted form as yet.
They include ternary relationships, which we define in terms of the FD concept of relational databases; constraints on exclusion; and the implicit constraints from the relational model such as referential integrity. ER is a much better level of abstraction than specifying individual data items or functional dependencies, and it is easier to use to develop a conceptual model for large databases.
The main advantages of ER modeling are that it is easy to learn, easy to use, and very easy to transform to SQL table definitions. Identify entities first, then relationships, and finally the attributes of entities. Identify binary relationships first whenever possible. Only use ternary relationships as a last resort.
Tip 4. ER model notations are all very similar. Pick the notation that works best for you unless your client or boss prefers a specific notation for their purposes. Remember that ER notation is the primary tool for communicating data concepts with your client. Tip 5. Keep the ER model simple. Too much detail wastes time and is harder to communicate to your client. The application of the semantic network model to conceptual schema design was shown by Bachman , McLeod and King , Hull and King , and Peckham and Maryanski The object-oriented software development community created UML to meet the special needs of describing object-oriented software design.
UML has grown into a standard for the design of digital systems in general. There are a number of different types of UML diagrams serving various purposes Rumbaugh et al.
The class and the activity diagram types are particularly useful for discussing database design issues. UML class diagrams capture the structural aspects found in database schemas. UML activity diagrams facilitate discussion on the dynamic processes involved in database design. This chapter is an overview of the syntax and semantics of the UML class and activity diagram constructs used in this book. We are using UML 2.
The influence of UML has in turn affected the database community. Class diagrams now appear frequently in the database literature to describe database schemas. UML activity diagrams are similar in purpose to flow charts. Processes are partitioned into constituent activities along with control flow specifications. This chapter is organized into three main sections. The first section presents class diagram notation, along with examples.
The next section covers activity diagram notation, along with illustrative examples. Finally, the last section concludes with a few tips for UML usage.
We conceptualize classes of objects in our everyday lives. For example, a car has attributes, such as a vehicle identification number VIN and mileage. A car also has operations, such as accelerate and brake.
All cars have these attributes and operations. Individual cars differ in the details. A given car has a value for the VIN and mileage. Individual cars are objects that are instances of the Car class. Classes and objects are a natural way of conceptualizing the world around us. The concepts of classes and objects are also the paradigms that form the foundation of objectoriented programming. The development of object-oriented programming led to the need for a language to describe object-oriented design, giving rise to UML.
Classes are analogous to entities. It is possible to conceptualize a database table as a class. The columns in the table are the attributes, and the rows are objects of that class. Each row in the table would have values for these columns, representing an individual car.
The major difference between classes and entities is the lack of operations in entities. Note that the term operation is used here in the UML sense of the word. Stored procedures, functions, triggers, and constraints are forms of named behavior that can be defined in databases; however, these are not associated with the behavior of individual rows. The term operations in UML refers to the methods inherent in classes of objects.
These behaviors are not stored in the definition of rows within the database. Classes can be shown with attributes and no operations in UML, which is the typical usage for database schemas. The UML icon for a class is a rectangle. When the class is shown with attributes and operations, the rectangle is subdivided into three horizontal compartments. The top compartment contains the class name, centered in boldface, beginning with a capital letter. Typically, class names are nouns.
The middle compartment contains attribute names, left justified in regular face, beginning with a lowercase letter. The bottom compartment contains operation names, left justified in regular face, beginning with a lowercase letter, ending with parentheses.
The parenthesis may contain arguments for the operation. The class notation has some variations, reflecting emphasis. Operations are important in software.
If the software designer wishes to focus on the operations, the class can be shown with only the class name and operations compartments. Showing operations and hiding attributes is a very common syntax used by software designers. Database designers, on the other hand, do not generally deal with class operations; however, the attributes are of paramount importance. The needs of the database designer can be met by writing the class with only the class name and attribute compartments showing.
Lastly, in high-level diagrams, it is often desirable to illustrate the relationships of the classes without becoming entangled in the details of the attributes and operations. Classes can be written with just the class name compartment when simplicity is desired. Various types of relationships may exist between classes. Associations are one type of relationship. The most generic form of association is drawn with a line connecting two classes. For example, in Figure 3.
A few types of associations, such as aggregation and composition, are very common. UML has designated symbols for these associations. For example, a Car may be part of a Car Pool.
The Car also exists on its own, independent of any Car Pool. Another distinguishing feature of aggregation is that the part may be shared among multiple objects. For example, a Car may belong to more than one Car Pool. The aggregation association is indicated with a hollow diamond attached to the class that holds the parts. Figure 3. For example, a Frame is part of a single Car. The notation for composition is an association adorned with a solid black diamond attached to the class that owns the parts.
Generalization is another common relationship. For example, Sedan is a type of car. The Car class is more general than the Sedan class. Generalization is indicated by a solid line adorned with a hollow arrowhead pointing to the more general class. Figures 3. We conclude this section with an example database schema of the music industry, illustrated by Figures 3.
These examples are parallel to the ER models shown in Figure 2. You may refer back to Figure 2. Associations between classes may be reflexive, binary, or n-ary. Reflexive association is a term we are carrying over from ER modeling. It is not a term defined in UML, although it is worth discussing. The reflexive association in Figure 3. The roles of classes in a relationship may be indicated at the ends of the relationship.
The number of objects involved in the relationship, referred to as multiplicity, may also be specified at the ends of the relationship. An asterisk indicates that many objects take part in the association at that end of the relationship.
The multiplicities of the reflexive association example in Figure 3. A binary association is a relationship between two classes. For example, one Division has many Departments. Notice the solid black diamond at the Division end of the relationship. The solid diamond is an adornment to the association that indicates composition. The Division is composed of Departments. The ternary relationship in Figure 3.
All classes partaking in the association are connected to a hollow diamond. Each end of the ternary association example in Figure 3. The meaning of each multiplicity is isolated from the other multiplicities. Given a class, if you have exactly one object from every other class in the association, the multiplicity is the number of associated objects for the given class.
One Employee working on one Project assignment uses many Skills. One Employee uses one Skill on many Project assignments. One Skill used on one Project is fulfilled by many Employees. The next three class diagrams in Figure 3. The illustrated one-to-one association specifies that each Department is associated with exactly one Employee acting in the role of manager, and each manager is associated with exactly one Department. The diagram with the one-to-many association means that each Department has many Employees, and each Employee belongs to exactly one Department.
The many-to-many example in Figure 3. This example also illustrates the use of an association class. If an association has attributes, these are written in a class that is attached to the association with a dashed line. The association class named WorkAssignment in Figure 3.
The association and the class together form an association class. Multiplicity can be a range of integers, written with the minimum and maximum values separated by two periods. The asterisk by itself carries the same meaning as the range [ Also, if the minimum and maximum values are the same number, then the multiplicity can be written as a single number.
For example, [ Optional existence can be specified using a zero. The [ Mandatory existence is specified whenever a multiplicity begins with a positive integer.
The example of mandatory existence in Figure 3. One end of an association can indicate mandatory Employee existence, while the other end may use optional existence. This is the case in the example, where an Office may have any number of occupants, including zero.
Manager Engineer Technician Secretary Generalization is another type of relationship. A superclass is a generalization of a subclass. Specialization is the opposite relationship Individual of generalization. A subclass is a Complete specialization of the superclass.
The top example in Figure 3. Manager, Engineer, Technician, and Secretary. These four Figure 3. Notice the four relationships share a common arrowhead.
Semantically, these are still four separate relationships. The sharing of the arrowhead is permissible in UML, to improve the clarity of the diagrams. The bottom example in Figure 3. The class named Individual is a generalization of the Employee and Customer classes. The Employee and Customer classes are in turn superclasses of the EmpCust class. A class can be a subclass in more than one generalization relationship.
The meaning in the example is that an EmpCust object is both an Employee and a Customer. UML incorporates some extensibility to accommodate user needs, such as a note. A note in UML is written as a rectangle with a dog-eared upper-right corner. The note can attach to the pertinent element s with a dashed line s. Write briefly in the note what you wish to convey. The bottom diagram in Figure 3. The top diagram means that a Program and Electronic Documentation both Program Electronic Documentation contribute to the composition of a Software Product.
The composition signifies that the parts do not exist without the Software Product there Course is no software pirating in our ideal world. The bottom diagram specifies that a Teacher and a Textbook are aggregated by a course.
The aggregation signifies that the Teacher and the Textbook Teacher Textbook are part of the Course, but they also exist separately. If a course is canceled, the Teacher and Figure 3. The n-ary relationship may be clarified by specifying roles next to the scheduled meeting location day time participating classes.
The concept of a primary key arises in the context of database design. Objects in software color are not typically identified in this fashion. Stereotypes are depicted with a short natural language word or phrase, enclosed in guillemets: We take advantage The vin attribute is type amount specified as the primary key for Cars.
This means that a given VIN identifies Figure 3. A noteworthy rule of thumb for primary keys: When a composition relationship exists, the primary key of the part includes the primary key of the owning object. The second diagram in Figure 3. Student enrollee class Course Example from the Music Industry Large database schemas may be introduced with highlevel diagrams.
Details can be broken out in additional diagrams. The overall goal is to present ideas in a clear, organized fashion. You will sometimes find there are multiple ways of representing the same material in UML. The decisions you make with regard to your representation depend in part on your purpose for a given diagram. Packages may be used to organize classes into groups. Packages may themselves also be grouped into packages. The goal of using packages is to make the overall design of a system more comprehensible.
One use for packages is to represent a schema. You can then show multiple schemas concisely. Another use for packages is to group related classes together within a schema, and present the schema clearly.
Given a set of classes, different people may conceptualize different groupings. The division is a design decision, with no right or wrong answer. Whatever decisions are made, the result should enhance readability. The notation for a package is a folder icon, and the contents of a package can be optionally shown in the body of the folder. If the contents are shown, then the name of the package is placed in the tab.
If the contents are elided, then the name of the package is placed in the body of the icon. If the purpose is to illustrate the relationships of the packages, and the classes are not important at the moment, then it is better to illustrate with the contents elided.
Music is created and placed on Media. The Media is then distributed. There is an association between the Music and the Media, and between the Media and Distribution. Let us look at the organization of the classes.
The music industry is illustrated in Figure 3. The Music package contains classes that are responsible for creating the music. Examples of Groups are the Beatles and the Bangles. Sarah McLachlan and Sting are Artists. Groups and Artists are involved in creating the music.
We will look shortly at the other classes and how they are Music Media Distribution Figure 3. The Media package contains classes that physically hold the recordings of the music.
The Distribution package contains classes that bring the media to you. The contents of a package can be expanded into greater detail. The relationships of the classes within the Music package are illustrated in Figure 3. A Group is an aggregation of two or more Artists.
As indicated by the multiplicity between Artist and Group [ Composers, Lyricists, and Musicians are different types of Artists. A Song is associated with one or more Composers. A Song may not have any Lyricist, or any number of Lyricists. A Song may have any number of Renditions.
A Rendition is associated with exactly one Song. A Rendition is associated with Musicians and Instruments. A given Musician—Instrument combination is associated with any number of Renditions. A specific Rendition—Musician combination may be associated with any number of Group A given Rendition—Instrument combination is associated with any number of Musicians.
A system may be understood more easily by shifting focus to each package in turn. We turn our attention now to the classes and relationships in the Media package, shown in Figure 3. The associated classes from the Music and Distribution packages are also shown, detailing how the Media package is related to the other two packages. The Music Media is associated with the Group and Artist classes, which are contained in the Music package shown in Figure 3.
The Music Media is also associated with the Publisher, Studio, and Producer classes, which are contained in the Distribution package shown in Figure 3. Albums and CDs are types of Music Media.
Albums and CDs are both composed of Tracks. Tracks are associated with Renditions. Activity Diagrams UML has a full suite of diagram types, each of which fulfills a need for describing a view of the design. UML activity diagrams are used to specify the activities and the flow of control in a process.
The process may be a workflow followed by people, organizations, or other physical things. Alternatively, the process may be an algorithm Figure 3.
The syntax and the semantics of UML constructs are the same, regardless of the process described. Our examples draw from workflows that are followed by people and organizations, since these are more useful for the logical design of databases. Activity Diagram Notation Description Activity diagrams include notation for nodes, control flow, and organization.
The icons we are describing here are outlined in Figure 3. Nodes initial node final node activity node Activity Name Control flow [guard] decision branch [alternative guard] fork join Organization Subset Name 1 Subset Name 2 partition swim lanes Figure 3. Any process begins with control residing in the initial node, represented as a solid black circle.