Database Systems - Chapter 7: Data Storage, Indexing & Physical Design - Trương Quỳnh Chi
Each relation schema consists of a number of
attributes and the relational database schema consists
of a number of relation schemas
Attributes are grouped to form a relation schema
Need some formal measure of why one grouping of
attributes into a relation schema may be better than
another
attributes and the relational database schema consists
of a number of relation schemas
Attributes are grouped to form a relation schema
Need some formal measure of why one grouping of
attributes into a relation schema may be better than
another
Bạn đang xem 20 trang mẫu của tài liệu "Database Systems - Chapter 7: Data Storage, Indexing & Physical Design - Trương Quỳnh Chi", để tải tài liệu gốc về máy hãy click vào nút Download ở trên.
File đính kèm:
- database_systems_chapter_7_data_storage_indexing_physical_de.pdf
Nội dung text: Database Systems - Chapter 7: Data Storage, Indexing & Physical Design - Trương Quỳnh Chi
- Contents 1 Introduction 2 Functional dependencies 3 Normalization 2
- Overview of Database Miniworld Design Process REQUIREMENTS - COLLECTION & ANALYSIS Functional requirements Data requirements FUNCTIONAL ANALYSIS CONCEPTUAL DESIGN independent Conceptual schema – High-level transaction specification DBMS LOGICAL DESIGN (DATA MODEL MAPPING) APPLICATION PROGRAM Database schema specific DESIGN – PHYSICAL DESIGN DBMS TRANSACTION Internal schema IMPLEMENTATION Application program 4 Application Design Database Design
- Introduction “Goodness” measures: Redundant information in tuples Update anomalies: modification, deletion, insertion Reducing the NULL values in tuples Disallowing the possibility of generating spurious tuples 6
- Introduction Update anomalies: modification, deletion, insertion Modification As the manager of a dept. changes we have to update many values according to employees working for that dept. Easy to make the DB inconsistent 8
- Introduction Insertion: How can we create a department before any employees are assigned to it ??
- Introduction Disallowing the possibility of generating spurious tuples EMP_PROJ (SSN, PNumber, Hours, EName, PName, PLocation) EMP_LOCS (EName, PLocation) EMP_PROJ1 (SSN, PNumber, Hours, PName, PLocation) Generation of invalid and spurious data during JOINS: PLocation is the attribute that relates EMP_LOCS and EMP_PROJ1, and PLocation is neither a primary key nor a foreign key in either EMP_LOCS or EMP_PROJ1 12
- Introduction Disallowing the possibility of generating spurious tuples 14
- Introduction Normalization helps DB designers determine the best relation schemas A formal framework for analyzing relation schemas based on their keys and on the functional dependencies among their attributes A series of normal form tests that can be carried out on individual relation schemas so that the relational database can be normalized to any desired degree It is based on the concept of normal form 1NF, 2NF, 3NF, BCNF, 4NF, 5NF It is a process which ensures that the data is structured in such a way that attributes are grouped with the PK. Attributes that do not directly depend on PK may be extracted to form a new relation 16
- Contents 1 Introduction 2 Functional dependencies 3 Normalization 18
- Functional Dependencies (FDs) Functional dependencies (FDs) are used to specify formal measures of the "goodness" of relational designs FDs and keys are used to define normal forms for relations FDs are constraints that are derived from the meaning and interrelationships of the data attributes A set of attributes X functionally determines a set of attributes Y if the value of X determines a unique value for Y X Y 20
- Functional Dependencies (FDs) If K is a key of R, then K functionally determines all attributes in R (since we never have two distinct tuples with t1[K]=t2[K]) 22
- Functional Dependencies (FDs) Direct dependency (fully functional dependency): All attributes in a R must be fully functionally dependent on the primary key (or the PK is a determinant of all attributes in R) SSN {Name, BDate, Address, DNO} EMPLOYEE SSN Name BDate Address DNO 24
- Functional Dependencies (FDs) Partial dependency Composite determinant - more than one value is required to determine the value of another attribute, the combination of values is called a composite determinant {SSN, PNumber} in EMP_PROJ Partial dependency - if the value of an attribute does not depend on an entire composite determinant, but only part of it, the relationship is known as the partial dependency SSN EName , Pnumber {PName, PLocation} EMP_PROJ SSN PNumber Hours EName PName PLocation 26
- Functional Dependencies (FDs) Given a set of FDs F, we can infer additional FDs that hold whenever the FDs in F hold Armstrong's inference rules: IR1. (Reflexive) If Y ⊆ X, then X Y IR2. (Augmentation) If X Y, then XZ YZ (Notation: XZ stands for X ⋃ Z) IR3. (Transitive) If X Y and Y Z, then X Z
- Functional Dependencies (FDs) Closure of a set F of FDs is the set F+ of all FDs that can be inferred from F Closure of a set of attributes X with respect to F is the set X+ of all attributes that are functionally determined by X X+ can be calculated by repeatedly applying IR1, IR2, IR3 using the FDs in F
- Functional Dependencies (FDs) Definition of FD Direct, indirect, partial dependencies Inference Rules for FDs Equivalence of Sets of FDs Minimal Sets of FDs 32
- Functional Dependencies (FDs) A set of FDs is minimal if it satisfies the following conditions: Every dependency in F has a single attribute for its RHS. We cannot remove any dependency from F and have a set of dependencies that is equivalent to F. We cannot replace any dependency X A in F with a dependency Y A, where Y proper-subset-of X ( Y subset-of X) and still have a set of dependencies that is equivalent to F
- Contents 1 Introduction 2 Functional dependencies 3 Normalization 36
- Normalization Two new concepts: A Prime attribute must be a member of some candidate key A Nonprime attribute is not a prime attribute: it is not a member of any candidate key 38
- Normalization First normal form (1NF): there is only one value at the intersection of each row and column of a relation - no set valued attributes in 1NF Disallows composite attributes, multivalued attributes, and nested relations To be part of the formal definition of a relation in the basic (flat) relational model 40
- 1NF 42
- 1NF EMP_PROJ (SSN, PNumber, Hours, EName, PName, PLocation) 1. SSN, PNumber Hours 2. SSN EName 3. PNumber PName, PLocation 44
- Normalization 1NF and dependency problems 2NF – solves partial dependency 3NF – solves indirect dependency BCNF – well-normalized relations 46
- 2NF 48
- Normalization 1NF and dependency problems 2NF – solves partial dependency 3NF – solves indirect dependency BCNF – well-normalized relations 50
- Normalization 3NF solves indirect (transitive) dependencies problem in 1NF and 2NF Method: identify all transitive dependencies and each transitive dependency will form a new relation, with non-prime attributes participating in the transitive dependency and the attribute which determines others as the attributes for the new relation 52
- Exercise Consider the universal relation R = {A, B, C, D, E, F, G, H, I, J} and the set of functional dependencies: 1. A, B C 2. A D, E 3. B F 4. F G, H 5. D I, J What is the key for R? Decompose R into 2NF, then 3NF relations. 54
- SUMMARY OF NORMAL FORMS based on Primary Keys 56
- General Normal Form Definitions A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on every key of R A relation schema R is in third normal form (3NF) if whenever a FD X -> A holds in R, then either: (a) X is a superkey of R, or (b) A is a prime attribute of R 58
- Normalization A relation schema R is in Boyce-Codd Normal Form (BCNF) if whenever an FD X -> A holds in R, then X is a superkey of R 60
- BCNF TEACH (Student, Course, Instructor) 62
- Nonadditive Join Decomposition into BCNF Schemas Algorithm 16.5. Relational Decomposition into BCNF with Nonadditive Join Property Input: A universal relation R and a set of functional dependencies F on the attributes of R. 1. Set D := {R} ; 2. While there is a relation schema Q in D that is not in BCNF do { choose a relation schema Q in D that is not in BCNF; find a functional dependency X→Y in Q that violates BCNF; replace Q in D by two relation schemas (Q – Y) and (X ∪ Y); } ; 64
- Exercise Consider the relation: BOOK (Book_Name, Author, Edition, Year) Based on a common-sense understanding of the data, what are the possible candidate keys of this relation? 68
- Key finding algorithms Extended part
- Key-finding algorithm (1) By Elmasri and Navathe Input: A relation R and a set of functional dependencies F on the attributes of R. Output: a key K of R 1. Set K to contain all attributes in R 2. For each attribute A in K { compute (K – A)+ with respect to F; if (K – A)+ contains all attributes in R, then set K := K – {A} };
- Key-finding algorithm (2)By Hossein Saiedian & Thomas Spencer Input: A relation R and a set of functional dependencies F on the attributes of R. Output: all candidate keys of R Let: U contain all attributes of R Ul contain attributes of R that occur only on the left-hand side of FDs in F Ur contain attributes of R that occur only on the right-hand side of FDs in F Ub contain attributes of R that occur on both sides of FDs in F Note: Ul ∩ Ur = ф, Ul ∩ Ub = ф and Ur ∩ Ub = ф Ul ∪ Ur ∪ Ub = U For every attribute A ∈ U, if A ∈ Ul, then A must be part of every candidate key of R. For every attribute A ∈ U, if A ∈ Ur, then A will not be part of any candidate key of R.
- Key-finding algorithm (2)By Hossein Saiedian & Thomas Spencer A simple categorization of attributes into the sets Ul, Ur and Ub allows to distinguish between those attributes that will participate in the candidate keys of a relational database schema and those that do not. The algorithm (2) finds all candidate keys.
- Exercise 2 Consider the universal relation R = {A, B, C, D, E, F} and the set of functional dependencies: 1) A, D B 2) A, B E 3) C D 4) B C 5) A, C F What is the key for R? Decompose R into 2NF, then 3NF relations.