Database Systems - Chapter 7: Data Storage, Indexing & Physical Design - Trương Quỳnh Chi

Each relation schema consists of a number of
attributes and the relational database schema consists
of a number of relation schemas
 Attributes are grouped to form a relation schema
 Need some formal measure of why one grouping of
attributes into a relation schema may be better than
another

79 trang xuanthi 02/01/2023 3780

Download

Bạn đang xem 20 trang mẫu của tài liệu "Database Systems - Chapter 7: Data Storage, Indexing & Physical Design - Trương Quỳnh Chi", để tải tài liệu gốc về máy hãy click vào nút Download ở trên.

File đính kèm:

database_systems_chapter_7_data_storage_indexing_physical_de.pdf

Nội dung text: Database Systems - Chapter 7: Data Storage, Indexing & Physical Design - Trương Quỳnh Chi

Contents 1 Introduction 2 Functional dependencies 3 Normalization 2
Overview of Database Miniworld Design Process REQUIREMENTS - COLLECTION & ANALYSIS Functional requirements Data requirements FUNCTIONAL ANALYSIS CONCEPTUAL DESIGN independent Conceptual schema – High-level transaction specification DBMS LOGICAL DESIGN (DATA MODEL MAPPING) APPLICATION PROGRAM Database schema specific DESIGN – PHYSICAL DESIGN DBMS TRANSACTION Internal schema IMPLEMENTATION Application program 4 Application Design Database Design
Introduction  “Goodness” measures:  Redundant information in tuples  Update anomalies: modification, deletion, insertion  Reducing the NULL values in tuples  Disallowing the possibility of generating spurious tuples 6
Introduction  Update anomalies: modification, deletion, insertion  Modification  As the manager of a dept. changes we have to update many values according to employees working for that dept.  Easy to make the DB inconsistent 8
Introduction  Insertion:  How can we create a department before any employees are assigned to it ??
Introduction  Disallowing the possibility of generating spurious tuples EMP_PROJ (SSN, PNumber, Hours, EName, PName, PLocation) EMP_LOCS (EName, PLocation) EMP_PROJ1 (SSN, PNumber, Hours, PName, PLocation)  Generation of invalid and spurious data during JOINS: PLocation is the attribute that relates EMP_LOCS and EMP_PROJ1, and PLocation is neither a primary key nor a foreign key in either EMP_LOCS or EMP_PROJ1 12
Introduction  Disallowing the possibility of generating spurious tuples 14
Introduction  Normalization helps DB designers determine the best relation schemas  A formal framework for analyzing relation schemas based on their keys and on the functional dependencies among their attributes  A series of normal form tests that can be carried out on individual relation schemas so that the relational database can be normalized to any desired degree  It is based on the concept of normal form 1NF, 2NF, 3NF, BCNF, 4NF, 5NF  It is a process which ensures that the data is structured in such a way that attributes are grouped with the PK. Attributes that do not directly depend on PK may be extracted to form a new relation 16
Contents 1 Introduction 2 Functional dependencies 3 Normalization 18
Functional Dependencies (FDs)  Functional dependencies (FDs) are used to specify formal measures of the "goodness" of relational designs  FDs and keys are used to define normal forms for relations  FDs are constraints that are derived from the meaning and interrelationships of the data attributes  A set of attributes X functionally determines a set of attributes Y if the value of X determines a unique value for Y X Y 20
Functional Dependencies (FDs)  If K is a key of R, then K functionally determines all attributes in R (since we never have two distinct tuples with t1[K]=t2[K]) 22
Functional Dependencies (FDs)  Direct dependency (fully functional dependency): All attributes in a R must be fully functionally dependent on the primary key (or the PK is a determinant of all attributes in R) SSN {Name, BDate, Address, DNO} EMPLOYEE SSN Name BDate Address DNO 24
Functional Dependencies (FDs)  Partial dependency  Composite determinant - more than one value is required to determine the value of another attribute, the combination of values is called a composite determinant {SSN, PNumber} in EMP_PROJ  Partial dependency - if the value of an attribute does not depend on an entire composite determinant, but only part of it, the relationship is known as the partial dependency SSN EName , Pnumber {PName, PLocation} EMP_PROJ SSN PNumber Hours EName PName PLocation 26
Functional Dependencies (FDs)  Given a set of FDs F, we can infer additional FDs that hold whenever the FDs in F hold  Armstrong's inference rules:  IR1. (Reflexive) If Y ⊆ X, then X Y  IR2. (Augmentation) If X Y, then XZ YZ (Notation: XZ stands for X ⋃ Z)  IR3. (Transitive) If X Y and Y Z, then X Z
Functional Dependencies (FDs)  Closure of a set F of FDs is the set F+ of all FDs that can be inferred from F  Closure of a set of attributes X with respect to F is the set X+ of all attributes that are functionally determined by X  X+ can be calculated by repeatedly applying IR1, IR2, IR3 using the FDs in F
Functional Dependencies (FDs)  Definition of FD  Direct, indirect, partial dependencies  Inference Rules for FDs  Equivalence of Sets of FDs  Minimal Sets of FDs 32
Functional Dependencies (FDs)  A set of FDs is minimal if it satisfies the following conditions:  Every dependency in F has a single attribute for its RHS.  We cannot remove any dependency from F and have a set of dependencies that is equivalent to F.  We cannot replace any dependency X A in F with a dependency Y A, where Y proper-subset-of X ( Y subset-of X) and still have a set of dependencies that is equivalent to F
Contents 1 Introduction 2 Functional dependencies 3 Normalization 36
Normalization  Two new concepts:  A Prime attribute must be a member of some candidate key  A Nonprime attribute is not a prime attribute: it is not a member of any candidate key 38
Normalization  First normal form (1NF): there is only one value at the intersection of each row and column of a relation - no set valued attributes in 1NF Disallows composite attributes, multivalued attributes, and nested relations  To be part of the formal definition of a relation in the basic (flat) relational model 40
1NF 42
1NF EMP_PROJ (SSN, PNumber, Hours, EName, PName, PLocation) 1. SSN, PNumber Hours 2. SSN EName 3. PNumber PName, PLocation 44
Normalization  1NF and dependency problems  2NF – solves partial dependency  3NF – solves indirect dependency  BCNF – well-normalized relations 46
2NF 48
Normalization  1NF and dependency problems  2NF – solves partial dependency  3NF – solves indirect dependency  BCNF – well-normalized relations 50
Normalization  3NF solves indirect (transitive) dependencies problem in 1NF and 2NF  Method: identify all transitive dependencies and each transitive dependency will form a new relation, with non-prime attributes participating in the transitive dependency and the attribute which determines others as the attributes for the new relation 52
Exercise Consider the universal relation R = {A, B, C, D, E, F, G, H, I, J} and the set of functional dependencies: 1. A, B C 2. A D, E 3. B F 4. F G, H 5. D I, J What is the key for R? Decompose R into 2NF, then 3NF relations. 54
SUMMARY OF NORMAL FORMS based on Primary Keys 56
General Normal Form Definitions  A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on every key of R  A relation schema R is in third normal form (3NF) if whenever a FD X -> A holds in R, then either: (a) X is a superkey of R, or (b) A is a prime attribute of R 58
Normalization  A relation schema R is in Boyce-Codd Normal Form (BCNF) if whenever an FD X -> A holds in R, then X is a superkey of R 60
BCNF  TEACH (Student, Course, Instructor) 62
Nonadditive Join Decomposition into BCNF Schemas Algorithm 16.5. Relational Decomposition into BCNF with Nonadditive Join Property Input: A universal relation R and a set of functional dependencies F on the attributes of R. 1. Set D := {R} ; 2. While there is a relation schema Q in D that is not in BCNF do { choose a relation schema Q in D that is not in BCNF; find a functional dependency X→Y in Q that violates BCNF; replace Q in D by two relation schemas (Q – Y) and (X ∪ Y); } ; 64
Exercise  Consider the relation: BOOK (Book_Name, Author, Edition, Year)  Based on a common-sense understanding of the data, what are the possible candidate keys of this relation? 68
Key finding algorithms Extended part
Key-finding algorithm (1) By Elmasri and Navathe Input: A relation R and a set of functional dependencies F on the attributes of R. Output: a key K of R 1. Set K to contain all attributes in R 2. For each attribute A in K { compute (K – A)+ with respect to F; if (K – A)+ contains all attributes in R, then set K := K – {A} };
Key-finding algorithm (2)By Hossein Saiedian & Thomas Spencer Input: A relation R and a set of functional dependencies F on the attributes of R. Output: all candidate keys of R Let:  U contain all attributes of R  Ul contain attributes of R that occur only on the left-hand side of FDs in F  Ur contain attributes of R that occur only on the right-hand side of FDs in F  Ub contain attributes of R that occur on both sides of FDs in F Note:  Ul ∩ Ur = ф, Ul ∩ Ub = ф and Ur ∩ Ub = ф  Ul ∪ Ur ∪ Ub = U  For every attribute A ∈ U, if A ∈ Ul, then A must be part of every candidate key of R.  For every attribute A ∈ U, if A ∈ Ur, then A will not be part of any candidate key of R.
Key-finding algorithm (2)By Hossein Saiedian & Thomas Spencer  A simple categorization of attributes into the sets Ul, Ur and Ub allows to distinguish between those attributes that will participate in the candidate keys of a relational database schema and those that do not.  The algorithm (2) finds all candidate keys.
Exercise 2 Consider the universal relation R = {A, B, C, D, E, F} and the set of functional dependencies: 1) A, D B 2) A, B E 3) C D 4) B C 5) A, C F What is the key for R? Decompose R into 2NF, then 3NF relations.