4541.564; Spring 2008
P f S L
ADVANCED DATABASES
Prof. Sang-goo Lee
(14:30pm: Mon & Wed: Room 302-208)
ADVANCED DATABASES
Syllabus
Text Books Exams (tentative dates)
y
Database System Concepts, 5th Edition, A.
Silberschatz, H. F. Korth, and S. Sudarshan, McGraw Hill 2006
( )
Exam 1: 4/7 (Mon)
Exam 2: 5/19 (Mon)
Exam 3: 6/14 (Sat 14:30) McGraw Hill, 2006.
Relational Database Theory, Atzeni & De Antonellis, Benjamin/Cummings, 1993.
Exam 3: 6/14 (Sat, 14:30)
Term Project/Paper
2~3 programming assignments (Chap 2)
Database Systems, Atzeni, et al, McGraw Hill, 2000. (Chap 6~7)
p g g g
To be announced later
Grades
2000. (Chap 6 7)
Lecture Notes
à will be posted before class at
Exams 1, 2, & 3: 20% each
Term Project/Paper: 30% total
Quizzes, Assignments, http://europa.snu.ac.kr
à username & password required
à Please use only for personal use
Q , g ,
and others: 10%
** A score of 0 in f th
y p
• any one of the exams,
• any one of your term project, term paper, or
• more than 50% of your
Advanced DB (2008-1)
y
assignments/quizzes
will result in F.
REVIEWS
REVIEWS
Data, Database ,
Data
à A formal description of
an entity, event, phenomena, or idea
that is worth recording
Database
à An integrated collection of
à persistent data
à representing the information of interest
à representing the information of interest
à for various programs that compose the computerized information system of an organization.
à Data are separated from the programs that use them
Advanced DB (2008-1)
DBMS
Database Management System g y
à Collection of interrelated data and
à a set of programs to access those data
Information System
à DB + DBMS + Application programs + utilities pp p g
File System
à Part of OS Part of OS
à Stores programs, data, documents, or anything
à (in disk) (in disk)
Data Independence p
ability to modify a schema in one level without affecting a y y g schema definition in the next higher level
physical data independence: physical data independence:
à physical level - conceptual level
logical data independence: logical data independence:
à conceptual level - view level
View 1 ‥‥ View n
Customer Account
Conceptual Level
Customer Account
i t t bl ?
Physical Level
pointer or table?
Advanced DB (2008-1)
Instances and Schemas
Similar to types and variables in programming languages yp p g g g g
Schema – the logical structure of the database
à e.g., the database consists of information about a set of customers and accounts g , and the relationship between them)
à Analogous to type information of a variable in a program
à Physical schema: database design at the physical level
à Logical schema: database design at the logical level
I h l f h d b i l
Instance – the actual content of the database at a particular point in time
A l h l f i bl
à Analogous to the value of a variable
Data Models
The underlying structure of a database y g
Collection of conceptual tools for describing
à data data
à data relationships
à data semantics
à consistency constraints
Entity Relationship Model Entity Relationship Model
Relational Model
Obj O i d M d l
Object Oriented Model
Advanced DB (2008-1)
Database Languages
Data Definition Language (DDL)
g g
g g ( )
à Specifies the DB Schema
create table
drop column
Data Manipulation Language (DML)
à Operate on the contents of the DB
retrieve, insert, delete, change, etc.
Query
à a statement requesting the retrieval of information
à query language: part of DML
à data model dependent
Storage Management g g
DBMS must effectively and efficiently manage storage (disk) space
Storage manager Storage manager
à a program module that
à provides the interface between
à the low-level data stored in the database and
à th ppli ti n pr r m nd q ri bmitt d t th t m
à the application programs and queries submitted to the system
Physical Storage
à Physical storage media hierarchy
à Physical storage media hierarchy
à RAID
à Storage Access g
à File Organization
à Storage Structures for Object-Oriented Databases
Advanced DB (2008-1)
DB users
DBA (DB Administrator)
à schema definition
à storage structure, access method definition h & h i l i i
à schema & physical organization
à security & authorization
à backup and recovery backup and recovery
Application programmers
Sophisticated users Sophisticated users
à use DML
Naïve users Naïve users
à Use interfaces provided by application programs
Overall Application interface
Application programs
query Database
scheme
Overall
System DML
il
DDL i t t
Structure
Application programs
Embedded DML precompiler
compiler interpreter
Query object code
Query evaluation
engine
processor
Transaction Buffer
Transaction manager
Buffer manager
Storage File
manager
manager
Indices Statistical data
Advanced DB (2008-1)
Data files
Data dictionary
Entity Relationship Model
entity
y p
y
à entity set vs. entity (instance)
à weak entity sets
relationship
à relationship cardinality p y
à binary, ternary, n-ary
attribute attribute
à multivalued attributes, derived attributes
generalization/specialization
generalization/specialization
à total-partial, exclusive-overlap
i
aggregation
ER Diagram g
Advanced DB (2008-1)
Relational Model
E. F. Codd, "a Relational Model of Data for Large Shared Data g Banks," Communications of the ACM, June 1970, pp.377-387.
Uses a single structure called relation Uses a single structure called relation
Set (& math) oriented model
Ph i l d i d d
Physical data independence
Definition of a relation R:
Let D 1 , ..., D n be domains, then R ⊂ D 1 , X ….. X D n
R = { <d 1 , ..., d n > | d 1 ∈ D 1 , ..., d n ∈ D n }
Relations and Tables
a tuple in a relation represents relationship among set of values p p p g
Implemented as tables
Relation R = { <a 1 , b 1 , c 1 > <a 2 , b 2 , c 2 > <a , b , c > } Relation R { <a 1 , b 1 , c 1 >, <a 2 , b 2 , c 2 >, , <a n , b n , c n > }
=> Table R
Name Address Telephone
column (field, attribute)
row (record, tuple)
HS Kim Suwon 323-3232
KS Lee Busan 323-5454
Name Address Telephone
ow ( eco d, p e)
KS Lee Busan 323 5454
MH Choi Seoul 553-3235
KH Na Yongin 545-5488
…
Advanced DB (2008-1)
Relational Database
A relational database
à a set of relations
à a collection of tables
Keys y
à superkey, candidate key, and primary key
à keys are constraints on allowable relation instances for a given schema y g
Relational Algebra
Query language
g
Q y g g
à Network, Hierarchical : navigational language
à Relational
relational algebra
relational calculus
SQL
QUEL
à Algebra : operators and operands
à Relational algebra
operands : relations
operators : fundamental operators + additional operators
Advanced DB (2008-1)
Formal Definition
A basic expression in the relational algebra consists of either one of the following:
à A relation in the database
A l
à A constant relation
Let E 1 and E 2 be relational-algebra expressions; the following are ll l ti l l b p i
all relational-algebra expressions:
à E 1 ∪ E 2
à E 1 – E 2
à E 1 x E 2
à σ p (E 1 ), P is a predicate on attributes in E 1
à ∏ s (E 1 ), S is a list consisting of some of the attributes in E 1
ρ (E ), N is the new name for the result of E
Additional Operations p
We define additional operations that do not add any power to the p y p relational algebra, but that simplify common queries.
Set intersection
Natural join Natural join
Division A i
Assignment
Advanced DB (2008-1)
SQL Q
Structured Query Language Structured Query Language
IBM's System R project : Sequel
RC-based: SQL is declarative
D DD
DML & DDL
à SELECT / UNION / INTERSECT / EXCEPT
à INSERT / DELETE / UPDATE
à CREATE / DROP / ADD
à CREATE / DROP / ADD
Integrity Constraints g y
Integrity Constraints (IC) are rules that the data in the DB must abide by
IC defines the semantics of the DB
Domain Constraints
à restricts the values of a column
Referential Integrity
à Foreign Key Constraint
à Let r1(R1), r2(R2) be relations with primary keys k1 & k2, respectively. α⊆R2 is a foreign key referencing k1 if it is required that for every t2∈r2 there must be a foreign key referencing k1 if it is required that for every t2∈r2, there must be a tuple t1∈r1 such that t1[k1] = t2[α]
à Π α (r2) ⊆ Π k1 (r2)
Advanced DB (2008-1)
Integrity Constraints g y
Assertion
à a general constraint expressed as ∀x P(x)
but in SQL as ∃x (¬P(x)) Q ( ( ))
CREATE ASSERTION sum-constraint CHECK (NOT EXISTS (SELECT * FROM branch
WHERE (SELECT SUM( t) FROM l WHERE (SELECT SUM(amount) FROM loan
WHERE loan.b_name=branch.b_name)
>= (SELECT SUM(amount) FROM account WHERE account.b_name=branch.b_name)))
Trigger
à Action tied to a DB event (insert/delete/update)
DEFINE TRIGGER overdraft ON UPDATE OF account T b
(IF NEW T.balance < 0
Functional Dependencies p
Basic Concept
R: relation scheme. Let α⊆R, β⊆ R Functional dependency α→β holds on R,
if i l l l i (R) f ll i f l 1 d 2 i
if in any legal relation r(R), for all pairs of tuples t1 and t2 in r, if t1[α]=t2[α] then t1[β]=t2[β]
Keys
Keys
Trivial FD
I f R l
Inference Rules
à Reflexivity, Transitivity, Augmentation
Closure & Cover
Closure & Cover
Advanced DB (2008-1)
Enforcing IC g
DB states
à DB state = DB instance
à DB is in a valid state if it satisfies all Integrity constraints
à All assertions are tested when created
à Modification to db is only allowed if it does not cause violation
DB Update DB
state 1 state 2
t1
Object-Oriented Data Model j
Object Structure Object Structure
à an object corresponds to an entity in ER model
à an object encapsulates data and behavior j p
à all interactions to an object are made via messages.
An object has
à variables : contain data for object (attribute in ER model)
à messages : interaction with rest of the world
à methods : procedures that implement the messages
Advanced DB (2008-1)
Classes & Inheritance
Object Class
à a group of similar objects
à same messages, methods, variables
⇒ l bj (i ) i i (i )
⇒ class : object (instance) = entity set : entity (instance)
Inheritance
Person
Employee Customer
Officer Teller Secretary
Complex (Composite) Objects
à Objects that contain other objects
à Aggregation (containment) hierarch
à Aggregation (containment) hierarchy
Object Identity & Persistence
Object Identity
j y
j y
à identity in relational model - key (by value)
à identity in file system - file name (by name)
à identity in OO system - file generated id (built-in)
An object retains its identity even if some or all variables or definitions of
h d h i
methods change over time
Storage and access of persistent objects
à How do you store methods?
à How do you find objects in a database?
H d i bj ?
à How do you store a composite object?
Advanced DB (2008-1)
Object-Relational Database j
Extended relational model to support pp
à Nested relations
à Complex types
à Object orientation
Most commercial DBMS claim to be OR
à Oracle, DB2, Informix, …
Nested Relation
Relational model
à First Normal Form: all attributes have atomic domains
Nested relational model Nested relational model
à Domains may be atomic or relation valued
tuple (complex structure) p ( p )
set (multiset)
list (special set)
Advanced DB (2008-1)
Querying with Complex Types
Q y g p yp
Relation valued attributes
à An expression evaluating to a relation
à can appear anywhere that a relation name my appear
Select B.name, Y.name
from docs as B B author list as Y from docs as B, B.author_list as Y
Path expression (dot notation) is used
à composite attributes / references
create table phd_students
( d i f( l )) d l
(advisor ref(people)) under people
select phd_students.advisor.name
from phd_students
OO vs OR
OR
à declarative and limited power of (extended) SQL (compared to PL)
à data protection and good optimization
d h R l d l k d li d i i
à extends the Rel. model to make modeling and querying easier
OO
à ffi i nt in mpl m in m m p r ti n f p r i t nt d t
à efficient in complex main mem. operations of persistant data
à susceptible to data corruption
Relational Relational
à simple data types, good query language, high protection
Advanced DB (2008-1)
Indexingg
Ordered Indices
B+-Tree Index Files
B Tree Index Files
B-Tree Index Files
Static Hashing
Dynamic Hashing
Comparison of Ordered Indexing and Hashing p g g
Index Definition in SQL
Multiple Key Access
Multiple-Key Access
Query Processing & Optimization
Q y g p
Measures of Query Cost Q y
Evaluation of individual operations
à Select sort join Select, sort, join
Evaluation of Expressions
E i l f i
Equivalence of expressions
Cost based optimization
Advanced DB (2008-1)
Transactions
Transaction
à a collection of operations that performs a single logical function in a database application
à programmer is responsible for writing “correct” transactions
DBMS must ensure the atomicity and durability of each transaction (at least)
à Atomicity: all-or-nothing
à Consistency: should not introduce inconsistencies
à Isolation: should not interfere with each other
à Durability: effect should be persistent
Concurrency Control & Recovery y y
Serializabilityy
Lock-based protocols
Time stamp based protocols
Time-stamp based protocols
Multiple granularity
Log-Based Recovery
Shadow Paging g g
Advanced DB (2008-1)