| Introduction
Databases
and database technology are having a major impact on the growing use of
computers.
A database
system is basically just a computerized record-keeping system.
The database itself can be regarded as a kind of electronic filing cabinet;
i.e., it is a repository or container for a collection of computerized
data files.
Users of
the system can perform a variety of operations on such files—for example:
• Adding
new, empty files to the database;
• Inserting
data into existing files;
• Retrieving
data from existing files;
• Changing
data in existing files;
• Deleting
data from existing files;
• Removing
existing files from the database.
It is fair
to say that databases play a critical role in almost all areas where computers
are used, including business, engineering, medicine, law, education, and
library science, to name a few.
Figure 1: Example of Database
Figure 2: Retrieval example
Figure 3: Insert/ Change/ Delete examples
A database
has the following implicit properties:
• A database
is a logically coherent collection of data with some inherent meaning.
A random assortment of data cannot correctly be referred to as a database.
• A database
has some source from which data are derived, some degree of interaction
with events in the real world, and an audience that is actively interested
in the contents of the database.
• A database
can be of any size and of varying complexity. For example, the list of
names and addresses referred to earlier may consist of only a few hundred
records, each with a simple structure.
• On the
other hand, the card catalog of a large library may contain half a million
cards stored under different categories—by primary author’s last name,
by subject, by book title—with each category organized in alphabetic order.
• A database
may be generated and maintained manually or it may be computerized.
The library card catalog is an example of a database that may be created
and maintained manually.
• A computerized
database may be created and maintained either by a group of application
programs written specifically for that task or by a database management
system.
• A database
management system (DBMS) is a collection of programs that enables users
to create and maintain a database.
• The DBMS
is hence a general-purpose software system that facilitates the processes
of defining, constructing, and manipulating databases for various applications.
• Defining
a database involves specifying the data types, structures, and constraints
for the data to be stored in the database.
• Constructing
the database is the process of storing the data itself on some storage
medium that is controlled by the DBMS.
• Manipulating
a database includes such functions as querying the database to retrieve
specific data, updating the database to reflect changes in the miniworld,
and generating reports from the data.
• It is not
necessary to use general-purpose DBMS software to implement a computerized
database.
• We could
write our own set of programs to create and maintain the database, in
effect creating our own special-purpose DBMS software.
• In either
case—whether we use a general-purpose DBMS or not—we usually have to employ
a considerable amount of software to manipulate the database.
• We will
call the database and DBMS software together a database system.
Figure 4: A simplified database system environment.
Example
of database:
• A UNIVERSITY
database for maintaining information concerning students, courses, and
grades in a university environment.
• Figure
5 shows the database structure and a few sample data for such a database.
Figure 5: An example of the students database
• The database
is organized as five files, each of which stores data records of the same
type.
• To define
this database, we must specify the structure of the records of each file
by specifying the different types of data elements to be stored in each
record.
• We must
also specify a data type for each data element within a record.
• For example,
we can specify that Name of STUDENT is a string of alphabetic characters,
StudentNumber of STUDENT is an integer, and Grade of GRADE_REPORT is a
single character from the set {A, B, C, D, F, I}.
• We may
also use a coding scheme to represent a data item. For example, we represent
the Class of a STUDENT as 1 for freshman, 2 for sophomore, 3 for junior,
4 for senior, and 5 for graduate student.
• To construct
the UNIVERSITY database, we store data to represent each student, course,
section, grade report, and prerequisite as a record in the appropriate
file.
• Notice
that records in the various files may be related. For example, the record
for "Smith" in the STUDENT file is related to two records in the GRADE_REPORT
file that specify Smith’s grades in two sections.
• Similarly,
each record in the PREREQUISITE file relates two course records: one representing
the course and the other representing the prerequisite.
• Most medium-size
and large databases include many types of records and have many relationships
among the records.
• Database
manipulation involves querying and updating. Examples of queries are "retrieve
the transcript—a list of all courses and grades—of Smith"; "list the names
of students who took the section of the Database course offered in fall
1999 and their grades in that section"; and "what are the prerequisites
of the Database course?"
• Examples
of updates are "change the class of Smith to Sophomore"; "create a new
section for the Database course for this semester"; and "enter a grade
of A for Smith in the Database section of last semester."
Note: These
informal queries and updates must be specified precisely in the database
system language before they can be processed.
Characteristics
of the Database Approach
• A number
of characteristics distinguish the database approach from the traditional
approach of programming with files.
• In traditional
file processing, each user defines and implements the files needed for
a specific application as part of programming the application.
• The redundancy
in defining and storing data results in wasted storage space and in redundant
efforts to maintain common data up-to-date.
• In the
database approach, a single repository of data is maintained that is defined
once and then is accessed by various users.
• The main
characteristics of the database approach versus the file-processing approach
are the following.
Self-Describing
Nature of a Database System:
• A fundamental
characteristic of the database approach is that the database system contains
not only the database itself but also a complete definition or description
of the database structure and constraints.
• This definition
is stored in the system catalog, which contains information such as the
structure of each file, the type and storage format of each data item,
and various constraints on the data.
• The information
stored in the catalog is called meta-data, and it describes the structure
of the primary database (Figure 1).
• The catalog
is used by the DBMS software and also by database users who need information
about the database structure.
• A general
purpose DBMS software package is not written for a specific database application,
and hence it must refer to the catalog to know the structure of the files
in a specific database, such as the type and format of data it will access.
• The DBMS
software must work equally well with any number of database applications—for
example, a university database, a banking database, or a company database—as
long as the database definition is stored in the catalog.
• In traditional
file processing, data definition is typically part of the application
programs themselves. Hence, these programs are constrained to work with
only one specific database, whose structure is declared in the application
programs.
• For example,
a PASCAL program may have record structures declared in it; a C++ program
may have "struct" or "class" declarations; and a COBOL program has Data
Division statements to define its files.
• Whereas
file-processing software can access only specific databases, DBMS software
can access diverse databases by extracting the database definitions from
the catalog and then using these definitions.
• The DBMS
stores in the catalog the definitions of all the files shown. Whenever
a request is made to access, say, the Name of a STUDENT record, the DBMS
software refers to the catalog to determine the structure of the STUDENT
file and the position and size of the Name data item within a STUDENT
record.
• By contrast,
in a typical file-processing application, the file structure and, in the
extreme case, the exact location of Name within a STUDENT record are already
coded within each program that accesses this data item.
Benefits
of the Database Approach
• The data
can be shared
• Redundancy
can be reduced
• Inconsistency
can be avoided (to some extent)
• Transaction
support can be provided
• Integrity
can be maintained
• Security
can be enforced
• Conflicting
requirements can be balanced
• Standards
can be enforced
Support
of Multiple Views of the Data:
• A database
typically has many users, each of whom may require a different perspective
or view of the database.
• A view
may be a subset of the database or it may contain virtual data that is
derived from the database files but is not explicitly stored.
• Some users
may not need to be aware of whether the data they refer to is stored or
derived.
• A multiuser
DBMS whose users have a variety of applications must provide facilities
for defining multiple views.
• For example,
one user of the database of Figure 2 may be interested only in the transcript
of each student; the view for this user is shown in Figure 4(a).
• A second
user, who is interested only in checking that students have taken all
the prerequisites of each course they register for, may require the view
shown in Figure 4(b).
Figure 6
Sharing
of Data and Multiuser Transaction Processing:
• A multiuser
DBMS, as its name implies, must allow multiple users to access the database
at the same time.
• This is
essential if data for multiple applications is to be integrated and maintained
in a single database.
• The DBMS
must include concurrency control software to ensure that several users
trying to update the same data do so in a controlled manner so that the
result of the updates is correct.
• For example,
when several reservation clerks try to assign a seat on an airline flight,
the DBMS should ensure that each seat can be accessed by only one clerk
at a time for assignment to a passenger.
• These types
of applications are generally called on-line transaction processing (OLTP)
applications. A fundamental role of multiuser DBMS software is to ensure
that concurrent transactions operate correctly.
|