Chapter 1. SQL

Table of Contents
1.1. The Relational Data Model
1.2. Relational Data Model Formalities
1.3. Operations in the Relational Data Model
1.3.1. Relational Algebra
1.3.2. Relational Calculus
1.3.3. Tuple Relational Calculus
1.3.4. Relational Algebra vs. Relational Calculus
1.4. The SQL Language
1.4.1. Select
1.4.2. Data Definition
1.4.3. Data Manipulation
1.4.4. System Catalogs
1.4.5. Embedded SQL

This chapter introduces the mathematical concepts behind relational databases. It is not required reading, so if you bog down or want to get straight to some simple examples feel free to jump ahead to the next chapter and come back when you have more time and patience. This stuff is supposed to be fun!

This material originally appeared as a part of Stefan Simkovics' Master's Thesis (Simkovics, 1998).

SQL has become the most popular relational query language. The name "SQL" is an abbreviation for Structured Query Language. In 1974 Donald Chamberlin and others defined the language SEQUEL (Structured English Query Language) at IBM Research. This language was first implemented in an IBM prototype called SEQUEL-XRM in 1974-75. In 1976-77 a revised version of SEQUEL called SEQUEL/2 was defined and the name was changed to SQL subsequently.

A new prototype called System R was developed by IBM in 1977. System R implemented a large subset of SEQUEL/2 (now SQL) and a number of changes were made to SQL during the project. System R was installed in a number of user sites, both internal IBM sites and also some selected customer sites. Thanks to the success and acceptance of System R at those user sites IBM started to develop commercial products that implemented the SQL language based on the System R technology.

Over the next years IBM and also a number of other vendors announced SQL products such as SQL/DS (IBM), DB2 (IBM), ORACLE (Oracle Corp.), DG/SQL (Data General Corp.), and SYBASE (Sybase Inc.).

SQL is also an official standard now. In 1982 the American National Standards Institute (ANSI) chartered its Database Committee X3H2 to develop a proposal for a standard relational language. This proposal was ratified in 1986 and consisted essentially of the IBM dialect of SQL. In 1987 this ANSI standard was also accepted as an international standard by the International Organization for Standardization (ISO). This original standard version of SQL is often referred to, informally, as "SQL/86". In 1989 the original standard was extended and this new standard is often, again informally, referred to as "SQL/89". Also in 1989, a related standard called Database Language Embedded SQL (ESQL) was developed.

The ISO and ANSI committees have been working for many years on the definition of a greatly expanded version of the original standard, referred to informally as SQL2 or SQL/92. This version became a ratified standard - "International Standard ISO/IEC 9075:1992, Database Language SQL" - in late 1992. SQL/92 is the version normally meant when people refer to "the SQL standard". A detailed description of SQL/92 is given in Date and Darwen, 1997. At the time of writing this document a new standard informally referred to as SQL3 is under development. It is planned to make SQL a Turing-complete language, i.e. all computable queries (e.g. recursive queries) will be possible. This is a very complex task and therefore the completion of the new standard can not be expected before 1999.

1.1. The Relational Data Model

As mentioned before, SQL is a relational language. That means it is based on the relational data model first published by E.F. Codd in 1970. We will give a formal description of the relational model later (in Relational Data Model Formalities) but first we want to have a look at it from a more intuitive point of view.

A relational database is a database that is perceived by its users as a collection of tables (and nothing else but tables). A table consists of rows and columns where each row represents a record and each column represents an attribute of the records contained in the table. The Suppliers and Parts Database shows an example of a database consisting of three tables:

Example 1-1. The Suppliers and Parts Database

SUPPLIER:                   SELLS:
 SNO |  SNAME  |  CITY       SNO | PNO
----+---------+--------     -----+-----
 1  |  Smith  | London        1  |  1
 2  |  Jones  | Paris         1  |  2
 3  |  Adams  | Vienna        2  |  4
 4  |  Blake  | Rome          3  |  1
                              3  |  3
                              4  |  2
PART:                         4  |  3
 PNO |  PNAME  |  PRICE       4  |  4
----+---------+---------
 1  |  Screw  |   10
 2  |  Nut    |    8
 3  |  Bolt   |   15
 4  |  Cam    |   25
     

The tables PART and SUPPLIER may be regarded as entities and SELLS may be regarded as a relationship between a particular part and a particular supplier.

As we will see later, SQL operates on tables like the ones just defined but before that we will study the theory of the relational model.