Physical Data Organization

Tom Kelliher, CS 318

Apr. 12, 2002

Administrivia

Announcements

Assignment

Read 11.4--5.

From Last Time

PL/pgSQL lab for some, an embarrassment for others.

Outline

  1. Disk Organization.

  2. Heap file organization.

  3. Sorted file organization.

Coming Up

Indexed organizations.

Disk Organization

  1. The organization of tables in memory won't affect results, but will greatly affect performance.

  2. Definitions:
    1. Storage structure: A particular organization of the rows of a table in a file.

    2. Index: An auxiliary data structure, possible stored in another file, permitting fast access to the rows of a table.

    3. Access Path: The particular technique used for accessing rows. The DBMS optimizer chooses the access path.

      Example: Consider a table indexed on its key. The access path for a SELECT on the key would utilize the index. The access path for an aggregate computation on the entire table might just access the pages of the table sequentially, eliminating the index overhead.

  3. Ideally, the entire database would be in main memory. Economic realities and the ACID properties dictate otherwise.

  4. Disk structure:
    1. Platter, Read/write heads. Data stored in concentric circles as magnetic flux changes. CRC codes.

    2. Sectors, blocks, pages.

      Contiguous storage. Fragmentation.

    3. Cylinder/Head/Sector (CHS) addressing. Logical Block Address (LBA) addressing.

    4. Access delay components: seek delay, rotational delay, transfer delay.

  5. Assumptions:
    1. All records of same length. Not really practical --- consider a VARCHAR field.

    2. Table occupies a file containing F pages, R records per page.

    3. Rows addressed logically by row Id (rid): page number, slot number.

Heap File Organization

  1. Features:
    1. Rows stored one after another, in no particular order.

    2. Inserts occur at the end of the file.

    3. Delete by marking slot free.

    4. Update in place. Common to all storage structures?

    5. ``Holes'' develop with deletes --- expensive compaction becomes necessary over time. Why?

  2. Operation analysis:
    1. Insert: F + 1 page transfers.

      F to perform duplicate checking, 1 to write new record.

    2. Delete: , on average if a key is involved.

      Otherwise, 2F. Why not F as in the book?

    3. Selects:
      1. Entire table, simple:
        SELECT *
        FROM TRANSCRIPT T;
        
        F transfers.

      2. Entire table, not so simple:
        SELECT *
        FROM TRANSCRIPT T
        ORDER BY T.StudId;
        
        F + N transfers.

      3. Entire table, simple:
        SELECT AVG(T.Grade)
        FROM TRANSCRIPT T;
        
        F accesses.

      4. Equality search for one record:
        SELECT T.Grade
        FROM TRANSCRIPT T
        WHERE T.StudId = '123456' AND T.CrsCode = 'CS318'
        AND T.Semester = 'S2002';
        
        transfers on average.

      5. Range search for several records:
        SELECT T.Grade
        FROM TRANSCRIPT T
        WHERE T.StudId > '100000' AND T.StudId < '200000';
        
        F transfers.

Sorted File Organization

  1. Features:
    1. Rows stored one after another, sorted on a key field or fields.

    2. Inserts must maintain sorted order.
      1. Expensive --- on average, F page transfers. Why F?

      2. Optimizations:
        1. Fillfactor --- leave some empty slots in each page.

        2. Utilize a pointer in each page to a chain of overflow pages.

          Disadvantage: pages of file are no longer contiguous.

    3. Delete by marking slot free.

  2. Operation analysis:
    1. Insert: Assuming contiguous allocation and fillfactor, page transfers.

      Derivation?

    2. Delete: Assuming contiguous allocation and the file is sorted on the relevant key, .

      Derivation?

      If it's not sorted on the relevant key?

    3. Selects:
      1. Entire table, simple:
        SELECT *
        FROM TRANSCRIPT T;
        
        Transfers?

      2. Entire table, not so simple:
        SELECT *
        FROM TRANSCRIPT T
        ORDER BY T.StudId;
        
        Transfers?

      3. Entire table, simple:
        SELECT AVG(T.Grade)
        FROM TRANSCRIPT T;
        
        Transfers?

      4. Equality seach for one record:
        SELECT T.Grade
        FROM TRANSCRIPT T
        WHERE T.StudId = '123456' AND T.CrsCode = 'CS318'
        AND T.Semester = 'S2002';
        
        Transfers?

      5. Range search for several records:
        SELECT T.Grade
        FROM TRANSCRIPT T
        WHERE T.StudId > '100000' AND T.StudId < '200000';
        
        Transfers?



Thomas P. Kelliher
Thu Apr 11 12:02:51 EDT 2002
Tom Kelliher