**Tom Kelliher, CS 318**

**Apr. 19, 2002**

Read 13.1--3.

PL/pgSQL lab.

- What is indexing?
- Terminology.
- Multilevel indexing.

Query processing.

Consider the Transcript schema: StudId, CrsCode, Semester, Grade. How should this be sorted to find:

- A particular student's transcript.
- A roster for a course.
- Number of students enrolled in a semester.

Questions:

- How do we decide how much performance is necessary?
- How do we achieve increased performance?
- What is the cost of such performance?

General idea:

- Table is sorted on one (or a set) of attributes.
- Indices sorted on other attributes may be maintained --- increasing
performance on queries referencing those attributes.

Consider Big State University's (BSU) Transcript table:

- 30,000 students, five courses per semester, 10 year's history
maintained.
- 10 records stored per disk block.
- Transcript sorted on StudId.
- How many page accesses needed to determine
- Number of courses student 123456789 has taken.
- Number of students enrolled in CS 436, spring 2002.
Repeat, assuming an index on (CrsCode, Semester). Costs of this index?

- Number of courses student 123456789 has taken.

- Search key.
- Index integrated with data; separate index file (key values and
rids).
- Clustered vs. unclustered indices.
Is an integrated index clustered on unclustered? What about a separate index? How many clustered indices may be maintained on a table?

- Inverted, fully inverted tables.
- Sparse, dense indices.
Sparse indices must be clustered.

Illustration of a sparse index:

Advantages? Disadvantages? Anomalies?

- Consider the following index on Transcript: (Semester, CrsCode).
- How will it help with these queries:
- Ids of all students who took CS 318 S2000.
- Ids of all students who took CS 318 any semester.
- Courses taken in F2001.
How will it

*really*help with this query?

- Ids of all students who took CS 318 S2000.

- What do we do if the index file is large?
- If indexing is good, why not index the index?
- Two level indexing: Use a sparse index on the index.
- Extend to
**n**levels. - Terminology:
- Leaf level: If integrated, contains row data. Otherwise,
contains key values, rids.
- Separator levels: contain pointers to next level index entries.
- Index level: separator levels
*or*leaf level. - Fan-out: number of separator entries in a page.

- Leaf level: If integrated, contains row data. Otherwise,
contains key values, rids.

- Multilevel indexing schemes.
- ISAM: Static separator structure, use of overflow buckets. Index may
become unbalanced.
- trees: Dynamic separator structure. Always balanced.
How much work is involved in maintaining balance?

Tue Apr 16 17:14:40 EDT 2002