Indexing Schemes
Tom Kelliher, CS 318
Apr. 19, 2002
Read 13.1--3.
PL/pgSQL lab.
- What is indexing?
- Terminology.
- Multilevel indexing.
Query processing.
Consider the Transcript schema: StudId, CrsCode, Semester, Grade. How
should this be sorted to find:
- A particular student's transcript.
- A roster for a course.
- Number of students enrolled in a semester.
Individually, collectively?
Questions:
- How do we decide how much performance is necessary?
- How do we achieve increased performance?
- What is the cost of such performance?
General idea:
- Table is sorted on one (or a set) of attributes.
- Indices sorted on other attributes may be maintained --- increasing
performance on queries referencing those attributes.
Consider Big State University's (BSU) Transcript table:
- 30,000 students, five courses per semester, 10 year's history
maintained.
- 10 records stored per disk block.
- Transcript sorted on StudId.
- How many page accesses needed to determine
- Number of courses student 123456789 has taken.
- Number of students enrolled in CS 436, spring 2002.
Repeat, assuming an index on (CrsCode, Semester). Costs of this index?
- Search key.
- Index integrated with data; separate index file (key values and
rids).
- Clustered vs. unclustered indices.
Is an integrated index clustered on unclustered? What about a separate
index? How many clustered indices may be maintained on a table?
- Inverted, fully inverted tables.
- Sparse, dense indices.
Sparse indices must be clustered.
Illustration of a sparse index:
Advantages? Disadvantages? Anomalies?
- Consider the following index on Transcript: (Semester, CrsCode).
- How will it help with these queries:
- Ids of all students who took CS 318 S2000.
- Ids of all students who took CS 318 any semester.
- Courses taken in F2001.
How will it really help with this query?
- What do we do if the index file is large?
- If indexing is good, why not index the index?
- Two level indexing: Use a sparse index on the index.
- Extend to n levels.
- Terminology:
- Leaf level: If integrated, contains row data. Otherwise,
contains key values, rids.
- Separator levels: contain pointers to next level index entries.
- Index level: separator levels or leaf level.
- Fan-out: number of separator entries in a page.
- Multilevel indexing schemes.
- ISAM: Static separator structure, use of overflow buckets. Index may
become unbalanced.
- trees: Dynamic separator structure. Always balanced.
How much work is involved in maintaining balance?
Thomas P. Kelliher
Tue Apr 16 17:14:40 EDT 2002
Tom Kelliher