**Tom Kelliher, CS23**

**May 8, 1996**

- Quiz.

The general idea of hashing:

- Data is categorized by a search key.
- Data is kept in a table (open or closed).
- Number of search keys
`>>`

table size. - Amount of active data
`<`

table size. - Hash function evenly distributes data over table.
- Insert, retrieve, delete require O(1) time.
- Ordered operations usually slow.

A hash function, h(), takes an arbitrary integer, j, and maps into an integer in a specified range, usually 0 to p-1 where p is prime.

h() should have these properties:

- Easy and fast to compute.
- Place data evenly throughout the table.

- How well does h() scatter random keys?
- How well does h() scatter non-random key?
- Perfect hash function --- possible if you know all keys in advance.
- Otherwise, we have collisions. How can we prove this?
- Collision resolution.

Sampler of h()'s
(* Always possible to convert a key to an integer/integers*):

- Selecting digits --- which digits?
- Folding --- add digits.
- Modulo Arithmetic.

- Hash table is an array of data holders.
- h() gives first possible position for key (no collision).
- Probe sequence used on collisions.
- Table element states:
- Never used.
- In use.
- Previously used (``plug'' holes in probe sequence left by deletes).

- Clustering.

Some probe sequences:

- Linear probing.
- Quadratic probing.
- Double hashing --- initial probe position, increment. Table size,
increment should be
*relatively prime*.

(Separate chaining.)

Hash table is array of pointers (array of linked lists.)

- Hash tables have excellent performance
*on average*. - Use a balanced search tree when performance must be guaranteed.
- Time wise, separate chaining superior to linear probing, quadratic probing, or double hashing.
- Pointer overhead involved with chaining.

Hashing randomizes the keys.

Poor performance for retrieving:

- keys in sorted order.
- min, max key.
- keys over a range.

Tue May 7 13:13:00 EDT 1996