Hashing

Tom Kelliher, CS23

May 8, 1996

The general idea of hashing:

Hash Functions

A hash function, h(), takes an arbitrary integer, j, and maps into an integer in a specified range, usually 0 to p-1 where p is prime.

h() should have these properties:

Sampler of h()'s ( Always possible to convert a key to an integer/integers):

Hash table is an array of data holders.
h() gives first possible position for key (no collision).
Probe sequence used on collisions.
Table element states:
1. Never used.
2. In use.
3. Previously used (``plug'' holes in probe sequence left by deletes).
Clustering.

Some probe sequences:

Linear probing.
Quadratic probing.
Double hashing --- initial probe position, increment. Table size, increment should be relatively prime.

(Separate chaining.)

Hash table is array of pointers (array of linked lists.)

Hash tables have excellent performance on average.
Use a balanced search tree when performance must be guaranteed.
Time wise, separate chaining superior to linear probing, quadratic probing, or double hashing.
Pointer overhead involved with chaining.

Hashing randomizes the keys.

Poor performance for retrieving:

Thomas P. Kelliher
Tue May 7 13:13:00 EDT 1996