Sorting on a Linear Array, Algorithm Assessment, Simulating Large Arrays

Tom Kelliher, CS 315

Jan. 27, 1999

Administrivia

Loop invariants, linear arrays, sorting on linear arrays.

Bit model, lower bounds.

A linear array:

Cell properties:

Communication properties:

Phase 1: sorting program.
1. Accept input from left neighbor.
2. Compare to stored value.
3. Output larger value to right neighbor.
4. Store smaller value.
Phase 2: output. Begin passing left when no more input is received from the left neighbor.

Example: sort 12, 19, 6, 15.

How many steps to sort? How many steps to output?

How can we measure a parallel algorithm? Apply following to sorting on the linear array.

Run time: T.
Number of processors: N.
Speedup: Let be the running time of the fastest sequential algorithm. Then:
What is linear speedup?
Say you have P processors. Speedup is at most P. Why? (Hint: a parallel algorithm that runs in T steps on P processors can be simulated on a sequential machine in how many steps?)
Work: W = TP.
Measures algorithm inefficiencies due to idle processors.
Efficiency of processor utilization:
Is it always possible to minimize T and E?
Example: consider two algorithms for solving a problem of size M:
1. M steps on an M processor machine.
2. steps on an processor machine.
If we have an processor machine and need to run the algorithm X times, which algorithm should we use? Assume the processor machine may be split in M M processor machines.
(Recall: .)

Let's say we want to solve a problem of size N with P processors.

We generally assume that P = N. Is this valid? --- More later.
Suppose, we have fine-grain processors in a linear array. Can we sort N numbers?
Fine-grain vs. coarse-grain processing.
Assuming appropriate granularity, here's how we simulate 12 processors on 3 processors:

What about simulating processors on processors, where ?
What slowdown do we encounter?
Is the scaled-down algorithm less efficient?
So, why do we assume, in general, that P = N?
Implications:
1. Given an algorithm efficient for large P, we can construct one efficient for small P.
2. What about the converse?

Thomas P. Kelliher
Wed Jan 27 09:39:51 EST 1999