Sorting on a Linear Array, Algorithm Assessment, Simulating Large
Tom Kelliher, CS 315
Jan. 27, 1999
Read 1.1.2, 1.1.3.
Loop invariants, linear arrays, sorting on linear arrays.
- Sorting on linear arrays, continued.
- Parallel algorithm assessment.
- Simulating large arrays on small arrays.
Bit model, lower bounds.
A linear array:
- Operation sequence:
- Receive inputs from neighbors.
- Read local storage.
- Perform computation indicated by local program.
- Generate output for neighbors.
- Update local storage.
- Data pulsing through network: systolic computation.
- Phase 1: sorting program.
- Accept input from left neighbor.
- Compare to stored value.
- Output larger value to right neighbor.
- Store smaller value.
- Phase 2: output. Begin passing left when no more input is received
from the left neighbor.
Example: sort 12, 19, 6, 15.
How many steps to sort? How many steps to output?
How can we measure a parallel algorithm? Apply following to sorting on the
- Run time: T.
- Number of processors: N.
- Speedup: Let be the running time of the fastest sequential
What is linear speedup?
Say you have P processors. Speedup is at most P. Why? (Hint: a
parallel algorithm that runs in T steps on P processors can be
simulated on a sequential machine in how many steps?)
- Work: W = TP.
Measures algorithm inefficiencies due to idle processors.
- Efficiency of processor utilization:
- Is it always possible to minimize T and E?
Example: consider two algorithms for solving a problem of size M:
If we have an processor machine and need to run the algorithm X
times, which algorithm should we use? Assume the processor machine
may be split in M M processor machines.
- M steps on an M processor machine.
- steps on an processor machine.
Let's say we want to solve a problem of size N with P processors.
- We generally assume that P = N. Is this valid? --- More later.
- Suppose, we have fine-grain processors in a
linear array. Can we sort N numbers?
Fine-grain vs. coarse-grain processing.
- Assuming appropriate granularity, here's how we simulate 12
processors on 3 processors:
What about simulating processors on processors, where ?
What slowdown do we encounter?
- Is the scaled-down algorithm less efficient?
- So, why do we assume, in general, that P = N?
- Given an algorithm efficient for large P, we can construct one
efficient for small P.
- What about the converse?
Thomas P. Kelliher
Wed Jan 27 09:39:51 EST 1999