CS26
Part a. A non-overlapped execution needs 19 time units. For a long sequence of programs, each overlapped execution needs 16 time units (it gets three units for ``free'' because of the overlap). The ratio is 16/19.
Part b. Let the non-overlapped time for a single execution be t. For a long sequence of programs, each overlapped execution can overlap it input with the previous program's computing and can overlap its computing with the previous program's output. So, it gets 2 t/3 for ``free.'' The ratio is 1/3.
Part a. Let the cache access time be t. The execution time with a cache is proportional to
The first time is the contribution from cache hits, while the second is the contribution from cache misses. The time is 9 t because the memory access loads the cache and then the cache is used to load the instruction register.
The execution time without a cache is proportional to
The ratio is 4.44.
Part b. Let the cache access time be t. The execution time with a cache is proportional to
The execution time without a cache is still proportional to
The ratio is 5.71. What's important to note here is that doubling the size of the cache didn't halve the execution time.
Following the text's example, which implies that the accumulator is the only register:
load A mult B store TEMP load C mult D add TEMP store RESULT
2.7.
load N,r0 // 2 load #A,r1 // 2 pointer into A load #b,r2 // 2 pointer into B load #0,r3 // 1 accumulated sum of products loop: branch r0==0,eloop // 1 load (r1)+,r4 // 2 get A[i] load (r2)+,r5 // 2 get B[i] mult r4,r5 // 1 add r5,r3 // 1 add #-1,r0 // 1 branch loop // 1 eloop: store r3,C // 2The number in each comment is the number of memory accesses for the instruction. Note that the branch at the top of the loop is executed one extra time. The equation is 10 + 9n.
2.8
load N,r0 // 2 load #A,r1 // 2 load #B,r2 // 2 load #0,r3 // 1 loop: branch r0==0,eloop // 1 load (r1)+,r4 // 2 mult (r2)+,r4 // 2 add r4,r3 // 1 add #-1,r0 // 1 branch loop // 1 eloop: store r3,C // 2The equation is 10 + 8n. We save one memory access in the loop by being able to combine a load and multiply. I continue to use several registers because of the speed disparity between a CPU and memory.
1220, 3000, 5830, 4599 (pre-decrement), 1200 (post-increment).
b and c support subroutine nesting. Only c supports recursion. Because the calls are all from the same place, b would scribble over the previous return address with the next return address.
Store the link register's value on a stack at the start of a subroutine. This would permit recursion.