Synchronization in Distributed Systems

Tom Kelliher, CS43

Apr. 9, 1996

Questions on the homework?

Clock Synchronization

Why bother?

The differences between non-distributed systems:

What is ``clock skew?''

Why not have a central time server?

Logical Clocks

Lamport.

Internally consistent set of clocks, no relationship to real time.

  1. Total order vs. partial order.
  2. Time doesn't matter --- order of events does.
  3. Non-interacting processes don't need to synch.

Happens-before ( -->)

Physical Clocks

Externally consistent clock.

Physical Clock Synchronization Algorithms

A centralized algorithm:

  1. Query time server.
  2. Set clock, adjusting for message delay.
How often to send queries?

A decentralized algorithm:

  1. Once an epoch, broadcast your current time
  2. Collect times from other broadcasts.
  3. Discard outliers, set time to average.
Suppose the network gets partitioned, etc.?

Clock Based Cache Consistency

Read, write caching in a distributed file system.

Leases.

Mutual Exclusion

Centralized Approach

Mutual exclusion coordinator process.

Network saturation, fault tolerance.

Number of messages required in entering critical region.

Distributed Approach

Fault tolerance.

Number of messages required in entering critical region.

Election Algorithms

Coordinator should be ``highest numbered'' process.

No knowledge of who's crashed, who's working.

Bully Algorithm

When a process notices a coordinator is down:

  1. Send election message to higher numbered processes.
  2. If no response, you're the coordinator.
  3. If a ``higher up'' responds, you're finished.

On receipt of an election message:

  1. If sender is lower numbered, send OK and start an election.

Whoever wins sends a coordinator message to all lower numbered processes.

Number of messages?

Ring Algorithm

Processes connected in a logical ring.

When a process notices a coordinator is down:

  1. Send election message an ``up'' list to next live process.
  2. Each process adds its number to list and passes list.
  3. Originator, upon receipt, converts list to coordinator message.

Number of messages?

How are processes added, removed from ring?

Atomic Transactions

Higher-level abstraction.

May consist of several events, either:

Can't allow for some occurring.

Examples:

Stable Storage

Mirrored disks.

Algorithm:

  1. Update master disk.
  2. Update slave disk.

Block mismatch after a crash?

Bad Block?

Fault tolerance?

Transaction Primitives

Properties:

Nested transactions --- May have to back out of child if parent aborts. How?

Implementation

Private Workspace

How are files handled?

Writeahead Log

Log event, before performing it.

Rollback for aborts.

Use log to recover from crashes.

Two-Phase Commit Protocol

Extremely useful for distributed transactions.

  1. Coordinator writes, sends prepare message.
  2. Subordinates write, send ready message.
  3. Coordinator collects replies, decides what to do.
  4. Coordinator writes, sends commit/abort message.
  5. Subordinates write, perform commit/abort, send finished message.

Two-Phase Locking

  1. Growing phase --- Acquire.
  2. Shrinking phase --- Release.

Strict two-phase locking: shrinking only occurs after commit/abort. Prevents cascaded transaction aborts.

Deadlocks

Not much work done here. Opportunities to become famous!!


Thomas P. Kelliher
Wed Apr 3 14:01:40 EST 1996
Tom Kelliher