COMP SCI 3305 - Parallel and Distributed Computing

PDC

Week 1

1.1 Why we need ever-increasing performance

  • Climate modeling

    • Need better computer models to better understand climate change
    • Models that could simulate interaction between atmosphere, ocean, solid land, ice cap.
    • How various intervention might affect global climate
  • Protein Folding

    • Misfolded proteins involved in diseases
    • Current computational power can’t study complex molecules such as proteins
  • Drug Discovery

    • Research into new medical treatments
  • Energy Research

    • Program more detailed wind turbines, solar cells
  • Data Analysis

    • Analyze a tremendous amount of data

1.2 Why we’re building parallel systems

Increased single processor performance by increasing density of transistors

Problem

  • Smaller transistor = Faster processor
  • Faster processor = Increased power consumption
  • Increased power consumption = Increased heat
  • Increased heat = Unreliable processor

Can’t increase the speed of processor but still can increase density of transistors. To exploit this, use parallelism

parallelism: put multiple, relatively simple processors on a single chip

Such circuits are called multicore processors**.**

core: synonymous with CPU

single-core system: processor with one CPU


1.3 Why we need parallel programs

  • Running multiple instances of a serial program isn’t useful (Instead of running multiple instances of a game, we want it to run faster)

    serial program: program written to run on a single processor, performance of such program on system with multiple processors = performance on a single processor of the multiprocessor system

Approaches to the serial problem

  • Rewrite serial programs so that they’re parallel

    parallel: can make use of multiple cores

  • Use translation programs

    translation program: program that will automatically convert serial programs into parallel programs

    • Very difficult, limited success, requires complex program analysis
    • Output likely inefficient
    • Sometimes best to devise a new algorithm

Example (devising new algorithm)

Computer n values and add them together

  • Serial solution

Untitled

  • Parallel solution
    • Suppose we have p cores, p much smaller than n.
    • Each core form partial sum of n/p values

Untitled

Untitled

Reduction Pattern - Better parallel algorithm to add up the partial sums

  • Don’t let master core do all the work, share among other cores
    1. Odd and even pairs (add 1 to 0, 3 to 2, …)
    2. Cores divisible by 2 (add 2 to 0, 6 to 4, …)
    3. Cores divisible by 4 (add 4 to 0, 12 to 8, …)
    4. So on and so forth

Untitled

With 8 cores (master core)

  • First method: 7 receives and adds
  • Second method: 3 receives and adds

With 1000 cores (master core)

  • First method: 999 receives and adds
  • Second method: 10 receives and adds

Sending mechanism

  • How to send a number from one core to another
  • Calculate distance between cores

Takeaway

  • Highly unlikely that a translation program would discover the second method

  • Cannot simply write large serial constructs and parallelize them

    parallelize: Modifying a program so that it can use multiple cores

  • Must write parallel programs

    parallel program: program that exploits the power of multiple processors


1.4 How to write parallel programs

  • Task-parallelism

    • partition various tasks carried out solving the problem among the cores
  • Data-parallelism

    • partition the data used in solving the problem among the cores
    • each core carries out similar operations on its part of the data

Example

Prof P, Teaching assistants A, B, C, D

100 students, 5 questions each

Task-parallelism

  • Each of them grade all 100 papers to 1 of the question
  • P grades Q1, A grades Q2, B grades Q3, …
  • Each person grades a different question (different task)

Data-parallelism

  • Divide 100 papers into 5 piles of 20 papers each
  • Each of them grade all papers in 1 of the piles
  • Each person grades all 5 questions (same/similar task)

Comparison

Task-parallelism

Untitled

  • Master core: receiving and adding the cores’ partial sums
  • Other cores: giving the partial sum to the master core

Data-parallelism

Untitled

  • data: value computed by Compute_next_value
  • each core carries roughly the same operation: computes the require values by calling Compute_next_value and adds them together

Coordination

Cores usually need to coordinate their work

  • Communication

    • 1 or more cores send info to another core
  • Load balancing

    • All cores assigned roughly the same number of values to compute
  • Synchronization

    • Each core works at its own pace
    • Make sure a core doesn’t get too far ahead
    • Example
      • master core read values from stdin and store in an array x

      • Don’t want other cores to start computing partial sums before master is done initializing x

      • synchronization between initialization of x and computation of partial sums

        Use function Synchronize_cores. Each core will wait in the function until all cores have entered the function - in particular the master core

Currently, most powerful programs written using explicit parallel constructs

explicit parallel construct: programs that include explicit instructions for parallelism: core 0 executes task 0, core 1 executes task 1, …, all cores synchronize, …

Other options for writing parallel program - higher level languages - tend to sacrifice performance in order to make program development easier.


1.5 What we’ll be doing

Write programs that are explicitly parallel, using C language and 3 different extensions:

  • Message-Passing Interface (MPI)
  • POSIX threads (Pthreads)
  • OpenMP

2 main parallel systems

  • ************Shared-memory system (Pthread, OpenMP)

    • Cores can share access to the computer’s memory
    • In theory, each core can read and write each memory location
    • Coordinate the cores by having them examine and update shared-memory locations
  • **************************************************Distributed-memory system (MPI)

    • Each core has its own private memory
    • Cores must communicate explicitly (sending msg across a network)

Untitled


1.6 Concurrent, parallel, distributed

Concurrent Computing

  • A program is one in which multiple tasks can be ***********in progress at any instant

Parallel Computing

  • A program is one in which multiple tasks cooperate closely to solve a problem
  • Runs multiple tasks simultaneously on cores that are physically close to each other and that either share the same memory or are connected by a very high-speed network
  • Eg: The 2 concurrent addition programs

Distributed Computing

  • A program may need to cooperate with other programs to solve a problem
  • Tend to be more “loosely coupled”. Tasks may be executed by multiple computers separated by large distances, tasks executed by programs created independently.
  • Eg: Web search program

Concluding Remarks

  • The laws of physics have brought us to the doorstep of multicore technology
  • Serial programs typically don’t benefit from multiple cores
    • Automatic parallel program generation from serial program code isn’t the most efficient approach to get high performance from multicore computers
  • Learning to write parallel programs involves learning how to coordinate the cores
  • Parallel programs are usually very complex and therefore, require sound program techniques and development