COMP SCI 3305 - Parallel and Distributed Computing
PDC
Week 1
1.1 Why we need ever-increasing performance
-
Climate modeling
- Need better computer models to better understand climate change
- Models that could simulate interaction between atmosphere, ocean, solid land, ice cap.
- How various intervention might affect global climate
-
Protein Folding
- Misfolded proteins involved in diseases
- Current computational power can’t study complex molecules such as proteins
-
Drug Discovery
- Research into new medical treatments
-
Energy Research
- Program more detailed wind turbines, solar cells
-
Data Analysis
- Analyze a tremendous amount of data
1.2 Why we’re building parallel systems
Increased single processor performance by increasing density of transistors
Problem
- Smaller transistor = Faster processor
- Faster processor = Increased power consumption
- Increased power consumption = Increased heat
- Increased heat = Unreliable processor
Can’t increase the speed of processor but still can increase density of transistors. To exploit this, use parallelism
parallelism: put multiple, relatively simple processors on a single chip
Such circuits are called multicore processors**.**
core: synonymous with CPU
single-core system: processor with one CPU
1.3 Why we need parallel programs
-
Running multiple instances of a serial program isn’t useful (Instead of running multiple instances of a game, we want it to run faster)
serial program: program written to run on a single processor, performance of such program on system with multiple processors = performance on a single processor of the multiprocessor system
Approaches to the serial problem
-
Rewrite serial programs so that they’re parallel
parallel: can make use of multiple cores
-
Use translation programs
translation program: program that will automatically convert serial programs into parallel programs
- Very difficult, limited success, requires complex program analysis
- Output likely inefficient
- Sometimes best to devise a new algorithm
Example (devising new algorithm)
Computer n values and add them together
- Serial solution
- Parallel solution
- Suppose we have p cores, p much smaller than n.
- Each core form partial sum of n/p values
Reduction Pattern - Better parallel algorithm to add up the partial sums
- Don’t let master core do all the work, share among other cores
- Odd and even pairs (add 1 to 0, 3 to 2, …)
- Cores divisible by 2 (add 2 to 0, 6 to 4, …)
- Cores divisible by 4 (add 4 to 0, 12 to 8, …)
- So on and so forth
With 8 cores (master core)
- First method: 7 receives and adds
- Second method: 3 receives and adds
With 1000 cores (master core)
- First method: 999 receives and adds
- Second method: 10 receives and adds
Sending mechanism
- How to send a number from one core to another
- Calculate distance between cores
Takeaway
-
Highly unlikely that a translation program would discover the second method
-
Cannot simply write large serial constructs and parallelize them
parallelize: Modifying a program so that it can use multiple cores
-
Must write parallel programs
parallel program: program that exploits the power of multiple processors
1.4 How to write parallel programs
-
Task-parallelism
- partition various tasks carried out solving the problem among the cores
-
Data-parallelism
- partition the data used in solving the problem among the cores
- each core carries out similar operations on its part of the data
Example
Prof P, Teaching assistants A, B, C, D
100 students, 5 questions each
Task-parallelism
- Each of them grade all 100 papers to 1 of the question
- P grades Q1, A grades Q2, B grades Q3, …
- Each person grades a different question (different task)
Data-parallelism
- Divide 100 papers into 5 piles of 20 papers each
- Each of them grade all papers in 1 of the piles
- Each person grades all 5 questions (same/similar task)
Comparison
Task-parallelism
- Master core: receiving and adding the cores’ partial sums
- Other cores: giving the partial sum to the master core
Data-parallelism
- data: value computed by Compute_next_value
- each core carries roughly the same operation: computes the require values by calling Compute_next_value and adds them together
Coordination
Cores usually need to coordinate their work
-
Communication
- 1 or more cores send info to another core
-
Load balancing
- All cores assigned roughly the same number of values to compute
-
Synchronization
- Each core works at its own pace
- Make sure a core doesn’t get too far ahead
- Example
-
master core read values from stdin and store in an array x
-
Don’t want other cores to start computing partial sums before master is done initializing x
-
synchronization between initialization of x and computation of partial sums
Use function Synchronize_cores. Each core will wait in the function until all cores have entered the function - in particular the master core
-
Currently, most powerful programs written using explicit parallel constructs
explicit parallel construct: programs that include explicit instructions for parallelism: core 0 executes task 0, core 1 executes task 1, …, all cores synchronize, …
Other options for writing parallel program - higher level languages - tend to sacrifice performance in order to make program development easier.
1.5 What we’ll be doing
Write programs that are explicitly parallel, using C language and 3 different extensions:
- Message-Passing Interface (MPI)
- POSIX threads (Pthreads)
- OpenMP
2 main parallel systems
-
************Shared-memory system (Pthread, OpenMP)
- Cores can share access to the computer’s memory
- In theory, each core can read and write each memory location
- Coordinate the cores by having them examine and update shared-memory locations
-
**************************************************Distributed-memory system (MPI)
- Each core has its own private memory
- Cores must communicate explicitly (sending msg across a network)
1.6 Concurrent, parallel, distributed
Concurrent Computing
- A program is one in which multiple tasks can be ***********in progress at any instant
Parallel Computing
- A program is one in which multiple tasks cooperate closely to solve a problem
- Runs multiple tasks simultaneously on cores that are physically close to each other and that either share the same memory or are connected by a very high-speed network
- Eg: The 2 concurrent addition programs
Distributed Computing
- A program may need to cooperate with other programs to solve a problem
- Tend to be more “loosely coupled”. Tasks may be executed by multiple computers separated by large distances, tasks executed by programs created independently.
- Eg: Web search program
Concluding Remarks
- The laws of physics have brought us to the doorstep of multicore technology
- Serial programs typically don’t benefit from multiple cores
- Automatic parallel program generation from serial program code isn’t the most efficient approach to get high performance from multicore computers
- Learning to write parallel programs involves learning how to coordinate the cores
- Parallel programs are usually very complex and therefore, require sound program techniques and development