Parallelism in Computer Science at Five Levels

Level 1: Child

Think of it like having a team of friends helping you clean your room. Each friend is responsible for a different part of the room, and they can all work at the same time. This way, you can clean up faster than if you were doing it alone. That’s what we call parallelism in computer science - many parts of a computer work on different tasks at the same time.

Level 2: Teenager

Imagine you have a big project due for school and you decide to split the work among your group members. Each of you works on a different part of the project at the same time. This is similar to how parallelism works in computers. If a computer has multiple processors, each processor can work on a different part of the task at the same time, making the overall process faster.

Level 3: College Student

In computer science, parallelism is the concept of doing multiple things at the same time to solve a problem faster. For example, if a task can be broken down into smaller independent tasks, each of those tasks can be given to different processors to be solved simultaneously. It’s like using more workers to complete a large task more quickly. This is an essential concept in high-performance computing.

Level 4: Grad Student

Parallelism in computing refers to the architecture where computations are truly performed simultaneously. Systems like multi-core processors, distributed systems or graphics processing units (GPUs) can execute multiple instructions at the same time. This simultaneous execution of instructions allows for faster processing of complex or large-scale tasks. Concepts like concurrent execution, synchronization, and inter-process communication are key to implementing parallelism efficiently.

Level 5: Colleague

Parallel computing is a type of computation where many calculations or processes are carried out simultaneously, exploiting the physical parallelism present in computer hardware from multiple cores in a single machine to multiple machines in a cluster or grid. In the era of big data and machine learning, the demand for parallel computing is greater than ever, given the ability to dramatically improve computation speed and handle massive datasets. However, the complexity lies in parallelizing the tasks, managing their execution, and coordinating communication between the tasks. Programming languages, libraries, and paradigms have been developed to facilitate these complexities, such as OpenMP, MPI, CUDA for GPU computing, and paradigms like data parallelism, task parallelism, and speculative parallelism.

Time Sharing of the CPU

In true parallelism, there is no need for time sharing of the CPU. This is because each processor (or core in a multicore processor) is executing its own thread of instructions simultaneously with the other processors.

However, it’s important to note that this kind of true parallelism requires hardware that supports it, such as multi-core or multi-processor systems. On a single-core processor, you can’t achieve true parallelism because there’s only one processing unit.

In contrast, time sharing is a technique used to give the illusion of parallelism on systems that do not have multiple processors or cores. It involves rapidly switching the CPU between different tasks so each task gets a turn with the processor. This happens so fast that it gives the illusion of multiple tasks being executed at the same time, but in reality, the processor is only working on one task at a time.

So, to answer your question succinctly: in true hardware-supported parallelism, each processor or core is working on a different task at the same time, and no time sharing of the CPU is needed. However, in systems without hardware support for parallelism, time sharing is used to give the illusion of simultaneous execution.

Process, Threads and Parallelism

In parallelism, tasks can be executing in their own thread or process, depending on the context.

A process is an instance of a program that is being executed. It has its own memory space and usually consists of multiple threads that can run concurrently.

A thread is a unit of execution within a process. Multiple threads within a process share the same data space with the main thread and can therefore share information more easily with each other than if they were separate processes.

So in a parallel execution context:

  • If the tasks are running in parallel in the same process, they are likely running in different threads within that process. These threads can usually share memory and communicate with each other more easily, but because they’re in the same process, an unhandled exception in one thread can potentially bring down all other threads.

  • If the tasks are running in parallel across different processes, then each is running in its own process. These processes do not share memory space, and communication between them can be more complex, typically requiring inter-process communication (IPC) mechanisms. However, they are more isolated from each other; an unhandled exception in one process will not affect the execution of another process.

In modern computers with multi-core or multi-processor architectures, both threads and processes can be executed in parallel, with each core executing a separate thread or process.

Code Example

Parallelism in computer science refers to the simultaneous execution of multiple tasks. This can dramatically speed up the execution time of a program, especially when dealing with large-scale data processing tasks. Python’s threading module is a simple way to introduce parallelism into your code.

Here’s a very basic example using Python’s threading module:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import threading

# Here's a simple function that we'll execute in parallel.
def print_numbers():
    for i in range(10):
        print(i)

def print_letters():
    for letter in "abcdefghij":
        print(letter)

# Create threads
thread1 = threading.Thread(target=print_numbers)
thread2 = threading.Thread(target=print_letters)

# Start threads
thread1.start()
thread2.start()

# Wait for both threads to finish
thread1.join()
thread2.join()

print("Threads finished execution")

In this code, we have two functions, print_numbers() and print_letters(). Normally, if we ran these functions one after another, the program would first print numbers 0-9, then print letters a-j.

However, using Python’s threading module, we can run these two functions concurrently. The start() method initiates a thread and begins running the function associated with it, and the join() method ensures that the main program waits for all threads to complete before it continues execution.

Note: The above example is very simplified, and threading in real-world applications can become quite complex due to issues such as shared state, race conditions, and deadlocks. Also, Python’s Global Interpreter Lock (GIL) may prevent true parallelism, so for CPU-bound tasks, multiprocessing or concurrent.futures module may be a better choice.

Real Parallelism in Code

Python’s threading module does not truly distribute computation across multiple CPUs due to the Global Interpreter Lock (GIL). The GIL is a mechanism that allows only one thread to execute Python bytecodes at a time in a single process, even on a multi-core system. This means that Python threads are best used for I/O-bound tasks (like downloading files from the internet or reading from disk), but they won’t necessarily speed up CPU-bound tasks.

If you want to truly leverage multiple CPUs/cores for computational tasks, you’d want to use multiprocessing in Python. Here’s a similar example as before, but using Python’s multiprocessing module:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
import multiprocessing

def print_numbers():
    for i in range(10):
        print(i)

def print_letters():
    for letter in "abcdefghij":
        print(letter)

# Create processes
process1 = multiprocessing.Process(target=print_numbers)
process2 = multiprocessing.Process(target=print_letters)

# Start processes
process1.start()
process2.start()

# Wait for both processes to finish
process1.join()
process2.join()

print("Processes finished execution")

With this code, Python creates two separate processes, and each process can run on a separate CPU core. This can lead to true parallel computation, which can speed up CPU-bound tasks. However, keep in mind that inter-process communication can be more complex than inter-thread communication because processes do not share memory space.