{"cells":[{"cell_type":"markdown","metadata":{},"source":["

How to use this document

"]},{"cell_type":"markdown","metadata":{},"source":["

What is org-mode

"]},{"cell_type":"markdown","metadata":{},"source":["

This is org-mode document. Org-mode is a markup language which allows\n","to write text documents containing executable code blocks.

"]},{"cell_type":"markdown","metadata":{},"source":["

Org-mode can do much more than this and you can find out more here.

"]},{"cell_type":"markdown","metadata":{},"source":["

Org-mode works in emacs (the best editor in the world).

"]},{"cell_type":"markdown","metadata":{},"source":["

Code blocks

"]},{"cell_type":"markdown","metadata":{},"source":["

You can do two things with code blocks (besides simply reading\n","them):

"]},{"cell_type":"markdown","metadata":{},"source":["
    \n","
  1. execute: this can be done using the C-c C-c (Ctrl-c\n","Ctrl-c) keybinding. The stdout of the code will be appended below the\n","code block itself

  2. \n","
  3. tangle: tangling a code block means creating a separate file that\n","contains the code which can thus be compiled and run as usual. If you\n","hit C-c C-v C-t (this call the emacs function\n","org-babel-tangle) all the code blocks in this file will be\n","tangled. If, instead, you only want to tangle a single code blocks, go\n","to that block and hit C-u C-c C-v C-t

  4. \n","
"]},{"cell_type":"markdown","metadata":{},"source":["

Compiling and\n","running a tangled code block

"]},{"cell_type":"markdown","metadata":{},"source":["

Once you have tangled a code block, you can compile and run it like\n","any other code. For the code blocks of this document to work, you have\n","to use the following compile command:

"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["gcc -fopenmp -I. aux.c code_block.c -o code_block"]},{"cell_type":"markdown","metadata":{},"source":["

And then you can run it by launching the code_block\n","executable

"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["./code_block"]},{"cell_type":"markdown","metadata":{},"source":["

Some other useful tricks

"]},{"cell_type":"markdown","metadata":{},"source":[""]},{"cell_type":"markdown","metadata":{},"source":["

Introduction

"]},{"cell_type":"markdown","metadata":{},"source":["

Parallel computer\n","architectures

"]},{"cell_type":"markdown","metadata":{},"source":[""]},{"cell_type":"markdown","metadata":{},"source":["

Roughly speaking, parallel computers can be classified into two\n","types:

"]},{"cell_type":"markdown","metadata":{},"source":["
    \n","
  1. Shared memory: all the CPUs share one (logical)\n","memory, i.e., all processes can access the same addressing space\n","regardless of the CPU they are running on. This makes it simple to\n","communicate data from one process to another

  2. \n","
  3. Distributed memory: the computer is actually\n","formed of multiple node, each having one or more CPUs and its own\n","memory. Nodes are connected through a network. A process running on one\n","node can only access data on the local memory; therefore, if it needs\n","data that is on another node, a message must be exchanged through the\n","network

  4. \n","
"]},{"cell_type":"markdown","metadata":{},"source":["

Shared memory: SMP vs NUMA

"]},{"cell_type":"markdown","metadata":{},"source":["

One \"logical memory\" does not necessarily mean that only one physical\n","memory exists. If multiple memory modules exist, access to data may\n","non-uniform

"]},{"cell_type":"markdown","metadata":{},"source":[""]},{"cell_type":"markdown","metadata":{},"source":[""]},{"cell_type":"markdown","metadata":{},"source":["

Both types of shared-memory architectures can be programmed using the\n","same tools and technologies. When performance is a concern, though,\n","special care must be taken for NUMA machines (we will not cover in this\n","course)

"]},{"cell_type":"markdown","metadata":{},"source":["

Multicore processors: why?

"]},{"cell_type":"markdown","metadata":{},"source":["

Until the early 00's all processors had only one core (in fact we did\n","not use the word \"core\" at all). Then why have multicore processors\n","become ubiquitous? Energy consumption is the reason why:

"]},{"cell_type":"markdown","metadata":{},"source":["

P = CV2f

"]},{"cell_type":"markdown","metadata":{},"source":["

however there is a linear dependence between f and V, therefore P grows as the cube of $f$!!!

"]},{"cell_type":"markdown","metadata":{},"source":["

Because the performance of single-core processors could only be\n","improved by increasing the frequency, this trend became\n","unsustainable.

"]},{"cell_type":"markdown","metadata":{},"source":["

Multicore processors design relies on Thread Level\n","Parallelism to improve performance. This means that more\n","transistors are used to assemble multiple execution units (i.e., cores)\n","on a single chip. This improves performance with only a linear increase\n","in the energy consumption: the capacitance C grows because of the increased\n","number of transistors.

"]},{"cell_type":"markdown","metadata":{},"source":["

In fact, it is also possible to produce faster processors which\n","consume less energy!!! Consider a quad-core processor with frequency\n","0.6f: it will be 2.4 times\n","faster and consume roughly 15\\% less energy

"]},{"cell_type":"markdown","metadata":{},"source":["

Multicore processors: why?

"]},{"cell_type":"markdown","metadata":{},"source":[""]},{"cell_type":"markdown","metadata":{},"source":["

Multicore architecture

"]},{"cell_type":"markdown","metadata":{},"source":["

Multicore computer: what does it look like?

"]},{"cell_type":"markdown","metadata":{},"source":["

The hwloc library is designed to retrieve all the\n","details of the architecture. For example, on my computer, I can run the\n","lstopo program from hwloc to retrieve the\n","architecture:

"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["ssh plafrim lstopo --of png"]},{"cell_type":"markdown","metadata":{},"source":["![./figures/arch.png](./figures/arch.png)"]},{"cell_type":"markdown","metadata":{},"source":["

How to program multicore\n","computers?

"]},{"cell_type":"markdown","metadata":{},"source":["

Many options exist, but they are not all simple, portable, efficient\n","etc.

"]},{"cell_type":"markdown","metadata":{},"source":["

Examples:

"]},{"cell_type":"markdown","metadata":{},"source":[""]},{"cell_type":"markdown","metadata":{},"source":["

The OpenMP standard

"]},{"cell_type":"markdown","metadata":{},"source":["

Basic ideas and components

"]},{"cell_type":"markdown","metadata":{},"source":[""]},{"cell_type":"markdown","metadata":{},"source":["

OpenMP (Open specifications for MultiProcessing) is\n","an Application Program Interface (API) to explicitly direct\n","multi-threaded, shared memory parallelism.

"]},{"cell_type":"markdown","metadata":{},"source":[""]},{"cell_type":"markdown","metadata":{},"source":["

The OpenMP standard is developed by an advisory board that includes\n","many members from academia (UTK, LBNL, ANL, NASA,…) and industry (Intel,\n","AMD, NEC, Fujitsu, NVIDIA,…)

"]},{"cell_type":"markdown","metadata":{},"source":["

Basic ideas and components

"]},{"cell_type":"markdown","metadata":{},"source":[""]},{"cell_type":"markdown","metadata":{},"source":[""]},{"cell_type":"markdown","metadata":{},"source":["

Disclaimer

"]},{"cell_type":"markdown","metadata":{},"source":[""]},{"cell_type":"markdown","metadata":{},"source":["

This course does not cover the whole OpenMP standard. The OpenMP\n","manual is over 600 pages as of today (v5.2)

"]},{"cell_type":"markdown","metadata":{},"source":["

Only a subset of constructs and clauses will be presented.

"]},{"cell_type":"markdown","metadata":{},"source":["

Tons of tutorials can be found online but better be used with\n","moderation.

"]},{"cell_type":"markdown","metadata":{},"source":["

Fork-join execution model

"]},{"cell_type":"markdown","metadata":{},"source":["

OpenMP is based on a fork-join execution model:

"]},{"cell_type":"markdown","metadata":{},"source":[""]},{"cell_type":"markdown","metadata":{},"source":[""]},{"cell_type":"markdown","metadata":{},"source":["

Parallel region

"]},{"cell_type":"markdown","metadata":{},"source":["

Parallel region directive\n","syntax

"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["#pragma omp parallel [clause]\n"," if (scalar or logical expression)\n"," private(list)\n"," firstprivate(list)\n"," shared(list)\n"," default(private | shared | none)\n"," reduction(operator:list)\n"," num_threads(scalar integer expression)\n","{\n"," /* Structured code block */\n","}"]},{"cell_type":"markdown","metadata":{},"source":[""]},{"cell_type":"markdown","metadata":{},"source":["

A simple hello world\n","example in OpenMP

"]},{"cell_type":"markdown","metadata":{},"source":["

Just a simple hello world with multiple threads:

"]},{"cell_type":"markdown","metadata":{},"source":[""]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["//%cflags:-fopenmp\n","#include \n","#include \n","#include \n","\n","int main(int argc, char* argv[]){\n","\t#pragma omp parallel\n","\t{\n","\t printf(\"Hello world!\\n\");\n","\t}\n","}"]},{"cell_type":"markdown","metadata":{},"source":["

A\n","slightly more complex hello world example in OpenMP

"]},{"cell_type":"markdown","metadata":{},"source":["

Just a simple hello world with multiple threads:

"]},{"cell_type":"markdown","metadata":{},"source":["
    \n","
  • start with serial execution

  • \n","
  • open a parallel region where:

    \n","
      \n","
    • each thread reads its identifier and the total number of threads\n","using, respectively, the omp_get_thread_num() and\n","omp_get_num_threads()
    • \n","
    • prints a message
    • \n","
  • \n","
"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["//%cflags:-fopenmp\n","#include \n","#include \n","#include \n","\n","int main(int argc, char* argv[]){\n","\t#pragma omp parallel\n","\t{\n","\t printf(\"Hello world from thread %2d in a pool of %2d.\\n\", omp_get_thread_num(),\n","\t omp_get_num_threads());\n","\t}\n","}"]},{"cell_type":"markdown","metadata":{},"source":["

Parallel region: how many\n","threads?

"]},{"cell_type":"markdown","metadata":{},"source":["

How many threads do we have in the parallel regions of a code? The\n","number of threads depends on:

"]},{"cell_type":"markdown","metadata":{},"source":["
    \n","
  • Evaluation of the if clause (one or many)

  • \n","
  • Setting of the num_threads clause

  • \n","
  • Use of the omp_set_num_threads() library\n","function

  • \n","
  • Setting of the OMP_NUM_THREADS environment\n","variable

  • \n","
  • Implementation default - usually the number of CPUs on a node,\n","though it could be dynamic

  • \n","
"]},{"cell_type":"markdown","metadata":{},"source":["

Parallel region: how many\n","threads?

"]},{"cell_type":"markdown","metadata":{},"source":["

Complete example

"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["//%cflags:-fopenmp\n","#include \n","#include \n","#include \n","\n","int main(int argc, char* argv[]){\n","\tint iam, nth, n=5;\n","\t\n","\t#pragma omp parallel\n","\t{\n","\t printf(\"Region 1 thread %2d / %2d.\\n\", omp_get_thread_num(), omp_get_num_threads());\n","\t}\n","\t\n","\tomp_set_num_threads(n);\n","\t\n","\t#pragma omp parallel\n","\t{\n","\t printf(\"Region 2 thread %2d / %2d.\\n\", omp_get_thread_num(), omp_get_num_threads());\n","\t}\n","\t\n","\t#pragma omp parallel num_threads(2)\n","\t{\n","\t printf(\"Region 3 thread %2d / %2d.\\n\", omp_get_thread_num(), omp_get_num_threads());\n","\t}\n","\t\n","\t#pragma omp parallel if(n<5)\n","\t{\n","\t printf(\"Region 4 thread %2d / %2d.\\n\", omp_get_thread_num(), omp_get_num_threads());\n","\t}\n","\t\n","\n","}"]},{"cell_type":"markdown","metadata":{},"source":["

Hello world with a bug

"]},{"cell_type":"markdown","metadata":{},"source":["

Here is a minor variant of the hello world program…with a bug

"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["//%cflags:-fopenmp\n","#include \n","#include \n","#include \n","\n","int main(int argc, char* argv[]){\n","\tint iam, nth;\n","\t\n","\t#pragma omp parallel\n","\t{\n","\t iam = omp_get_thread_num();\n","\t nth = omp_get_num_threads();\n","\t do_stuff(1);\n","\t printf(\"Hello world from thread %d in a pool of %2d.\\n\", iam, nth);\n","\t}\n","}"]},{"cell_type":"markdown","metadata":{},"source":["

Data sharing 1/2

"]},{"cell_type":"markdown","metadata":{},"source":["
    \n","
  • Most variables are shared by default

  • \n","
  • Global variables include:

    \n","
      \n","
    • Fortran: COMMON blocks, SAVE and MODULE variables
    • \n","
    • C: File scope variables, static
    • \n","
  • \n","
  • Private variables include:

    \n","
      \n","
    • Loop index variables (in !$OMP DO) constructs
    • \n","
    • Stack variables in subroutines called from parallel regions
    • \n","
  • \n","
  • Fortran: Automatic variables within a statement block

  • \n","
  • The OpenMP Data Scope Attribute Clauses are used to explicitly\n","define how variables should be scoped. They include:

    \n","
      \n","
    • private
    • \n","
    • firstprivate
    • \n","
    • shared
    • \n","
    • default
    • \n","
    • reduction
    • \n","
  • \n","
"]},{"cell_type":"markdown","metadata":{},"source":["

Data sharing 2/2

"]},{"cell_type":"markdown","metadata":{},"source":["
    \n","
  • private(list): a new object of the same type is\n","created for each thread (uninitialized!)

  • \n","
  • firstprivate(list): Listed variables are initialized\n","according to the value of their original objects prior to entry into the\n","parallel or work-sharing construct.

  • \n","
  • lastprivate(list): The value copied back into the\n","original variable object is obtained from the last (sequentially)\n","iteration or section of the enclosing construct.

  • \n","
  • shared(list): only one object exists in memory and\n","all the threads access it

  • \n","
  • default(shared|private|none): sets the default\n","scoping

  • \n","
  • reduction(operator:list): performs a reduction on\n","the variables that appear in its list.

  • \n","
"]},{"cell_type":"markdown","metadata":{},"source":["

Hello world bugfix

"]},{"cell_type":"markdown","metadata":{},"source":["

Let's fix the bug: by declaring iam private, each thread\n","will have its own copy of this variable

"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["//%cflags:-fopenmp\n","#include \n","#include \n","#include \n","\n","int main(int argc, char* argv[]){\n","\tint iam, nth;\n","\t\n","\t#pragma omp parallel private(iam)\n","\t{\n","\t iam = omp_get_thread_num();\n","\t nth = omp_get_num_threads();\n","\t do_stuff(1);\n","\t printf(\"Hello world from thread %d in a pool of %2d.\\n\", iam, nth);\n","\t}\n","}"]},{"cell_type":"markdown","metadata":{},"source":["

Work distribution and\n","sharing

"]},{"cell_type":"markdown","metadata":{},"source":["

Dependencies

"]},{"cell_type":"markdown","metadata":{},"source":["

Dependencies

"]},{"cell_type":"markdown","metadata":{},"source":["

The interest of parallel programming is not to execute the same\n","workload multiple times but to distribute the workload to the available\n","processes so that execution time can be reduced. This implies that\n","multiple instructions will be executed concurrently\n","(or, equivalently, in parallel).

"]},{"cell_type":"markdown","metadata":{},"source":["

Two successive statements S1 and S2 can be executed concurrently if\n","they are independent. According to the\n","Bernstein conditions there exist three types of\n","dependencies:

"]},{"cell_type":"markdown","metadata":{},"source":["
    \n","
  • Read-After-Write or true\n","dependency or flow dependency: if\n","Input(S2) overlaps with Output(S1)

  • \n","
  • Write-After-Read or\n","anti-dependency: if Output(S2) overlaps\n","with Input(S1)

  • \n","
  • Write-After-Write or output\n","dependency: if Output(S2) overlaps with\n","Output(S1)

  • \n","
"]},{"cell_type":"markdown","metadata":{},"source":["

Dependencies

"]},{"cell_type":"markdown","metadata":{},"source":["

Example. Are these two statements independent?

"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["//%cflags:-fopenmp\n","#include \n","#include \n","#include \n","\n","int main(int argc, char* argv[]){\n","\ta = b+c;\n","\te = d+a;\n","}"]},{"cell_type":"markdown","metadata":{},"source":["

What kind of dependency is there? RAW. Here is a more convoluted\n","example

"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["//%cflags:-fopenmp\n","#include \n","#include \n","#include \n","\n","int main(int argc, char* argv[]){\n","\tfor(i=1; iDependencies"]},{"cell_type":"markdown","metadata":{},"source":["

Example. Are these two statements independent?

"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["//%cflags:-fopenmp\n","#include \n","#include \n","#include \n","\n","int main(int argc, char* argv[]){\n","\ta = b+c;\n","\tb = c*2;\n","}"]},{"cell_type":"markdown","metadata":{},"source":["

What kind of dependency is there? WAR. Note that WAR dependencies can\n","be sometimes removed!

"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["//%cflags:-fopenmp\n","#include \n","#include \n","#include \n","\n","int main(int argc, char* argv[]){\n","\td = b;\n","\ta = d+c;\n","\tb = c*2;\n","}"]},{"cell_type":"markdown","metadata":{},"source":["

Now the second and third statement have become independent. Here is a\n","more convoluted example

"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["//%cflags:-fopenmp\n","#include \n","#include \n","#include \n","\n","int main(int argc, char* argv[]){\n","\tfor(i=0; iDependencies"]},{"cell_type":"markdown","metadata":{},"source":["

Example. Are these two statements independent?

"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["//%cflags:-fopenmp\n","#include \n","#include \n","#include \n","\n","int main(int argc, char* argv[]){\n","\tc = a+b;\n","\tc = 2;\n","}"]},{"cell_type":"markdown","metadata":{},"source":["

What kind of dependency is there? WAW. Here is a more convoluted\n","example

"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["//%cflags:-fopenmp\n","#include \n","#include \n","#include \n","\n","int main(int argc, char* argv[]){\n","\tfor(i=0; iMaster"]},{"cell_type":"markdown","metadata":{},"source":["

The master directive identifies a code block which is\n","only executed by the master thread

"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["//%cflags:-fopenmp\n","#include \n","#include \n","#include \n","\n","int main(int argc, char* argv[]){\n","\tint iam;\n","\t\n","\t#pragma omp parallel private(iam)\n","\t {\n","\t iam = omp_get_thread_num();\n","\t\n","\t#pragma omp master\n","\t {\n","\t do_stuff(0.1);\n","\t printf(\" ---> This is only done by: %2d\\n\",iam);\n","\t }\n","\t printf(\" This is also done by: %2d.\\n\",iam);\n","\t }\n","}"]},{"cell_type":"markdown","metadata":{},"source":["

Single

"]},{"cell_type":"markdown","metadata":{},"source":["

The single directive identifies a code block which is\n","only executed by one (any) thread

"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["//%cflags:-fopenmp\n","#include \n","#include \n","#include \n","\n","int main(int argc, char* argv[]){\n","\tint iam;\n","\t\n","\t#pragma omp parallel private(iam)\n","\t {\n","\t iam = omp_get_thread_num();\n","\t\n","\t#pragma omp single\n","\t {\n","\t do_stuff(0.1);\n","\t printf(\" ---> This is only done by: %2d\\n\",iam);\n","\t }\n","\t printf(\" This is also done by: %2d.\\n\",iam);\n","\t }\n","}"]},{"cell_type":"markdown","metadata":{},"source":["

Single vs master

"]},{"cell_type":"markdown","metadata":{},"source":["

One obvious difference between single and\n","master is that with master only the thread\n","with id 0 can execute the code block. This has a risk: you have to make\n","sure that the master thread passes by that code block otherwise it will\n","never be executed.

"]},{"cell_type":"markdown","metadata":{},"source":["

Can you spot any other difference from executing the two code blocks\n","above? There is an implied barrier at the end of the\n","single block. It can be removed using the\n","nowait clause

"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["//%cflags:-fopenmp\n","#include \n","#include \n","#include \n","\n","int main(int argc, char* argv[]){\n","\tint iam;\n","\t\n","\t#pragma omp parallel private(iam)\n","\t {\n","\t iam = omp_get_thread_num();\n","\t\n","\t#pragma omp single nowait\n","\t {\n","\t do_stuff(0.1);\n","\t printf(\" ---> This is only done by: %2d\\n\",iam);\n","\t }\n","\t printf(\" This is also done by: %2d.\\n\",iam);\n","\t }\n","}"]},{"cell_type":"markdown","metadata":{},"source":["

Parallel loops

"]},{"cell_type":"markdown","metadata":{},"source":["

Parallel

"]},{"cell_type":"markdown","metadata":{},"source":["

In the code below, all the iterations in the loop are\n","independent. This means that they can be executed\n","concurrently. However the code below is wrong because\n","it does not produce the same result as in sequential

"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["//%cflags:-fopenmp\n","#include \n","#include \n","#include \n","\n","int main(int argc, char* argv[]){\n","\tint i, n=4;\n","\tint a[n], b[n], c[n];\n","\t\n","\t#pragma omp parallel private(i)\n","\t{\n","\t\n","\t for (i=0; iParallel"]},{"cell_type":"markdown","metadata":{},"source":["

OpenMP provides a construct that automatically parallelizes loops by\n","executing chunks of iterations concurrently. Note that the loop index\n","i is implicitly private.

"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["//%cflags:-fopenmp\n","#include \n","#include \n","#include \n","\n","int main(int argc, char* argv[]){\n","\tint i, n=4;\n","\tint a[n], b[n], c[n];\n","\t\n","\t#pragma omp parallel\n","\t{\n","\t#pragma omp for\n","\t for (i=0; iSchedule"]},{"cell_type":"markdown","metadata":{},"source":["

The schedule clause in the for construct\n","specifies how the iterations of the loop are assigned to threads:

"]},{"cell_type":"markdown","metadata":{},"source":["
    \n","
  • static: loop iterations are divided into pieces of\n","size chunk and then statically assigned to threads in a round-robin\n","fashion

  • \n","
  • dynamic: loop iterations are divided into pieces of\n","size chunk, and dynamically scheduled among the threads; when a thread\n","finishes one chunk, it is dynamically assigned another

  • \n","
  • guided: for a chunk size of 1, the size of each\n","chunk is proportional to the number of unassigned iterations divided by\n","the number of threads, decreasing to 1. For a chunk size with value k\n","(greater than 1), the size of each chunk is determined in the same way\n","with the restriction that the chunks do not contain fewer than k\n","iterations

  • \n","
  • runtime: The scheduling decision is deferred until\n","runtime by the environment variable OMP SCHEDULE

  • \n","
"]},{"cell_type":"markdown","metadata":{},"source":["

Schedule

"]},{"cell_type":"markdown","metadata":{},"source":["

Let's see how schedule works:

"]},{"cell_type":"code","execution_count":null,"metadata":{"vscode":{"languageId":"C"}},"outputs":[],"source":["int i;\n","#pragma omp parallel for num_threads(4) schedule(guided,25)\n","for (i=0; i<400; i++)\n"," printf(\"%3d %2d\\n\",i,omp_get_thread_num());"]},{"cell_type":"markdown","metadata":{},"source":["![res.data](res.data)"]},{"cell_type":"markdown","metadata":{},"source":["![./figures/scheds.png](./figures/scheds.png)"]},{"cell_type":"markdown","metadata":{},"source":["

Collapse

"]},{"cell_type":"markdown","metadata":{},"source":["

The collapse clause allows for combining multiple loops\n","into a single one and for parallelizing its iterations. The number of\n","loops to collapse can be provided as an option

"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["//%cflags:-fopenmp\n","#include \n","#include \n","#include \n","\n","int main(int argc, char* argv[]){\n","\tint i, j;\n","\t\n","\t#pragma omp parallel for private(i,j) collapse(2)\n","\tfor (i=0; i<2; i++) {\n","\t for (j=0; j<4; j++) {\n","\tprintf(\"Thread %2d does iteration i:%2d j:%2d\\n\",omp_get_thread_num(),i,j);\n","\t }\n","\t}\n","}"]},{"cell_type":"markdown","metadata":{},"source":["

Threads synchronization

"]},{"cell_type":"markdown","metadata":{},"source":["

Barriers

"]},{"cell_type":"markdown","metadata":{},"source":["

Barrier

"]},{"cell_type":"markdown","metadata":{},"source":["

A barrier is simply a waiting point: all threads must wait for all\n","the others to reach a barrier point before moving on. Example

"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["//%cflags:-fopenmp\n","#include \n","#include \n","#include \n","\n","int main(int argc, char* argv[]){\n","\tint iam;\n","\t double t=secs();\n","\t#pragma omp parallel private(iam)\n","\t {\n","\t iam = omp_get_thread_num();\n","\t\n","\t if(iam==0){\n","\t do_stuff(0.5); // 0.5 seconds\n","\t } else {\n","\t do_stuff(0.3); // 0.3 seconds\n","\t }\n","\t#pragma omp barrier\n","\t printf(\"Thread %2d reached this point at time %f.\\n\",iam,secs()-t);\n","\t }\n","}"]},{"cell_type":"markdown","metadata":{},"source":["

Barrier

"]},{"cell_type":"markdown","metadata":{},"source":["

Improper use of barriers can cause deadlocks: if not\n","all threads pass by the barrier, those who do will be waiting\n","forever…

"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["//%cflags:-fopenmp\n","#include \n","#include \n","#include \n","\n","int main(int argc, char* argv[]){\n","\tint iam;\n","\t double t=secs();\n","\t#pragma omp parallel private(iam)\n","\t {\n","\t iam = omp_get_thread_num();\n","\t\n","\t if(iam==0){\n","\t do_stuff(0.5);\n","\t } else {\n","\t do_stuff(0.3);\n","\t #pragma omp barrier\n","\t }\n","\t\n","\t printf(\"Thread %2d reached this point at time %f.\\n\",iam,secs()-t);\n","\t }\n","}"]},{"cell_type":"markdown","metadata":{},"source":["

Critical sections

"]},{"cell_type":"markdown","metadata":{},"source":["

Critical

"]},{"cell_type":"markdown","metadata":{},"source":["

The critical directive identifies a code block which is\n","executed in mutual exclusion by all threads, i.e., one\n","at a time.

"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["//%cflags:-fopenmp\n","#include \n","#include \n","#include \n","\n","int main(int argc, char* argv[]){\n","\tint iam;\n","\t double t=secs();\n","\t\n","\t#pragma omp parallel private(iam)\n","\t {\n","\t iam = omp_get_thread_num();\n","\t\n","\t#pragma omp critical\n","\t {\n","\t do_stuff(0.1);\n","\t printf(\"This is done by %2d at time %f\\n\",iam, secs()-t);\n","\t }\n","\t }\n","}"]},{"cell_type":"markdown","metadata":{},"source":["

Critical scope

"]},{"cell_type":"markdown","metadata":{},"source":["

Critical sections can have names. The name argument is used to\n","identify the critical construct. For any critical construct for which\n","name is not specified, the effect is as if an identical (unspecified)\n","name was specified. It is not possible to have two or more threads in\n","different critical regions that have the same name!

"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["//%cflags:-fopenmp\n","#include \n","#include \n","#include \n","\n","int main(int argc, char* argv[]){\n","\tint iam;\n","\t double t=secs();\n","\t\n","\t#pragma omp parallel private(iam)\n","\t {\n","\t iam = omp_get_thread_num();\n","\t\n","\t#pragma omp critical (toto)\n","\t {\n","\t do_stuff(0.1);\n","\t printf(\"First is done by %2d at time %f\\n\",iam, secs()-t);\n","\t }\n","\t\n","\t#pragma omp critical (titi)\n","\t {\n","\t do_stuff(0.1);\n","\t printf(\"Second is done by %2d at time %f\\n\",iam, secs()-t);\n","\t }\n","\t }\n","}"]},{"cell_type":"markdown","metadata":{},"source":["

Atomic instructions

"]},{"cell_type":"markdown","metadata":{},"source":["

Atomic

"]},{"cell_type":"markdown","metadata":{},"source":["

The atomic construct ensures that a specific storage location is\n","accessed atomically so that possible simultaneous reads and writes by\n","multiple threads do not result in indeterminate values. Five types of\n","atomic constructs exist: read, write,\n","update, capture and compare

"]},{"cell_type":"markdown","metadata":{},"source":["
    \n","
  • read: atomically read a memory location, i.e.,\n","x can not change while being read
  • \n","
"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["//%cflags:-fopenmp\n","#include \n","#include \n","#include \n","\n","int main(int argc, char* argv[]){\n","\tint x, v;\n","\t\n","\t#pragma omp parallel\n","\t {\n","\t #pragma atomic read\n","\t v = x;\n","\t }\n","}"]},{"cell_type":"markdown","metadata":{},"source":["

Atomic

"]},{"cell_type":"markdown","metadata":{},"source":["
    \n","
  • write: atomically write a memory location

  • \n","
  • update: atomically update (i.e. read-modify-write) a\n","memory location

  • \n","
"]},{"cell_type":"markdown","metadata":{},"source":["

So what's the interest of atomic? take this example: we could\n","certainly use critical to protect the update of\n","x[] but this would prevent calls to\n","compute_one to be executed concurrently. With\n","atomic only the update of x[] is\n","serialized.

"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["//%cflags:-fopenmp\n","#include \n","#include \n","#include \n","\n","int main(int argc, char* argv[]){\n","\tdouble t_start=secs(), t_end;\n","\t int i, n=100, m=5, tot=0, x[5]={0,0,0,0,0};\n","\t\n","\t#pragma omp parallel for\n","\t for(i=0; iAtomic"]},{"cell_type":"markdown","metadata":{},"source":["
    \n","
  • capture: atomically update a memory location and\n","capture its initial or final value
  • \n","
"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["//%cflags:-fopenmp\n","#include \n","#include \n","#include \n","\n","int main(int argc, char* argv[]){\n","\tint x, v, y, w;\n","\t\n","\t#pragma omp parallel\n","\t {\n","\t /* Capture initial value */\n","\t #pragma atomic capture\n","\t v = x++;\n","\t\n","\t /* Capture final value */\n","\t #pragma atomic capture\n","\t w = ++y;\n","\t\n","\t }\n","}"]},{"cell_type":"markdown","metadata":{},"source":["

Atomic

"]},{"cell_type":"markdown","metadata":{},"source":["
    \n","
  • compare: atomically and conditionally update a memory\n","location
  • \n","
"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["//%cflags:-fopenmp\n","#include \n","#include \n","#include \n","\n","int main(int argc, char* argv[]){\n","\tint i, n=1000, min=99999999;\n","\tint x[n]; \n","\t\n","\trand_fill(x, n);\n","\t\n","\t#pragma omp parallel for\n","\tfor(i=0; iReductions"]},{"cell_type":"markdown","metadata":{},"source":["

Reductions

"]},{"cell_type":"markdown","metadata":{},"source":["

Assume this simple code that computes the sum of all the elements of\n","an array

"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["//%cflags:-fopenmp\n","#include \n","#include \n","#include \n","\n","int main(int argc, char* argv[]){\n","\tint i, sum, n=1000;\n","\tint x[n]; \n","\t\n","\trand_fill(x, n); sum=0;\n","\t\n","\t\n","\tfor(i=0; iThe iterations of this loop are clearly dependent because of the\n","updates on sum. We could actually use a critical section or\n","an atomic update but we would loose all performance.

"]},{"cell_type":"markdown","metadata":{},"source":["

Reductions

"]},{"cell_type":"markdown","metadata":{},"source":["

Reductions allow us to take advantage of\n","associativity and commutativity of some operators (+ in this case):

"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["//%cflags:-fopenmp\n","#include \n","#include \n","#include \n","\n","int main(int argc, char* argv[]){\n","\tint i, sum, n=1000;\n","\t int x[n]; \n","\t\n","\t rand_fill(x, n); sum=0;\n","\t\n","\t#pragma omp parallel reduction(+:sum)\n","\t {\n","\t#pragma omp for \n","\t for(i=0; iThe reduction clause specifies an operator and one or more list\n","items. For each list item, a private copy is created in each implicit\n","task, and is initialized appropriately for the operator. After the end\n","of the region, the original list item is updated with the values of the\n","private copies using the specified operator.

"]},{"cell_type":"markdown","metadata":{},"source":["

Reductions

"]},{"cell_type":"markdown","metadata":{},"source":["

For the C language, predefined reduction operators are\n","(note that : in the table below is actually a | )

"]},{"cell_type":"markdown","metadata":{},"source":["\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","
OperatorInitializerCombiner
+omppriv=0ompout += ompin
*omppriv=1ompout *= ompin
~omppriv=~0ompout ~= ompin
:omppriv=0ompout := ompin
^omppriv=0ompout ^= ompin
&&omppriv=1ompout = ompin &&\n","ompout
::omppriv=0ompout = ompin :: ompout
maxomppriv=minvalompout = max(ompin,ompout)
minomppriv=maxvalompout = min(ompin,ompout)
"]},{"cell_type":"markdown","metadata":{},"source":["

Tasks

"]},{"cell_type":"markdown","metadata":{},"source":["

Task

"]},{"cell_type":"markdown","metadata":{},"source":["

The OpenMP task construct simply identifies a block of\n","code which is ready to be executed and whose execution is\n","deferred. Once the task is created, it can be executed\n","by any thread, at any time. This means that we can not\n","make any assumptions on when a task is executed and by which thread and\n","in which order all the created tasks are executed.

"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["//%cflags:-fopenmp\n","#include \n","#include \n","#include \n","\n","int main(int argc, char* argv[]){\n","\t#pragma omp parallel\n","\t {\n","\t#pragma omp single\n","\t {\n","\t #pragma omp task\n","\t printf(\"Thead %2d does task 1\\n\",omp_get_thread_num());\n","\t\n","\t #pragma omp task\n","\t printf(\"Thead %2d does task 2\\n\",omp_get_thread_num());\n","\t\n","\t #pragma omp task\n","\t printf(\"Thead %2d does task 3\\n\",omp_get_thread_num());\n","\t\n","\t #pragma omp task\n","\t printf(\"Thead %2d does task 4\\n\",omp_get_thread_num());\n","\t }\n","\t }\n","}"]},{"cell_type":"markdown","metadata":{},"source":["

Why do we need the master construct in the code\n","above?

"]},{"cell_type":"markdown","metadata":{},"source":["

Task data

"]},{"cell_type":"markdown","metadata":{},"source":["

A slightly more complex example, with a bug:

"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["//%cflags:-fopenmp\n","#include \n","#include \n","#include \n","\n","int main(int argc, char* argv[]){\n","\tint i;\n","\tprintf(\"Hello %p\\n\",&i);\n","\t#pragma omp parallel private(i)\n","\t{\n","\t#pragma omp master\n","\t {\n","\t for(i=0; i<6; i++)\n","\t {\n","\t#pragma omp task\n","\t printf(\"Thread %d iteration: %d\\n\", omp_get_thread_num(), i);\n","\t }\n","\t }\n","\t}\n","}"]},{"cell_type":"markdown","metadata":{},"source":["

What went wrong?

"]},{"cell_type":"markdown","metadata":{},"source":["

Task data

"]},{"cell_type":"markdown","metadata":{},"source":["

The value of shared variables accessed within a task might change\n","between the creation of the task and its actual execution. Some clauses\n","can be used to define the scope of variables within tasks:

"]},{"cell_type":"markdown","metadata":{},"source":["
    \n","
  • shared(x) means that when the task is executed x is\n","the same variable (the same memory location) as when the task was\n","created

  • \n","
  • firstprivate(x) means that x is private to the task,\n","i.e., when the task is created, a brand new variable x is created as\n","well and its value is set to be the same as the value of x in the\n","enclosing context at the moment when the task is created. This new copy\n","is destroyed when the task is finished

  • \n","
  • private(x) means that x is private to the task,\n","i.e., when the task is created, a brand new variable x is created as\n","well. This new copy is destroyed when the task is finished

  • \n","
"]},{"cell_type":"markdown","metadata":{},"source":["

If a variable is private in the parallel region it is\n","implicitly firstprivate in the included tasks

"]},{"cell_type":"markdown","metadata":{},"source":["

Task data

"]},{"cell_type":"markdown","metadata":{},"source":["

A slightly more complex example, with a bugfix:

"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["//%cflags:-fopenmp\n","#include \n","#include \n","#include \n","\n","int main(int argc, char* argv[]){\n","\tint i;\n","\tprintf(\"Hello %p\\n\",&i);\n","\t#pragma omp parallel\n","\t{\n","\t#pragma omp master\n","\t {\n","\t for(i=0; i<6; i++)\n","\t {\n","\t#pragma omp task firstprivate(i)\n","\t printf(\"Thread %d iteration: %d\\n\", omp_get_thread_num(), i);\n","\t }\n","\t }\n","\t}\n","}"]},{"cell_type":"markdown","metadata":{},"source":["

Task if

"]},{"cell_type":"markdown","metadata":{},"source":["

Creating and handling tasks has a cost. Therefore, it is not always\n","worth creating a task, for example, if the task has only little work to\n","do. The if clause can be used to choose whether to create a\n","task or immediately run the code block

"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["//%cflags:-fopenmp\n","#include \n","#include \n","#include \n","\n","int main(int argc, char* argv[]){\n","\tdouble w=0.5;\n","\t\n","\t#pragma omp parallel\n","\t{\n","\t#pragma omp master\n","\t {\n","\t#pragma omp task\n","\t printf(\"Thread %d executes this first task\\n\", omp_get_thread_num());\n","\t\n","\t#pragma omp task if(w>0.4)\n","\t {\n","\t do_stuff(w);\n","\t printf(\"Thread %d executes this second task\\n\", omp_get_thread_num());\n","\t }\n","\t\n","\t }\n","\t}\n","}"]},{"cell_type":"markdown","metadata":{},"source":["

Taskwait

"]},{"cell_type":"markdown","metadata":{},"source":["

So how can we be sure that some tasks are actually executed? The\n","taskwait directive ensures that all the previously\n","submitted tasks have been executed. Note that this does not include\n","descendants, i.e., tasks that have been generated by other tasks.

"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["//%cflags:-fopenmp\n","#include \n","#include \n","#include \n","\n","int main(int argc, char* argv[]){\n","\tint x, y, z;\n","\t\n","\t#pragma omp parallel\n","\t{\n","\t#pragma omp master\n","\t {\n","\t#pragma omp task\n","\t x = compute_one(0.2);\n","\t\n","\t#pragma omp task\n","\t y = compute_one(0.2);\n","\t\n","\t#pragma omp taskwait\n","\t z = x+y;\n","\t printf(\"z is %d\\n\", z);\n","\t }\n","\t}\n","}"]},{"cell_type":"markdown","metadata":{},"source":["

Task dependencies

"]},{"cell_type":"markdown","metadata":{},"source":["

It is possible to define an execution order by specifying task\n","dependencies. This is done through the\n","depend clause and the Bernstein conditions:

"]},{"cell_type":"markdown","metadata":{},"source":["
    \n","
  • The in dependence-type. The generated task will be a\n","dependent task of all previously generated sibling tasks that reference\n","at least one of the list items in an out or\n","inout dependence-type list.

  • \n","
  • The out and inout dependence-types. The\n","generated task will be a dependent task of all previously generated\n","sibling tasks that reference at least one of the list items in an\n","in, out, or inout dependence-type\n","list.

  • \n","
"]},{"cell_type":"markdown","metadata":{},"source":["

Task dependencies

"]},{"cell_type":"markdown","metadata":{},"source":["

Example:

"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["//%cflags:-fopenmp\n","#include \n","#include \n","#include \n","\n","int main(int argc, char* argv[]){\n","\tint a, b, c, x, y;\n","\tdouble t=secs();\n","\t#pragma omp parallel\n","\t{\n","\t#pragma omp master\n","\t {\n","\t#pragma omp task depend(out:a)\n","\t a = f_a();\n","\t\n","\t#pragma omp task depend(out:b)\n","\t b = f_b();\n","\t\n","\t#pragma omp task depend(out:c)\n","\t c = f_c();\n","\t\n","\t#pragma omp task depend(in:b,c) depend(out:x)\n","\t x = f_x(b, c);\n","\t\n","\t#pragma omp task depend(in:a,x) depend(out:y)\n","\t y = f_y(a, x);\n","\t\n","\t#pragma omp taskwait\n","\t printf(\"y: %d (correct value is 9) and time is %f\\n\",y,secs()-t);\n","\t }\n","\t}\n","}"]},{"cell_type":"markdown","metadata":{},"source":["

Can you draw the dependency graph?

"]},{"cell_type":"markdown","metadata":{},"source":["

Task priorities

"]},{"cell_type":"markdown","metadata":{},"source":["

Assuming only two threads are available and all functions take one\n","second, the following two schedulings are possible.

"]},{"cell_type":"markdown","metadata":{},"source":[""]},{"cell_type":"markdown","metadata":{},"source":["

Task priorities

"]},{"cell_type":"markdown","metadata":{},"source":["

The priority clause can be used to give the OpenMP\n","scheduler a hint on the importance of a task

"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["//%cflags:-fopenmp\n","#include \n","#include \n","#include \n","\n","int main(int argc, char* argv[]){\n","\tint a, b, c, x, y;\n","\tdouble t=secs();\n","\t#pragma omp parallel\n","\t{\n","\t#pragma omp master\n","\t {\n","\t#pragma omp task depend(out:b) priority(2)\n","\t b = f_b();\n","\t\n","\t#pragma omp task depend(out:c) priority(2)\n","\t c = f_c();\n","\t\n","\t#pragma omp task depend(out:a)\n","\t a = f_a();\n","\t\n","\t#pragma omp task depend(in:b,c) depend(out:x)\n","\t x = f_x(b, c);\n","\t\n","\t#pragma omp task depend(in:a,x) depend(out:y)\n","\t y = f_y(a, x);\n","\t\n","\t#pragma omp taskwait\n","\t printf(\"y: %d (correct value is 9) and time is %f\\n\",y,secs()-t);\n","\t }\n","\t}\n","}"]},{"cell_type":"markdown","metadata":{},"source":["

Task dependencies and\n","pointers

"]},{"cell_type":"markdown","metadata":{},"source":["

When using pointers to specify dependencies, you should dereference\n","it to make sure the dependence is inferred from the pointed data rather\n","than the pointer variable.

"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["//%cflags:-fopenmp\n","#include \n","#include \n","#include \n","\n","int main(int argc, char* argv[]){\n","\tint x[2]={0,0};\n","\tint *p=x;\n","\tdouble t=secs();\n","\t#pragma omp parallel\n","\t{\n","\t#pragma omp master\n","\t {\n","\t#pragma omp task firstprivate(p) depend(out:*p)\n","\t *p = compute_one(1.0);\n","\t\n","\t p+=1;\n","\t\n","\t#pragma omp task firstprivate(p) depend(out:*p)\n","\t *p = compute_one(1.0);\n","\t\n","\t#pragma omp taskwait\n","\t printf(\"x: {%d,%d} (correct value is {1,1}) and time is %f\\n\",x[0],x[1],secs()-t);\n","\t }\n","\t}\n","}"]},{"cell_type":"markdown","metadata":{},"source":["

Locks

"]},{"cell_type":"markdown","metadata":{},"source":["

Locks

"]},{"cell_type":"markdown","metadata":{},"source":["

A lock is a data of type omp_lock_t which can be used to\n","prevent simultaneous access to shared resources according to the\n","schema

"]},{"cell_type":"markdown","metadata":{},"source":["
    \n","
  • acquire (or set or lock) the lock

  • \n","
  • access data

  • \n","
  • release (on unset or unlock) the lock

  • \n","
"]},{"cell_type":"markdown","metadata":{},"source":["

Acquisition of the lock is exclusive in the sense that only one\n","threads can hold the lock at a given time. A lock can be in one of the\n","following states:

"]},{"cell_type":"markdown","metadata":{},"source":["
    \n","
  • uninitialized: the lock is not active and cannot\n","be acquired/released by any thread;

  • \n","
  • unlocked: the lock has been initialized and can\n","be acquired by any thread;

  • \n","
  • locked: the lock has been acquired by one thread\n","and cannot be acquired by any other thread until the owner releases\n","it.

  • \n","
"]},{"cell_type":"markdown","metadata":{},"source":["

Locks

"]},{"cell_type":"markdown","metadata":{},"source":["

Transitions through states can be achieved with the following\n","routines

"]},{"cell_type":"markdown","metadata":{},"source":["
    \n","
  • omp_init_lock: initializes a lock

  • \n","
  • omp_destroy_lock: uninitializes a lock

  • \n","
  • omp_set_lock: waits until a lock is available, and\n","then sets it

  • \n","
  • omp_unset_lock: unsets a lock

  • \n","
  • omp_test_lock: tests a lock, and sets it if it is\n","available

  • \n","
"]},{"cell_type":"markdown","metadata":{},"source":["

Locks

"]},{"cell_type":"markdown","metadata":{},"source":["

Example

"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["//%cflags:-fopenmp\n","#include \n","#include \n","#include \n","\n","int main(int argc, char* argv[]){\n","\tomp_lock_t lock;\n","\tomp_init_lock(&lock);\n","\t\n","\t#pragma omp parallel\n","\t{\n","\t omp_set_lock(&lock);\n","\t printf(\"%d: It's my turn to use the resource\\n\",omp_get_thread_num());\n","\t use_resource();\n","\t omp_unset_lock(&lock);\n","\t}\n","\t\n","\tomp_destroy_lock(&lock);\n","}"]},{"cell_type":"markdown","metadata":{},"source":["

Locks

"]},{"cell_type":"markdown","metadata":{},"source":["

Example with test lock

"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["//%cflags:-fopenmp\n","#include \n","#include \n","#include \n","\n","int main(int argc, char* argv[]){\n","\tomp_lock_t lock;\n","\tomp_init_lock(&lock);\n","\t\n","\t#pragma omp parallel\n","\t{\n","\t\n","\t while(!omp_test_lock(&lock)){\n","\t /* if lock is already locked, I do some other useful stuff */\n","\t printf(\"%d: lock is busy, I do some stuff\\n\",omp_get_thread_num());\n","\t do_stuff(0.5);\n","\t }\n","\t\n","\t printf(\"%d: It's my turn to use the resource\\n\",omp_get_thread_num());\n","\t use_resource();\n","\t omp_unset_lock(&lock);\n","\t}\n","\t\n","\tomp_destroy_lock(&lock);\n","}"]},{"cell_type":"markdown","metadata":{},"source":["

Mistakes to avoid

"]},{"cell_type":"markdown","metadata":{},"source":["

Add useless loops

"]},{"cell_type":"markdown","metadata":{},"source":["

Do not create a new loop just for the purpose of parallelizing it

"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["//%cflags:-fopenmp\n","#include \n","#include \n","#include \n","\n","int main(int argc, char* argv[]){\n","\t#pragma omp parallel\n","\t{\n","\t nth = omp_get_num_threads();\n","\t#pragma omp for\n","\t for(i=0; iThis is exactly the same as

"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["//%cflags:-fopenmp\n","#include \n","#include \n","#include \n","\n","int main(int argc, char* argv[]){\n","\t#pragma omp parallel\n","\t{\n","\t do_stuff();\n","\t}\n","}"]},{"cell_type":"markdown","metadata":{},"source":["

Use tasks with caution

"]},{"cell_type":"markdown","metadata":{},"source":[""]},{"cell_type":"markdown","metadata":{},"source":["

Tasks have a relatively high overhead and should be\n","used with caution.

"]},{"cell_type":"markdown","metadata":{},"source":["
    \n","
  • creating a task for a few operations is probably a very bad\n","idea
  • \n","
  • try to combine as many operations as possible into a single task as\n","long as this does not reduce parallelism
  • \n","
"]},{"cell_type":"markdown","metadata":{},"source":["

Minimize critical sections

"]},{"cell_type":"markdown","metadata":{},"source":["

Critical sections serialize the work of processes.

"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["//%cflags:-fopenmp\n","#include \n","#include \n","#include \n","\n","int main(int argc, char* argv[]){\n","\t#pragma omp parallel for\n","\t for(i=0; iThe code above is completely sequential. The execution time will be\n","the same as without the parallelization but the iterations will be done\n","in a different order.

"]},{"cell_type":"markdown","metadata":{},"source":["
    \n","
  • separate what really needs to be in the critical section from what\n","can be done in parallel
  • \n","
  • sometimes atomic instructions can be used instead of critical
  • \n","
"]},{"cell_type":"markdown","metadata":{},"source":["

Don't forget the parallel\n","section

"]},{"cell_type":"markdown","metadata":{},"source":["

Stupid to say but there is no parallelism if you don't put a parallel\n","section somewhere. Just don't forget.

"]},{"cell_type":"markdown","metadata":{},"source":["

Conversely, if you can merge multiple parallel sections into a single\n","one, it might be good to do so in order to reduce overhead

"]},{"cell_type":"markdown","metadata":{},"source":["

Aux code

"]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["//%cflags:-fopenmp\n","#include \n","#include \n","#include \n","\n","int main(int argc, char* argv[]){\n","\tint seed=-1;\n","\t #pragma omp threadprivate(seed)\n","\t\n","\t int rnd_int() {\n","\t // & 0x7fffffff is equivalent to modulo with RNG_MOD = 2^31\n","\t#if defined(_OPENMP)\n","\t if(seed==-1) seed = omp_get_thread_num()+1;\n","\t#else\n","\t if(seed==-1) seed = 1;\n","\t#endif\n","\t return (seed = (seed * 1103515245 + 12345) & 0x7fffffff);\n","\t }\n","\t\n","\t void rand_fill(int *x, int n){\n","\t int i;\n","\t for(i=0; i\n","#include \n","#include \n","#include \n","#include \n","\n","#include \n","#include \n","#include \n","\n","int main(int argc, char* argv[]){\n","\tint rnd_int();\n","\tvoid rand_fill(int *x, int n);\n","\tlong usecs ();\n","\tdouble secs ();\n","\tvoid do_stuff(double sec);\n","\tint compute_one(double sec);\n","\tint f_a();\n","\tint f_b();\n","\tint f_c();\n","\tint f_x(int b, int c);\n","\tint f_y(int a, int x);\n","\tvoid use_resource();\n","}"]}],"metadata":{"kernelspec":{"display_name":"C","language":"c","name":"c"},"language_info":{"file_extension":".c","mimetype":"text/plain","name":"c"}},"nbformat":4,"nbformat_minor":4}