Using Compiled Languages#

First steps with python using numba#

import numpy as np
import numba

Python is a nice scripting object-oriented language but it can run into performance issues. We will see a few examples below. To get better performance, one uses compiled languages, such a C, C++ and Fortran. We will also use the numba python library that allows one to perform “just in time” compilations. We will however explore in more details in this lecture how to compile directly C and Fortran codes. We will see in a next lecture how one can interface these compiled functions directly to python.

Let’s start first with a simple example to see how bad python performs when not used properly. We define a simple function that uses a python loop, which is generally a very bad idea with python.

def f_simple(X):
    Y = np.empty_like(X)
    for i in range(len(X)):
        x = X[i]
        Y[i] = x + x**2 + x**3 + x**4 + x**5 + x**6 + x**7 + x**8
    return Y

We create a random numpy array of moderate size.

x = np.random.normal(size=1_000_000)

We finally call the function and time it using the %%timeit python function.

%%timeit
f_simple(x)
1.08 s ± 12.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

We see that it called the timing routine 7 times, each timing using only 1 call to the function. We can change this to see how robust the time measurements are. The standard deviation seems indeed a bit large.

%%timeit -r 4 -n 4
f_simple(x)
1.08 s ± 1.12 ms per loop (mean ± std. dev. of 4 runs, 4 loops each)

We see that the measurement seems now more consistent with a smaller standard deviation.

Let’s now try to use more proper python programming, avoiding using explicit loops, but direct numpy array notations instead.

def f_numpy(X):
    return X + X**2 + X**3 + X**4 + X**5 + X**6 + X**7 + X**8
%%timeit -r 4 -n 4
f_numpy(x)
108 ms ± 950 μs per loop (mean ± std. dev. of 4 runs, 4 loops each)

Wow! It is indeed much faster. Too fast even… Let’s use a bigger array. For this, on your own jupyter notebook, please uncomment the second line.

x = np.random.normal(size=1_000_000)
# x = np.random.normal(size=10_000_000)
%%timeit -r 4 -n 4
f_numpy(x)
107 ms ± 153 μs per loop (mean ± std. dev. of 4 runs, 4 loops each)

These multiple powers are probably slow to evaluate. Let’s use a nice trick to avoid having to call these expensive operations.

def f_numpy_2(X):
    return X * (1 + X * (1 + X * (1 + X * (1 + X * (1 + X * (1 + X * (1 + X)))))))
%%timeit -r 4 -n 4
f_numpy_2(x)
5.96 ms ± 263 μs per loop (mean ± std. dev. of 4 runs, 4 loops each)

Wow! Another dramatic improvement!

We kind of reached the maximum we can do using python alone. We will now try to use a nice python package called numba that allows one to perform just in time compilation. What numba does is to first convert the python function into a C code and then to compile this C code on the fly. The performance of the resulting function is usually much higher. Since the function is now compiled, you don’t need to worry about using loops directly anymore. In fact, to allow numba to translate the python instructions into C instructions, it is recommended to use explicit loops.

Let see how we can optimize our function using numba.

@numba.jit(nopython=True)
def f_numba(X):
    Y = np.empty_like(X)
    for i in range(len(X)):
        x = X[i]
        Y[i] = x + x**2 + x**3 + x**4 + x**5 + x**6 + x**7 + x**8
    #        Y[i] = x*(1 + x*(1 + x*(1 + x*(1 + x*(1 + x*(1 + x*(1 + x)))))))
    return Y

Note that we have used the decorator @numba.jit that tells numba to translate the function in C and compile it. numba tries to translate everything in C. If it cannot, it will keep the python code as is. Using the option nopython=True forces numba to translate in C. If numba fails to do it, an error will follow.

Let’s now time the resulting compiled function.

%%timeit -r 4 -n 4
f_numba(x)
The slowest run took 275.43 times longer than the fastest. This could mean that an intermediate result is being cached.
26.1 ms ± 44.5 ms per loop (mean ± std. dev. of 4 runs, 4 loops each)

This is now really fast! This is the main advantage of using a compiled language. The standard deviation is quite large when compared to the mean. This is because the timer is also counting the extra time numba needs to compile the function. To avoid this, we can use an even bigger array (again please uncomment the second line in the next cell). Note that we could also have used the cache=True option of numba but this is beyond the scope of this lecture.

x = np.random.normal(size=1_000_000)
# x = np.random.normal(size=100_000_000)
%%timeit -r 4 -n 4
f_numba(x)
459 μs ± 132 μs per loop (mean ± std. dev. of 4 runs, 4 loops each)

Using a compiler also allows us to use parallel computing. We will see in future lectures how to program in parallel. For the time being, we just trust numba to do it for us. To parallelize a numba function, just add the parallel=True option and replace the range function defining the loop by the parallel function numba.prange which defines the method to divide up the loop into parallel tasks.

@numba.jit(nopython=True, parallel=True)
def f_numba_para(X):
    Y = np.empty_like(X)
    for i in numba.prange(len(X)):
        x = X[i]
        Y[i] = x + x**2 + x**3 + x**4 + x**5 + x**6 + x**7 + x**8
    return Y
%%timeit -r 4 -n 40
f_numba_para(x)
The slowest run took 58.52 times longer than the fastest. This could mean that an intermediate result is being cached.
2.7 ms ± 4.36 ms per loop (mean ± std. dev. of 4 runs, 40 loops each)

This ends our journey towards better and better performance using python. We started with an explicit loop within a pure python function and the pretty awful timing of roughly 200s. We ended with a compiled parallel C code generated by numba with an amazing 10000x (yes 10 thousands!) speedup with roughly 20ms of execution time.

Linux and the Terminal window#

To understand better what numba is doing under the hood, we now switch to using the Terminal window. We will edit our C or Fortran codes using an editor (could be vim or emacs). We will then compile our C or Fortran code using a compiler. Before we get there, let’s first get some practice in the Terminal window.

In your jupyter notebook Home page, hit the New button, this time choosing the Terminal option. You should see a prompt like $ and a cursor. Just type: $ ls. You should see the content of the course directory in the Terminal window.

In this jupyter notebook, you can execute all the same command line instructions using a ! before.

!ls
Compiled_Languages.ipynb  cmake_example  compiled.md  rust.md

Here is a list of very useful Linux commands that you have to know by heart.

Command

Examples

Description

ls

ls
ls -als

List files in current directory
List in long format including hidden files and file sizes

cd

cd ..
cd week09
cd ~bob/se-for-sci/content

Change to parent directory
Change to directory week09
Change to target directory inside Bob’s SE course directory

mkdir

mkdir test

Creating a new directory called test

rmdir

rmdir test

Removing the directory called test

cp

cp file1.txt file2.txt
cp ~bob/file1.txt .
cp ~bob/* .
cp -r ~bob/se-for-sci .

Copy file1.txt into a new file called file2.txt
Copy the file called file1.txt in Bob’s home directory into a new file locally keeping the same name
Copy all the files in Bob’s home directory locally giving them the same name
Copy recursively the entire content of Bob’s SE course directory locally keeping the same names

rm

rm file1.txt
rm -rf *

Remove only the file called file1.txt
Remove recursively all files and directories without asking permission (very dangerous)

mv

mv ~bob/file1.txt file2.txt

Move one file into another location and with a new name

more

more file1.txt

Look at the file content one page at a time

man

man more

Look at the manual for a given Linux command

grep

grep Hello file1.txt

Search for string Hello inside the file file1.txt

Try now to play with these different commands in the Terminal window. In the remainder of the lecture, we will have to use the Terminal window again so get used to it!

Compiling a C code#

Now let’s move to the core of the lecture, namely learning how to compile actual code. We will start simple with the famous Hello world example. The cell below will write a C file called hello.c.

%%writefile hello.c
#include <stdio.h>
int main() {
   // This is a comment 
   printf("Hello, World!\n");
   return 0;
}
Writing hello.c

You can check in parallel in the Terminal window that this file was properly created using the $ more hello.c command.

The first line is an include statement. It tells the compiler to include at the beginning of the file another file called stdio.h which is part of the C compiler library of files. As the name indicates, this files contains the standard Input/Output C functions. The function we will use here is printf to output to screen the character string Hello world!. Comments are defining in CC using the // directive.

In this lecture, we will not teach the basics of the C language. We only focus on the compilers. If you need more details on the C syntax, please use the web as a never ending source of information.

To compile the code, we now need to use a compiler. In most Linux system, you always find by default the GNU compiler called gcc. More resources on the GNU C compiler can be found here. The command to compile our simple hello.c code is as follows:

!gcc hello.c

This command creates a new file called a.out which is the executable of your code. You can check that it is indeed here by typing:

!ls
Compiled_Languages.ipynb  a.out  cmake_example	compiled.md  hello.c  rust.md

You can now run this executable by typing:

!./a.out
Hello, World!

Congratulations! You succeeded in running your first compiled code!

In the previous cell, the symbol ! is used to execute from the jupyter notebook a Linux command. In the Terminal window, you can try and execute $ ./a.out where $ is the prompt (don’t type the $ symbol, it should be already there!). The dot-slash ./ means execute the code that sits here, in this directory.

Note that if you type only $ a.out it won’t work.

!a.out
/usr/bin/sh: 1: a.out: not found

Indeed, the operating system wasn’t able to find the executable anywhere in the system.

For that, you need to define the PATH variable that contains the path to your executables.

Try now to type in the Terminal window:

export PATH=~/se-for-sci/content/week09:$PATH

echo $PATH

You should see a long list of directories conrtaining all the executables accessible to you, including:

<yourhomedir>/se-for-sci/content/week09

You can now type a.out in the Terminal window and it will work like a charm.

Note that this jupyter notebook inherits the PATH the system had when you launched it the first time with the command jupyter notebook. You won’t be able to change the path anymore. This is why you need to use the Terminal window for this little exercise we just did.

As you have probably guessed, a.out is the default name for executables. If you want to give it a proper name, use the -o option.

!gcc hello.c -o hello
!ls
Compiled_Languages.ipynb  cmake_example  hello	  rust.md
a.out			  compiled.md	 hello.c

We have now a new file called hello which is our new executable.

!./hello
Hello, World!

Let’s now move to a more complicated task. We would like to reproduce the exercise we did using python and numba but this time directly ourselves using C.

Here is the C code that implements the power function we used before.

To make a proper comparison with our previous attempts, please uncomment the line in the following cell.

%%writefile power.c
#include <stdio.h>
#include <stdlib.h>
#include <math.h>

int main()
{
    int i,n=1000000;
//    int i,n=100000000;
    float x,y;
    
    printf("%i\n",n);
    for (i=0;i<n;i++){
        x=rand();
        y=x+pow(x,2)+pow(x,3)+pow(x,4)+pow(x,5)+pow(x,6)+pow(x,7)+pow(x,8);
    }
    return 0;
}
Writing power.c

We will not dwelled on the new C syntax introduced here: declaring integer and floating point variables, a for loop and the external functions rand() and pow().

The key points are that we now need to add more include statements to allow the use of these external functions. The new library element strlib.h is already contained in the standard GNU C libraries. The library element math.h is not. We need to tell the compiler to look into an external library to find the pow() function.

This is done with the compiler using the -l option that tells the compiler to link your code with an external library of already compiled functions. In our case, the name of the math library is simply m, so we have to type the command:

!gcc power.c -o power -lm

Note that the outcome of this compilation might depend on your system. Some GNU C compiler versions need the -lm linking option, some others don’t.

To see what version of the compiler you use, just type:

!gcc --version
gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Let’s now try to execute our new code using the Linux command time:

!time ./power
1000000
0.11user 0.00system 0:00.11elapsed 100%CPU (0avgtext+0avgdata 2048maxresident)k
0inputs+0outputs (0major+85minor)pagefaults 0swaps

This is very disappointing! The reason for this poor performance is the function pow() that works for any floating point powers, not just integer powers like we need here. Let’s use the same trick we use above for our python code and re-write our C code as follows:

%%writefile mult.c
#include <stdio.h>
#include <stdlib.h>

int main()
{
    int i,n=1000000;
//    int i,n=100000000;
    float x,y;
    
    printf("%i\n",n);
    for (i=0;i<n;i++){
        x=rand();
        y=x*(1+x*(1+x*(1+x*(1+x*(1+x*(1+x*(1+x)))))));
    }
    return 0;
}
Writing mult.c

Note that we don’t use the math library anymore. We can compile now using:

!gcc mult.c -o mult
!time ./mult
1000000
0.01user 0.00system 0:00.01elapsed 100%CPU (0avgtext+0avgdata 1408maxresident)k
0inputs+0outputs (0major+71minor)pagefaults 0swaps

Much better! We can also compile the code using the optimization option -O (capital O) that allows to explore various degree of optimization, from -O0 which corresponds to basically zero optimization to -O3 which allows the compiler to re-write aggressively parts of your code to make it faster.

Let’s try to optimize our executable using:

!gcc -O3 mult.c -o mult
!time ./mult
1000000
0.00user 0.00system 0:00.00elapsed 100%CPU (0avgtext+0avgdata 1408maxresident)k
0inputs+0outputs (0major+71minor)pagefaults 0swaps

Indeed, much better! We have now reached the same level of performance than numba, but we did it ourselves.

Compiling a Fortran code#

Let’s continue our exploration of compilers and use now a very popular language in scientific computing, namely Fortran.

We start with the Hello world code. The syntax is widely different than C.

%%writefile hello.f90
program hello

write(*,*)"Hello world!"

end program hello
Writing hello.f90

The most widely distributed Fortran compiler in Linux machines is here also the GNU Fortran compiler, part of the GNU Compiler Collection (gcc). You can find more details here.

The compilation options are very similar than for the C compiler.

!gfortran hello.f90 -o hello
!./hello
 Hello world!

Let’s now explore the capabilities of the Fortran compiler for our computational example. We have the following code in Fortran:

%%writefile power.f90
program power

    real(kind=8)::x, y
    integer::i,n=1000000
!    integer::i,n=100000000

    write(*,*)n
    do i=1,n
        x=rand()
        y=x*(1+x*(1+x*(1+x*(1+x*(1+x*(1+x*(1+x)))))))
    enddo
    
end program power
Writing power.f90

Let’s compile this code with aggressive optimization turned on.

!gfortran -O3 power.f90 -o power
!time ./power
     1000000
0.01user 0.00system 0:00.01elapsed 100%CPU (0avgtext+0avgdata 2560maxresident)k
56inputs+0outputs (1major+110minor)pagefaults 0swaps

The performance is slightly but not significantly better than the C code. This means that choosing between C and Fortran depends more on your personal history and taste, as well as on minor syntax preferences. There are many fundamental differences between C and Fortran: arrays stored contiguously in memory using row-major (Fortran) versus column-major (C) ordering, loop index starting with 0 (C) or 1 (Fortran), etc. But overall, Fortran and C++ have the same language capabilities, especially in terms of object-oriented programming.

Compiling a C++ code#

Finally, let’s quickly explore the C++ compiler. We will first look at the Hello world code and then at the power calculations. We will use here again the GCC C++ compiler. Note that the file must end with the .cpp suffix to be considered as a C++ code.

%%writefile hello.cpp
#include <iostream>

int main() {
// This is a comment
    std::cout << "Hello World!";
    return 0;
}
Writing hello.cpp
!gcc hello.cpp -o hello -lstdc++
!./hello
Hello World!

The syntax is again quite different. The I/O library has a different name.

The code for the computational example now looks like:

%%writefile mult.cpp
#include <iostream>

int main()
{
    int i,n=1000000;
//    int i,n=100000000;
    float x,y;
    
    std::cout << n;
    for (i=0;i<n;i++){
        x=rand();
        y=x*(1+x*(1+x*(1+x*(1+x*(1+x*(1+x*(1+x)))))));
    }
    return 0;
}
Writing mult.cpp

In order for this code to compile properly, we have to link it with the standard C++ library.

!gcc -O3 mult.cpp -o mult -lstdc++
!time ./mult
10000000.00user 0.00system 0:00.00elapsed 100%CPU (0avgtext+0avgdata 3328maxresident)k
0inputs+0outputs (0major+136minor)pagefaults 0swaps

We get the same performance than C and Fortran.

Building a code with more than one file#

It is not recommended to develop a complex software using only one giant file with millions of lines. Good practice in software engineering favors using multiple files, one per class of objects or functions.

Compiling software with multiple files is slightly more complex than what we did so far.

Let’s decompose our Hello world code into a main routine and a subroutine, each of which are coded in a separate file.

%%writefile hello.f90
program main
    
    integer :: i=1

    call greetings(i)

end program main
Overwriting hello.f90
%%writefile greet.f90
subroutine greetings(i)

    integer, intent(in) :: i

    write(*,*)"Hello world!",i

end subroutine greetings
Writing greet.f90

We will first compile each individual file using the -c option of the compiler. This tells the compiler to turn the file and all the functions it contains into an object.

!gfortran -c hello.f90
!gfortran -c greet.f90

We see now using the ls command in the cell below that we have 2 new files greet.o and hello.o. These object files contains independent functions that are not meant to work together just yet.

!ls
Compiled_Languages.ipynb  greet.f90  hello.cpp	mult.c	  power.f90
a.out			  greet.o    hello.f90	mult.cpp  rust.md
cmake_example		  hello      hello.o	power
compiled.md		  hello.c    mult	power.c

We now need to link these functions together to get the final executable. The compiler will perform this linking operation using all the required .o files as arguments. The final executable will be the result of this linking operation and will be given the name hello using the -o option.

!gfortran hello.o greet.o -o hello
!./hello
 Hello world!           1

Preprocessor directives#

A very popular and convenient way of programming is to use preprocessor directives. Preprocessor directives have their own syntax and can be seen as yet another programming language. As the name indicates, before the compiler actually compiles your code, if the -cpp option has been used, it will first call the preprocessor. The preprocessor will go through your code and look for directives starting with # on the first character of each line only. Be careful with this strict rule.

The goal of the preprocessor is to really edit your file before sending it to the compiler. Only the part of the code that are within the blocks that have not been edited by the preprocessor will be compiled. This is useful to get efficient code because you don’t need to perform the tests at run time but only at compilation time.

The example below is quite self-explanatory. More details on preprocessor directives can be found here.

%%writefile greet.f90
subroutine greetings(i)

    integer, intent(in) :: i
#ifdef FRENCH
    write(*,*)"Bonjour tout le monde !",i
#else
    write(*,*)"Hello world!",i
#endif
    
end subroutine greetings
Overwriting greet.f90
!gfortran -cpp -DFRENCH -c greet.f90
!gfortran hello.o greet.o -o hello
!./hello
 Bonjour tout le monde !           1

Libraries#

We have already used several libraries provided by the system, such as the standard C and C++ libraries or the C math library. We can also create our own libraries with all our functions. For this, we have to use specific command line instructions. The sequence of commands below will create a library archive or .a file.

!ar r libgreet.a greet.o
ar: creating libgreet.a
!ranlib libgreet.a
!ls
Compiled_Languages.ipynb  greet.f90  hello.cpp	 mult	   power.c
a.out			  greet.o    hello.f90	 mult.c    power.f90
cmake_example		  hello      hello.o	 mult.cpp  rust.md
compiled.md		  hello.c    libgreet.a  power

This library contains a bunch of compiled objects that we can link with our codes later. To do so, just use the -L and -l options.

The first option with a capital L tells the compiler in what directory it will find the library and the second option with a small l tells the compiler the name of the library, so that -lname tells the compiler to look for the file libname.a.

Let’s check that it works.

!gfortran hello.f90 -o hello -L. -lgreet
!./hello
 Bonjour tout le monde !           1

Using PATH, LIBRARY_PATH and LD_LIBRARY_PATH#

We have seen now most of the basics of compiled languages. We have seen in particular how to add to the PATH environment variable one or more directories that contain our executables.

There is a similar system wide environment variable to tell the system where are our different libraries. The corresponding environment variable is called LIBRARY_PATH.

In this jupyter notebook, we can see the values these environment variables had when the notebook was started.

!echo $PATH
/home/runner/work/se-for-sci/se-for-sci/.pixi/envs/default/bin:/home/runner/.pixi/bin:/snap/bin:/home/runner/.local/bin:/opt/pipx_bin:/home/runner/.cargo/bin:/home/runner/.config/composer/vendor/bin:/usr/local/.ghcup/bin:/home/runner/.dotnet/tools:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
!echo $LIBRARY_PATH

We see for example that the LIBRARY_PATH has not been set. In a Terminal window, we can set this variable to our local directory so that we could now access the library libgreet.a from everywhere, without the need to specify -L. or -L~/se-for-sci/content/week09.

For this, just type in the Terminal:

export LIBRARY_PATH=~/se-for-sci/content/week09

echo $LIBRARY_PATH

You can now try to compile the code in the Terminal window using:

gfortran hello.f90 -o hello -lgreet

We don’t need the -L option anymore because the system knows where to find libgreet.a.

In most cases, you will use this strategy to link objects from a library into your executable. Note that the resulting executable can be quite big, because it contains all the object .o files that the linker has extracted from the library. In this case, we say the executable used a static library.

Another strategy consists to load the library objects dynamically at run time. The system will load the content of the so-called shared libraries. The shared library can be built using the -fPIC and -shared option as:

gfortran -fPIC -cpp -DFRENCH -c greet.f90

gfortran -shared -o libgreet2.so greet.o

Finally, the executable can be generated using:

gfortran hello.f90 -o hello -L. -lgreet2

Be careful, at run time, the shared loadable libraries must be accessible by the system. You can add your shared dynamic libraries in the environment variable LD_LIBRARY_PATH:

export LD_LIBRARY_PATH=~/se-for-sci/content/week09

echo $LD_LIBRARY_PATH

You can then execute ./hello with the properly loaded libraries at run time.

If the LD_LIBRARY_PATH variable is not set properly, you will get the error:

./hello: error while loading shared libraries: libgreet2.so: cannot open shared object file: No such file or directory

Dealing with complex libraries and compiler versions using module#

In more complex cases, such as large scientific libraries, setting by hand the different PATH variables can become tedious and error prone. There is a nice tool on most modern Linux servers called module. It allows you to load (or unload) dynamically the libraries installed on your system by setting up for you all these environment variables.

Since module will change the environment variables, we cannot use the jupyter notebook. We have to work directly in the Terminal window.

In the Terminal window, type:

$ module list

You should see:

Currently Loaded Modulefiles:
 1) anaconda3/2021.11

Type now:

$ module avail

You should see a long list of available modules. If you type

$ module avail fftw

you will get all the modules available for the fftw fast Fourier transform library.

!module avail fftw
/usr/bin/sh: 1: module: not found

We will now load the library for the gcc compiler version 3.3.9.

For this, type in the Terminal window:

$ module load fftw/gcc/3.3.9
$ module list

You should see:

Currently Loaded Modulefiles:
 1) anaconda3/2021.11   2) fftw/gcc/3.3.9

It is worth now inspecting the different environment variables:

$ echo $LIBRARY_PATH
/usr/local/fftw/gcc/3.3.9/lib64
$ echo $LD_LIBRARY_PATH
/usr/local/fftw/gcc/3.3.9/lib64

We see that the Linux environment has been properly set to use the fftw library.

Inspecting what is in this directory, we get:

$ ls /usr/local/fftw/gcc/3.3.9/lib6
cmake                   libfftw3f_threads.so.3.6.9  libfftw3_omp.a          libfftw3q_threads.so
libfftw3.a              libfftw3l.a                 libfftw3_omp.so         libfftw3q_threads.so.3
libfftw3f.a             libfftw3l_omp.a             libfftw3_omp.so.3       libfftw3q_threads.so.3.6.9
libfftw3f_omp.a         libfftw3l_omp.so            libfftw3_omp.so.3.6.9   libfftw3.so
libfftw3f_omp.so        libfftw3l_omp.so.3          libfftw3q.a             libfftw3.so.3
libfftw3f_omp.so.3      libfftw3l_omp.so.3.6.9      libfftw3q_omp.a         libfftw3.so.3.6.9
libfftw3f_omp.so.3.6.9  libfftw3l.so                libfftw3q_omp.so        libfftw3_threads.a
libfftw3f.so            libfftw3l.so.3              libfftw3q_omp.so.3      libfftw3_threads.so
libfftw3f.so.3          libfftw3l.so.3.6.9          libfftw3q_omp.so.3.6.9  libfftw3_threads.so.3
libfftw3f.so.3.6.9      libfftw3l_threads.a         libfftw3q.so            libfftw3_threads.so.3.6.9
libfftw3f_threads.a     libfftw3l_threads.so        libfftw3q.so.3          pkgconfig
libfftw3f_threads.so    libfftw3l_threads.so.3      libfftw3q.so.3.6.9
libfftw3f_threads.so.3  libfftw3l_threads.so.3.6.9  libfftw3q_threads.a

These are quite a few useful libraries we could play with. Note the .a and .so file name suffix.

To remove this library, just type:

$ module unload fftw/gcc/3.3.9

You can check yourself in the Terminal window that the environment variables are now back to their original values.

Compiling more complex codes: Makefile and CMake#

Compiling codes can be quite complex when the code base contains thousands of files with many dependencies. A dependency is when a code depends on another library or object to function properly.

When you work on your code, compiling these files can be very time consuming, especially if you ask the compiler to optimize your codes. The trick is to only compile the files that have changes since you last compiled them. Of course, you have to recompile the corresponding object, but also all the objects that depend on it. This is the reason why dependencies are so important.

We have now modern tools that allow to compile complex codes dealing properly with dependencies.

Historically, the first tool to deliver such a service was make. We will describe it briefly in this course, as you will have to use Makefile unfortunately. The message here is that make is depreciated and should be replaced as much as possible with cmake, the modern version of make.

Compiling code using make and the Makefile#

A Makefile is a file containing specific instructions to compile your code. In a sense, this is yet another programming language but designed only for code compilation. What is nice with make is the possibility to only recompile files that have changes since the last time it was compiled.

The syntax is quite simple with one major annoying caveat. The general format is as follows:

target : prerequisite
        instructions

A target is usually an object or an executable.

A prerequisite is an object or a file that the target depends on.

The lines containing instructions MUST start with a TAB. Spaces are not allowed. This is the most annoying aspect of Makefile. There is however a workaround as shown below using the semi-colon ;.

%%writefile Makefile
hello : hello.o greet.o Makefile ; gfortran hello.o greet.o -o hello

%.o : %.f90 ; gfortran -cpp -DFRENCH -c $<

clean : ; rm *.o
Writing Makefile

Make sure that the 2 files hello.f90 and greet.f90 are here.

!ls *.f90
greet.f90  hello.f90  power.f90

To execute this Makefile, just type:

$ make

or

$ make hello

in the Terminal window.

In the jupyter notebook, we can also compile the code using:

!make
gfortran hello.o greet.o -o hello
!./hello
 Bonjour tout le monde !           1

Now let’s modify only the file greet.f90.

!touch greet.f90
!make
gfortran -cpp -DFRENCH -c greet.f90
gfortran hello.o greet.o -o hello

We see that only this file is recompiled. The new executable is also generated by linking the new object.

We can remove all the objects using:

!make clean
rm *.o

and recompile everything using:

!make
gfortran -cpp -DFRENCH -c hello.f90
gfortran -cpp -DFRENCH -c greet.f90
gfortran hello.o greet.o -o hello

We can see a more complex example of Makefile by cloning the ramses repository. You can either clone the repository using the following command:

!git clone https://rteyssie@bitbucket.org/rteyssie/ramses.git --quiet

If your Terminal window does not have access to the Internet, connect directly to the corresponding BitBucket web page here using your browser.

Using the Terminal window, try and compile this code using the Makefile in the ramses/bin directory.

Compiling code using cmake#

Makefiles are a system specific build system - they just run commands for you when things need “making”. cmake was designed to be cross-platform, and is a “build system generator” - it makes your Makefiles for you (or Ninja, or other build systems). It is based on a very high level syntax and can explore your system, looking automatically for libraries and configuring them properly for you. Although the cmake added value really shows for complex project (or on a different system, like Windows, or if you want to use an IDE, or if you want to pack up and distribute your code, or anything else.), we will start with a simple example.

Let’s assume you have your 2 original Fortran files hello.f90 and greet.f90.

Let’s create a new file with name CMakeLists.txt. It. must have this name.

%%writefile CMakeLists.txt
cmake_minimum_required(VERSION 3.18...3.24)

project(hello LANGUAGES Fortran)

add_executable(hello hello.f90 greet.f90)

target_compile_definitions(hello PRIVATE FRENCH)

set_source_files_properties(greet.f90 PROPERTIES Fortran_PREPROCESS TRUE)
Writing CMakeLists.txt

In a Terminal window, type the following to create a directory named build and configure your project:

$ cmake -S . -B build

This will generate a Makefile (or Ninja, or Xcode, MSVC soluiton, or whatever build system you want). To build, type:

$ cmake --build build

The Makefile will run and your executable is ready! Just type in there:

$ ./build/hello

We will now walk you through a more complex cmake example available in the directory cmake_example.

Additional material can be found in this book:

And this useful workshop: