Dividing the Pie – High Performance Computing

The Concept

Pi can be written as the integral

Which means that we can approximate pi by approximating the integral numerically.

The Code

The following code approximate pi by numerical integration. The rectangle method with 10 000 intervals is used for the integration. The approximation, an error estimate as well as the wall clock time is produced as output.

$ cd mpi_basic/03_dividing_the_pie
$ cat src/cpi.c
/******************************************************************************
 *                                                                            *
 *  Basic MPI Example - Dividing the Pie                                      *
 *                                                                            *
 *  Find an approximation to pi using numerical integration                   *
 *                                                                            *
 ******************************************************************************
 *                                                                            *
 *  The original code was written by Gustav at University of Indiana in 2003. *
 *                                                                            *
 *  The current version has been tested/updated by the HPC department at      *
 *  the Norwegian University of Science and Technology in 2011.               *
 *                                                                            *
 ******************************************************************************/
#include "mpi.h"
#include 
#include 
 
double f(double);
 
double f(double a)
{
    return (4.0 / (1.0 + a*a));
}
 
int main(int argc,char *argv[])
{
    int done = 0, n, myid, numprocs, i;
    double PI25DT = 3.141592653589793238462643;
    double mypi, pi, h, sum, x;
    double startwtime = 0.0, endwtime;
    int  namelen;
    char processor_name[MPI_MAX_PROCESSOR_NAME];
 
    MPI_Init(&argc,&argv);
    MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
    MPI_Comm_rank(MPI_COMM_WORLD,&myid);
    MPI_Get_processor_name(processor_name,&namelen);
 
    fprintf(stdout,"Process %d of %d is on %s\n",
            myid, numprocs, processor_name);
    fflush(stdout);
 
    n = 10000;                  /* default # of rectangles */
    if (myid == 0)
        startwtime = MPI_Wtime();
 
    MPI_Bcast(&n, 1, MPI_INT, 0, MPI_COMM_WORLD);
 
    h   = 1.0 / (double) n;
    sum = 0.0;
    /* A slightly better approach starts from large i and works back */
    for (i = myid + 1; i <= n; i += numprocs)
    {
        x = h * ((double)i - 0.5);
        sum += f(x);
    }
    mypi = h * sum;
 
    MPI_Reduce(&mypi, &pi, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);
 
    if (myid == 0) {
        endwtime = MPI_Wtime();
        printf("pi is approximately %.16f, Error is %.16f\n",
               pi, fabs(pi - PI25DT));
        printf("wall clock time = %f\n", endwtime-startwtime);         
        fflush(stdout);
    }
 
    MPI_Finalize();
    return 0;
}

Previous instructions

MPI_Init() and MPI_Finalize(); Is used to initialize and end the MPI program.

MPI_Comm_rank(); Is used to give a special purpose to process 0 (timekeeping and printing of the result) as well as to partition the rectangles between the processes.

MPI_Bcast(); Is used to broadcast the total number of rectangles.

MPI_Get_processor_name(); Is used to print the location of each process.

New instructions

MPI_Reduce( &mypi, &pi, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD ); Sums (MPI_SUM) all the individual mypi values in the MPI_COMM_WORLD communicator into pi at process 0. There is 1 MPI_DOUBLE in each sendbuf (mypi). Examples of other operations to perform instead of MPI_SUM are MPI_MAX (maximum), MPI_PROD (product) and MPI_LAND (logical and).

MPI_Wtime(); Returns a floating point number representing the number of seconds elapsed since some time in the past. This can be used to time some event, like the computational time in this example.

Compile & Run

If you have not already done so, obtain all the example code here.

Switch to the Intel compiler (optional, only necessary once in each terminal session)

$ module load intel

Compile the program using

$ make

Submit the job to the queue

$ make submit

The output from the program execution is placed in the output folder

$ cat output/*
Process 0 of 16 is on compute-2-18.local
Process 4 of 16 is on compute-2-18.local
Process 6 of 16 is on compute-2-18.local
Process 7 of 16 is on compute-2-18.local
Process 9 of 16 is on compute-2-8.local
Process 14 of 16 is on compute-2-8.local
Process 8 of 16 is on compute-2-8.local
Process 11 of 16 is on compute-2-8.local
Process 12 of 16 is on compute-2-8.local
Process 10 of 16 is on compute-2-8.local
Process 13 of 16 is on compute-2-8.local
Process 15 of 16 is on compute-2-8.local
Process 2 of 16 is on compute-2-18.local
Process 5 of 16 is on compute-2-18.local
Process 1 of 16 is on compute-2-18.local
Process 3 of 16 is on compute-2-18.local
pi is approximately 3.1415926544231274, Error is 0.0000000008333343
wall clock time = 0.003900