%$Id$
\documentclass[a4paper,12pt]{article}
\usepackage[utf8]{inputenc}
\usepackage{pslatex}
\usepackage{eurosym}
\usepackage{amssymb}
\usepackage{latexsym}
\usepackage[dvips]{graphicx}
\usepackage{delarray}
\usepackage{amsmath}
%\usepackage{bbm}
%\usepackage{bbold}
%\usepackage{accents}
\usepackage{subfigure}
\usepackage{multirow}
\usepackage{fancyhdr}
%\usepackage{tocbibind}
%\usepackage{bibtex}
%\usepackage{wrapfig}
\usepackage{color}
\RequirePackage{hyperref}
\hypersetup{
	colorlinks = true,
	linkcolor = blue,
	citecolor = green,
	filecolor = green,
	urlcolor = red,
	pdfborder = {0 0 0}
}
\usepackage[hyphenbreaks]{breakurl}
%\usepackage{fmtcount}
\usepackage{parskip}

\newcommand{\command}[1]{\texttt{#1}}
\newcommand{\file}[1]{``\texttt{#1}''}
\newcommand{\directory}[1]{`\texttt{#1}'}
\newcommand{\name}[1]{\textsc{#1}}
\newcommand{\code}[1]{\texttt{#1}}

\frenchspacing

\graphicspath{{./fig/}{./png/}}

\setlength{\hoffset}{-1in}
\setlength{\textwidth}{7.5in}
\setlength{\voffset}{-0.5in}
\setlength{\textheight}{9.5in}

\title{Pencil Code: Quick Start guide}

\author{Illa R. Losada, Michiel Lambrechts, Elizabeth Cole, Philippe Bourdin}


\begin{document}
\maketitle

\tableofcontents

\newpage

\section{Required software}

\subsection{\name{Linux}}
If you plan to run serious simulations, you have to familiarize yourself with the Linux shell (or terminal), which is covered in many available online tutorials.
A \name{Fortran} and a \name{C} compiler are needed to compile the code.
Both compilers should belong to the same distribution package and version (e.g. GNU GCC or Intel).
For HDF5 and MPI support, you need to install the package \directory{libhdf5-openmpi-dev} that automatically pulls in the GNU compilers and OpenMPI. On a recent ubuntu system this is done with:
\begin{verbatim}
  sudo apt install libhdf5-openmpi-dev
\end{verbatim}

\subsection{\name{MacOS X}}
There is some community to run Pencil Code directly under MacOS, but this is not officially supported. We strongly recommend to use Linux, e.g. on a virtual machine or via dual boot, because all serious scientific computing infrastructures are using Unix variants like Linux.

WARNING: This section is not up-to-date and not tested! Someone needs to fully rewrite this section based on a current MacOS.

Some 5--10 years ago, this was the procedure:
You first need to install \name{Xcode} from the
website \url{http://developer.apple.com/}, where you have to register as a member.
In addition to that it might be necessary to install a \name{fortran} \name{open-mpi} compiler which can be done by first installing \name{homebrew} (available at \url{https://brew.sh}) and then through the terminal command:
\begin{verbatim}
  brew install open-mpi
\end{verbatim}
A \command{gfortran} binary package can also be found at the website:

\url{http://gcc.gnu.org/wiki/GFortranBinaries}

Just download the file and use the installer contained therein.
It installs into \directory{/usr/local/gfortran} with
a symbolic link in \directory{/usr/local/bin/gfortran}. It might be necessary to add
the following line to the \file{.cshrc}-file in your \directory{/home} folder:
\begin{verbatim}
  setenv PATH /usr/local/bin:\$PATH
\end{verbatim} 
For \name{python} users, moreover, new versions of the operating system often give conflicts with \name{anaconda} when compiling, so it might be necessary to use the terminal command 
\begin{verbatim}
  conda deactivate
\end{verbatim}
before compiling and running the Pencil Code.
If one needs to reactivate \name{anaconda}, it is done using
\begin{verbatim}
  conda activate
\end{verbatim}

\section{Download the Pencil Code}
\label{sec:download}
The Pencil Code is an open source code written mainly in \name{Fortran} and available under GPL.
General information can be found at our official homepage:

\url{http://pencil-code.org/}.

The latest version of the code can be downloaded with \command{svn}. In the
directory where you want to put the code, type:
\begin{verbatim}
  svn checkout https://github.com/pencil-code/pencil-code/trunk/ pencil-code
\end{verbatim}

Alternatively, you may also use \command{git}:
\begin{verbatim}
  git clone https://github.com/pencil-code/pencil-code.git
\end{verbatim}

More details on download options can be found here: \url{http://pencil-code.org/download.php}

The downloaded \directory{pencil-code} directory contains several sub-directories:
\begin{enumerate}
  \item \directory{doc}: you may build the latest manual as PDF by issuing the command \command{make} inside this directory
  \item \directory{samples}: contains many sample problems
  \item \directory{config}: has all the configuration files
  \item \directory{src}: the actual source code
  \item \directory{bin} and \directory{lib}: supplemental scripts
  \item \directory{idl}, \directory{python}, \directory{julia}, etc.: data processing for diverse languages
\end{enumerate}


\section{Configure the shell environment}

You need to export some environment variables into your shell.
Please change to the freshly downloaded directory:
\begin{verbatim}
  cd pencil-code
\end{verbatim}

Probably you use a \command{sh}-compatible shell (like the \name{Linux} default shell \command{bash}), there you just type:\\[2mm]
\command{ . ./sourceme.sh  }  or  \command{  source sourceme.sh}

(In a \command{csh}-compatible shell, like \command{tcsh}, use this alternative: \command{source sourceme.csh} )


\section{Your first simulation run}

\subsection{Create a new run-directory}

Now create a run-directory and clone the input and configuration files from one of the samples that fits you best to get started quickly (here from \directory{pencil-code/samples/1d-tests/jeans-x}):
\begin{verbatim}
  mkdir -p /data/myuser/myrun/src
  cd /data/myuser/myrun
  cp $PENCIL_HOME/samples/1d-tests/jeans-x/*.in ./
  cp $PENCIL_HOME/samples/1d-tests/jeans-x/src/*.local src/
\end{verbatim}
Your run should be put outside of your \directory{/home} directory, if you expect to generate a lot of data and you have a tight storage quota in your \directory{/home}.

\subsection{Linking to the sources}

One command sets up all needed symbolic links to the original Pencil Code directory:
\begin{verbatim}
  pc_setupsrc
\end{verbatim}

\subsection{Makefile and parameters}

Two basic configuration files define a simulation setup: \file{src/Makefile.local} contains a list of modules that are being used, and \file{src/cparam.local} defines the grid size and the number of processors to be used.
Take a quick look at these files...


\subsubsection{Single-processor}
An example \file{src/Makefile.local} using the module for only one processor would look like:
\begin{verbatim}
  MPICOMM=nompicomm
\end{verbatim}

For most modules there is also a \file{no}-variant which switches that functionality off.

In \file{src/cparam.local} the number of processors needs to be set to \code{1} accordingly:
\begin{verbatim}
  integer, parameter :: ncpus=1,nprocx=1,nprocy=1,nprocz=ncpus/(nprocx*nprocy)
  integer, parameter :: nxgrid=128,nygrid=1,nzgrid=128
\end{verbatim}

\subsubsection{Multi-processor}
If you like to use \name{MPI} for multi-processor simulations, be sure that you have a \name{MPI} library installed and change \file{src/Makefile.local} to use \name{MPI}:
\begin{verbatim}
  MPICOMM=mpicomm
\end{verbatim}

Change the \code{ncpus} setting in \file{src/cparam.local}.
Think about how you want to distribute the volume on the processors --- usually, you should have 128 grid points in the x-direction to take advantage of the SIMD processor unit.
For compilation, you have to use a configuration file that includes the \file{\_MPI} suffix, see below.

\subsection{Compiling}

In order to compile the code, you can use a pre-defined configuration file corresponding to your compiler package.
E.g. the default compilers are \command{gfortran} together with \command{gcc} and the code is being built with default options (not using \name{MPI}) by issuing the command:

\begin{verbatim}
  pc_build
\end{verbatim}

Alternatively, for multi-processor runs (still using the default \name{GNU-GCC} compilers):
\begin{verbatim}
  pc_build -f GNU-GCC_MPI
\end{verbatim}


\subsubsection{Debugging and testing options}

During code development and testing phase, there is a common set of debug options that one should use for compiling.
These debug options are available as predefined sets for all major compilers.
E.g., if you use the \name{GNU-GCC} compilers, you would use this extension:
\begin{verbatim}
  pc_build -f GNU-GCC_MPI,GNU-GCC_debug
\end{verbatim}
Alternatively, if you know that you do not need MPI:
\begin{verbatim}
  pc_build -f GNU-GCC,GNU-GCC_debug
\end{verbatim}

\subsubsection{Using a different compiler (optional)}

If you prefer to use a different compiler package (e.g. with \name{MPI} support or using \command{ifort}), you may try:

\begin{verbatim}
  pc_build -f Intel
  pc_build -f Intel_MPI
  pc_build -f Cray
  pc_build -f Cray_MPI
\end{verbatim}

More pre-defined configurations are found in the directory \file{pencil-code/config/compilers/*.conf}.

\subsubsection{Individual compiler setup (optional)}

Of course you can also create a configuration file in any subdirectory of \directory{pencil-code/config/hosts/}.
By default, \command{pc\_build} looks for a config file that is based on your \code{host-ID}, which you may see with the command:
\begin{verbatim}
  pc_build -i
\end{verbatim}
You may add your modified configuration with the filename \file{host-ID.conf}, where you can change compiler options according to the Pencil Code manual.
A good host configuration example, that you may clone and adapt according to your needs, is \file{pencil-code/config/hosts/IWF/host-andromeda-GNU\_Linux-Linux.conf}.

\subsection{Running}

The initial conditions are set in \file{start.in} and the parameters for the main simulation run can be found in \file{run.in}.
In \file{print.in} you can choose which quantities are written to the file \file{data/time\_series.dat}.

Be sure you have created an empty \directory{data} directory.
\begin{verbatim}
  mkdir data
\end{verbatim}

It is now time to run the code:
\begin{verbatim}
  pc_run
\end{verbatim}
If everything worked well, your output should contain the line
\begin{verbatim}
  start.x has completed successfully
\end{verbatim}
after initializing everything successfully. It would then start running,   
printing in the console the quantities specified in \file{print.in}, for 
instance,
\begin{verbatim}
---it--------t-------dt------rhom------urms------uxpt-----uypt-----uzpt-----
       0      0.00 4.9E-03 1.000E+00  1.414E+00 2.00E+00 0.00E+00 0.00E+00
      10      0.05 4.9E-03 1.000E+00  1.401E+00 1.98E+00 0.00E+00 0.00E+00
      20      0.10 4.9E-03 1.000E+00  1.361E+00 1.88E+00 0.00E+00 0.00E+00 
      .......
\end{verbatim}
ending with
\begin{verbatim}
  Simulation finished after        xxxx  time-steps
  .....
  Wall clock time/timestep/meshpoint [microsec] = ...
\end{verbatim}
An empty file called \file{COMPLETED} will appear in your run directory once 
the run is finished.


If you work with one of the samples or an identical setup in a new working directory, you can verify the correctness of the results
by checking against reference data, delivered with each sample:
\begin{verbatim}
  diff reference.out data/time_series.dat
\end{verbatim}

Welcome to the world of Pencil Code!

\subsection{Troubleshooting}

If compiling fails, please try the following --- with or without the optional \command{\_MPI} for \name{MPI} runs:
\begin{verbatim}
  pc_build --cleanall
  pc_build -f GNU-GCC_MPI
\end{verbatim}

If some step still fails, you may report to our mailing list: \url{http://pencil-code.org/contact.php}.
In your report, please state the exact point in this quick start guide that fails for you (including the full error message) --- and be sure you precisely followed all non-optional instructions from the beginning.

In addition to that, please report your operating system (if not \name{Linux}-based) and the shell you use (if not \command{bash}).
Also please give the full output of these commands:
\begin{verbatim}
  bash
  cd path/to/your/pencil-code/
  source sourceme.sh
  echo $PENCIL_HOME
  ls -la $PENCIL_HOME/bin
  cd samples/1d-tests/jeans-x/
  gcc --version
  gfortran --version
  pc_build --cleanall
  pc_build -d
\end{verbatim}

If you plan to use \name{MPI}, please also provide the full output of:
\begin{verbatim}
  mpicc --version
  mpif90 --version
  mpiexec --version
\end{verbatim}

\section{Data post-processing}

\subsection{IDL visualization (optional, recommended)}
% The goal of this section is to demonstrate the general work flow with a very simple example.

\subsubsection{GUI-based visualization (recommended for quick inspection)}
The most simple approach to visualize a Cartesian grid setup is to run the Pencil Code GUI and to select the files and physical quantities you want to see:
\begin{verbatim}
IDL> .r pc_gui
\end{verbatim}
If you miss some physical quantities, you might want to extend the two IDL routines \command{pc\_get\_quantity} and \command{pc\_check\_quantities}. Anything implemented there will be available in the GUI, too.

\subsubsection{Command-line based processing of ``big data''}
Please check the documentation inside these files:
\begin{center}
\begin{tabular}{|l|l|}\hline
  \command{pencil-code/idl/read/pc\_read\_var\_raw.pro} & efficient reading of raw data\\\hline
  \command{pencil-code/idl/read/pc\_read\_subvol\_raw.pro} & reading of sub-volumes\\\hline
  \command{pencil-code/idl/read/pc\_read\_slice\_raw.pro} & reading of any 2D slice from 3D snapshots\\\hline
  \command{pencil-code/idl/pc\_get\_quantity.pro} & compute physical quantities out of raw data\\\hline
  \command{pencil-code/idl/pc\_check\_quantities.pro} & dependency checking of physical quantities\\\hline
\end{tabular}
\end{center}
in order to read data efficiently and compute quantities in physical units.

\subsubsection{Command-line based data analysis (may be inefficient)}
Several \name{idl}-procedures have been written
(see in \directory{pencil-code/idl}) to facilitate inspecting the data
that can be found in raw format in \directory{jeans-x/data}.
For example, let us inspect the time series data
\begin{verbatim}
IDL> pc_read_ts, obj=ts
\end{verbatim}
The structure \code{ts} contains several variables that can be inspected by
\begin{verbatim}
IDL> help, ts, /structure
** Structure <911fa8>, 4 tags, length=320, data length=320, refs=1:
   IT              LONG      Array[20]
   T               FLOAT     Array[20]
   UMAX            FLOAT     Array[20]
   RHOMAX          FLOAT     Array[20]
\end{verbatim}
The diagnostic \code{UMAX}, the maximal velocity, is available since it was set
in \file{jeans-x/print.in}. Please check the manual for more information about the input files.

We plot now the evolution of \code{UMAX} after the initial perturbation that is defined in \file{start.in}:
\begin{verbatim}
IDL> plot, ts.t, alog(ts.umax)
\end{verbatim}
% TODO Include screen shot

The complete state of the simulation is saved as snapshot files in
\file{jeans-x/data/proc0/VAR*} every \code{dsnap} time units,
as defined in \file{jeans-x/run.in}.
These snapshots, for example \file{VAR5}, can be loaded with:
\begin{verbatim}
IDL> pc_read_var, obj=ff, varfile="VAR5", /trimall
\end{verbatim}

Similarly \code{tag\_names} will provide us with the available variables:
\begin{verbatim}
IDL> print, tag_names(ff)
T X Y Z DX DY DZ UU LNRHO POTSELF
\end{verbatim}

The logarithm of the density can be inspected by using a GUI:
\begin{verbatim}
IDL> cslice, ff.lnrho
\end{verbatim}

Of course, for scripting one might use any quantity from the \code{ff} structure, like calculating the average density:
\begin{verbatim}
IDL> print, mean(exp(ff.lnrho))
\end{verbatim}


\subsection{Python visualization (optional)}
Be advised that the \name{Python} support is still not complete or as feature-rich as for \name{IDL}.
Furthermore, we move to \name{Python3} in 2020, and not all the routines have 
been updated yet.

\subsubsection{Python module requirements}
 

In this example we use the modules: \code{numpy} and \code{matplotlib}.
A complete list of required module is included in 
\file{pencil-code/python/pencil/README}.


\subsubsection{Using the 'pencil' module}
After sourcing the \file{sourceme.sh} script (see above), you should be able to import the \code{pencil} module:

\begin{verbatim}
import pencil as pc
\end{verbatim}

Some useful functions:
\begin{center}
\begin{tabular}{|l|l|}\hline
\command{pc.read.ts} & read \file{time\_series.dat}. Parameters are added as members of the class\\\hline
\command{pc.read.slices} & read 2D slice files and return two arrays: (nslices,vsize,hsize) and (time)\\\hline
\command{pc.visu.animate\_interactive} & assemble a 2D animation from a 3D 
array\\\hline
%× & ×\\\hline
%× & ×\\\hline
%× & ×\\\hline
\end{tabular}
\end{center}

Some examples of postprocessing with \name{Python} can be found in the 
\name{Pencil} wiki:


\url{https://github.com/pencil-code/pencil-code/wiki/PythonForPencil}.

% This is out of the scope of a "quick start" guide.
% One might better implement this as an "highlight" on the website:
% \section{Another example: helically forced turbulence}
% \input{example2}
\section{GPU acceleration}
Significant parts of the code can be accelerated by running them on GPUs, employing the framework Astaroth (\url{https://bitbucket.org/jpekkila/astaroth/}).
This requires to install or load additional libraries/modules as well as to download additional repositories.
\subsection{Prerequisites}
Of course, your hardware must provide at least one GPU. For NVIDIA devices, you would have to install/load the CUDA toolbox, for AMD devices a corresponding  HIP package.
Additionally, \code{flex}, \code{bison}, \code{python}, and \code{cmake} (version 3.21 or newer), are needed.
\subsection{Downloading}
For a fresh checkout of all needed repositories, instead of the \name{git} cloning described in Sec.~\ref{sec:download}, do
\begin{verbatim}
git clone -b master --recurse-submodules --remote https://github.com/pencil-code/pencil-code.git 
\end{verbatim}

\vspace{-2mm}
\begin{verbatim}
    pencil-code
\end{verbatim}
or
\begin{verbatim}
git clone -b master --recurse-submodules --remote https://pencil-code.org/git/ pencil-code
\end{verbatim}
of course with additional username specification as shown in Sec.~\ref{sec:download} for write access to the latter repo.
Installation by SVN is not supported.

The new directory
\code{pencil-code/src/astaroth} contains the GPU-related software.
To add this part to an already existing installation of Pencil Code, do
\begin{verbatim}
git submodule update --init --remote
cd src/astaroth/submodule
git checkout develop
\end{verbatim}
in the Pencil Code installation directory.

\subsection{Compiling}
In Makefile.local,  an additional setting is needed:
\begin{verbatim}
GPU = gpu_astaroth
\end{verbatim}
%TRANSPILATION = on
%RUNTIME_COMPILATION = on

Then, execute \command{pc\_build} as usual.
If you have activated GPU acceleration for an already existing setup, repeat \command{pc\_setupsrc} before the build.

\subsection{Running}
Important: the parameter \command{ncpus} in \command{cparam.local} has now to be understood as ``number of MPI ranks = number of UNIX processes = number of GPUs". The number of CPUs employed is always larger  as
the accelerated code makes use of CPU multithreading. Hence, setting in the environment (\code{bash}: by \command{export})
\begin{verbatim}
 OMP_NUM_THREADS=<ncores>
\end{verbatim}
with \command{ncores}$ >=2$ before running is mandatory. 
Choose \command{ncores} as large as sensible in view of the hardware; on clusters consult the node specification.
The additional settings
\begin{verbatim}
 OMP_MAX_ACTIVE_LEVELS=2
 OMP_PROC_BIND=close,spread
 OMP_WAIT_POLICY=PASSIVE
\end{verbatim}
are recommended. Note, that when running in a job system like SLURM, the number of CPUs assigned to an MPI process (\command{srun} option: \command{--cpus-per-task}) must agree with \command{ncores} and all exports to the environment need to be included in the \code{sbatch} script.

If you are using MPICH, add the following in your job script before calling \command{pc\_run} but after \command{pc\_start}.
\begin{verbatim}
export MPICH_GPU_SUPPORT_ENABLED=1
\end{verbatim}

The normal usage of GPUs will imply runtime compilation, 
because of which for module environments it is imperative that the same modules are used for the run as for the build.
To ensure this, add at the beginning of your \code{sbatch} script
\begin{verbatim}
source src/.moduleinfo
\end{verbatim}
Then, you run the code as usual with \command{pc\_start}/\command{pc\_run}. 

Example SLURM batch script (CRAY machines):
\begin{verbatim}
#SBATCH --account=...
#SBATCH --partition=...
#SBATCH --nodes=...
#SBATCH --ntasks-per-node=8
#SBATCH --gpus-per-node=8        # allocates one gpu per MPI rank 
#SBATCH --cpus-per-task=7        # seven cpus per task for multithreading
#SBATCH --exclusive              # avoids other jobs to run on the node(s)
#SBATCH --mem=0                  # requests all available node memory

source src/.moduleinfo

export LD_LIBRARY_PATH=${CRAY_LD_LIBRARY_PATH}:$LD_LIBRARY_PATH
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}
export OMP_MAX_ACTIVE_LEVELS=2
export OMP_PROC_BIND=close,spread
export OMP_WAIT_POLICY=PASSIVE

pc_start
export MPICH_GPU_SUPPORT_ENABLED=1   # MPICH only
pc_run

\end{verbatim}

Running with GPU acceleration for the first time in a work directory will take significant time ($\lesssim$ 10 minutes) for several preliminary steps before the simulation proper starts. At later (re)starts, this time reduces strongly as long as no input parameters or \command{*.local} settings are changed.

\subsection{Limitations}
\begin{itemize}
\item For yet unaccelerated modules, see the exclusion list in \command{gpu\_astaroth.f90, subroutine initialize\_GPU}.\\
         You may derive an (incomplete) {\it inclusion} list from the auto-tested samples (see \url{https://norlx51.nordita.org/tests/GPU/} )
\item The \command{*before|after\_boundary} subroutines are incompletely accelerated - always test a setup against the CPU version!
\item Diagnostics, which are implemented in a non-standard way in Pencil Code, will not be computed and appear in the output as zeros.
\end{itemize}

\subsection{Troubleshooting}
The Astaroth-related part of the build process is logged separately into \code{src/astaroth/ac\_build.log}, while runtime-compilation is logged into \code{ac\_runtime\_compilation.log}.
Due to the inclusion of Astaroth, the process is somewhat more fragile. 
Most trouble is caused by an inappropriate combination of libraries/modules.
Administrators should be able to help here.

As Astaroth by default assumes that communication is done with CUDA-aware MPI (direct data transfer between GPUs),
employing an MPI library/module, lacking such support, will cause the code to crash (segmentation fault). 
In this case, try to turn off CUDA-aware MPI by
\begin{verbatim}
  lcuda_aware_mpi=F
\end{verbatim}
in \code{gpu\_run\_pars}.

\end{document}
