README

A Vectorized Implementation of the Tersoff Potential 
    for the LAMMPS Molecular Dynamics Software
====================================================

Author: Markus Höhnerbach <hoehnerbach@aices.rwth-aachen.de>
Date:  4 Aug 2016

This project provides the source code of a vectorized
implementation of the Tersoff potential.
We target a variety of processors with conventional vector
instruction sets such as NEON, SSE, AVX, and AVX2, the first 
and second generation of the Xeon Phi accelerator, as well 
as NVIDIA GPUs.
There is experimental support for platform-agnostic 
vectorization through the Cilk array notation.

Supported compilers: ICC 14.0, 15.0 or 16.0, GCC (ARM)
Supported MPI: Intel MPI

The code builts upon the existing Xeon Phi support and
vectorization capabilities of the USER-INTEL LAMMPS
package as well as the GPU support from the KOKKOS package.

Overview
--------

benchmarks/
  vect/
    very simple benchmark to measure vect. efficiency.
  lammps/
    input files, parameter files and scripts to conduct
    benchmarking and accuracy tests. Subfolders contain
    results from real-world systems.
machines/
  lammps-10Mar16/
    complete lammps source code that is certain to work
    with the provided source code.
  <a>-<b>_<c>/
    folder to build lammps on a specific system. Names:
    a = organization, b = CPU arch, c = accelerator.
    These folders contain a build.sh script that shows
    how to build binaries to experiment with on a given
    system.
src/
  The core source code that contains the vectorized
  Tersoff potential. Can be dropped into an existing
  LAMMPS install with USER-INTEL package installed,
  and should just work.
test/
  Contains a script to test the code against bothh the
  benchmark and randomly generated systems of multiple
  species. Invoke the python script with the binary
  that you would like to test. For now only works with
  the USER-INTEL package.

Installation (simple)
---------------------

To try this code out, download LAMMPS from lammps.sandia.gov,
and extract the files to some directory $LAMMPS_DIR.
In the following, $THIS denotes the directory where this
README is located.
You need to enable the packages MANYBODY, USER-OMP and USER-INTEL:

$ cd $LAMMPS_DIR/src
$ make yes-MANYBODY yes-USER-OMP yes-USER-INTEL

Copy the files pair_tersoff_intel.h, pair_tersoff_intel.cpp
and intel_intrinsics.h from $THIS/src/ to $LAMMPS_DIR/src.

Build LAMMPS (make sure to have ICC with offloading support
and Intel MPI loaded):

$ make intel_phi

This creates a binary $LAMMPS_DIR/src/lmp_intel_phi.

Testing (simple)
----------------

To test this binary, use the provided test-script:

$ cd $THIS/test
$ python test.py $LAMMPS_DIR/src/lmp_intel_phi

All the tests should turn green.

Usage
-----

For further usage instructions, please have a look at
the documentation of the USER-INTEL package.
The code neatly plugs into that framework, all you need
to do is
1. specify the correct "package intel" command according
   to the USER-INTEL docs, to initialize the correct usage
   mode.
2. use the Tersoff potential and set the suffix to "intel"

Getting Started
---------------

If you just want to try out the code and make some
obvservations on its performance, the easiest way to do so
is to download the LAMMPS-provided benchmark for the Tersoff
potential, and pass the correct options via the command line.

$ http://lammps.sandia.gov/bench/bench_tersoff.tar.gz
$ tar xfz bench_tersoff.tar.gz
$ cd tersoff
$ $LAMMPS_DIR/src/lmp_intel_phi -in in.tersoff -pk omp 0 \
-pk intel 1 balance $BALANCE mode $MODE -sf intel

1. Choose $MODE as either single, double or mixed depending 
   on the precision you want the run to use.
2. Choose $BALANCE according to where you want to run:
   0 runs everything on the host, 1 everything on the Phi,
   values in between split the computation. -1 will perform
   automatic load balancing.

In-Depth Benchmarking
---------------------

For in-depth benchmarking, build all the binaries that you
would like to investigate (machines/*/build.sh show how to
build a variety of targets).
For single-node benchmarking, benchmarks/lammps contains
shell scripts to conduct a number of experiments.
For multi-node benchmarking, machines/lrz-ib_phi contains
a python script to showcase how to create job-scripts to
be submitted to a batch system.
If you can't run the code on suitable machines, check out
the result folders, i.e. benchmarks/lammps/results* and
machines/lrz-ib_phi/run*, as they contain real-world data
from a selection of machines.

Limitations
-----------

It inherits all the limitations inherent to the USER-INTEL
package or the KOKKOS package, please look at that documentation 
for details.

Reference
---------

There is a preprint describing this work on arXiv.org:
https://arxiv.org/abs/1607.02904

License
-------

The code is licensed in accordance with the LAMMPS copyright 
under the GNU General Public License Version 2 onwards.
The vector math functions in vector_math_neon.h are copyrighted
by Julien Pommier under the zlib license.