-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathREADME
156 lines (125 loc) · 4.96 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
A Vectorized Implementation of the Tersoff Potential
for the LAMMPS Molecular Dynamics Software
====================================================
Author: Markus Höhnerbach <[email protected]>
Date: 4 Aug 2016
This project provides the source code of a vectorized
implementation of the Tersoff potential.
We target a variety of processors with conventional vector
instruction sets such as NEON, SSE, AVX, and AVX2, the first
and second generation of the Xeon Phi accelerator, as well
as NVIDIA GPUs.
There is experimental support for platform-agnostic
vectorization through the Cilk array notation.
Supported compilers: ICC 14.0, 15.0 or 16.0, GCC (ARM)
Supported MPI: Intel MPI
The code builts upon the existing Xeon Phi support and
vectorization capabilities of the USER-INTEL LAMMPS
package as well as the GPU support from the KOKKOS package.
Overview
--------
benchmarks/
vect/
very simple benchmark to measure vect. efficiency.
lammps/
input files, parameter files and scripts to conduct
benchmarking and accuracy tests. Subfolders contain
results from real-world systems.
machines/
lammps-10Mar16/
complete lammps source code that is certain to work
with the provided source code.
<a>-<b>_<c>/
folder to build lammps on a specific system. Names:
a = organization, b = CPU arch, c = accelerator.
These folders contain a build.sh script that shows
how to build binaries to experiment with on a given
system.
src/
The core source code that contains the vectorized
Tersoff potential. Can be dropped into an existing
LAMMPS install with USER-INTEL package installed,
and should just work.
test/
Contains a script to test the code against bothh the
benchmark and randomly generated systems of multiple
species. Invoke the python script with the binary
that you would like to test. For now only works with
the USER-INTEL package.
Installation (simple)
---------------------
To try this code out, download LAMMPS from lammps.sandia.gov,
and extract the files to some directory $LAMMPS_DIR.
In the following, $THIS denotes the directory where this
README is located.
You need to enable the packages MANYBODY, USER-OMP and USER-INTEL:
$ cd $LAMMPS_DIR/src
$ make yes-MANYBODY yes-USER-OMP yes-USER-INTEL
Copy the files pair_tersoff_intel.h, pair_tersoff_intel.cpp
and intel_intrinsics.h from $THIS/src/ to $LAMMPS_DIR/src.
Build LAMMPS (make sure to have ICC with offloading support
and Intel MPI loaded):
$ make intel_phi
This creates a binary $LAMMPS_DIR/src/lmp_intel_phi.
Testing (simple)
----------------
To test this binary, use the provided test-script:
$ cd $THIS/test
$ python test.py $LAMMPS_DIR/src/lmp_intel_phi
All the tests should turn green.
Usage
-----
For further usage instructions, please have a look at
the documentation of the USER-INTEL package.
The code neatly plugs into that framework, all you need
to do is
1. specify the correct "package intel" command according
to the USER-INTEL docs, to initialize the correct usage
mode.
2. use the Tersoff potential and set the suffix to "intel"
Getting Started
---------------
If you just want to try out the code and make some
obvservations on its performance, the easiest way to do so
is to download the LAMMPS-provided benchmark for the Tersoff
potential, and pass the correct options via the command line.
$ http://lammps.sandia.gov/bench/bench_tersoff.tar.gz
$ tar xfz bench_tersoff.tar.gz
$ cd tersoff
$ $LAMMPS_DIR/src/lmp_intel_phi -in in.tersoff -pk omp 0 \
-pk intel 1 balance $BALANCE mode $MODE -sf intel
1. Choose $MODE as either single, double or mixed depending
on the precision you want the run to use.
2. Choose $BALANCE according to where you want to run:
0 runs everything on the host, 1 everything on the Phi,
values in between split the computation. -1 will perform
automatic load balancing.
In-Depth Benchmarking
---------------------
For in-depth benchmarking, build all the binaries that you
would like to investigate (machines/*/build.sh show how to
build a variety of targets).
For single-node benchmarking, benchmarks/lammps contains
shell scripts to conduct a number of experiments.
For multi-node benchmarking, machines/lrz-ib_phi contains
a python script to showcase how to create job-scripts to
be submitted to a batch system.
If you can't run the code on suitable machines, check out
the result folders, i.e. benchmarks/lammps/results* and
machines/lrz-ib_phi/run*, as they contain real-world data
from a selection of machines.
Limitations
-----------
It inherits all the limitations inherent to the USER-INTEL
package or the KOKKOS package, please look at that documentation
for details.
Reference
---------
There is a preprint describing this work on arXiv.org:
https://arxiv.org/abs/1607.02904
License
-------
The code is licensed in accordance with the LAMMPS copyright
under the GNU General Public License Version 2 onwards.
The vector math functions in vector_math_neon.h are copyrighted
by Julien Pommier under the zlib license.