add acknowledgements in doc and README
[LbmBenchmarkKernelsPublic.git] / doc / main.rst
CommitLineData
10988083
MW
1.. # --------------------------------------------------------------------------
2 #
3 # Copyright
4 # Markus Wittmann, 2016-2017
5 # RRZE, University of Erlangen-Nuremberg, Germany
6 # markus.wittmann -at- fau.de or hpc -at- rrze.fau.de
7 #
8 # Viktor Haag, 2016
9 # LSS, University of Erlangen-Nuremberg, Germany
10 #
11 # This file is part of the Lattice Boltzmann Benchmark Kernels (LbmBenchKernels).
12 #
13 # LbmBenchKernels is free software: you can redistribute it and/or modify
14 # it under the terms of the GNU General Public License as published by
15 # the Free Software Foundation, either version 3 of the License, or
16 # (at your option) any later version.
17 #
18 # LbmBenchKernels is distributed in the hope that it will be useful,
19 # but WITHOUT ANY WARRANTY; without even the implied warranty of
20 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
21 # GNU General Public License for more details.
22 #
23 # You should have received a copy of the GNU General Public License
24 # along with LbmBenchKernels. If not, see <http://www.gnu.org/licenses/>.
25 #
26 # --------------------------------------------------------------------------
27
28.. title:: LBM Benchmark Kernels Documentation
29
30
31===================================
32LBM Benchmark Kernels Documentation
33===================================
34
35.. sectnum::
36.. contents::
37
38Compilation
39===========
40
41The benchmark framework currently supports only Linux systems and the GCC and
42Intel compilers. Every other configuration probably requires adjustment inside
43the code and the makefiles. Further some code might be platform or at least
44POSIX specific.
45
46The benchmark can be build via ``make`` from the ``src`` subdirectory. This will
47generate one binary which hosts all implemented benchmark kernels.
48
49Binaries are located under the ``bin`` subdirectory and will have different names
50depending on compiler and build configuration.
51
52Debug and Verification
53----------------------
54
55::
56
57 make
58
59Running ``make`` without any arguments builds the debug version (BUILD=debug) of
60the benchmark kernels, where no optimizations are performed, line numbers and
61debug symbols are included as well as ``DEBUG`` will be defined. The resulting
62binary will be found in the ``bin`` subdirectory and named
63``lbmbenchk-linux-<compiler>-debug``.
64
65Without any further specification the binary includes verification
66(``VERIFICATION=on``), statistics (``STATISTICS``), and VTK output
67(``VTK_OUTPUT=on``) enabled.
68
69Please note that the generated binary will therefore
70exhibit a poor performance.
71
72Benchmarking
73------------
74
75To generate a binary for benchmarking run make with ::
76
77 make BENCHMARK=on BUILD=release
78
79Here BUILD=release turns optimizations on and BENCHMARK=on disables
80verfification, statistics, and VTK output.
81
82Release and Verification
83------------------------
84
85Verification with the debug builds can be extremely slow. Hence verification
86capabilities can be build with release builds: ::
87
88 make BUILD=release
89
90Compilers
91---------
92
93Currently only the GCC and Intel compiler under Linux are supported. Between
94both configuration can be chosen via ``CONFIG=linux-gcc`` or
95``CONFIG=linux-intel``.
96
97Options Summary
98---------------
99
100Options that can be specified when building the framework with make:
101
102============= ======================= ============ ==========================================================
103name values default description
104------------- ----------------------- ------------ ----------------------------------------------------------
105TARCH -- -- Via TARCH the architecture the compiler generates code for can be overriden. The value depends on the chose compiler.
106BENCHMARK on, off off If enabled, disables VERIFICATION, STATISTICS, VTK_OUTPUT.
107BUILD debug, release debug No optimization, debug symbols, DEBUG defined.
108CONFIG linux-gcc, linux-intel linux-intel Select GCC or Intel compiler.
109ISA avx, sse avx Determines which ISA extension is used for macro definitions. This is *not* the architecture the compiler generates code for.
110OPENMP on, off on OpenMP, i.\,e.\. threading support.
111STATISTICS on, off off View statistics, like density etc, during simulation.
112VERIFICATION on, off off Turn verification on/off.
113VTK_OUTPUT on, off off Enable/Disable VTK file output.
114============= ======================= ============ ==========================================================
115
116Invocation
117==========
118
119Running the binary will print among the GPL licence header a line like the following:
120
121 LBM Benchmark Kernels 0.1, compiled Jul 5 2017 21:59:22, type: verification
122
123if verfication was enabled during compilation or
124
125 LBM Benchmark Kernels 0.1, compiled Jul 5 2017 21:59:22, type: benchmark
126
127if verfication was disabled during compilation.
128
129Command Line Parameters
130-----------------------
131
132Running the binary with ``-h`` list all available parameters: ::
133
134 Usage:
135 ./lbmbenchk -list
136 ./lbmbenchk
137 [-dims XxYyZ] [-geometry box|channel|pipe|blocks[-<block size>]] [-iterations <iterations>] [-lattice-dump-ascii]
138 [-rho-in <density>] [-rho-out <density] [-omega <omega>] [-kernel <kernel>]
139 [-periodic-x]
140 [-t <number of threads>]
141 [-pin core{,core}*]
142 [-verify]
143 -- <kernel specific parameters>
144
145 -list List available kernels.
146
147 -dims XxYxZ Specify geometry dimensions.
148
149 -geometry blocks-<block size>
150 Geometetry with blocks of size <block size> regularily layout out.
151
152
153If an option is specified multiple times the last one overrides previous ones.
154This holds also true for ``-verify`` which sets geometry dimensions,
155iterations, etc, which can afterward be override, e.g.: ::
156
157 $ bin/lbmbenchk-linux-intel-release -verfiy -dims 32x32x32
158
159Kernel specific parameters can be opatained via selecting the specific kernel
160and passing ``-h`` as parameter: ::
161
162 $ bin/lbmbenchk-linux-intel-release -kernel -- -h
163 ...
164 Kernel parameters:
165 [-blk <n>] [-blk-[xyz] <n>]
166
167
168A list of all available kernels can be obtained via ``-list``: ::
169
170 $ ../bin/lbmbenchk-linux-gcc-debug -list
171 Lattice Boltzmann Benchmark Kernels (LbmBenchKernels) Copyright (C) 2016, 2017 LSS, RRZE
172 This program comes with ABSOLUTELY NO WARRANTY; for details see LICENSE.
173 This is free software, and you are welcome to redistribute it under certain conditions.
174
175 LBM Benchmark Kernels 0.1, compiled Jul 5 2017 21:59:22, type: verification
176 Available kernels to benchmark:
177 list-aa-pv-soa
178 list-aa-ria-soa
179 list-aa-soa
180 list-aa-aos
181 list-pull-split-nt-1s-soa
182 list-pull-split-nt-2s-soa
183 list-push-soa
184 list-push-aos
185 list-pull-soa
186 list-pull-aos
187 push-soa
188 push-aos
189 pull-soa
190 pull-aos
191 blk-push-soa
192 blk-push-aos
193 blk-pull-soa
194 blk-pull-aos
195
196
197Benchmarking
198============
199
200Correct benchmarking is a nontrivial task. Whenever benchmark results should be
201created make sure the binary was compiled with:
202
203- ``BENCHMARK=on`` and
204- ``BUILD=release`` and
205- the correct ISA for macros is used, selected via ``ISA`` and
206- use ``TARCH`` to specify the architecture the compiler generates code for.
207
208During benchmarking pinning should be used via the ``-pin`` parameter. Running
209a benchmark with 10 threads an pin them to the first 10 cores works like ::
210
211 $ bin/lbmbenchk-linux-intel-release ... -t 10 -pin $(seq -s , 0 9)
212
213Things the binary does nor check or controll:
214
215- transparent huge pages: when allocating memory small 4 KiB pages might be
216 replaced with larger ones. This is in general a good thing, but if this is
217 really the case, depends on the system settings.
218
219- CPU/core frequency: For reproducible results the frequency of all cores
220 should be fixed.
221
222- NUMA placement policy: The benchmark assumes a first touch policy, which
223 means the memory will be placed at the NUMA domain the touching core is
224 associated with. If a different policy is in place or the NUMA domain to be
225 used is already full memory might be allocated in a remote domain. Accesses
226 to remote domains typically have a higher latency and lower bandwidth.
227
228- System load: interference with other application, espcially on desktop
229 systems should be avoided.
230
231- Padding: most kernels do not care about padding against cache or TLB
232 thrashing. Even if the number of (fluid) nodes suggest everything is fine,
233 through parallelization still problems might occur.
234
235- CPU dispatcher function: the compiler might add different versions of a
236 function for different ISA extensions. Make sure the code you might think is
237 executed is actually the code which is executed.
238
4e91c4b6
MW
239
240Acknowledgements
241================
242
243This work was funded by BMBF, grant no. 01IH15003A (project SKAMPY).
244
245This work was funded by KONWHIR project OMI4PAPS.
246
247
248
10988083
MW
249.. |datetime| date:: %Y-%m-%d %H:%M
250
251Document was generated at |datetime|.
252
This page took 0.103291 seconds and 5 git commands to generate.