Commit | Line | Data |
---|---|---|
10988083 MW |
1 | .. # -------------------------------------------------------------------------- |
2 | # | |
3 | # Copyright | |
4 | # Markus Wittmann, 2016-2017 | |
5 | # RRZE, University of Erlangen-Nuremberg, Germany | |
6 | # markus.wittmann -at- fau.de or hpc -at- rrze.fau.de | |
7 | # | |
8 | # Viktor Haag, 2016 | |
9 | # LSS, University of Erlangen-Nuremberg, Germany | |
10 | # | |
11 | # This file is part of the Lattice Boltzmann Benchmark Kernels (LbmBenchKernels). | |
12 | # | |
13 | # LbmBenchKernels is free software: you can redistribute it and/or modify | |
14 | # it under the terms of the GNU General Public License as published by | |
15 | # the Free Software Foundation, either version 3 of the License, or | |
16 | # (at your option) any later version. | |
17 | # | |
18 | # LbmBenchKernels is distributed in the hope that it will be useful, | |
19 | # but WITHOUT ANY WARRANTY; without even the implied warranty of | |
20 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the | |
21 | # GNU General Public License for more details. | |
22 | # | |
23 | # You should have received a copy of the GNU General Public License | |
24 | # along with LbmBenchKernels. If not, see <http://www.gnu.org/licenses/>. | |
25 | # | |
26 | # -------------------------------------------------------------------------- | |
27 | ||
28 | .. title:: LBM Benchmark Kernels Documentation | |
29 | ||
30 | ||
31 | =================================== | |
32 | LBM Benchmark Kernels Documentation | |
33 | =================================== | |
34 | ||
35 | .. sectnum:: | |
36 | .. contents:: | |
37 | ||
38 | Compilation | |
39 | =========== | |
40 | ||
41 | The benchmark framework currently supports only Linux systems and the GCC and | |
42 | Intel compilers. Every other configuration probably requires adjustment inside | |
43 | the code and the makefiles. Further some code might be platform or at least | |
44 | POSIX specific. | |
45 | ||
46 | The benchmark can be build via ``make`` from the ``src`` subdirectory. This will | |
47 | generate one binary which hosts all implemented benchmark kernels. | |
48 | ||
49 | Binaries are located under the ``bin`` subdirectory and will have different names | |
50 | depending on compiler and build configuration. | |
51 | ||
52 | Debug and Verification | |
53 | ---------------------- | |
54 | ||
55 | :: | |
56 | ||
57 | make | |
58 | ||
59 | Running ``make`` without any arguments builds the debug version (BUILD=debug) of | |
60 | the benchmark kernels, where no optimizations are performed, line numbers and | |
61 | debug symbols are included as well as ``DEBUG`` will be defined. The resulting | |
62 | binary will be found in the ``bin`` subdirectory and named | |
63 | ``lbmbenchk-linux-<compiler>-debug``. | |
64 | ||
65 | Without any further specification the binary includes verification | |
66 | (``VERIFICATION=on``), statistics (``STATISTICS``), and VTK output | |
67 | (``VTK_OUTPUT=on``) enabled. | |
68 | ||
69 | Please note that the generated binary will therefore | |
70 | exhibit a poor performance. | |
71 | ||
72 | Benchmarking | |
73 | ------------ | |
74 | ||
75 | To generate a binary for benchmarking run make with :: | |
76 | ||
77 | make BENCHMARK=on BUILD=release | |
78 | ||
79 | Here BUILD=release turns optimizations on and BENCHMARK=on disables | |
80 | verfification, statistics, and VTK output. | |
81 | ||
82 | Release and Verification | |
83 | ------------------------ | |
84 | ||
85 | Verification with the debug builds can be extremely slow. Hence verification | |
86 | capabilities can be build with release builds: :: | |
87 | ||
88 | make BUILD=release | |
89 | ||
90 | Compilers | |
91 | --------- | |
92 | ||
93 | Currently only the GCC and Intel compiler under Linux are supported. Between | |
94 | both configuration can be chosen via ``CONFIG=linux-gcc`` or | |
95 | ``CONFIG=linux-intel``. | |
96 | ||
97 | Options Summary | |
98 | --------------- | |
99 | ||
100 | Options that can be specified when building the framework with make: | |
101 | ||
102 | ============= ======================= ============ ========================================================== | |
103 | name values default description | |
104 | ------------- ----------------------- ------------ ---------------------------------------------------------- | |
105 | TARCH -- -- Via TARCH the architecture the compiler generates code for can be overriden. The value depends on the chose compiler. | |
106 | BENCHMARK on, off off If enabled, disables VERIFICATION, STATISTICS, VTK_OUTPUT. | |
107 | BUILD debug, release debug No optimization, debug symbols, DEBUG defined. | |
108 | CONFIG linux-gcc, linux-intel linux-intel Select GCC or Intel compiler. | |
109 | ISA avx, sse avx Determines which ISA extension is used for macro definitions. This is *not* the architecture the compiler generates code for. | |
110 | OPENMP on, off on OpenMP, i.\,e.\. threading support. | |
111 | STATISTICS on, off off View statistics, like density etc, during simulation. | |
112 | VERIFICATION on, off off Turn verification on/off. | |
113 | VTK_OUTPUT on, off off Enable/Disable VTK file output. | |
114 | ============= ======================= ============ ========================================================== | |
115 | ||
116 | Invocation | |
117 | ========== | |
118 | ||
119 | Running the binary will print among the GPL licence header a line like the following: | |
120 | ||
121 | LBM Benchmark Kernels 0.1, compiled Jul 5 2017 21:59:22, type: verification | |
122 | ||
123 | if verfication was enabled during compilation or | |
124 | ||
125 | LBM Benchmark Kernels 0.1, compiled Jul 5 2017 21:59:22, type: benchmark | |
126 | ||
127 | if verfication was disabled during compilation. | |
128 | ||
129 | Command Line Parameters | |
130 | ----------------------- | |
131 | ||
132 | Running the binary with ``-h`` list all available parameters: :: | |
133 | ||
134 | Usage: | |
135 | ./lbmbenchk -list | |
136 | ./lbmbenchk | |
137 | [-dims XxYyZ] [-geometry box|channel|pipe|blocks[-<block size>]] [-iterations <iterations>] [-lattice-dump-ascii] | |
138 | [-rho-in <density>] [-rho-out <density] [-omega <omega>] [-kernel <kernel>] | |
139 | [-periodic-x] | |
140 | [-t <number of threads>] | |
141 | [-pin core{,core}*] | |
142 | [-verify] | |
143 | -- <kernel specific parameters> | |
144 | ||
145 | -list List available kernels. | |
146 | ||
147 | -dims XxYxZ Specify geometry dimensions. | |
148 | ||
149 | -geometry blocks-<block size> | |
150 | Geometetry with blocks of size <block size> regularily layout out. | |
151 | ||
152 | ||
153 | If an option is specified multiple times the last one overrides previous ones. | |
154 | This holds also true for ``-verify`` which sets geometry dimensions, | |
155 | iterations, etc, which can afterward be override, e.g.: :: | |
156 | ||
157 | $ bin/lbmbenchk-linux-intel-release -verfiy -dims 32x32x32 | |
158 | ||
159 | Kernel specific parameters can be opatained via selecting the specific kernel | |
160 | and passing ``-h`` as parameter: :: | |
161 | ||
162 | $ bin/lbmbenchk-linux-intel-release -kernel -- -h | |
163 | ... | |
164 | Kernel parameters: | |
165 | [-blk <n>] [-blk-[xyz] <n>] | |
166 | ||
167 | ||
168 | A list of all available kernels can be obtained via ``-list``: :: | |
169 | ||
170 | $ ../bin/lbmbenchk-linux-gcc-debug -list | |
171 | Lattice Boltzmann Benchmark Kernels (LbmBenchKernels) Copyright (C) 2016, 2017 LSS, RRZE | |
172 | This program comes with ABSOLUTELY NO WARRANTY; for details see LICENSE. | |
173 | This is free software, and you are welcome to redistribute it under certain conditions. | |
174 | ||
175 | LBM Benchmark Kernels 0.1, compiled Jul 5 2017 21:59:22, type: verification | |
176 | Available kernels to benchmark: | |
177 | list-aa-pv-soa | |
178 | list-aa-ria-soa | |
179 | list-aa-soa | |
180 | list-aa-aos | |
181 | list-pull-split-nt-1s-soa | |
182 | list-pull-split-nt-2s-soa | |
183 | list-push-soa | |
184 | list-push-aos | |
185 | list-pull-soa | |
186 | list-pull-aos | |
187 | push-soa | |
188 | push-aos | |
189 | pull-soa | |
190 | pull-aos | |
191 | blk-push-soa | |
192 | blk-push-aos | |
193 | blk-pull-soa | |
194 | blk-pull-aos | |
195 | ||
196 | ||
197 | Benchmarking | |
198 | ============ | |
199 | ||
200 | Correct benchmarking is a nontrivial task. Whenever benchmark results should be | |
201 | created make sure the binary was compiled with: | |
202 | ||
203 | - ``BENCHMARK=on`` and | |
204 | - ``BUILD=release`` and | |
205 | - the correct ISA for macros is used, selected via ``ISA`` and | |
206 | - use ``TARCH`` to specify the architecture the compiler generates code for. | |
207 | ||
208 | During benchmarking pinning should be used via the ``-pin`` parameter. Running | |
209 | a benchmark with 10 threads an pin them to the first 10 cores works like :: | |
210 | ||
211 | $ bin/lbmbenchk-linux-intel-release ... -t 10 -pin $(seq -s , 0 9) | |
212 | ||
213 | Things the binary does nor check or controll: | |
214 | ||
215 | - transparent huge pages: when allocating memory small 4 KiB pages might be | |
216 | replaced with larger ones. This is in general a good thing, but if this is | |
217 | really the case, depends on the system settings. | |
218 | ||
219 | - CPU/core frequency: For reproducible results the frequency of all cores | |
220 | should be fixed. | |
221 | ||
222 | - NUMA placement policy: The benchmark assumes a first touch policy, which | |
223 | means the memory will be placed at the NUMA domain the touching core is | |
224 | associated with. If a different policy is in place or the NUMA domain to be | |
225 | used is already full memory might be allocated in a remote domain. Accesses | |
226 | to remote domains typically have a higher latency and lower bandwidth. | |
227 | ||
228 | - System load: interference with other application, espcially on desktop | |
229 | systems should be avoided. | |
230 | ||
231 | - Padding: most kernels do not care about padding against cache or TLB | |
232 | thrashing. Even if the number of (fluid) nodes suggest everything is fine, | |
233 | through parallelization still problems might occur. | |
234 | ||
235 | - CPU dispatcher function: the compiler might add different versions of a | |
236 | function for different ISA extensions. Make sure the code you might think is | |
237 | executed is actually the code which is executed. | |
238 | ||
4e91c4b6 MW |
239 | |
240 | Acknowledgements | |
241 | ================ | |
242 | ||
243 | This work was funded by BMBF, grant no. 01IH15003A (project SKAMPY). | |
244 | ||
245 | This work was funded by KONWHIR project OMI4PAPS. | |
246 | ||
247 | ||
248 | ||
10988083 MW |
249 | .. |datetime| date:: %Y-%m-%d %H:%M |
250 | ||
251 | Document was generated at |datetime|. | |
252 |