| 1 | .. # -------------------------------------------------------------------------- |
| 2 | # |
| 3 | # Copyright |
| 4 | # Markus Wittmann, 2016-2017 |
| 5 | # RRZE, University of Erlangen-Nuremberg, Germany |
| 6 | # markus.wittmann -at- fau.de or hpc -at- rrze.fau.de |
| 7 | # |
| 8 | # Viktor Haag, 2016 |
| 9 | # LSS, University of Erlangen-Nuremberg, Germany |
| 10 | # |
| 11 | # This file is part of the Lattice Boltzmann Benchmark Kernels (LbmBenchKernels). |
| 12 | # |
| 13 | # LbmBenchKernels is free software: you can redistribute it and/or modify |
| 14 | # it under the terms of the GNU General Public License as published by |
| 15 | # the Free Software Foundation, either version 3 of the License, or |
| 16 | # (at your option) any later version. |
| 17 | # |
| 18 | # LbmBenchKernels is distributed in the hope that it will be useful, |
| 19 | # but WITHOUT ANY WARRANTY; without even the implied warranty of |
| 20 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
| 21 | # GNU General Public License for more details. |
| 22 | # |
| 23 | # You should have received a copy of the GNU General Public License |
| 24 | # along with LbmBenchKernels. If not, see <http://www.gnu.org/licenses/>. |
| 25 | # |
| 26 | # -------------------------------------------------------------------------- |
| 27 | |
| 28 | .. title:: LBM Benchmark Kernels Documentation |
| 29 | |
| 30 | |
| 31 | =================================== |
| 32 | LBM Benchmark Kernels Documentation |
| 33 | =================================== |
| 34 | |
| 35 | .. sectnum:: |
| 36 | .. contents:: |
| 37 | |
| 38 | Compilation |
| 39 | =========== |
| 40 | |
| 41 | The benchmark framework currently supports only Linux systems and the GCC and |
| 42 | Intel compilers. Every other configuration probably requires adjustment inside |
| 43 | the code and the makefiles. Further some code might be platform or at least |
| 44 | POSIX specific. |
| 45 | |
| 46 | The benchmark can be build via ``make`` from the ``src`` subdirectory. This will |
| 47 | generate one binary which hosts all implemented benchmark kernels. |
| 48 | |
| 49 | Binaries are located under the ``bin`` subdirectory and will have different names |
| 50 | depending on compiler and build configuration. |
| 51 | |
| 52 | Debug and Verification |
| 53 | ---------------------- |
| 54 | |
| 55 | :: |
| 56 | |
| 57 | make |
| 58 | |
| 59 | Running ``make`` without any arguments builds the debug version (BUILD=debug) of |
| 60 | the benchmark kernels, where no optimizations are performed, line numbers and |
| 61 | debug symbols are included as well as ``DEBUG`` will be defined. The resulting |
| 62 | binary will be found in the ``bin`` subdirectory and named |
| 63 | ``lbmbenchk-linux-<compiler>-debug``. |
| 64 | |
| 65 | Without any further specification the binary includes verification |
| 66 | (``VERIFICATION=on``), statistics (``STATISTICS``), and VTK output |
| 67 | (``VTK_OUTPUT=on``) enabled. |
| 68 | |
| 69 | Please note that the generated binary will therefore |
| 70 | exhibit a poor performance. |
| 71 | |
| 72 | Benchmarking |
| 73 | ------------ |
| 74 | |
| 75 | To generate a binary for benchmarking run make with :: |
| 76 | |
| 77 | make BENCHMARK=on BUILD=release |
| 78 | |
| 79 | Here BUILD=release turns optimizations on and BENCHMARK=on disables |
| 80 | verfification, statistics, and VTK output. |
| 81 | |
| 82 | Release and Verification |
| 83 | ------------------------ |
| 84 | |
| 85 | Verification with the debug builds can be extremely slow. Hence verification |
| 86 | capabilities can be build with release builds: :: |
| 87 | |
| 88 | make BUILD=release |
| 89 | |
| 90 | Compilers |
| 91 | --------- |
| 92 | |
| 93 | Currently only the GCC and Intel compiler under Linux are supported. Between |
| 94 | both configuration can be chosen via ``CONFIG=linux-gcc`` or |
| 95 | ``CONFIG=linux-intel``. |
| 96 | |
| 97 | Options Summary |
| 98 | --------------- |
| 99 | |
| 100 | Options that can be specified when building the framework with make: |
| 101 | |
| 102 | ============= ======================= ============ ========================================================== |
| 103 | name values default description |
| 104 | ------------- ----------------------- ------------ ---------------------------------------------------------- |
| 105 | TARCH -- -- Via TARCH the architecture the compiler generates code for can be overriden. The value depends on the chose compiler. |
| 106 | BENCHMARK on, off off If enabled, disables VERIFICATION, STATISTICS, VTK_OUTPUT. |
| 107 | BUILD debug, release debug No optimization, debug symbols, DEBUG defined. |
| 108 | CONFIG linux-gcc, linux-intel linux-intel Select GCC or Intel compiler. |
| 109 | ISA avx, sse avx Determines which ISA extension is used for macro definitions. This is *not* the architecture the compiler generates code for. |
| 110 | OPENMP on, off on OpenMP, i.\,e.\. threading support. |
| 111 | STATISTICS on, off off View statistics, like density etc, during simulation. |
| 112 | VERIFICATION on, off off Turn verification on/off. |
| 113 | VTK_OUTPUT on, off off Enable/Disable VTK file output. |
| 114 | ============= ======================= ============ ========================================================== |
| 115 | |
| 116 | Invocation |
| 117 | ========== |
| 118 | |
| 119 | Running the binary will print among the GPL licence header a line like the following: |
| 120 | |
| 121 | LBM Benchmark Kernels 0.1, compiled Jul 5 2017 21:59:22, type: verification |
| 122 | |
| 123 | if verfication was enabled during compilation or |
| 124 | |
| 125 | LBM Benchmark Kernels 0.1, compiled Jul 5 2017 21:59:22, type: benchmark |
| 126 | |
| 127 | if verfication was disabled during compilation. |
| 128 | |
| 129 | Command Line Parameters |
| 130 | ----------------------- |
| 131 | |
| 132 | Running the binary with ``-h`` list all available parameters: :: |
| 133 | |
| 134 | Usage: |
| 135 | ./lbmbenchk -list |
| 136 | ./lbmbenchk |
| 137 | [-dims XxYyZ] [-geometry box|channel|pipe|blocks[-<block size>]] [-iterations <iterations>] [-lattice-dump-ascii] |
| 138 | [-rho-in <density>] [-rho-out <density] [-omega <omega>] [-kernel <kernel>] |
| 139 | [-periodic-x] |
| 140 | [-t <number of threads>] |
| 141 | [-pin core{,core}*] |
| 142 | [-verify] |
| 143 | -- <kernel specific parameters> |
| 144 | |
| 145 | -list List available kernels. |
| 146 | |
| 147 | -dims XxYxZ Specify geometry dimensions. |
| 148 | |
| 149 | -geometry blocks-<block size> |
| 150 | Geometetry with blocks of size <block size> regularily layout out. |
| 151 | |
| 152 | |
| 153 | If an option is specified multiple times the last one overrides previous ones. |
| 154 | This holds also true for ``-verify`` which sets geometry dimensions, |
| 155 | iterations, etc, which can afterward be override, e.g.: :: |
| 156 | |
| 157 | $ bin/lbmbenchk-linux-intel-release -verfiy -dims 32x32x32 |
| 158 | |
| 159 | Kernel specific parameters can be opatained via selecting the specific kernel |
| 160 | and passing ``-h`` as parameter: :: |
| 161 | |
| 162 | $ bin/lbmbenchk-linux-intel-release -kernel -- -h |
| 163 | ... |
| 164 | Kernel parameters: |
| 165 | [-blk <n>] [-blk-[xyz] <n>] |
| 166 | |
| 167 | |
| 168 | A list of all available kernels can be obtained via ``-list``: :: |
| 169 | |
| 170 | $ ../bin/lbmbenchk-linux-gcc-debug -list |
| 171 | Lattice Boltzmann Benchmark Kernels (LbmBenchKernels) Copyright (C) 2016, 2017 LSS, RRZE |
| 172 | This program comes with ABSOLUTELY NO WARRANTY; for details see LICENSE. |
| 173 | This is free software, and you are welcome to redistribute it under certain conditions. |
| 174 | |
| 175 | LBM Benchmark Kernels 0.1, compiled Jul 5 2017 21:59:22, type: verification |
| 176 | Available kernels to benchmark: |
| 177 | list-aa-pv-soa |
| 178 | list-aa-ria-soa |
| 179 | list-aa-soa |
| 180 | list-aa-aos |
| 181 | list-pull-split-nt-1s-soa |
| 182 | list-pull-split-nt-2s-soa |
| 183 | list-push-soa |
| 184 | list-push-aos |
| 185 | list-pull-soa |
| 186 | list-pull-aos |
| 187 | push-soa |
| 188 | push-aos |
| 189 | pull-soa |
| 190 | pull-aos |
| 191 | blk-push-soa |
| 192 | blk-push-aos |
| 193 | blk-pull-soa |
| 194 | blk-pull-aos |
| 195 | |
| 196 | |
| 197 | Benchmarking |
| 198 | ============ |
| 199 | |
| 200 | Correct benchmarking is a nontrivial task. Whenever benchmark results should be |
| 201 | created make sure the binary was compiled with: |
| 202 | |
| 203 | - ``BENCHMARK=on`` and |
| 204 | - ``BUILD=release`` and |
| 205 | - the correct ISA for macros is used, selected via ``ISA`` and |
| 206 | - use ``TARCH`` to specify the architecture the compiler generates code for. |
| 207 | |
| 208 | During benchmarking pinning should be used via the ``-pin`` parameter. Running |
| 209 | a benchmark with 10 threads an pin them to the first 10 cores works like :: |
| 210 | |
| 211 | $ bin/lbmbenchk-linux-intel-release ... -t 10 -pin $(seq -s , 0 9) |
| 212 | |
| 213 | Things the binary does nor check or controll: |
| 214 | |
| 215 | - transparent huge pages: when allocating memory small 4 KiB pages might be |
| 216 | replaced with larger ones. This is in general a good thing, but if this is |
| 217 | really the case, depends on the system settings. |
| 218 | |
| 219 | - CPU/core frequency: For reproducible results the frequency of all cores |
| 220 | should be fixed. |
| 221 | |
| 222 | - NUMA placement policy: The benchmark assumes a first touch policy, which |
| 223 | means the memory will be placed at the NUMA domain the touching core is |
| 224 | associated with. If a different policy is in place or the NUMA domain to be |
| 225 | used is already full memory might be allocated in a remote domain. Accesses |
| 226 | to remote domains typically have a higher latency and lower bandwidth. |
| 227 | |
| 228 | - System load: interference with other application, espcially on desktop |
| 229 | systems should be avoided. |
| 230 | |
| 231 | - Padding: most kernels do not care about padding against cache or TLB |
| 232 | thrashing. Even if the number of (fluid) nodes suggest everything is fine, |
| 233 | through parallelization still problems might occur. |
| 234 | |
| 235 | - CPU dispatcher function: the compiler might add different versions of a |
| 236 | function for different ISA extensions. Make sure the code you might think is |
| 237 | executed is actually the code which is executed. |
| 238 | |
| 239 | |
| 240 | Acknowledgements |
| 241 | ================ |
| 242 | |
| 243 | This work was funded by BMBF, grant no. 01IH15003A (project SKAMPY). |
| 244 | |
| 245 | This work was funded by KONWHIR project OMI4PAPS. |
| 246 | |
| 247 | |
| 248 | |
| 249 | .. |datetime| date:: %Y-%m-%d %H:%M |
| 250 | |
| 251 | Document was generated at |datetime|. |
| 252 | |