X-Git-Url: http://git.rrze.uni-erlangen.de/gitweb/?p=LbmBenchmarkKernelsPublic.git;a=blobdiff_plain;f=doc%2Fmain.rst;fp=doc%2Fmain.rst;h=921ea844bb994f0742dddb46b9d3e335c0ec8777;hp=0000000000000000000000000000000000000000;hb=109880839321408644c94a34eb31208460b9f46d;hpb=42cf91486fb5c1ad178b3d21935a1be563e5fa39 diff --git a/doc/main.rst b/doc/main.rst new file mode 100644 index 0000000..921ea84 --- /dev/null +++ b/doc/main.rst @@ -0,0 +1,242 @@ +.. # -------------------------------------------------------------------------- + # + # Copyright + # Markus Wittmann, 2016-2017 + # RRZE, University of Erlangen-Nuremberg, Germany + # markus.wittmann -at- fau.de or hpc -at- rrze.fau.de + # + # Viktor Haag, 2016 + # LSS, University of Erlangen-Nuremberg, Germany + # + # This file is part of the Lattice Boltzmann Benchmark Kernels (LbmBenchKernels). + # + # LbmBenchKernels is free software: you can redistribute it and/or modify + # it under the terms of the GNU General Public License as published by + # the Free Software Foundation, either version 3 of the License, or + # (at your option) any later version. + # + # LbmBenchKernels is distributed in the hope that it will be useful, + # but WITHOUT ANY WARRANTY; without even the implied warranty of + # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + # GNU General Public License for more details. + # + # You should have received a copy of the GNU General Public License + # along with LbmBenchKernels. If not, see . + # + # -------------------------------------------------------------------------- + +.. title:: LBM Benchmark Kernels Documentation + + +=================================== +LBM Benchmark Kernels Documentation +=================================== + +.. sectnum:: +.. contents:: + +Compilation +=========== + +The benchmark framework currently supports only Linux systems and the GCC and +Intel compilers. Every other configuration probably requires adjustment inside +the code and the makefiles. Further some code might be platform or at least +POSIX specific. + +The benchmark can be build via ``make`` from the ``src`` subdirectory. This will +generate one binary which hosts all implemented benchmark kernels. + +Binaries are located under the ``bin`` subdirectory and will have different names +depending on compiler and build configuration. + +Debug and Verification +---------------------- + +:: + + make + +Running ``make`` without any arguments builds the debug version (BUILD=debug) of +the benchmark kernels, where no optimizations are performed, line numbers and +debug symbols are included as well as ``DEBUG`` will be defined. The resulting +binary will be found in the ``bin`` subdirectory and named +``lbmbenchk-linux--debug``. + +Without any further specification the binary includes verification +(``VERIFICATION=on``), statistics (``STATISTICS``), and VTK output +(``VTK_OUTPUT=on``) enabled. + +Please note that the generated binary will therefore +exhibit a poor performance. + +Benchmarking +------------ + +To generate a binary for benchmarking run make with :: + + make BENCHMARK=on BUILD=release + +Here BUILD=release turns optimizations on and BENCHMARK=on disables +verfification, statistics, and VTK output. + +Release and Verification +------------------------ + +Verification with the debug builds can be extremely slow. Hence verification +capabilities can be build with release builds: :: + + make BUILD=release + +Compilers +--------- + +Currently only the GCC and Intel compiler under Linux are supported. Between +both configuration can be chosen via ``CONFIG=linux-gcc`` or +``CONFIG=linux-intel``. + +Options Summary +--------------- + +Options that can be specified when building the framework with make: + +============= ======================= ============ ========================================================== +name values default description +------------- ----------------------- ------------ ---------------------------------------------------------- +TARCH -- -- Via TARCH the architecture the compiler generates code for can be overriden. The value depends on the chose compiler. +BENCHMARK on, off off If enabled, disables VERIFICATION, STATISTICS, VTK_OUTPUT. +BUILD debug, release debug No optimization, debug symbols, DEBUG defined. +CONFIG linux-gcc, linux-intel linux-intel Select GCC or Intel compiler. +ISA avx, sse avx Determines which ISA extension is used for macro definitions. This is *not* the architecture the compiler generates code for. +OPENMP on, off on OpenMP, i.\,e.\. threading support. +STATISTICS on, off off View statistics, like density etc, during simulation. +VERIFICATION on, off off Turn verification on/off. +VTK_OUTPUT on, off off Enable/Disable VTK file output. +============= ======================= ============ ========================================================== + +Invocation +========== + +Running the binary will print among the GPL licence header a line like the following: + + LBM Benchmark Kernels 0.1, compiled Jul 5 2017 21:59:22, type: verification + +if verfication was enabled during compilation or + + LBM Benchmark Kernels 0.1, compiled Jul 5 2017 21:59:22, type: benchmark + +if verfication was disabled during compilation. + +Command Line Parameters +----------------------- + +Running the binary with ``-h`` list all available parameters: :: + + Usage: + ./lbmbenchk -list + ./lbmbenchk + [-dims XxYyZ] [-geometry box|channel|pipe|blocks[-]] [-iterations ] [-lattice-dump-ascii] + [-rho-in ] [-rho-out ] [-kernel ] + [-periodic-x] + [-t ] + [-pin core{,core}*] + [-verify] + -- + + -list List available kernels. + + -dims XxYxZ Specify geometry dimensions. + + -geometry blocks- + Geometetry with blocks of size regularily layout out. + + +If an option is specified multiple times the last one overrides previous ones. +This holds also true for ``-verify`` which sets geometry dimensions, +iterations, etc, which can afterward be override, e.g.: :: + + $ bin/lbmbenchk-linux-intel-release -verfiy -dims 32x32x32 + +Kernel specific parameters can be opatained via selecting the specific kernel +and passing ``-h`` as parameter: :: + + $ bin/lbmbenchk-linux-intel-release -kernel -- -h + ... + Kernel parameters: + [-blk ] [-blk-[xyz] ] + + +A list of all available kernels can be obtained via ``-list``: :: + + $ ../bin/lbmbenchk-linux-gcc-debug -list + Lattice Boltzmann Benchmark Kernels (LbmBenchKernels) Copyright (C) 2016, 2017 LSS, RRZE + This program comes with ABSOLUTELY NO WARRANTY; for details see LICENSE. + This is free software, and you are welcome to redistribute it under certain conditions. + + LBM Benchmark Kernels 0.1, compiled Jul 5 2017 21:59:22, type: verification + Available kernels to benchmark: + list-aa-pv-soa + list-aa-ria-soa + list-aa-soa + list-aa-aos + list-pull-split-nt-1s-soa + list-pull-split-nt-2s-soa + list-push-soa + list-push-aos + list-pull-soa + list-pull-aos + push-soa + push-aos + pull-soa + pull-aos + blk-push-soa + blk-push-aos + blk-pull-soa + blk-pull-aos + + +Benchmarking +============ + +Correct benchmarking is a nontrivial task. Whenever benchmark results should be +created make sure the binary was compiled with: + +- ``BENCHMARK=on`` and +- ``BUILD=release`` and +- the correct ISA for macros is used, selected via ``ISA`` and +- use ``TARCH`` to specify the architecture the compiler generates code for. + +During benchmarking pinning should be used via the ``-pin`` parameter. Running +a benchmark with 10 threads an pin them to the first 10 cores works like :: + + $ bin/lbmbenchk-linux-intel-release ... -t 10 -pin $(seq -s , 0 9) + +Things the binary does nor check or controll: + +- transparent huge pages: when allocating memory small 4 KiB pages might be + replaced with larger ones. This is in general a good thing, but if this is + really the case, depends on the system settings. + +- CPU/core frequency: For reproducible results the frequency of all cores + should be fixed. + +- NUMA placement policy: The benchmark assumes a first touch policy, which + means the memory will be placed at the NUMA domain the touching core is + associated with. If a different policy is in place or the NUMA domain to be + used is already full memory might be allocated in a remote domain. Accesses + to remote domains typically have a higher latency and lower bandwidth. + +- System load: interference with other application, espcially on desktop + systems should be avoided. + +- Padding: most kernels do not care about padding against cache or TLB + thrashing. Even if the number of (fluid) nodes suggest everything is fine, + through parallelization still problems might occur. + +- CPU dispatcher function: the compiler might add different versions of a + function for different ISA extensions. Make sure the code you might think is + executed is actually the code which is executed. + +.. |datetime| date:: %Y-%m-%d %H:%M + +Document was generated at |datetime|. +