--- /dev/null
+<?xml version="1.0" encoding="utf-8" ?>
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
+<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="Docutils 0.12: http://docutils.sourceforge.net/" />
+<title>LBM Benchmark Kernels Documentation</title>
+<style type="text/css">
+
+/*
+:Author: David Goodger (goodger@python.org)
+:Id: $Id: html4css1.css 7614 2013-02-21 15:55:51Z milde $
+:Copyright: This stylesheet has been placed in the public domain.
+
+Default cascading style sheet for the HTML output of Docutils.
+
+See http://docutils.sf.net/docs/howto/html-stylesheets.html for how to
+customize this style sheet.
+*/
+
+/* used to remove borders from tables and images */
+.borderless, table.borderless td, table.borderless th {
+ border: 0 }
+
+table.borderless td, table.borderless th {
+ /* Override padding for "table.docutils td" with "! important".
+ The right padding separates the table cells. */
+ padding: 0 0.5em 0 0 ! important }
+
+.first {
+ /* Override more specific margin styles with "! important". */
+ margin-top: 0 ! important }
+
+.last, .with-subtitle {
+ margin-bottom: 0 ! important }
+
+.hidden {
+ display: none }
+
+a.toc-backref {
+ text-decoration: none ;
+ color: black }
+
+blockquote.epigraph {
+ margin: 2em 5em ; }
+
+dl.docutils dd {
+ margin-bottom: 0.5em }
+
+object[type="image/svg+xml"], object[type="application/x-shockwave-flash"] {
+ overflow: hidden;
+}
+
+/* Uncomment (and remove this text!) to get bold-faced definition list terms
+dl.docutils dt {
+ font-weight: bold }
+*/
+
+div.abstract {
+ margin: 2em 5em }
+
+div.abstract p.topic-title {
+ font-weight: bold ;
+ text-align: center }
+
+div.admonition, div.attention, div.caution, div.danger, div.error,
+div.hint, div.important, div.note, div.tip, div.warning {
+ margin: 2em ;
+ border: medium outset ;
+ padding: 1em }
+
+div.admonition p.admonition-title, div.hint p.admonition-title,
+div.important p.admonition-title, div.note p.admonition-title,
+div.tip p.admonition-title {
+ font-weight: bold ;
+ font-family: sans-serif }
+
+div.attention p.admonition-title, div.caution p.admonition-title,
+div.danger p.admonition-title, div.error p.admonition-title,
+div.warning p.admonition-title, .code .error {
+ color: red ;
+ font-weight: bold ;
+ font-family: sans-serif }
+
+/* Uncomment (and remove this text!) to get reduced vertical space in
+ compound paragraphs.
+div.compound .compound-first, div.compound .compound-middle {
+ margin-bottom: 0.5em }
+
+div.compound .compound-last, div.compound .compound-middle {
+ margin-top: 0.5em }
+*/
+
+div.dedication {
+ margin: 2em 5em ;
+ text-align: center ;
+ font-style: italic }
+
+div.dedication p.topic-title {
+ font-weight: bold ;
+ font-style: normal }
+
+div.figure {
+ margin-left: 2em ;
+ margin-right: 2em }
+
+div.footer, div.header {
+ clear: both;
+ font-size: smaller }
+
+div.line-block {
+ display: block ;
+ margin-top: 1em ;
+ margin-bottom: 1em }
+
+div.line-block div.line-block {
+ margin-top: 0 ;
+ margin-bottom: 0 ;
+ margin-left: 1.5em }
+
+div.sidebar {
+ margin: 0 0 0.5em 1em ;
+ border: medium outset ;
+ padding: 1em ;
+ background-color: #ffffee ;
+ width: 40% ;
+ float: right ;
+ clear: right }
+
+div.sidebar p.rubric {
+ font-family: sans-serif ;
+ font-size: medium }
+
+div.system-messages {
+ margin: 5em }
+
+div.system-messages h1 {
+ color: red }
+
+div.system-message {
+ border: medium outset ;
+ padding: 1em }
+
+div.system-message p.system-message-title {
+ color: red ;
+ font-weight: bold }
+
+div.topic {
+ margin: 2em }
+
+h1.section-subtitle, h2.section-subtitle, h3.section-subtitle,
+h4.section-subtitle, h5.section-subtitle, h6.section-subtitle {
+ margin-top: 0.4em }
+
+h1.title {
+ text-align: center }
+
+h2.subtitle {
+ text-align: center }
+
+hr.docutils {
+ width: 75% }
+
+img.align-left, .figure.align-left, object.align-left {
+ clear: left ;
+ float: left ;
+ margin-right: 1em }
+
+img.align-right, .figure.align-right, object.align-right {
+ clear: right ;
+ float: right ;
+ margin-left: 1em }
+
+img.align-center, .figure.align-center, object.align-center {
+ display: block;
+ margin-left: auto;
+ margin-right: auto;
+}
+
+.align-left {
+ text-align: left }
+
+.align-center {
+ clear: both ;
+ text-align: center }
+
+.align-right {
+ text-align: right }
+
+/* reset inner alignment in figures */
+div.align-right {
+ text-align: inherit }
+
+/* div.align-center * { */
+/* text-align: left } */
+
+ol.simple, ul.simple {
+ margin-bottom: 1em }
+
+ol.arabic {
+ list-style: decimal }
+
+ol.loweralpha {
+ list-style: lower-alpha }
+
+ol.upperalpha {
+ list-style: upper-alpha }
+
+ol.lowerroman {
+ list-style: lower-roman }
+
+ol.upperroman {
+ list-style: upper-roman }
+
+p.attribution {
+ text-align: right ;
+ margin-left: 50% }
+
+p.caption {
+ font-style: italic }
+
+p.credits {
+ font-style: italic ;
+ font-size: smaller }
+
+p.label {
+ white-space: nowrap }
+
+p.rubric {
+ font-weight: bold ;
+ font-size: larger ;
+ color: maroon ;
+ text-align: center }
+
+p.sidebar-title {
+ font-family: sans-serif ;
+ font-weight: bold ;
+ font-size: larger }
+
+p.sidebar-subtitle {
+ font-family: sans-serif ;
+ font-weight: bold }
+
+p.topic-title {
+ font-weight: bold }
+
+pre.address {
+ margin-bottom: 0 ;
+ margin-top: 0 ;
+ font: inherit }
+
+pre.literal-block, pre.doctest-block, pre.math, pre.code {
+ margin-left: 2em ;
+ margin-right: 2em }
+
+pre.code .ln { color: grey; } /* line numbers */
+pre.code, code { background-color: #eeeeee }
+pre.code .comment, code .comment { color: #5C6576 }
+pre.code .keyword, code .keyword { color: #3B0D06; font-weight: bold }
+pre.code .literal.string, code .literal.string { color: #0C5404 }
+pre.code .name.builtin, code .name.builtin { color: #352B84 }
+pre.code .deleted, code .deleted { background-color: #DEB0A1}
+pre.code .inserted, code .inserted { background-color: #A3D289}
+
+span.classifier {
+ font-family: sans-serif ;
+ font-style: oblique }
+
+span.classifier-delimiter {
+ font-family: sans-serif ;
+ font-weight: bold }
+
+span.interpreted {
+ font-family: sans-serif }
+
+span.option {
+ white-space: nowrap }
+
+span.pre {
+ white-space: pre }
+
+span.problematic {
+ color: red }
+
+span.section-subtitle {
+ /* font-size relative to parent (h1..h6 element) */
+ font-size: 80% }
+
+table.citation {
+ border-left: solid 1px gray;
+ margin-left: 1px }
+
+table.docinfo {
+ margin: 2em 4em }
+
+table.docutils {
+ margin-top: 0.5em ;
+ margin-bottom: 0.5em }
+
+table.footnote {
+ border-left: solid 1px black;
+ margin-left: 1px }
+
+table.docutils td, table.docutils th,
+table.docinfo td, table.docinfo th {
+ padding-left: 0.5em ;
+ padding-right: 0.5em ;
+ vertical-align: top }
+
+table.docutils th.field-name, table.docinfo th.docinfo-name {
+ font-weight: bold ;
+ text-align: left ;
+ white-space: nowrap ;
+ padding-left: 0 }
+
+/* "booktabs" style (no vertical lines) */
+table.docutils.booktabs {
+ border: 0px;
+ border-top: 2px solid;
+ border-bottom: 2px solid;
+ border-collapse: collapse;
+}
+table.docutils.booktabs * {
+ border: 0px;
+}
+table.docutils.booktabs th {
+ border-bottom: thin solid;
+ text-align: left;
+}
+
+h1 tt.docutils, h2 tt.docutils, h3 tt.docutils,
+h4 tt.docutils, h5 tt.docutils, h6 tt.docutils {
+ font-size: 100% }
+
+ul.auto-toc {
+ list-style-type: none }
+
+</style>
+</head>
+<body>
+<div class="document" id="lbm-benchmark-kernels-documentation">
+<h1 class="title">LBM Benchmark Kernels Documentation</h1>
+
+<!-- # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
+#
+# Copyright
+# Markus Wittmann, 2016-2017
+# RRZE, University of Erlangen-Nuremberg, Germany
+# markus.wittmann -at- fau.de or hpc -at- rrze.fau.de
+#
+# Viktor Haag, 2016
+# LSS, University of Erlangen-Nuremberg, Germany
+#
+# This file is part of the Lattice Boltzmann Benchmark Kernels (LbmBenchKernels).
+#
+# LbmBenchKernels is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# LbmBenchKernels is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with LbmBenchKernels. If not, see <http://www.gnu.org/licenses/>.
+#
+# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -->
+<div class="contents topic" id="contents">
+<p class="topic-title first">Contents</p>
+<ul class="auto-toc simple">
+<li><a class="reference internal" href="#compilation" id="id2">1 Compilation</a><ul class="auto-toc">
+<li><a class="reference internal" href="#debug-and-verification" id="id3">1.1 Debug and Verification</a></li>
+<li><a class="reference internal" href="#benchmarking" id="id4">1.2 Benchmarking</a></li>
+<li><a class="reference internal" href="#release-and-verification" id="id5">1.3 Release and Verification</a></li>
+<li><a class="reference internal" href="#compilers" id="id6">1.4 Compilers</a></li>
+<li><a class="reference internal" href="#options-summary" id="id7">1.5 Options Summary</a></li>
+</ul>
+</li>
+<li><a class="reference internal" href="#invocation" id="id8">2 Invocation</a><ul class="auto-toc">
+<li><a class="reference internal" href="#command-line-parameters" id="id9">2.1 Command Line Parameters</a></li>
+</ul>
+</li>
+<li><a class="reference internal" href="#id1" id="id10">3 Benchmarking</a></li>
+<li><a class="reference internal" href="#acknowledgements" id="id11">4 Acknowledgements</a></li>
+</ul>
+</div>
+<div class="section" id="compilation">
+<h1><a class="toc-backref" href="#id2">1 Compilation</a></h1>
+<p>The benchmark framework currently supports only Linux systems and the GCC and
+Intel compilers. Every other configuration probably requires adjustment inside
+the code and the makefiles. Further some code might be platform or at least
+POSIX specific.</p>
+<p>The benchmark can be build via <tt class="docutils literal">make</tt> from the <tt class="docutils literal">src</tt> subdirectory. This will
+generate one binary which hosts all implemented benchmark kernels.</p>
+<p>Binaries are located under the <tt class="docutils literal">bin</tt> subdirectory and will have different names
+depending on compiler and build configuration.</p>
+<div class="section" id="debug-and-verification">
+<h2><a class="toc-backref" href="#id3">1.1 Debug and Verification</a></h2>
+<pre class="literal-block">
+make
+</pre>
+<p>Running <tt class="docutils literal">make</tt> without any arguments builds the debug version (BUILD=debug) of
+the benchmark kernels, where no optimizations are performed, line numbers and
+debug symbols are included as well as <tt class="docutils literal">DEBUG</tt> will be defined. The resulting
+binary will be found in the <tt class="docutils literal">bin</tt> subdirectory and named
+<tt class="docutils literal"><span class="pre">lbmbenchk-linux-<compiler>-debug</span></tt>.</p>
+<p>Without any further specification the binary includes verification
+(<tt class="docutils literal">VERIFICATION=on</tt>), statistics (<tt class="docutils literal">STATISTICS</tt>), and VTK output
+(<tt class="docutils literal">VTK_OUTPUT=on</tt>) enabled.</p>
+<p>Please note that the generated binary will therefore
+exhibit a poor performance.</p>
+</div>
+<div class="section" id="benchmarking">
+<h2><a class="toc-backref" href="#id4">1.2 Benchmarking</a></h2>
+<p>To generate a binary for benchmarking run make with</p>
+<pre class="literal-block">
+make BENCHMARK=on BUILD=release
+</pre>
+<p>Here BUILD=release turns optimizations on and BENCHMARK=on disables
+verfification, statistics, and VTK output.</p>
+</div>
+<div class="section" id="release-and-verification">
+<h2><a class="toc-backref" href="#id5">1.3 Release and Verification</a></h2>
+<p>Verification with the debug builds can be extremely slow. Hence verification
+capabilities can be build with release builds:</p>
+<pre class="literal-block">
+make BUILD=release
+</pre>
+</div>
+<div class="section" id="compilers">
+<h2><a class="toc-backref" href="#id6">1.4 Compilers</a></h2>
+<p>Currently only the GCC and Intel compiler under Linux are supported. Between
+both configuration can be chosen via <tt class="docutils literal"><span class="pre">CONFIG=linux-gcc</span></tt> or
+<tt class="docutils literal"><span class="pre">CONFIG=linux-intel</span></tt>.</p>
+</div>
+<div class="section" id="options-summary">
+<h2><a class="toc-backref" href="#id7">1.5 Options Summary</a></h2>
+<p>Options that can be specified when building the framework with make:</p>
+<table border="1" class="docutils">
+<colgroup>
+<col width="8%" />
+<col width="13%" />
+<col width="7%" />
+<col width="72%" />
+</colgroup>
+<tbody valign="top">
+<tr><td>name</td>
+<td>values</td>
+<td>default</td>
+<td>description</td>
+</tr>
+<tr><td>TARCH</td>
+<td>--</td>
+<td>--</td>
+<td>Via TARCH the architecture the compiler generates code for can be overriden. The value depends on the chose compiler.</td>
+</tr>
+<tr><td>BENCHMARK</td>
+<td>on, off</td>
+<td>off</td>
+<td>If enabled, disables VERIFICATION, STATISTICS, VTK_OUTPUT.</td>
+</tr>
+<tr><td>BUILD</td>
+<td>debug, release</td>
+<td>debug</td>
+<td>No optimization, debug symbols, DEBUG defined.</td>
+</tr>
+<tr><td>CONFIG</td>
+<td>linux-gcc, linux-intel</td>
+<td>linux-intel</td>
+<td>Select GCC or Intel compiler.</td>
+</tr>
+<tr><td>ISA</td>
+<td>avx, sse</td>
+<td>avx</td>
+<td>Determines which ISA extension is used for macro definitions. This is <em>not</em> the architecture the compiler generates code for.</td>
+</tr>
+<tr><td>OPENMP</td>
+<td>on, off</td>
+<td>on</td>
+<td>OpenMP, i.,e.. threading support.</td>
+</tr>
+<tr><td>STATISTICS</td>
+<td>on, off</td>
+<td>off</td>
+<td>View statistics, like density etc, during simulation.</td>
+</tr>
+<tr><td>VERIFICATION</td>
+<td>on, off</td>
+<td>off</td>
+<td>Turn verification on/off.</td>
+</tr>
+<tr><td>VTK_OUTPUT</td>
+<td>on, off</td>
+<td>off</td>
+<td>Enable/Disable VTK file output.</td>
+</tr>
+</tbody>
+</table>
+</div>
+</div>
+<div class="section" id="invocation">
+<h1><a class="toc-backref" href="#id8">2 Invocation</a></h1>
+<p>Running the binary will print among the GPL licence header a line like the following:</p>
+<blockquote>
+LBM Benchmark Kernels 0.1, compiled Jul 5 2017 21:59:22, type: verification</blockquote>
+<p>if verfication was enabled during compilation or</p>
+<blockquote>
+LBM Benchmark Kernels 0.1, compiled Jul 5 2017 21:59:22, type: benchmark</blockquote>
+<p>if verfication was disabled during compilation.</p>
+<div class="section" id="command-line-parameters">
+<h2><a class="toc-backref" href="#id9">2.1 Command Line Parameters</a></h2>
+<p>Running the binary with <tt class="docutils literal"><span class="pre">-h</span></tt> list all available parameters:</p>
+<pre class="literal-block">
+Usage:
+./lbmbenchk -list
+./lbmbenchk
+ [-dims XxYyZ] [-geometry box|channel|pipe|blocks[-<block size>]] [-iterations <iterations>] [-lattice-dump-ascii]
+ [-rho-in <density>] [-rho-out <density] [-omega <omega>] [-kernel <kernel>]
+ [-periodic-x]
+ [-t <number of threads>]
+ [-pin core{,core}*]
+ [-verify]
+ -- <kernel specific parameters>
+
+-list List available kernels.
+
+-dims XxYxZ Specify geometry dimensions.
+
+-geometry blocks-<block size>
+ Geometetry with blocks of size <block size> regularily layout out.
+</pre>
+<p>If an option is specified multiple times the last one overrides previous ones.
+This holds also true for <tt class="docutils literal"><span class="pre">-verify</span></tt> which sets geometry dimensions,
+iterations, etc, which can afterward be override, e.g.:</p>
+<pre class="literal-block">
+$ bin/lbmbenchk-linux-intel-release -verfiy -dims 32x32x32
+</pre>
+<p>Kernel specific parameters can be opatained via selecting the specific kernel
+and passing <tt class="docutils literal"><span class="pre">-h</span></tt> as parameter:</p>
+<pre class="literal-block">
+$ bin/lbmbenchk-linux-intel-release -kernel -- -h
+...
+Kernel parameters:
+[-blk <n>] [-blk-[xyz] <n>]
+</pre>
+<p>A list of all available kernels can be obtained via <tt class="docutils literal"><span class="pre">-list</span></tt>:</p>
+<pre class="literal-block">
+$ ../bin/lbmbenchk-linux-gcc-debug -list
+Lattice Boltzmann Benchmark Kernels (LbmBenchKernels) Copyright (C) 2016, 2017 LSS, RRZE
+This program comes with ABSOLUTELY NO WARRANTY; for details see LICENSE.
+This is free software, and you are welcome to redistribute it under certain conditions.
+
+LBM Benchmark Kernels 0.1, compiled Jul 5 2017 21:59:22, type: verification
+Available kernels to benchmark:
+ list-aa-pv-soa
+ list-aa-ria-soa
+ list-aa-soa
+ list-aa-aos
+ list-pull-split-nt-1s-soa
+ list-pull-split-nt-2s-soa
+ list-push-soa
+ list-push-aos
+ list-pull-soa
+ list-pull-aos
+ push-soa
+ push-aos
+ pull-soa
+ pull-aos
+ blk-push-soa
+ blk-push-aos
+ blk-pull-soa
+ blk-pull-aos
+</pre>
+</div>
+</div>
+<div class="section" id="id1">
+<h1><a class="toc-backref" href="#id10">3 Benchmarking</a></h1>
+<p>Correct benchmarking is a nontrivial task. Whenever benchmark results should be
+created make sure the binary was compiled with:</p>
+<ul class="simple">
+<li><tt class="docutils literal">BENCHMARK=on</tt> and</li>
+<li><tt class="docutils literal">BUILD=release</tt> and</li>
+<li>the correct ISA for macros is used, selected via <tt class="docutils literal">ISA</tt> and</li>
+<li>use <tt class="docutils literal">TARCH</tt> to specify the architecture the compiler generates code for.</li>
+</ul>
+<p>During benchmarking pinning should be used via the <tt class="docutils literal"><span class="pre">-pin</span></tt> parameter. Running
+a benchmark with 10 threads an pin them to the first 10 cores works like</p>
+<pre class="literal-block">
+$ bin/lbmbenchk-linux-intel-release ... -t 10 -pin $(seq -s , 0 9)
+</pre>
+<p>Things the binary does nor check or controll:</p>
+<ul class="simple">
+<li>transparent huge pages: when allocating memory small 4 KiB pages might be
+replaced with larger ones. This is in general a good thing, but if this is
+really the case, depends on the system settings.</li>
+<li>CPU/core frequency: For reproducible results the frequency of all cores
+should be fixed.</li>
+<li>NUMA placement policy: The benchmark assumes a first touch policy, which
+means the memory will be placed at the NUMA domain the touching core is
+associated with. If a different policy is in place or the NUMA domain to be
+used is already full memory might be allocated in a remote domain. Accesses
+to remote domains typically have a higher latency and lower bandwidth.</li>
+<li>System load: interference with other application, espcially on desktop
+systems should be avoided.</li>
+<li>Padding: most kernels do not care about padding against cache or TLB
+thrashing. Even if the number of (fluid) nodes suggest everything is fine,
+through parallelization still problems might occur.</li>
+<li>CPU dispatcher function: the compiler might add different versions of a
+function for different ISA extensions. Make sure the code you might think is
+executed is actually the code which is executed.</li>
+</ul>
+</div>
+<div class="section" id="acknowledgements">
+<h1><a class="toc-backref" href="#id11">4 Acknowledgements</a></h1>
+<p>This work was funded by BMBF, grant no. 01IH15003A (project SKAMPY).</p>
+<p>This work was funded by KONWHIR project OMI4PAPS.</p>
+<p>Document was generated at 2017-10-26 09:43.</p>
+</div>
+</div>
+</body>
+</html>