X-Git-Url: http://git.rrze.uni-erlangen.de/gitweb/?p=LbmBenchmarkKernelsPublic.git;a=blobdiff_plain;f=doc%2Fhtml%2Fmain.html;fp=doc%2Fhtml%2Fmain.html;h=99f4cb847eb50e75fbbfdbbf68c260645dd031ef;hp=511f6b2dcf0bfd7cd2be208fc20e3d3f83c391e3;hb=0095f461c30075a883df0265a7b831061ee7ebee;hpb=e3f82424829ebb623343ce0092238f83b4a1b8c2 diff --git a/doc/html/main.html b/doc/html/main.html index 511f6b2..99f4cb8 100644 --- a/doc/html/main.html +++ b/doc/html/main.html @@ -420,42 +420,66 @@ tr:nth-child(odd) {
Contents
The lattice Boltzmann (LBM) benchmark kernels are a collection of LBM kernel +implementations.
+AS SUCH THE LBM BENCHMARK KERNELS ARE NO FULLY EQUIPPED CFD SOLVER AND SOLELY +SERVES THE PURPOSE OF STUDYING POSSIBLE PERFORMANCE OPTIMIZATIONS AND/OR +EXPERIMENTS.
+Currently all kernels utilize a D3Q19 discretization and the +two-relaxation-time (TRT) collision operator [ginzburg-2008]. +All operations are carried out in double precision arithmetic.
+The benchmark framework currently supports only Linux systems and the GCC and Intel compilers. Every other configuration probably requires adjustment inside -the code and the makefiles. Further some code might be platform or at least +the code and the makefiles. Furthermore some code might be platform or at least POSIX specific.
The benchmark can be build via make from the src subdirectory. This will generate one binary which hosts all implemented benchmark kernels.
Binaries are located under the bin subdirectory and will have different names depending on compiler and build configuration.
+Compilation can target debug or release builds. Combined with both build types +verification can be enabled, which increases the runtime and hence is not +suited for benchmarking.
make BUILD=debug BENCHMARK=off@@ -470,32 +494,34 @@ binary will be found in the bin subdirectory a
Please note that the generated binary will therefore exhibit a poor performance.
Verification with the debug builds can be extremely slow. Hence verification +capabilities can be build with release builds:
++make BENCHMARK=off ++
To generate a binary for benchmarking run make with
make
As default BENCHMARK=on and BUILD=release is set, where -BUILD=release turns optimizations on and BENCHMARK=on disables +BUILD=release turns optimizations on and BENCHMARK=on disables verfification, statistics, and VTK output.
-Verification with the debug builds can be extremely slow. Hence verification -capabilities can be build with release builds:
--make BENCHMARK=off -+
See Options Summary below for further description of options which can be +applied, e.g. TARCH as well as the Benchmarking section.
Currently only the GCC and Intel compiler under Linux are supported. Between both configuration can be chosen via CONFIG=linux-gcc or CONFIG=linux-intel.
For each configuration and build (debug/release) a subdirectory under the src/obj directory is created where the dependency and object files are stored. @@ -510,21 +536,23 @@ make clean-all
all object and dependency files are deleted.
Options that can be specified when building the framework with make:
+Options that can be specified when building the suite with make:
name | -values | -default | -description | + +||||
name | +values | +default | +description | ||||
---|---|---|---|---|---|---|---|
BENCHMARK | on, off | on | @@ -533,7 +561,7 @@ make clean-all|||||
BUILD | debug, release | release | -No optimization, debug symbols, DEBUG defined. | +debug: no optimization, debug symbols, DEBUG defined. release: optimizations enabled. | |||
CONFIG | linux-gcc, linux-intel | @@ -543,7 +571,7 @@ make clean-all||||||
ISA | avx, sse | avx | -Determines which ISA extension is used for macro definitions. This is not the architecture the compiler generates code for. | +Determines which ISA extension is used for macro definitions of the intrinsics. This is not the architecture the compiler generates code for. | |||
OPENMP | on, off | @@ -575,7 +603,7 @@ make clean-all
Correct benchmarking is a nontrivial task. Whenever benchmark results should be created make sure the binary was compiled with:
For the Intel compiler one can specify depending on the target ISA extension:
+Compiling for an architecture supporting AVX (Sandy Bridge, Ivy Bridge):
++make ISA=avx TARCH=-xAVX ++
Compiling for an architecture supporting AVX2 (Haswell, Broadwell):
++make ISA=avx TARCH=-xCORE-AVX2,-fma ++
WARNING: ISA is here still set to avx as currently we have the FMA intrinsics not +implemented. This might change in the future.
+Compiling for an architecture supporting AVX-512 (Skylake):
++make ISA=avx TARCH=-xCORE-AVX512 ++
WARNING: ISA is here still set to avx as currently we have no implementation for the +AVX512 intrinsics. This might change in the future.
+During benchmarking pinning should be used via the -pin parameter. Running -a benchmark with 10 threads an pin them to the first 10 cores works like
+a benchmark with 10 threads and pin them to the first 10 cores works like$ bin/lbmbenchk-linux-intel-release ... -t 10 -pin $(seq -s , 0 9)-
Things the binary does nor check or controll:
+Things the binary does nor check or control:
With correct padding cache and TLB thrashing can be avoided. Therefore the number of (fluid) nodes used in the data layout is artificially increased.
Currently automatic padding is active for kernels which support it. It can be @@ -938,22 +998,912 @@ like -pad 0+16 would
TODO: supported geometries: channel, pipe, blocks
+TODO: supported geometries: channel, pipe, blocks, fluid
+The sections lists performance values measured on several machines for +different kernels and geometries. +The RFM column denotes the expected performance as predicted by the +Roofline performance model [williams-2008]. +For performance prediction of each kernel a memory bandwidth benchmark is used +which mimics the kernels memory access pattern and the kernel's loop balance +(see [kernels] for details).
+memory bandwidth:
+geometry dimensions: 500x100x100
+kernel | +pipe | +blocks-2 | +blocks-4 | +blocks-6 | +blocks-8 | +blocks-10 | +blocks-15 | +blocks-16 | +blocks-20 | +blocks-25 | +blocks-32 | +RFM | +
---|---|---|---|---|---|---|---|---|---|---|---|---|
blk-push-aos | +58.82 | +49.85 | +57.34 | +59.90 | +61.37 | +62.17 | +65.30 | +64.00 | +67.54 | +64.46 | +69.69 | +104 | +
blk-push-soa | +32.32 | +33.46 | +34.02 | +34.64 | +35.06 | +35.04 | +36.31 | +35.44 | +37.20 | +35.14 | +37.95 | +104 | +
blk-pull-aos | +56.97 | +51.41 | +56.09 | +57.92 | +59.98 | +59.83 | +63.37 | +61.55 | +65.50 | +63.11 | +67.02 | +104 | +
blk-pull-soa | +49.29 | +46.23 | +47.50 | +51.97 | +51.27 | +49.52 | +55.23 | +53.13 | +54.50 | +49.79 | +57.90 | +104 | +
aa-aos | +91.35 | +66.14 | +76.80 | +84.76 | +83.63 | +91.36 | +93.46 | +92.62 | +93.91 | +92.25 | +92.93 | +145 | +
aa-soa | +75.51 | +65.68 | +70.94 | +71.36 | +73.83 | +75.46 | +74.84 | +79.48 | +83.28 | +77.70 | +82.72 | +145 | +
aa-vec-soa | +93.85 | +83.44 | +91.58 | +93.96 | +94.35 | +96.62 | +101.76 | +96.72 | +106.37 | +102.60 | +110.28 | +145 | +
list-push-aos | +80.29 | +80.97 | +80.95 | +81.10 | +81.37 | +82.44 | +81.77 | +81.49 | +80.72 | +81.93 | +80.93 | +83 | +
list-push-soa | +47.52 | +42.65 | +45.28 | +46.64 | +43.46 | +40.59 | +44.94 | +46.55 | +41.53 | +45.98 | +44.86 | +83 | +
list-pull-aos | +85.30 | +82.97 | +86.43 | +83.42 | +86.33 | +83.70 | +86.43 | +83.77 | +83.10 | +85.89 | +84.44 | +83 | +
list-pull-soa | +62.12 | +63.61 | +63.28 | +61.32 | +66.72 | +62.65 | +64.82 | +60.49 | +58.01 | +64.46 | +62.52 | +83 | +
list-pull-split-nt-1s-soa | +121.35 | +113.77 | +115.29 | +113.54 | +117.00 | +116.46 | +114.78 | +114.54 | +110.83 | +112.67 | +117.85 | +125 | +
list-pull-split-nt-2s-soa | +118.09 | +110.48 | +112.55 | +113.18 | +113.44 | +111.85 | +109.27 | +114.41 | +110.28 | +111.78 | +113.74 | +125 | +
list-aa-aos | +121.28 | +118.63 | +119.00 | +118.50 | +121.99 | +119.11 | +118.83 | +121.47 | +121.62 | +126.18 | +120.12 | +129 | +
list-aa-soa | +126.34 | +116.90 | +129.45 | +127.12 | +129.41 | +121.42 | +126.19 | +126.76 | +126.70 | +124.40 | +125.22 | +129 | +
list-aa-ria-soa | +133.68 | +121.82 | +126.04 | +128.46 | +131.15 | +132.25 | +128.78 | +133.50 | +126.69 | +124.40 | +130.37 | +145 | +
list-aa-pv-soa | +146.22 | +124.39 | +130.73 | +136.29 | +137.61 | +131.21 | +138.65 | +138.78 | +127.02 | +132.40 | +138.37 | +145 | +
memory bandwidth:
+geometry dimensions: 500x100x100
+kernel | +pipe | +blocks-2 | +blocks-4 | +blocks-6 | +blocks-8 | +blocks-10 | +blocks-15 | +blocks-16 | +blocks-20 | +blocks-25 | +blocks-32 | +RFM | +
---|---|---|---|---|---|---|---|---|---|---|---|---|
blk-push-aos | +55.75 | +47.62 | +54.57 | +57.10 | +58.49 | +59.00 | +61.72 | +60.56 | +64.05 | +61.10 | +66.03 | +105 | +
blk-push-soa | +30.06 | +31.09 | +32.13 | +32.54 | +32.74 | +32.72 | +33.81 | +33.19 | +34.90 | +33.21 | +35.75 | +105 | +
blk-pull-aos | +53.80 | +48.61 | +53.08 | +54.99 | +56.08 | +56.68 | +59.20 | +58.12 | +61.49 | +58.71 | +63.45 | +105 | +
blk-pull-soa | +46.96 | +46.61 | +48.84 | +49.70 | +50.33 | +50.46 | +52.36 | +51.39 | +54.20 | +51.61 | +55.71 | +105 | +
aa-aos | +91.40 | +66.99 | +78.47 | +83.38 | +86.62 | +88.62 | +92.98 | +91.54 | +97.08 | +94.93 | +98.90 | +168 | +
aa-soa | +83.01 | +69.96 | +75.85 | +77.72 | +79.01 | +79.29 | +82.38 | +80.11 | +85.70 | +83.91 | +87.69 | +168 | +
aa-vec-soa | +112.03 | +96.52 | +105.32 | +109.76 | +112.55 | +113.82 | +120.55 | +118.37 | +126.30 | +121.37 | +131.94 | +168 | +
list-push-aos | +75.13 | +74.18 | +75.20 | +75.42 | +75.24 | +75.99 | +75.80 | +75.80 | +75.54 | +76.22 | +76.21 | +97 | +
list-push-soa | +40.99 | +38.14 | +39.00 | +38.89 | +38.89 | +39.67 | +39.87 | +39.28 | +39.35 | +40.08 | +40.13 | +97 | +
list-pull-aos | +82.07 | +82.88 | +83.29 | +83.09 | +83.32 | +83.49 | +82.82 | +82.88 | +83.32 | +82.60 | +82.93 | +97 | +
list-pull-soa | +62.07 | +60.40 | +61.89 | +61.39 | +62.43 | +60.90 | +60.48 | +62.80 | +62.50 | +61.10 | +60.38 | +97 | +
list-pull-split-nt-1s-soa | +125.81 | +120.60 | +121.96 | +122.34 | +122.86 | +123.53 | +123.64 | +123.67 | +125.94 | +124.09 | +123.69 | +128 | +
list-pull-split-nt-2s-soa | +122.79 | +117.16 | +118.86 | +119.16 | +119.56 | +119.99 | +120.01 | +120.03 | +122.64 | +120.57 | +120.39 | +128 | +
list-aa-aos | +128.13 | +127.41 | +129.31 | +129.07 | +129.79 | +129.63 | +129.67 | +129.94 | +129.12 | +128.41 | +129.72 | +150 | +
list-aa-soa | +141.60 | +139.78 | +141.58 | +142.16 | +141.94 | +141.31 | +142.37 | +142.25 | +142.43 | +141.40 | +142.26 | +150 | +
list-aa-ria-soa | +141.82 | +134.88 | +140.15 | +140.72 | +141.67 | +140.51 | +141.18 | +141.29 | +142.97 | +141.94 | +143.25 | +168 | +
list-aa-pv-soa | +164.79 | +140.95 | +159.24 | +161.78 | +162.40 | +163.04 | +164.69 | +164.38 | +165.11 | +165.75 | +166.09 | +168 | +
memory bandwidth:
+geometry dimensions: 500x100x100
+kernel | +pipe | +blocks-2 | +blocks-4 | +blocks-6 | +blocks-8 | +blocks-10 | +blocks-15 | +blocks-16 | +blocks-20 | +blocks-25 | +blocks-32 | +RFM | +
---|---|---|---|---|---|---|---|---|---|---|---|---|
blk-push-aos | +113.01 | +93.99 | +108.98 | +114.65 | +117.87 | +119.47 | +124.95 | +122.46 | +129.29 | +123.87 | +133.01 | +197 | +
blk-push-soa | +100.21 | +98.87 | +103.63 | +105.56 | +107.02 | +107.27 | +111.61 | +109.83 | +116.16 | +110.51 | +110.29 | +197 | +
blk-pull-aos | +118.45 | +102.54 | +114.12 | +117.82 | +122.69 | +124.31 | +130.58 | +127.85 | +135.72 | +129.65 | +139.94 | +197 | +
blk-pull-soa | +82.60 | +83.36 | +87.13 | +88.39 | +88.84 | +88.96 | +92.48 | +90.93 | +95.79 | +91.92 | +98.64 | +197 | +
aa-aos | +171.32 | +125.43 | +147.73 | +157.70 | +163.35 | +167.25 | +175.39 | +174.20 | +182.54 | +173.67 | +187.76 | +308 | +
aa-soa | +180.85 | +152.39 | +165.84 | +152.59 | +171.90 | +175.76 | +184.94 | +182.34 | +189.43 | +180.30 | +193.54 | +308 | +
aa-vec-soa | +208.03 | +181.51 | +195.86 | +203.41 | +209.08 | +212.34 | +224.05 | +219.49 | +234.31 | +225.92 | +245.22 | +308 | +
list-push-aos | +158.81 | +164.67 | +162.93 | +163.05 | +165.22 | +164.31 | +164.66 | +160.78 | +164.07 | +165.19 | +164.06 | +177 | +
list-push-soa | +134.60 | +110.44 | +110.17 | +132.01 | +132.95 | +133.46 | +134.37 | +134.33 | +135.12 | +134.91 | +137.87 | +177 | +
list-pull-aos | +169.61 | +170.03 | +170.89 | +170.90 | +171.20 | +171.60 | +172.09 | +171.95 | +169.48 | +172.08 | +171.02 | +177 | +
list-pull-soa | +120.50 | +116.73 | +118.62 | +118.00 | +120.99 | +118.15 | +117.17 | +121.41 | +120.83 | +120.00 | +118.74 | +177 | +
list-pull-split-nt-1s-soa | +225.59 | +224.18 | +225.10 | +226.34 | +226.01 | +230.37 | +227.50 | +228.42 | +227.39 | +231.65 | +227.35 | +246 | +
list-pull-split-nt-2s-soa | +219.20 | +214.63 | +217.61 | +218.13 | +219.07 | +221.01 | +219.88 | +220.09 | +220.62 | +221.68 | +220.58 | +246 | +
list-aa-aos | +241.39 | +239.27 | +239.53 | +242.56 | +242.46 | +243.00 | +242.91 | +242.46 | +241.24 | +242.96 | +241.52 | +275 | +
list-aa-soa | +273.73 | +268.49 | +268.48 | +271.79 | +275.29 | +274.56 | +277.18 | +272.67 | +274.21 | +275.24 | +278.21 | +275 | +
list-aa-ria-soa | +288.42 | +261.89 | +273.26 | +284.84 | +283.88 | +288.29 | +290.72 | +289.81 | +293.36 | +290.75 | +292.93 | +308 | +
list-aa-pv-soa | +303.35 | +267.21 | +289.18 | +294.96 | +294.36 | +298.16 | +300.45 | +301.71 | +302.37 | +302.88 | +304.46 | +308 | +
TODO
This work was funded by BMBF, grant no. 01IH15003A (project SKAMPY).
This work was funded by KONWHIR project OMI4PAPS.
-Document was generated at 2017-11-02 15:33.
+[ginzburg-2008] | I. Ginzburg, F. Verhaeghe, and D. d'Humières. +Two-relaxation-time lattice Boltzmann scheme: About parametrization, velocity, pressure and mixed boundary conditions. +Commun. Comput. Phys., 3(2):427-478, 2008. |
[williams-2008] | S. Williams, A. Waterman, and D. Patterson. +Roofline: an insightful visual performance model for multicore architectures. +Commun. ACM, 52(4):65-76, Apr 2009. doi:10.1145/1498765.1498785 |
Document was generated at 2017-11-21 15:43.