TUTCRIS - Tampereen teknillinen yliopisto

TUTCRIS

High-performance SIMD implementation of the lattice-Boltzmann method on the Xeon Phi processor

Tutkimustuotosvertaisarvioitu

Standard

High-performance SIMD implementation of the lattice-Boltzmann method on the Xeon Phi processor. / Robertsén, Fredrik; Mattila, Keijo; Westerholm, Jan.

julkaisussa: Concurrency Computation, Vuosikerta 31, Nro 13, e5072, 10.07.2019.

Tutkimustuotosvertaisarvioitu

Harvard

Robertsén, F, Mattila, K & Westerholm, J 2019, 'High-performance SIMD implementation of the lattice-Boltzmann method on the Xeon Phi processor' Concurrency Computation, Vuosikerta. 31, Nro 13, e5072. https://doi.org/10.1002/cpe.5072

APA

Robertsén, F., Mattila, K., & Westerholm, J. (2019). High-performance SIMD implementation of the lattice-Boltzmann method on the Xeon Phi processor. Concurrency Computation, 31(13), [e5072]. https://doi.org/10.1002/cpe.5072

Vancouver

Robertsén F, Mattila K, Westerholm J. High-performance SIMD implementation of the lattice-Boltzmann method on the Xeon Phi processor. Concurrency Computation. 2019 heinä 10;31(13). e5072. https://doi.org/10.1002/cpe.5072

Author

Robertsén, Fredrik ; Mattila, Keijo ; Westerholm, Jan. / High-performance SIMD implementation of the lattice-Boltzmann method on the Xeon Phi processor. Julkaisussa: Concurrency Computation. 2019 ; Vuosikerta 31, Nro 13.

Bibtex - Lataa

@article{0780d0ff519246728af1b82f381aa1f3,
title = "High-performance SIMD implementation of the lattice-Boltzmann method on the Xeon Phi processor",
abstract = "We present a high-performance implementation of the lattice-Boltzmann method (LBM) on the Knights Landing generation of Xeon Phi. The Knights Landing architecture includes 16GB of high-speed memory (MCDRAM) with a reported bandwidth of over 400 GB/s, and a subset of the AVX-512 single instruction multiple data (SIMD) instruction set. We explain five critical implementation aspects for high performance on this architecture: (1) the choice of appropriate LBM algorithm, (2) suitable data layout, (3) vectorization of the computation, (4) data prefetching, and (5) running our LBM simulations exclusively from the MCDRAM. The effects of these implementation aspects on the computational performance are demonstrated with the lattice-Boltzmann scheme involving the D3Q19 discrete velocity set and the TRT collision operator. In our benchmark simulations of fluid flow through porous media, using double-precision floating-point arithmetic, the observed performance exceeds 960 million fluid lattice site updates per second.",
keywords = "Lattice Boltzmann, prefetching, SIMD, Xeon Phi",
author = "Fredrik Roberts{\'e}n and Keijo Mattila and Jan Westerholm",
year = "2019",
month = "7",
day = "10",
doi = "10.1002/cpe.5072",
language = "English",
volume = "31",
journal = "Concurrency and Computation: Practice and Experience",
issn = "1532-0626",
publisher = "Wiley",
number = "13",

}

RIS (suitable for import to EndNote) - Lataa

TY - JOUR

T1 - High-performance SIMD implementation of the lattice-Boltzmann method on the Xeon Phi processor

AU - Robertsén, Fredrik

AU - Mattila, Keijo

AU - Westerholm, Jan

PY - 2019/7/10

Y1 - 2019/7/10

N2 - We present a high-performance implementation of the lattice-Boltzmann method (LBM) on the Knights Landing generation of Xeon Phi. The Knights Landing architecture includes 16GB of high-speed memory (MCDRAM) with a reported bandwidth of over 400 GB/s, and a subset of the AVX-512 single instruction multiple data (SIMD) instruction set. We explain five critical implementation aspects for high performance on this architecture: (1) the choice of appropriate LBM algorithm, (2) suitable data layout, (3) vectorization of the computation, (4) data prefetching, and (5) running our LBM simulations exclusively from the MCDRAM. The effects of these implementation aspects on the computational performance are demonstrated with the lattice-Boltzmann scheme involving the D3Q19 discrete velocity set and the TRT collision operator. In our benchmark simulations of fluid flow through porous media, using double-precision floating-point arithmetic, the observed performance exceeds 960 million fluid lattice site updates per second.

AB - We present a high-performance implementation of the lattice-Boltzmann method (LBM) on the Knights Landing generation of Xeon Phi. The Knights Landing architecture includes 16GB of high-speed memory (MCDRAM) with a reported bandwidth of over 400 GB/s, and a subset of the AVX-512 single instruction multiple data (SIMD) instruction set. We explain five critical implementation aspects for high performance on this architecture: (1) the choice of appropriate LBM algorithm, (2) suitable data layout, (3) vectorization of the computation, (4) data prefetching, and (5) running our LBM simulations exclusively from the MCDRAM. The effects of these implementation aspects on the computational performance are demonstrated with the lattice-Boltzmann scheme involving the D3Q19 discrete velocity set and the TRT collision operator. In our benchmark simulations of fluid flow through porous media, using double-precision floating-point arithmetic, the observed performance exceeds 960 million fluid lattice site updates per second.

KW - Lattice Boltzmann

KW - prefetching

KW - SIMD

KW - Xeon Phi

U2 - 10.1002/cpe.5072

DO - 10.1002/cpe.5072

M3 - Article

VL - 31

JO - Concurrency and Computation: Practice and Experience

JF - Concurrency and Computation: Practice and Experience

SN - 1532-0626

IS - 13

M1 - e5072

ER -