Code name |
FALL3D |
Developer(s) |
Arnau Folch, Leonardo Mingari, Natalia Gutierrez, Antonio Costa and Giovanni Macedonio and Eduardo Cabrera
Contact person: Arnau Folch
|
Link |
https://gitlab.com/fall3d-distribution
|
Short description |
Open-source off-line Eulerian model for atmospheric passive transport and deposition based on the Advection-Diffusion-Sedimentation (ADS) equation. In FALL3D-8.0 (Folch et al., 2020; Prata et al., 2020), the ADS equation has been extended to handle passive transport of other substances different from tephra (aerosols, particles and radionuclides) |
Original code level |
3 |
Current code level |
2 |
Pilot(s) involved |
PD3, PD6, PD12
|
Co-design |
Indirectly. The mini-app “MiniAero” is an explicit (using RK4) unstructured finite volume code that solves the compressible Navier-Stokes equations, applicable to FALL3D. Both inviscid and viscous terms are included. The viscous terms can be optionally included or excluded.
A FALL3D mini-app was developed for both CPU & GPU using OpenACC directives in the latter case. This mini-app only solves the FALL3D task assuming a fixed problem, i.e., helicoidal velocity field. This FALL3D task involves fully solving the main computational kernel, i.e., solves a set of advection-diffusion-sedimentation (ADS) equations on a structured grid using a second order finite volume explicit scheme (Euler or RK 4th in time). As stated, the ADS kernel is fully computed through GPU cards, i.e., 100% |
Main results and References |
MAIN RESULTS:
-
New code release (8.0) and ensemble-based forecasts with parallel data assimilation implemented (8.1 release).
-
Better communication/computation ratio by refactoring the original (7.3) code version. The speedup increases with the number of cores up to a factor of 4.3x.
-
GPU porting: migrated solver kernels to GPUs using OpenACC directives; the improvement is almost 60% comparing GPU version with CPU version.
-
Vectorization: some directives implemented to enforce vectorization for loops that the compiler does not auto-vectorize. Speedup up to 1.2 achieved.
-
Parallel I/O: A parallel I/O strategy for NetCDF files implemented. The performance, up to a factor 2x, is also improved.
-
GPU porting is almost complete. Approximately 90% of the main FALL3D task is ported to the accelerators.
REFERENCES:
-
Mingari, L., Folch, A., Prata, A. T., Pardini, F., Macedonio, G., and Costa, A.: Data assimilation of volcanic aerosol observations using FALL3D+PDAF, Atmos. Chem. Phys., 22, 1773–1792, https://doi.org/10.5194/acp-22-1773-2022, 2022.
-
Folch A, Mingari L and Prata AT (2022) Ensemble-Based Forecast of Volcanic Clouds Using FALL3D-8.1. Front. Earth Sci. 9:741841. doi: 10.3389/feart.2021.741841
-
Prata, A. T., Mingari, L., Folch, A., Macedonio, G., and Costa, A.: FALL3D-8.0: a computational model for atmospheric transport and deposition of particles, aerosols and radionuclides – Part 2: Model validation, Geosci. Model Dev., 14, 409–436, https://doi.org/10.5194/gmd-14-409-2021, 2021.
-
Folch, A., Mingari, L., Gutierrez, N., Hanzich, M., Costa, A., Macedonio, G., FALL3D-8.0: a computational model for atmospheric transport and deposition of particles and aerosols. Part I: model physics and numerics, geoscientific model development, https://doi.org/10.5194/gmd-13-1431-2020, 2020.
|
Performance results |
Strong scalability tests up to 10000 cores (irene-rome and MN-4) with parallel efficiency above 80%.
FALL3D Mini-app
The Mini-fall3d GPU version (MPI + OpenACC) is almost 7x faster than the previous version (MPI + OpenACC + OpenMP) on a medium size problem (400x400x240 grid cells)
A speedup of circa 29x has been achieved compared to the plain MPI version
Full FALL3D
The full FALL3D GPU version (MPI + OpenACC) is almost 6x faster than the original version (MPI) on a large size problem (1000x760x64 grid cells) |