HIP and MPI+HIP updates #361

ohearnk · 2024-04-15T03:06:23Z

NOTE: do not merge this until the issues below are resolved.

This MR ports the updated CUDA and MPI+CUDA codes (including recently merged f-function optimizations) to HIP and MPI+HIP versions, respectively. Also, this MR begins to unify the CUDA and HIP versions to simpily future GPU code maintainence.

Closes #344.

…MPI+HIP codes. Unify CUDA and HIP code paths (CUDA / HIP => GPU, CUDA_MPIV / HIP_MPIV => MPIV_GPU, etc.).

ohearnk · 2024-04-15T03:23:18Z

As noted above, all tests (full test suite) are passing for the CUDA and MPI+CUDA (1 GPU) versions. However, some tests are failing for the HIP and MPI+HIP versions. See the logs below from tests on the MI210s on the AMD AAC. Interestingly, the test failures are slightly different between the HIP and MPI+HIP versions.

This is a bit difficult to debug at least when comparing against the working HIP / MPI+HIP versions from the 23.08b release as there are also a number of test failures there. It may be better to pick a commit before the f-function optimizations and run tests there for comparison.

Test configuration on the AMD AAC:

RHEL9 partition (1CN128C8G2H_2IB_MI210_RHEL9)
ROCm v5.7.1, UCX v1.15.0, OpenMPI v4.1.6, GCC v11.3.1 (gfortran)
f-function support disabled
CMake configuration (HIP version):

cmake .. -DCOMPILER=MANUAL -DCMAKE_C_COMPILER=hipcc -DCMAKE_CXX_COMPILER=hipcc -DCMAKE_Fortran_COMPILER=gfortran -DMPI= -DHIP=TRUE -DQUICK_USER_ARCH=gfx90a -DENABLEF= -DCMAKE_INSTALL_PREFIX=${PWD}/../install_rhel9_hip_gfx90a_rocm5.7.1_ucx1.15.0_ompi4.1.6 -DHIP_TOOLKIT_ROOT_DIR=/shared/apps/rhel9/opt/rocm-5.7.1

HIP test summary and diffs:
runtest_hip.log
hip_test_diffs.log

MPI+HIP test summary and diffs:
runtest_mpi_hip_1gpu.log
mpi_hip_test_diffs.log

…preprocessor definitions for performance and storage considerations. Refactor preprocessor defintions to avoid unnecessary arithmetic.

…aths for older HIP builds.

…regarding STORE_OPERATOR). Fix segfault in debug builds of GPU code without ERI f function supported enabled but basis contains f functions. Remove unneeded DGEMM operation in CUDA codes in SCF/USCF methods. Other code clean-up.

…ggled on in CMake build.

…power functions (inlined device functions calling pow to preprocessor definitions using multiplication operations). Other code clean-up.

…s. Add CMake option to enable LLVM-based address sanitizer (ASAN) for debugging with HIP builds.

Port updated CUDA and MPI+CUDA codes (f function support) to HIP and …

c388fde

…MPI+HIP codes. Unify CUDA and HIP code paths (CUDA / HIP => GPU, CUDA_MPIV / HIP_MPIV => MPIV_GPU, etc.).

ohearnk requested review from agoetz and Madu86 April 15, 2024 03:06

ohearnk self-assigned this Apr 15, 2024

ohearnk added 11 commits April 17, 2024 21:47

Merge branch 'master' into hip-f-func-porting.

534ea10

Merge branch 'master' into hip-f-func-porting.

43e5ad9

Fix uninitialized variable usage.

a77c08c

Add missed file during CUDA source conversion via hipify-perl (*.cuh).

5e3f244

Fix source file permissions. Remove unused code.

6510281

Merge branch 'master' into hip-f-func-porting.

4815e0b

Deduplicate GPU codes (CUDA/HIP). Change several static constants to …

779130d

…preprocessor definitions for performance and storage considerations. Refactor preprocessor defintions to avoid unnecessary arithmetic.

Fix include path for HIP builds. Match preprocessor controlled code p…

c0a1a64

…aths for older HIP builds.

Conditionally compile ROCsolver code (for SCF diagonalizations) if to…

0bf89e1

…ggled on in CMake build.

Further GPU code deduplication. Use faster math functions for simple …

f937da6

…power functions (inlined device functions calling pow to preprocessor definitions using multiplication operations). Other code clean-up.

ohearnk force-pushed the hip-f-func-porting branch from fbd9602 to f937da6 Compare June 26, 2024 18:33

ohearnk added 2 commits July 1, 2024 11:09

Remove unnecessary DGEMM in SCF for CUDA GPU codepaths.

815b8c9

Ensure QUICK GPU architectures are always set correctly for HIP build…

72782c8

…s. Add CMake option to enable LLVM-based address sanitizer (ASAN) for debugging with HIP builds.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HIP and MPI+HIP updates #361

HIP and MPI+HIP updates #361

ohearnk commented Apr 15, 2024 •

edited

Loading

ohearnk commented Apr 15, 2024 •

edited

Loading

HIP and MPI+HIP updates #361

Are you sure you want to change the base?

HIP and MPI+HIP updates #361

Conversation

ohearnk commented Apr 15, 2024 • edited Loading

ohearnk commented Apr 15, 2024 • edited Loading

ohearnk commented Apr 15, 2024 •

edited

Loading

ohearnk commented Apr 15, 2024 •

edited

Loading