Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1gpu test fail with severely constrained memory #35

Open
abouteiller opened this issue Mar 27, 2021 · 1 comment
Open

1gpu test fail with severely constrained memory #35

abouteiller opened this issue Mar 27, 2021 · 1 comment
Assignees
Labels
bug Something isn't working high priority This is an important feature

Comments

@abouteiller
Copy link
Contributor

Original report by me.


The following command launches a GPU test with severe memory constraints (only half the matrix fits in the block allocation).

The test passes about half the time, crashes about half the time.

ASAN_OPTIONS=suppressions=$HOME/parsec/master/parsec/contrib/asan.supp LSAN_OPTIONS=exitcode=0:detect_leaks=0:suppressions=$HOME/parsec/master/parsec/contrib/lsan.supp /usr/bin/srun --pty  "-Ccauchy" "-N" "1" "-Cgtx1060" gdb --args "./tests/testing_dpotrf" "-N" "3200" "-t" "320" "-x" "-v=5" "-g" "1" "-P" "1"  "--" "--mca" "device_cuda_memory_number_of_blocks" "80"

It turns out that in some cases the stage_out function pointer is NULL; probably when we bounce the task due to lack of memory to hold all three tiles?

Thread 16 "testing_dpotrf" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fff77fff700 (LWP [129089 (bb)](https://bitbucket.org/icldistcomp/dplasma/commits/129089))]
0x0000000000000000 in ?? ()
(gdb) bt
#0    0x0000000000000000 in No symbol matches 0x0000000000000000. () at None
#1    0x00007ffff2c10397 in parsec_gpu_kernel_pop (gpu_device=0x616000005a00, gpu_task=0x615000190000, gpu_stream=0x6190000104c8)
                         at /home/bouteill/parsec/master/parsec/parsec/mca/device/cuda/device_cuda_module.c:2523
#2    0x00007ffff2c0f2aa in progress_stream (gpu_device=0x616000005a00, stream=0x6190000104c8, upstream_progress_fct=0x7ffff2c102ba <parsec_gpu_kernel_pop>, task=0x615000190000, out_task=0x7fff77fe0cb0)
                         at /home/bouteill/parsec/master/parsec/parsec/mca/device/cuda/device_cuda_module.c:2295
#3    0x00007ffff2c1219e in parsec_gpu_kernel_scheduler (es=0x6140000a0000, gpu_task=0x615000190000, which_gpu=2)
                         at /home/bouteill/parsec/master/parsec/parsec/mca/device/cuda/device_cuda_module.c:2975
#4    0x00007ffff42d0cbe in hook_of_dpotrf_U_potrf_dtrsm_CUDA (es=0x6140000a0000, this_task=0x61800019c400)
                         at src/dpotrf_U.c:7519
#5    0x00007ffff2be6f36 in __parsec_execute (es=0x6140000a0000, task=0x61800019c400)
                         at /home/bouteill/parsec/master/parsec/parsec/scheduling.c:172
#6    0x00007ffff2be7a1a in __parsec_task_progress (es=0x6140000a0000, task=0x61800019c400, distance=0)
                         at /home/bouteill/parsec/master/parsec/parsec/scheduling.c:431
#7    0x00007ffff2be7e6e in __parsec_context_wait (es=0x6140000a0000)
                         at /home/bouteill/parsec/master/parsec/parsec/scheduling.c:560
#8    0x00007ffff2bcabc2 in __parsec_thread_init (startup=0x6150000021c0)
                         at /home/bouteill/parsec/master/parsec/parsec/parsec.c:291
#9    0x00007fffeae54dd5 in start_thread + 0xc5 () at /usr/lib64/libpthread.so.0
#10   0x00007fffb4c8aead in clone + 0x6d () at /usr/lib64/libc.so.6
(gdb) up
#1  0x00007ffff2c10397 in parsec_gpu_kernel_pop (gpu_device=0x616000005a00, gpu_task=0x615000190000, gpu_stream=0x6190000104c8)
    at /home/bouteill/parsec/master/parsec/parsec/mca/device/cuda/device_cuda_module.c:2523
2523                if(PARSEC_SUCCESS != gpu_task->stage_out(gpu_task, (1U << i), gpu_stream)){
(gdb) list
2518            for( i = 0; i < this_task->locals[0].value; i++ ) {
2519                gpu_copy = this_task->data[i].data_out;
2520                /* If the gpu copy is not owned by parsec, we don't manage it at all */
2521                if( 0 == (gpu_copy->flags & PARSEC_DATA_FLAG_PARSEC_OWNED) ) continue;
2522                original = gpu_copy->original;
2523                if(PARSEC_SUCCESS != gpu_task->stage_out(gpu_task, (1U << i), gpu_stream)){
2524                    parsec_warning( "%s:%d %s", __FILE__, __LINE__,
2525                                    "gpu_task->stage_out from device ");
2526                    parsec_warning("data %s <<%p>> -> <<%p>>\n", this_task->task_class->out[i]->name,
2527                                    gpu_copy->device_private, original->device_copies[0]->device_private);
(gdb) p gpu_task->stage_out
$1 = (parsec_stage_out_function_t *) 0x0

@abouteiller abouteiller added high priority This is an important feature and removed critical labels Feb 28, 2022
@abouteiller
Copy link
Contributor Author

still present, also manifest in ctest

srun: job 453777 queued and waiting for resources
srun: job 453777 has been allocated resources
W@00000 /!\ DEBUG LEVEL WILL PROBABLY REDUCE THE PERFORMANCE OF THIS RUN /!\.
i@00001 GPU Device 0 (capability 6.1): NVIDIA GeForce GTX 1060 6GB: cuda(0)
        Location (PCI Bus/Device/Domain): 2:0.0
        SM                 : 10
        clockRate (GHz)    : 1.71
        concurrency        : yes
        computeMode        : 0
        Peak Memory Bandwidth (GB/s): 192.19 [Clock Rate (Khz) 4004000 | Bus Width (bits) 192]
        peak Gflops         : double 136.680, single 4373.760 tensor 0.000 half 68.340
i@00002 GPU Device 0 (capability 6.1): NVIDIA GeForce GTX 1060 6GB: cuda(0)
        Location (PCI Bus/Device/Domain): 2:0.0
        SM                 : 10
        clockRate (GHz)    : 1.71
        concurrency        : yes
        computeMode        : 0
        Peak Memory Bandwidth (GB/s): 192.19 [Clock Rate (Khz) 4004000 | Bus Width (bits) 192]
        peak Gflops         : double 136.680, single 4373.760 tensor 0.000 half 68.340
i@00003 GPU Device 0 (capability 6.1): NVIDIA GeForce GTX 1060 6GB: cuda(0)
        Location (PCI Bus/Device/Domain): 2:0.0
        SM                 : 10
        clockRate (GHz)    : 1.71
        concurrency        : yes
        computeMode        : 0
        Peak Memory Bandwidth (GB/s): 192.19 [Clock Rate (Khz) 4004000 | Bus Width (bits) 192]
        peak Gflops         : double 136.680, single 4373.760 tensor 0.000 half 68.340
i@00000 GPU Device 0 (capability 6.1): NVIDIA GeForce GTX 1060 6GB: cuda(0)
        Location (PCI Bus/Device/Domain): 2:0.0
        SM                 : 10
        clockRate (GHz)    : 1.71
        concurrency        : yes
        computeMode        : 0
        Peak Memory Bandwidth (GB/s): 192.19 [Clock Rate (Khz) 4004000 | Bus Width (bits) 192]
        peak Gflops         : double 136.680, single 4373.760 tensor 0.000 half 68.340
i@00001 GPU Device 1 (capability 6.1): NVIDIA GeForce GTX 1060 6GB: cuda(1)
        Location (PCI Bus/Device/Domain): 4:0.0
        SM                 : 10
        clockRate (GHz)    : 1.71
        concurrency        : yes
        computeMode        : 0
        Peak Memory Bandwidth (GB/s): 192.19 [Clock Rate (Khz) 4004000 | Bus Width (bits) 192]
        peak Gflops         : double 136.680, single 4373.760 tensor 0.000 half 68.340
i@00002 GPU Device 1 (capability 6.1): NVIDIA GeForce GTX 1060 6GB: cuda(1)
        Location (PCI Bus/Device/Domain): 4:0.0
        SM                 : 10
        clockRate (GHz)    : 1.71
        concurrency        : yes
        computeMode        : 0
        Peak Memory Bandwidth (GB/s): 192.19 [Clock Rate (Khz) 4004000 | Bus Width (bits) 192]
        peak Gflops         : double 136.680, single 4373.760 tensor 0.000 half 68.340
i@00001 CPU Device: Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz
        Parsec Streams     : 20
        clockRate (GHz)    : 2.30
        peak Gflops        : double 184.0000, single 368.0000
i@00003 GPU Device 1 (capability 6.1): NVIDIA GeForce GTX 1060 6GB: cuda(1)
        Location (PCI Bus/Device/Domain): 4:0.0
        SM                 : 10
        clockRate (GHz)    : 1.71
        concurrency        : yes
        computeMode        : 0
        Peak Memory Bandwidth (GB/s): 192.19 [Clock Rate (Khz) 4004000 | Bus Width (bits) 192]
        peak Gflops         : double 136.680, single 4373.760 tensor 0.000 half 68.340
i@00002 CPU Device: Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz
        Parsec Streams     : 20
        clockRate (GHz)    : 2.30
        peak Gflops        : double 184.0000, single 368.0000
i@00003 CPU Device: Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz
        Parsec Streams     : 20
        clockRate (GHz)    : 2.30
        peak Gflops        : double 184.0000, single 368.0000
i@00000 GPU Device 1 (capability 6.1): NVIDIA GeForce GTX 1060 6GB: cuda(1)
        Location (PCI Bus/Device/Domain): 4:0.0
        SM                 : 10
        clockRate (GHz)    : 1.71
        concurrency        : yes
        computeMode        : 0
        Peak Memory Bandwidth (GB/s): 192.19 [Clock Rate (Khz) 4004000 | Bus Width (bits) 192]
        peak Gflops         : double 136.680, single 4373.760 tensor 0.000 half 68.340
i@00000 CPU Device: Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz
        Parsec Streams     : 20
        clockRate (GHz)    : 2.30
        peak Gflops        : double 184.0000, single 368.0000
#+++++ cores detected       : 20
#+++++ nodes x cores + gpu  : 4 x 20 + 2 (80+8)
#+++++ thread mode          : THREAD_SERIALIZED
#+++++ P x Q                : 2 x 2 (4/4)
#+++++ M x N x K|NRHS       : 1940 x 1940 x 1940
#+++++ LDA , LDB , LDC      : 1940 , 1940 , 1940
#+++++ MB x NB              : 320 x 320
[   1] TIME(s)      0.54203 : PaRSEC initialized
Generate matrices ... Done
[   2] TIME(s)      0.53899 : PaRSEC initialized
Generate matrices ... Done
[   3] TIME(s)      0.53739 : PaRSEC initialized
Generate matrices ... Done
[b03:155037:0:155037] Caught signal 8 (Floating point exception: integer divide by zero)
[   0] TIME(s)      0.54732 : PaRSEC initialized
***************************************************
 ----- TESTING DGEMM (N, N) --------
Generate matrices ... Done
[b04:154188:0:154188] Caught signal 8 (Floating point exception: integer divide by zero)
[b01:1877 :0:1877] Caught signal 8 (Floating point exception: integer divide by zero)
[b02:155626:0:155626] Caught signal 8 (Floating point exception: integer divide by zero)
==== backtrace (tid: 154188) ====
 0  /sw/spack/2022-02-10/opt/spack/linux-scientific7-x86_64/gcc-7.3.0/ucx-1.11.2-eu33bafslc3brefgjapdl6ilwpnhhyh4/lib/libucs.so.0(ucs_handle_error+0xe4) [0x7fffab2dce14]
==== backtrace (tid:   1877) ====
==== backtrace (tid: 155037) ====
 0  /sw/spack/2022-02-10/opt/spack/linux-scientific7-x86_64/gcc-7.3.0/ucx-1.11.2-eu33bafslc3brefgjapdl6ilwpnhhyh4/lib/libucs.so.0(ucs_handle_error+0xe4) [0x7fffab2dce14]
 1  /sw/spack/2022-02-10/opt/spack/linux-scientific7-x86_64/gcc-7.3.0/ucx-1.11.2-eu33bafslc3brefgjapdl6ilwpnhhyh4/lib/libucs.so.0(+0x2b13c) [0x7fffab2dd13c]
 2  /sw/spack/2022-02-10/opt/spack/linux-scientific7-x86_64/gcc-7.3.0/ucx-1.11.2-eu33bafslc3brefgjapdl6ilwpnhhyh4/lib/libucs.so.0(+0x2b4ba) [0x7fffab2dd4ba]
 3  /usr/lib64/libpthread.so.0(+0xf630) [0x7fffebc28630]
 4  /home/bouteill/parsec/dplasma.debug/src/libdplasma.so.2(+0x4d6b85) [0x7ffff41b2b85]
 5  /home/bouteill/parsec/dplasma.debug/src/libdplasma.so.2(dplasma_dgemm_New_ex+0x318) [0x7ffff41b2edb]
 6  /home/bouteill/parsec/dplasma.debug/src/libdplasma.so.2(dplasma_dgemm_New+0x63) [0x7ffff41b2ff7]
 7  /home/bouteill/parsec/dplasma.debug/tests/./testing_dgemm() [0x405d42]
 8  /usr/lib64/libc.so.6(__libc_start_main+0xf5) [0x7fffc2735555]
 9  /home/bouteill/parsec/dplasma.debug/tests/./testing_dgemm() [0x402b09]
=================================
==== backtrace (tid: 155626) ====
 1  /sw/spack/2022-02-10/opt/spack/linux-scientific7-x86_64/gcc-7.3.0/ucx-1.11.2-eu33bafslc3brefgjapdl6ilwpnhhyh4/lib/libucs.so.0(+0x2b13c) [0x7fffab2dd13c]
 2  /sw/spack/2022-02-10/opt/spack/linux-scientific7-x86_64/gcc-7.3.0/ucx-1.11.2-eu33bafslc3brefgjapdl6ilwpnhhyh4/lib/libucs.so.0(+0x2b4ba) [0x7fffab2dd4ba]
 3  /usr/lib64/libpthread.so.0(+0xf630) [0x7fffebc28630]
 4  /home/bouteill/parsec/dplasma.debug/src/libdplasma.so.2(+0x4d6b85) [0x7ffff41b2b85]
 5  /home/bouteill/parsec/dplasma.debug/src/libdplasma.so.2(dplasma_dgemm_New_ex+0x318) [0x7ffff41b2edb]
 6  /home/bouteill/parsec/dplasma.debug/src/libdplasma.so.2(dplasma_dgemm_New+0x63) [0x7ffff41b2ff7]
 7  /home/bouteill/parsec/dplasma.debug/tests/./testing_dgemm() [0x405d42]
 8  /usr/lib64/libc.so.6(__libc_start_main+0xf5) [0x7fffc2735555]
 9  /home/bouteill/parsec/dplasma.debug/tests/./testing_dgemm() [0x402b09]
=================================
 0  /sw/spack/2022-02-10/opt/spack/linux-scientific7-x86_64/gcc-7.3.0/ucx-1.11.2-eu33bafslc3brefgjapdl6ilwpnhhyh4/lib/libucs.so.0(ucs_handle_error+0xe4) [0x7fffab2dce14]
 1  /sw/spack/2022-02-10/opt/spack/linux-scientific7-x86_64/gcc-7.3.0/ucx-1.11.2-eu33bafslc3brefgjapdl6ilwpnhhyh4/lib/libucs.so.0(+0x2b13c) [0x7fffab2dd13c]
 2  /sw/spack/2022-02-10/opt/spack/linux-scientific7-x86_64/gcc-7.3.0/ucx-1.11.2-eu33bafslc3brefgjapdl6ilwpnhhyh4/lib/libucs.so.0(+0x2b4ba) [0x7fffab2dd4ba]
 3  /usr/lib64/libpthread.so.0(+0xf630) [0x7fffebc28630]
 4  /home/bouteill/parsec/dplasma.debug/src/libdplasma.so.2(+0x4d6b85) [0x7ffff41b2b85]
 5  /home/bouteill/parsec/dplasma.debug/src/libdplasma.so.2(dplasma_dgemm_New_ex+0x318) [0x7ffff41b2edb]
 6  /home/bouteill/parsec/dplasma.debug/src/libdplasma.so.2(dplasma_dgemm_New+0x63) [0x7ffff41b2ff7]
 7  /home/bouteill/parsec/dplasma.debug/tests/./testing_dgemm() [0x405d42]
 0  /sw/spack/2022-02-10/opt/spack/linux-scientific7-x86_64/gcc-7.3.0/ucx-1.11.2-eu33bafslc3brefgjapdl6ilwpnhhyh4/lib/libucs.so.0(ucs_handle_error+0xe4) [0x7fffab2dce14]
 1  /sw/spack/2022-02-10/opt/spack/linux-scientific7-x86_64/gcc-7.3.0/ucx-1.11.2-eu33bafslc3brefgjapdl6ilwpnhhyh4/lib/libucs.so.0(+0x2b13c) [0x7fffab2dd13c]
 2  /sw/spack/2022-02-10/opt/spack/linux-scientific7-x86_64/gcc-7.3.0/ucx-1.11.2-eu33bafslc3brefgjapdl6ilwpnhhyh4/lib/libucs.so.0(+0x2b4ba) [0x7fffab2dd4ba]
 3  /usr/lib64/libpthread.so.0(+0xf630) [0x7fffebc28630]
 4  /home/bouteill/parsec/dplasma.debug/src/libdplasma.so.2(+0x4d6b85) [0x7ffff41b2b85]
 5  /home/bouteill/parsec/dplasma.debug/src/libdplasma.so.2(dplasma_dgemm_New_ex+0x318) [0x7ffff41b2edb]
 8  /usr/lib64/libc.so.6(__libc_start_main+0xf5) [0x7fffc2735555]
 9  /home/bouteill/parsec/dplasma.debug/tests/./testing_dgemm() [0x402b09]
=================================
 6  /home/bouteill/parsec/dplasma.debug/src/libdplasma.so.2(dplasma_dgemm_New+0x63) [0x7ffff41b2ff7]
 7  /home/bouteill/parsec/dplasma.debug/tests/./testing_dgemm() [0x405d42]
 8  /usr/lib64/libc.so.6(__libc_start_main+0xf5) [0x7fffc2735555]
 9  /home/bouteill/parsec/dplasma.debug/tests/./testing_dgemm() [0x402b09]
=================================
Compute ... ... Compute ... ... Compute ... ... Compute ... ... srun: error: b02: task 1: Floating point exception
srun: error: b04: task 3: Floating point exception
srun: error: b01: task 0: Floating point exception
srun: error: b03: task 2: Floating point exception

    ```

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working high priority This is an important feature
Projects
None yet
Development

No branches or pull requests

2 participants