Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPUs copmilation failing on current dev #276

Open
jpmorgan98 opened this issue Jan 21, 2025 · 3 comments
Open

GPUs copmilation failing on current dev #276

jpmorgan98 opened this issue Jan 21, 2025 · 3 comments
Assignees
Labels
bug Something isn't working as intended gpu numba Numba-related issues

Comments

@jpmorgan98
Copy link
Collaborator

jpmorgan98 commented Jan 21, 2025

This issue I am seeing specifically on Lassen but @alexandermote also reported it on Tioga. When in the current version of dev minimum numba versioning is moving to 0.60.0. This means we don't need to patch numba going forward for the vector atomic issue. It seems that they might have changed the syntax for generating IR code.

To replicate install current dev of mcdc and main of harmonize. Then on CUDA or AMD run any problem with:

python input.py --mode=numba --target=gpu

The pertinent Python trackback is the first chunk which reports:

Traceback (most recent call last):
  File "/usr/WS1/morgan83/pp-mcdc/kobyashi/mcdc_gpu/input.py", line 78, in <module>
    mcdc.run()
  File "/usr/WS1/morgan83/MCDC/mcdc/main.py", line 67, in run
    data_arr, mcdc_arr = prepare()
                         ^^^^^^^^^
  File "/usr/WS1/morgan83/MCDC/mcdc/main.py", line 1132, in prepare
    build_gpu_progs(input_deck, config.args)
  File "/usr/WS1/morgan83/MCDC/mcdc/loop.py", line 843, in build_gpu_progs
    src_spec = gpu_sources_spec()
               ^^^^^^^^^^^^^^^^^^
  File "/usr/WS1/morgan83/MCDC/mcdc/loop.py", line 422, in gpu_sources_spec
    return adapt.harm.RuntimeSpec("mcdc_source", adapt.state_spec, base_fns, async_fns)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/WS1/morgan83/harmonize/harmonize/python/runtime.py", line 277, in __init__
    self.generate_code(gpu_platform)
  File "/usr/WS1/morgan83/harmonize/harmonize/python/runtime.py", line 826, in generate_code
    self.generate_async_ptx(RuntimeSpec.cache_path,suffix,gpu_platform)
  File "/usr/WS1/morgan83/harmonize/harmonize/python/runtime.py", line 753, in generate_async_ptx
    ir_text  = extern_device_ir(fn,self.type_map,suffix,platform)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/WS1/morgan83/harmonize/harmonize/python/codegen.py", line 99, in extern_device_ir
    ir_text, res_type = device_ir(func,platform)
                        ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/WS1/morgan83/harmonize/harmonize/python/codegen.py", line 86, in device_ir
    ptx, res_type = config.cuda.compile_ptx_for_current_device(func,fn_arg_ano(func),device=True,debug=config.DEBUG,opt=(not config.DEBUG))
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/workspace/morgan83/conda_lassen/envs/mcdc_env/lib/python3.11/site-packages/numba/cuda/compiler.py", line 391, in compile_ptx_for_current_device
    return compile_ptx(pyfunc, sig, debug=debug, lineinfo=lineinfo,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/workspace/morgan83/conda_lassen/envs/mcdc_env/lib/python3.11/site-packages/numba/cuda/compiler.py", line 380, in compile_ptx
    return compile(pyfunc, sig, debug=debug, lineinfo=lineinfo, device=device,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

and the very end

File "/usr/workspace/morgan83/conda_lassen/envs/mcdc_env/lib/python3.11/site-packages/numba/core/untyped_passes.py", line 125, in run_pass
    raise TypeError("Signature mismatch: %d argument types given, "
TypeError: Signature mismatch: 4 argument types given, but function takes 5 arguments
^CERROR:  One or more process (first noticed rank 0) terminated with signal 2

The full traceback is attached but probably not important. trace_back.txt

The Numba Side

So here is the documentation for the compile_ptx_for_current_device() from Numba/0.58.0

numba.cuda.compile_ptx_for_current_device(pyfunc, sig, debug=False, lineinfo=False, device=False, fastmath=False, opt=True)

and here's the same documentation from Numba/0.60.0

numba.cuda.compile_ptx_for_current_device(pyfunc, sig, debug=False, lineinfo=False, device=False, fastmath=False, opt=True, abi='numba', abi_info=None)

The options definitely changed but this shouldn't be an issue as they are not required. @alexandermote can you confirm you are using numba>=0.60.0

@jpmorgan98 jpmorgan98 added bug Something isn't working as intended gpu numba Numba-related issues labels Jan 21, 2025
@alexandermote
Copy link
Contributor

Correct; I am on numba==0.61.0. This is the exact error I was getting.

@jpmorgan98
Copy link
Collaborator Author

Ok actually I think this is an issue with the current branch of dev no numba. I reverted to main (release) and it seems to have fixed it so some PR did this. Good lord I need to get GPU CI working again.

@ilhamv any thoughts?

@ilhamv
Copy link
Member

ilhamv commented Jan 22, 2025

@jpmorgan98 Can you try tracing back the commits and see at which point it starts to break the GPU mode?

@jpmorgan98 jpmorgan98 changed the title Numba v0.60.0 not compiling on GPUs GPUs copmilation failing on current dev Jan 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working as intended gpu numba Numba-related issues
Projects
None yet
Development

No branches or pull requests

4 participants