Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setting up TensorFlow plugin 'fused_bias_act.cu': Loading... Failed! #32

Closed
kwhuang88228 opened this issue Feb 10, 2022 · 3 comments
Closed

Comments

@kwhuang88228
Copy link

kwhuang88228 commented Feb 10, 2022

Hi Drew, I'm getting the following error both when I train a GANformer model on the clevr dataset from scratch or when I fine-tune a pretrained model. I didn't have this issue before the repo was updated with PyTorch implementation. I've also tried this and this without luck. Do you have any ideas?

Environment:
Python 3.6.13
tensorflow-gpu 1.14.0
CUDA 9.1
cudnn 7

Start model training from scratch
Local submit - run_dir: results/clevr-scratch-000
dnnlib: Running training.training_loop.training_loop() on localhost...
Streaming data using training.dataset.TFRecordDataset datasets...
Dataset shape:  [3, 256, 256]
Dynamic range:  [0, 255]
Constructing networks...
Setting up TensorFlow plugin 'fused_bias_act.cu': Loading... Failed!
Traceback (most recent call last):
  File "run_network.py", line 556, in <module>
    main()
  File "run_network.py", line 553, in main
    run(**vars(args))
  File "run_network.py", line 368, in run
    dnnlib.submit_run(**kwargs)
  File "/datadrive/kwhuang/gansformer/dnnlib/submission/submit.py", line 346, in submit_run
    return farm.submit(submit_config, host_run_dir)
  File "/datadrive/kwhuang/gansformer/dnnlib/submission/internal/local.py", line 16, in submit
    return run_wrapper(submit_config)
  File "/datadrive/kwhuang/gansformer/dnnlib/submission/submit.py", line 254, in run_wrapper
    run_func_obj(**submit_config.run_func_kwargs)
  File "/datadrive/kwhuang/gansformer/training/training_loop.py", line 194, in training_loop
    label_size = dataset.label_size, **cG.args)
  File "/datadrive/kwhuang/gansformer/dnnlib/tflib/network.py", line 100, in __init__
    self._init_graph()
  File "/datadrive/kwhuang/gansformer/dnnlib/tflib/network.py", line 159, in _init_graph
    out_expr = self._build_func(*self.input_templates, **build_kwargs)
  File "/datadrive/kwhuang/gansformer/training/networks.py", line 868, in Generator
    components.synthesis = tflib.Network("G_synthesis", func_name = globals()[synthesis_func], **kwargs)
  File "/datadrive/kwhuang/gansformer/dnnlib/tflib/network.py", line 100, in __init__
    self._init_graph()
  File "/datadrive/kwhuang/gansformer/dnnlib/tflib/network.py", line 159, in _init_graph
    out_expr = self._build_func(*self.input_templates, **build_kwargs)
  File "/datadrive/kwhuang/gansformer/training/networks.py", line 1423, in G_synthesis
    kernel = 3, att_vars = att_vars)
  File "/datadrive/kwhuang/gansformer/training/networks.py", line 1267, in layer
    resample_kernel = resample_kernel, fused_modconv = _fused_modconv, modulate = style, noconv = noconv)
  File "/datadrive/kwhuang/gansformer/training/networks.py", line 390, in modulated_conv2d_layer
    s = dense_layer(y, dim = get_shape(x)[1], weight_var = mod_weight_var, bias_var = mod_bias_var) + 1 # [BI]
  File "/datadrive/kwhuang/gansformer/training/networks.py", line 77, in dense_layer
    x = apply_bias_act(x, act, lrmul, bias_var, name)
  File "/datadrive/kwhuang/gansformer/training/networks.py", line 85, in apply_bias_act
    return fused_bias_act(x, b = b, act = act)
  File "/datadrive/kwhuang/gansformer/dnnlib/tflib/ops/fused_bias_act.py", line 62, in fused_bias_act
    return impl_dict[impl](x=x, b=b, axis=axis, act=act, alpha=alpha, gain=gain)
  File "/datadrive/kwhuang/gansformer/dnnlib/tflib/ops/fused_bias_act.py", line 116, in _fused_bias_act_cuda
    cuda_kernel = _get_plugin().fused_bias_act
  File "/datadrive/kwhuang/gansformer/dnnlib/tflib/ops/fused_bias_act.py", line 10, in _get_plugin
    return custom_ops.get_plugin(os.path.splitext(__file__)[0] + '.cu')
  File "/datadrive/kwhuang/gansformer/dnnlib/tflib/custom_ops.py", line 156, in get_plugin
    plugin = tf.load_op_library(bin_file)
  File "/anaconda/envs/gansformer/lib/python3.6/site-packages/tensorflow/python/framework/load_library.py", line 61, in load_op_library
    lib_handle = py_tf.TF_LoadLibrary(library_filename)
tensorflow.python.framework.errors_impl.NotFoundError: /datadrive/kwhuang/gansformer/dnnlib/tflib/_cudacache/fused_bias_act_1.14_.so: undefined symbol: _ZN10tensorflow12OpDefBuilder4AttrENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
@dorarad
Copy link
Owner

dorarad commented Feb 10, 2022

Hi, Thanks for reaching out!

I recommend in the following line:
https://github.com/dorarad/gansformer/blob/main/dnnlib/tflib/custom_ops.py#L130
try changing int(tf_ver < 1.15) to 0.

Then you should clean the custom ops built so that you can retry:

rm -rf /external_code/gan/gansformer/dnnlib/tflib/cudacache/

and then try to run the code again. See the following issues (#7, #8) for further discussion and let me know if the solution works!

@kwhuang88228
Copy link
Author

Thanks Drew! Deleting the cuda cache did the trick

@dorarad
Copy link
Owner

dorarad commented Feb 10, 2022

Awesome! :-)

@dorarad dorarad closed this as completed Feb 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants