Segmentation fault with emformer #335

dirkstark · 2024-05-02T23:59:59Z

Hello, I'm currently testing the emformer model and have copied the parameters from "sherpa-cnn-conv-emformer-transducer-small-2023-01-09" into my own model. While the finished emformer model (2023-01-09) is running I get "Segmentation fault" in my own model:

icefall:~/bin$ ./sherpa-ncnn ./test/tokens.txt ./test/encoder_jit_trace-pnnx.ncnn.param ./test/encoder_jit_trace-pnnx.ncnn.bin ./test/decoder_jit_trace-pnnx.ncnn.param ./test/decoder_jit_trace-pnnx.ncnn.bin ./test/joiner_jit_trace-pnnx.ncnn.param ./test/joiner_jit_trace-pnnx.ncnn.bin test.wav
RecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=ModelConfig(encoder_param="./test/encoder_jit_trace-pnnx.ncnn.param", encoder_bin="./test/encoder_jit_trace-pnnx.ncnn.bin", decoder_param="./test/decoder_jit_trace-pnnx.ncnn.param", decoder_bin="./test/decoder_jit_trace-pnnx.ncnn.bin", joiner_param="./test/joiner_jit_trace-pnnx.ncnn.param", joiner_bin="./test/joiner_jit_trace-pnnx.ncnn.bin", tokens="./test/tokens.txt", encoder num_threads=4, decoder num_threads=4, joiner num_threads=4), decoder_config=DecoderConfig(method="greedy_search", num_active_paths=4), endpoint_config=EndpointConfig(rule1=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=2.4, min_utterance_length=0), rule2=EndpointRule(must_contain_nonsilence=True, min_trailing_silence=1.4, min_utterance_length=0), rule3=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=0, min_utterance_length=20)), enable_endpoint=False, hotwords_file="", hotwrods_score=1.5)
wav filename: test.wav
wav duration (s): 28.89
Started!
Segmentation fault (core dumped)

I trained two epochs for testing, then exported with "https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/conv_emformer_transducer_stateless2/export-for-ncnn.py" (streaming-ncnn-decode works) and converted with pnnx. Here is the pnnx log:

./pnnx ~/icefall/egs/test/ASR/conv_emformer_transducer_stateless2/exp/decoder_jit_trace-pnnx.pt
pnnxparam = ~/icefall/egs/test/ASR/conv_emformer_transducer_stateless2/exp/decoder_jit_trace-pnnx.pnnx.param
pnnxbin = ~/icefall/egs/test/ASR/conv_emformer_transducer_stateless2/exp/decoder_jit_trace-pnnx.pnnx.bin
pnnxpy = ~/icefall/egs/test/ASR/conv_emformer_transducer_stateless2/exp/decoder_jit_trace-pnnx_pnnx.py
pnnxonnx = ~/icefall/egs/test/ASR/conv_emformer_transducer_stateless2/exp/decoder_jit_trace-pnnx.pnnx.onnx
ncnnparam = ~/icefall/egs/test/ASR/conv_emformer_transducer_stateless2/exp/decoder_jit_trace-pnnx.ncnn.param
ncnnbin = ~/icefall/egs/test/ASR/conv_emformer_transducer_stateless2/exp/decoder_jit_trace-pnnx.ncnn.bin
ncnnpy = ~/icefall/egs/test/ASR/conv_emformer_transducer_stateless2/exp/decoder_jit_trace-pnnx_ncnn.py
fp16 = 1
optlevel = 2
device = cpu
inputshape =
inputshape2 =
customop =
moduleop = scaling_converter.PoolingModuleNoProj,zipformer.AttentionDownsampleUnsqueeze,zipformer_for_ncnn_export_only.AttentionDownsampleUnsqueeze
############# pass_level0
############# pass_level1
############# pass_level2
############# pass_level3
open failed
############# pass_level4
############# pass_level5
pnnx build without onnx-zero support, skip saving onnx
############# pass_ncnn

Encoder, decoder and joiner have relatively similar output. Even though I use an emformer, it always says zipformer and I also see "open failed" with every conversion.

It would be nice if someone here had a tip.

csukuangfj · 2024-05-03T00:37:15Z

Do you train the.model by yourself?
If yes, have you followed exactly our doc to export it to ncnn?

Does our provided model.work?

dirkstark · 2024-05-03T09:40:12Z

Thank you for your fast answer. Yes, i trained the model by myself. I wrote "streaming-ncnn-decode works" that was wrong. I mean "decode.py and streaming_decode.py" works. I followed this doc: https://icefall.readthedocs.io/en/latest/model-export/export-ncnn-conv-emformer.html

build ncnn from https://github.com/csukuangfj/ncnn
proofed with decode.py if it works
exported with export-for-ncnn.py (i checked the parameters beforehand)
pnnx for joiner, decoder and encoder

don't know if the log is okay ... seems wrong with zipformer and 'open failed', but i get the nccn-bins and params

add SherpaMetaData
test ./streaming-ncnn-decode.py:

2024-05-03 11:33:11,783 INFO [streaming-ncnn-decode.py:349] Constructing Fbank computer
2024-05-03 11:33:11,783 INFO [streaming-ncnn-decode.py:352] Reading sound files: ./exp/test.wav
2024-05-03 11:33:11,789 INFO [streaming-ncnn-decode.py:357] torch.Size([106560])
Segmentation fault

Error in: encoder_out, states = model.run_encoder(frames, states) => ret, ncnn_out0 = ex.extract("out0")

csukuangfj · 2024-05-03T09:43:16Z

exported with export-for-ncnn.py (i checked the parameters beforehand)

Please describe how you checked that.

Also, please answer whether you have followed exactly the following doc
https://icefall.readthedocs.io/en/latest/model-export/export-ncnn-conv-emformer.html

Hint:

You don't need to modify the exported files when running streaming-ncnn-decode.py.
You must modify it according to the doc if you want to run it with sherpa-ncnn.

dirkstark · 2024-05-03T23:20:47Z

Please describe how you checked that.

I trained with this params:

./conv_emformer_transducer_stateless2/train.py
--world-size 1
--num-epochs 30
--start-epoch 1
--exp-dir conv_emformer_transducer_stateless2/exp
--max-duration 420
--master-port 12321
--num-encoder-layers 16
--chunk-length 32
--cnn-module-kernel 31
--left-context-length 32
--right-context-length 8
--memory-size 32
--encoder-dim 144
--dim-feedforward 576
--nhead 4

I tried to use the params from the small model: https://huggingface.co/marcoyang/sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/blob/main/export-ncnn.sh

It seems that there is no "--bpe-model", so I used "tokens.txt" as described in the documentation:

./conv_emformer_transducer_stateless2/export-for-ncnn.py
--exp-dir conv_emformer_transducer_stateless2/exp
--tokens data/lang_bpe_500/tokens.txt
--epoch 2
--avg 1
--use-averaged-model 0
--num-encoder-layers 16
--chunk-length 32
--cnn-module-kernel 31
--left-context-length 32
--right-context-length 8
--memory-size 32
--encoder-dim 144
--dim-feedforward 576
--nhead 4

The output was similar to the documentation. Except:

emformer2.py:614: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert attention.shape == (B * self.nhead, Q, self.head_dim)
emformer2.py:405: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert cache.shape == (B, D, self.cache_size), cache.shape
_trace.py:1065: TracerWarning: Encountering a list at the output of the tracer might cause the trace to be incorrect, this is only valid if the container structure does not change based on the module's inputs. Consider using a constant container instead (e.g. for list, use a tuple instead. for dict, use a NamedTuple instead). If you absolutely need this and know the side effects, pass strict=False to trace() to allow this behavior.
module._c._create_method_from_trace(

The files appear to have been created. With pnnx encoder i get this output:

fp16 = 1
optlevel = 2
device = cpu
inputshape =
inputshape2 =
customop =
moduleop = scaling_converter.PoolingModuleNoProj,zipformer.AttentionDownsampleUnsqueeze,zipformer_for_ncnn_export_only.AttentionDownsampleUnsqueeze
############# pass_level0
inline module = emformer2.Conv2dSubsampling
inline module = scaling.DoubleSwish
inline module = scaling_converter.NonScaledNorm
inline module = torch.nn.modules.linear.Identity
############# pass_level1
############# pass_level2
############# pass_level3
open failed
############# pass_level4
############# pass_level5
[...]
make_slice_expression input 1157
pnnx build without onnx-zero support, skip saving onnx
############# pass_ncnn
[...]
fallback batch axis 233 for operand pnnx_expr_126_mul(1117,1.666667e-01)
[...]
reshape tensor with batch index 1 is not supported yet!
[...]
unsqueeze batch dim 1 is not supported yet!

The missing "--bpe-model" like in your sherpa-ncnn-conv-emformer-transducer-small-2023-01-09 isn't a problem?
The output "moduleop = scaling_converter.PoolingModuleNoProj,zipformer.AttentionDownsampleUnsqueeze,zipformer_for_ncnn_export_only.AttentionDownsampleUnsqueeze" is also okay?
Is there any verbose-mode to check what's wrong?

Also, please answer whether you have followed exactly the following doc

I tried to but I can't guarantee. I'll retry in few days.

Thank you for the hint and your help. Just if it's interesting: I tested with sherpa-ncnn and streaming-ncnn on different systems but get the same error but your "sherpa-ncnn-conv-emformer-transducer-small-2023-01-09 " works well.

csukuangfj · 2024-05-04T00:40:13Z

If you follow the doc exactly, there should not be any issues.

Please try to export with our provided pytorch checkpoint and make sure you can reproduce it.

dirkstark · 2024-05-05T19:27:36Z

Pytorch checkpoint? The documentation states: "We are using Ubuntu 18.04, Torch 1.13 and Python 3.8 for testing" and "Please use a newer version of PyTorch". I am using “2.1.1+cu121”.

I'm not sure if pnnx uses the same cuda version, but if I rebuild everything on a clean system, it's no problem to use 2.1.1, right?

csukuangfj · 2024-05-05T23:05:45Z

pytorch checkpoint is a .pt file.

I suggest you to follow the doc step by step using our provided checkpoint.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segmentation fault with emformer #335

Segmentation fault with emformer #335

dirkstark commented May 2, 2024

csukuangfj commented May 3, 2024

dirkstark commented May 3, 2024

csukuangfj commented May 3, 2024

dirkstark commented May 3, 2024 •

edited

Loading

csukuangfj commented May 4, 2024

dirkstark commented May 5, 2024

csukuangfj commented May 5, 2024

Segmentation fault with emformer #335

Segmentation fault with emformer #335

Comments

dirkstark commented May 2, 2024

csukuangfj commented May 3, 2024

dirkstark commented May 3, 2024

csukuangfj commented May 3, 2024

dirkstark commented May 3, 2024 • edited Loading

csukuangfj commented May 4, 2024

dirkstark commented May 5, 2024

csukuangfj commented May 5, 2024

dirkstark commented May 3, 2024 •

edited

Loading