Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pandas.errors.ParserError: Error tokenizing data. C error: Expected 44 fields in line 3169, saw 56 #2

Open
logicvanlyf opened this issue Apr 5, 2024 · 9 comments

Comments

@logicvanlyf
Copy link

Hello, when I reproduce the code, when I run preprocessing.py file, I get the following error:

Preprocessing: 33%|███▎ | 185/565 [00:52<01:48, 3.49it/s]
Traceback (most recent call last):
File "D:\Codes\MHyEEG-main\data\preprocessing.py", line 252, in
preprocess(sessions_dir, args.save_path, args.verbose)
File "D:\Codes\MHyEEG-main\data\preprocessing.py", line 125, in preprocess
gaze_df = pd.read_csv(gaze_file, sep='\t', skiprows=23)
File "D:\anaconda3\envs\pytorch_thesis\lib\site-packages\pandas\io\parsers\readers.py", line 948, in read_csv
return _read(filepath_or_buffer, kwds)
File "D:\anaconda3\envs\pytorch_thesis\lib\site-packages\pandas\io\parsers\readers.py", line 617, in _read
return parser.read(nrows)
File "D:\anaconda3\envs\pytorch_thesis\lib\site-packages\pandas\io\parsers\readers.py", line 1748, in read
) = self._engine.read( # type: ignore[attr-defined]
File "D:\anaconda3\envs\pytorch_thesis\lib\site-packages\pandas\io\parsers\c_parser_wrapper.py", line 234, in read
chunks = self._reader.read_low_memory(nrows)
File "parsers.pyx", line 843, in pandas._libs.parsers.TextReader.read_low_memory
File "parsers.pyx", line 904, in pandas._libs.parsers.TextReader._read_rows
File "parsers.pyx", line 879, in pandas._libs.parsers.TextReader._tokenize_rows
File "parsers.pyx", line 890, in pandas._libs.parsers.TextReader._check_tokenize_status
File "parsers.pyx", line 2058, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 44 fields in line 3169, saw 56

This error indicates that an error occurred while parsing the data using Pandas. Specifically, it encountered a data row that was parsed to have 56 fields, but the program expected that the row should have 44 fields. This may be caused by some lines in the data file that do not match the format expected by the program.

Excuse me, is there a corresponding solution? If so, please let me know and I would be very grateful.

@Xiaochi111
Copy link

Hello, when I reproduce the code, when I run preprocessing.py file, I get the following error:

Preprocessing: 33%|███▎ | 185/565 [00:52<01:48, 3.49it/s] Traceback (most recent call last): File "D:\Codes\MHyEEG-main\data\preprocessing.py", line 252, in preprocess(sessions_dir, args.save_path, args.verbose) File "D:\Codes\MHyEEG-main\data\preprocessing.py", line 125, in preprocess gaze_df = pd.read_csv(gaze_file, sep='\t', skiprows=23) File "D:\anaconda3\envs\pytorch_thesis\lib\site-packages\pandas\io\parsers\readers.py", line 948, in read_csv return _read(filepath_or_buffer, kwds) File "D:\anaconda3\envs\pytorch_thesis\lib\site-packages\pandas\io\parsers\readers.py", line 617, in _read return parser.read(nrows) File "D:\anaconda3\envs\pytorch_thesis\lib\site-packages\pandas\io\parsers\readers.py", line 1748, in read ) = self._engine.read( # type: ignore[attr-defined] File "D:\anaconda3\envs\pytorch_thesis\lib\site-packages\pandas\io\parsers\c_parser_wrapper.py", line 234, in read chunks = self._reader.read_low_memory(nrows) File "parsers.pyx", line 843, in pandas._libs.parsers.TextReader.read_low_memory File "parsers.pyx", line 904, in pandas._libs.parsers.TextReader._read_rows File "parsers.pyx", line 879, in pandas._libs.parsers.TextReader._tokenize_rows File "parsers.pyx", line 890, in pandas._libs.parsers.TextReader._check_tokenize_status File "parsers.pyx", line 2058, in pandas._libs.parsers.raise_parser_error pandas.errors.ParserError: Error tokenizing data. C error: Expected 44 fields in line 3169, saw 56

This error indicates that an error occurred while parsing the data using Pandas. Specifically, it encountered a data row that was parsed to have 56 fields, but the program expected that the row should have 44 fields. This may be caused by some lines in the data file that do not match the format expected by the program.

Excuse me, is there a corresponding solution? If so, please let me know and I would be very grateful.

Hello, I have also encountered this problem. Have you solved it? If you have solved it, can you tell me the solution

@KONE544174974
Copy link

that becase the document P10-Rec1-All-Date-New-Section_30.tsv lose 3 lines of data, so need some tips to recorrect~

@balancedzq
Copy link

Hello, when I reproduce the code, when I run preprocessing.py file, I get the following error:

Preprocessing: 33%|███▎ | 185/565 [00:52<01:48, 3.49it/s] Traceback (most recent call last): File "D:\Codes\MHyEEG-main\data\preprocessing.py", line 252, in preprocess(sessions_dir, args.save_path, args.verbose) File "D:\Codes\MHyEEG-main\data\preprocessing.py", line 125, in preprocess gaze_df = pd.read_csv(gaze_file, sep='\t', skiprows=23) File "D:\anaconda3\envs\pytorch_thesis\lib\site-packages\pandas\io\parsers\readers.py", line 948, in read_csv return _read(filepath_or_buffer, kwds) File "D:\anaconda3\envs\pytorch_thesis\lib\site-packages\pandas\io\parsers\readers.py", line 617, in _read return parser.read(nrows) File "D:\anaconda3\envs\pytorch_thesis\lib\site-packages\pandas\io\parsers\readers.py", line 1748, in read ) = self._engine.read( # type: ignore[attr-defined] File "D:\anaconda3\envs\pytorch_thesis\lib\site-packages\pandas\io\parsers\c_parser_wrapper.py", line 234, in read chunks = self._reader.read_low_memory(nrows) File "parsers.pyx", line 843, in pandas._libs.parsers.TextReader.read_low_memory File "parsers.pyx", line 904, in pandas._libs.parsers.TextReader._read_rows File "parsers.pyx", line 879, in pandas._libs.parsers.TextReader._tokenize_rows File "parsers.pyx", line 890, in pandas._libs.parsers.TextReader._check_tokenize_status File "parsers.pyx", line 2058, in pandas._libs.parsers.raise_parser_error pandas.errors.ParserError: Error tokenizing data. C error: Expected 44 fields in line 3169, saw 56

This error indicates that an error occurred while parsing the data using Pandas. Specifically, it encountered a data row that was parsed to have 56 fields, but the program expected that the row should have 44 fields. This may be caused by some lines in the data file that do not match the format expected by the program.

Excuse me, is there a corresponding solution? If so, please let me know and I would be very grateful.

Hello, I have also encountered this problem. Have you solved it? If you have solved it, can you tell me the solution

@balancedzq
Copy link

Hello, when I reproduce the code, when I run preprocessing.py file, I get the following error:
Preprocessing: 33%|███▎ | 185/565 [00:52<01:48, 3.49it/s] Traceback (most recent call last): File "D:\Codes\MHyEEG-main\data\preprocessing.py", line 252, in preprocess(sessions_dir, args.save_path, args.verbose) File "D:\Codes\MHyEEG-main\data\preprocessing.py", line 125, in preprocess gaze_df = pd.read_csv(gaze_file, sep='\t', skiprows=23) File "D:\anaconda3\envs\pytorch_thesis\lib\site-packages\pandas\io\parsers\readers.py", line 948, in read_csv return _read(filepath_or_buffer, kwds) File "D:\anaconda3\envs\pytorch_thesis\lib\site-packages\pandas\io\parsers\readers.py", line 617, in _read return parser.read(nrows) File "D:\anaconda3\envs\pytorch_thesis\lib\site-packages\pandas\io\parsers\readers.py", line 1748, in read ) = self._engine.read( # type: ignore[attr-defined] File "D:\anaconda3\envs\pytorch_thesis\lib\site-packages\pandas\io\parsers\c_parser_wrapper.py", line 234, in read chunks = self._reader.read_low_memory(nrows) File "parsers.pyx", line 843, in pandas._libs.parsers.TextReader.read_low_memory File "parsers.pyx", line 904, in pandas._libs.parsers.TextReader._read_rows File "parsers.pyx", line 879, in pandas._libs.parsers.TextReader._tokenize_rows File "parsers.pyx", line 890, in pandas._libs.parsers.TextReader._check_tokenize_status File "parsers.pyx", line 2058, in pandas._libs.parsers.raise_parser_error pandas.errors.ParserError: Error tokenizing data. C error: Expected 44 fields in line 3169, saw 56
This error indicates that an error occurred while parsing the data using Pandas. Specifically, it encountered a data row that was parsed to have 56 fields, but the program expected that the row should have 44 fields. This may be caused by some lines in the data file that do not match the format expected by the program.
Excuse me, is there a corresponding solution? If so, please let me know and I would be very grateful.

Hello, I have also encountered this problem. Have you solved it? If you have solved it, can you tell me the solution

Hello, I have also encountered this problem. Have you solved it? If you have solved it, can you tell me the solution

@balancedzq
Copy link

因为文件P10-Rec1-All-Date-New-Section_30.tsv丢失了3行数据,所以需要一些提示来重新纠正~

Hello, could you tell me more about how to deal with this problem? thank you

@KONE544174974
Copy link

KONE544174974 commented Apr 21, 2024

因为文件P10-Rec1-All-Date-New-Section_30.tsv丢失了3行数据,所以需要一些提示来重新纠正~

Hello, could you tell me more about how to deal with this problem? thank you

i used the simple way to recorect this problem~ just find this document named P10-Rec1-All-Date-New-Section_30.tsv, and then check the line 3159 or 3169 i couldn't remember clearly, but just check around these lines, you will find 3 lins are different, then follow the before or later line to re-add 3 lines. However, need you spent a few time to be familiar with the structure of data documents and mock them~good luck!

@balancedzq
Copy link

由于文件P10-Rec1-All-Date-New-Section_30.tsv丢失了3行数据,所以需要一些提示来重新修正~

您好,您能告诉我更多有关如何处理这个问题的信息吗?谢谢

我用简单的方法重新纠正了这个问题只要找到这个名为P10-Rec1-All-Date-New-Section_30.tsv的文件,然后检查第3159或3169行我记不太清楚了,但只要检查一下这些行即可,你会发现有3行不同,然后按照前面或后面的行重新添加3行。不过,需要你花一些时间来熟悉数据文档的结构并模拟它们祝你好运!

Hello, I know very little about the original data file, I would be very grateful if you could share your corrected P10-Rec1-All-Date-New-Section_30.tsv file

@elelo22
Copy link
Collaborator

elelo22 commented Apr 22, 2024

Hi everyone, sorry for the late reply. I tried running the code again but I don't get this error, so maybe the dataset changed? In fact, I looked and it seems I don't have this file P10-Rec1-All-Data-New-Section_30.tsv, I have P10-Rec1-All-Data-New_Section_28.tsv and then P10-Rec1-All-Data-New_Section_32.tsv. I think the fastest solution would be to just skip this file and hopefully others don't have the same problem.

Otherwise, try @KONE544174974's solution, maybe they can give a bit more details as to how they solved the problem/there's a way to do it programmatically which could be shared. I would try to do it but not being able to reproduce/see the problem I can't try to come up with a solution.

@aishanii
Copy link

aishanii commented Jul 8, 2024

hi, while running the main file, i am getting this error.
RuntimeError: weight tensor should be defined either for all 3 classes or no classes but got weight tensor of shape: [2]

----Loading dataset----
Dataset: MAHNOB-HCI
#Traning samples: 360
#Validation samples: 3
#Training distribution: [210 150]
wandb: Currently logged in as: vanshuagarwal11-03 (vanshuagarwal11-03-SRM Institute of Science and Technology). Use wandb login --relogin to force relogin
wandb: Tracking run with wandb version 0.17.4
wandb: Run data is saved locally in /content/wandb/run-20240708_111041-0am4ciyg
wandb: Run wandb offline to turn off syncing.
wandb: Syncing run proud-leaf-49
wandb: ⭐️ View project at https://wandb.ai/vanshuagarwal11-03-SRM%20Institute%20of%20Science%20and%20Technology/MHyEEG
wandb: 🚀 View run at https://wandb.ai/vanshuagarwal11-03-SRM%20Institute%20of%20Science%20and%20Technology/MHyEEG/runs/0am4ciyg
Number of parameters: 19663747

Running on GPU? True - gpu_num: 0
Train round: 0% 0/45 [00:00<?, ?batch/s]tensor([[ 0.0504, -0.0277, -0.0015],
[-0.1052, -0.0144, 0.0084],
[-0.1097, 0.0798, 0.0005],
[-0.0701, -0.0238, -0.0405],
[-0.0034, 0.0291, 0.0478],
[-0.0992, 0.0290, -0.0440],
[-0.1483, -0.0215, -0.0354],
[-0.0660, 0.0021, -0.0198]], device='cuda:0',
grad_fn=)
tensor([2, 1, 2, 2, 1, 1, 2, 1], device='cuda:0')
Traceback (most recent call last):
File "/content/drive/MyDrive/MHyEEG-main-share/MHyEEG-main/main.py", line 88, in
main(args, n_workers)
File "/content/drive/MyDrive/MHyEEG-main-share/MHyEEG-main/main.py", line 47, in main
trainer.train(train_loader, eval_loader)
File "/content/drive/MyDrive/MHyEEG-main-share/MHyEEG-main/training.py", line 92, in train
loss = self.criterion(outputs, labels)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/loss.py", line 1185, in forward
return F.cross_entropy(input, target, weight=self.weight,
File "/usr/local/lib/python3.10/dist-packages/torch/nn/functional.py", line 3086, in cross_entropy
return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
RuntimeError: weight tensor should be defined either for all 3 classes or no classes but got weight tensor of shape: [2]
wandb: 🚀 View run proud-leaf-49 at: https://wandb.ai/vanshuagarwal11-03-SRM%20Institute%20of%20Science%20and%20Technology/MHyEEG/runs/0am4ciyg
wandb: ⭐️ View project at: https://wandb.ai/vanshuagarwal11-03-SRM%20Institute%20of%20Science%20and%20Technology/MHyEEG
wandb: Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
wandb: Find logs at: ./wandb/run-20240708_111041-0am4ciyg/logs
wandb: WARNING The new W&B backend becomes opt-out in version 0.18.0; try it out with wandb.require("core")! See https://wandb.me/wandb-core for more information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants