ValueError: vcf is not a valid file or directory. Please provide a valid file or directory. #71

RosaDeSa · 2023-05-09T13:54:51Z

Hi Kevin , I'm trying this script but I'm running into this error during the prediction:
(the vcf file was annotated with VEP)

DEBUG | ezancestry.process:process_user_input:214 - list index out of range
Traceback (most recent call last):
File "/home/tigem/r.desantis/.local/lib/python3.9/site-packages/ezancestry/process.py", line 217, in process_user_input
snpsdf = pd.read_csv(
File "/usr/local/lib/python3.9/dist-packages/pandas/util/_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/pandas/io/parsers/readers.py", line 678, in read_csv
return _read(filepath_or_buffer, kwds)
File "/usr/local/lib/python3.9/dist-packages/pandas/io/parsers/readers.py", line 581, in _read
return parser.read(nrows)
File "/usr/local/lib/python3.9/dist-packages/pandas/io/parsers/readers.py", line 1253, in read
index, columns, col_dict = self._engine.read(nrows)
File "/usr/local/lib/python3.9/dist-packages/pandas/io/parsers/python_parser.py", line 270, in read
alldata = self._rows_to_cols(content)
File "/usr/local/lib/python3.9/dist-packages/pandas/io/parsers/python_parser.py", line 1013, in _rows_to_cols
self._alert_malformed(msg, row_num + 1)
File "/usr/local/lib/python3.9/dist-packages/pandas/io/parsers/python_parser.py", line 739, in _alert_malformed
raise ParserError(msg)
pandas.errors.ParserError: Expected 3 fields in line 7, saw 4

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/tigem/r.desantis/.local/bin/ezancestry", line 8, in
sys.exit(app())
File "/home/tigem/r.desantis/.local/lib/python3.9/site-packages/typer/main.py", line 214, in call
return get_command(self)(*args, **kwargs)
File "/home/tigem/r.desantis/.local/lib/python3.9/site-packages/click/core.py", line 829, in call
return self.main(*args, **kwargs)
File "/home/tigem/r.desantis/.local/lib/python3.9/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/home/tigem/r.desantis/.local/lib/python3.9/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/tigem/r.desantis/.local/lib/python3.9/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/tigem/r.desantis/.local/lib/python3.9/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/home/tigem/r.desantis/.local/lib/python3.9/site-packages/typer/main.py", line 532, in wrapper
return callback(**use_params) # type: ignore
File "/home/tigem/r.desantis/.local/lib/python3.9/site-packages/ezancestry/commands.py", line 286, in predict
snpsdf = process_user_input(input_data, aisnps_directory, aisnps_set)
File "/home/tigem/r.desantis/.local/lib/python3.9/site-packages/ezancestry/process.py", line 232, in process_user_input
raise ValueError(
ValueError: a1.VEP.ann.vcf is not a valid file or directory. Please provide a valid file or directory.

arvkevi · 2023-05-10T00:26:28Z

Hi @RosaDeSa 👋🏼 were you able to figure out what the issue was? If so, it could be helpful for others if you share your solution. I'm unsure how ezancestry handles VEP annotations, the parser from snps might be robust enough to handle them though.

RosaDeSa · 2023-05-10T10:56:22Z

Hi @arvkevi , I obtained the prediction.csv file and plotted it. The problem was probably due to a malformed file; I generated again the VCF file adding some parameters in VEP.
Despite this, I'm still determining the results, I used two different VCFs (from two different samples), but the prediction results are exactly the same; this is probably a little weird. I'll try snsp, as you suggested. If I find consistent results, I'll gladly share the solution here!
Thanx

arvkevi · 2023-05-11T08:38:01Z

Ezancestry uses snps to read vcfs in process.py. Are the two samples related? Do they have the exact same set of AISNPs?

RosaDeSa · 2023-05-11T10:33:23Z

I noticed it, also using snps I've same results. The samples are not related, they belong two different person.
And yes, they have the same AISNPs, it's weird, isn't?

In a while I'll analyze wgs of other 2 different samples, I'll test also on those the script.

#pca,kidd,/home/r.desantis/.ezancestry/data/models,/home/r.desantis/.ezancestry/data/aisnps
,component1,component2,component3,predicted_population_population,ACB,ASW,BEB,CDX,CEU,CHB,CHS,CLM,ESN,FIN,GBR,GIH,GWD,IBS,ITU,JPT,KHV,LWK,MSL,MXL,PEL,PJL,PUR,STU,TSI,YRI,predicted_population_superpopulation,AFR,AMR,EAS,EUR,SAS,population_description,superpopulation_name
LV_vep.vcf,0.11874386857468588,0.15300045809781831,0.3265148978535419,ITU,0.0,0.0,0.08919748915377203,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.09703463769218275,0.0,0.0,0.29927578644262454,0.0,0.0,0.0,0.0,0.0,0.0,0.08710151819096609,0.22274821473011025,0.20464235379034443,0.0,0.0,SAS,0.0,0.17202243612400409,0.0,0.0,0.827977563875996,Indian Telugu in the UK,South Asian Ancestry


#pca,kidd,/home/r.desantis/.ezancestry/data/models,/home/r.desantis/.ezancestry/data/aisnps
,component1,component2,component3,predicted_population_population,ACB,ASW,BEB,CDX,CEU,CHB,CHS,CLM,ESN,FIN,GBR,GIH,GWD,IBS,ITU,JPT,KHV,LWK,MSL,MXL,PEL,PJL,PUR,STU,TSI,YRI,predicted_population_superpopulation,AFR,AMR,EAS,EUR,SAS,population_description,superpopulation_name
out.vcf,0.11874386857468588,0.15300045809781831,0.3265148978535419,ITU,0.0,0.0,0.08919748915377203,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.09703463769218275,0.0,0.0,0.29927578644262454,0.0,0.0,0.0,0.0,0.0,0.0,0.08710151819096609,0.22274821473011025,0.20464235379034443,0.0,0.0,SAS,0.0,0.17202243612400409,0.0,0.0,0.827977563875996,Indian Telugu in the UK,South Asian Ancestry

RosaDeSa · 2023-05-24T10:23:40Z

Hi @arvkevi also with other 2 samples I've same problem.

Following head of vcf with SNPs that I give in input. Is that correct for Ezancestry?

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  a2
chr1    13813   .       T       G       67.64   MQ_filter       AC=1;AF=0.500;AN=2;BaseQRankSum=-1.645;DP=5;ExcessHet=0.0000;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=24.33;MQRankSum=-1.282;QD=13.53;ReadPosRankSum=1.036;SOR=1.609     GT:AD:DP:FT:GQ:PL       0/1:3,2:5:DP_filter:75:75,0,120
chr1    13838   rs200683566     C       T       64.64   MQ_filter       AC=1;AF=0.500;AN=2;BaseQRankSum=0.000;DB;DP=6;ExcessHet=0.0000;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=25.17;MQRankSum=-1.501;QD=10.77;ReadPosRankSum=0.431;SOR=1.179   GT:AD:DP:FT:GQ:PL       0/1:4,2:6:DP_filter:72:72,0,142
chr1    13868   .       A       G       32.65   MQ_filter       AC=1;AF=0.500;AN=2;BaseQRankSum=-0.967;DP=3;ExcessHet=0.0000;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=26.87;MQRankSum=0.967;QD=10.88;ReadPosRankSum=0.967;SOR=0.223      GT:AD:DP:FT:GQ:PL       0/1:1,2:3:DP_filter:18:40,0,18
chr1    16288   rs200736374     C       G       42.64   QD_filter       AC=1;AF=0.500;AN=2;BaseQRankSum=1.889;DB;DP=36;ExcessHet=0.0000;FS=1.817;MLEAC=1;MLEAF=0.500;MQ=42.58;MQRankSum=-2.014;QD=1.22;ReadPosRankSum=1.022;SOR=0.939   GT:AD:DP:GQ:PL  0/1:30,5:35:50:50,0,968
chr1    16298   rs200451305     C       T       311.64  PASS    AC=1;AF=0.500;AN=2;BaseQRankSum=1.497;DB;DP=30;ExcessHet=0.0000;FS=3.682;MLEAC=1;MLEAF=0.500;MQ=42.47;MQRankSum=-4.337;QD=12.47;ReadPosRankSum=2.029;SOR=1.388  GT:AD:DP:GQ:PL  0/1:13,12:25:99:319,0,385
chr1    16378   rs148220436     T       C       293.64  MQ_filter       AC=1;AF=0.500;AN=2;BaseQRankSum=-2.461;DB;DP=38;ExcessHet=0.0000;FS=5.153;MLEAC=1;MLEAF=0.500;MQ=36.39;MQRankSum=-3.036;QD=8.16;ReadPosRankSum=-0.747;SOR=1.190 GT:AD:DP:GQ:PL  0/1:22,14:36:99:301,0,599

arvkevi · 2023-05-24T23:56:07Z

Hey @RosaDeSa, one other thing that could be contributing to this is having too many missing AISNPs in the vcf. When you call predict, it should log a message indicating how many AISNPs were present in your vcf for a sample. It looks like this (from cell 23 of this notebook).

2021-09-20 06:25:34.289 | INFO     | ezancestry.process:_input_to_dataframe:276 - Sample has a valid genotype for 44 
out of a possible 55 (80.0%)

Do you know how many AISNPs were in your input samples?

RosaDeSa · 2023-05-25T09:16:28Z

Yes, you're right! I've 0 of out of possible 55 using the Kidd set and 1 of 127 using the Seldin set.
Do you think the problem is the reference I used to align the data (hg38)? Prediction searches the aisnps for rs id and not for position, right?

arvkevi · 2023-05-25T17:33:38Z

Hmm, the merge is on both rsid AND position. Unfortunately, this requires vcf annotated with rsids and for the position to match the hg19 positions from the .aisnps files.

You could try commenting out "chr" and "position_hg19" in this line, but I haven't looked at the hg19->hg38 liftover in about a year. So if you do this, you should see if any alleles changed.

I'll have to think about how ezancestry could support hg38. The easiest would probably be a --hg38 flag that uses new versions of the aisnps files. But I won't have time to get to this work for a little while.

RosaDeSa closed this as completed May 9, 2023

RosaDeSa reopened this May 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError: vcf is not a valid file or directory. Please provide a valid file or directory. #71

ValueError: vcf is not a valid file or directory. Please provide a valid file or directory. #71

RosaDeSa commented May 9, 2023

arvkevi commented May 10, 2023

RosaDeSa commented May 10, 2023 •

edited

Loading

arvkevi commented May 11, 2023

RosaDeSa commented May 11, 2023

RosaDeSa commented May 24, 2023 •

edited

Loading

arvkevi commented May 24, 2023

RosaDeSa commented May 25, 2023

arvkevi commented May 25, 2023 •

edited

Loading

ValueError: vcf is not a valid file or directory. Please provide a valid file or directory. #71

ValueError: vcf is not a valid file or directory. Please provide a valid file or directory. #71

Comments

RosaDeSa commented May 9, 2023

arvkevi commented May 10, 2023

RosaDeSa commented May 10, 2023 • edited Loading

arvkevi commented May 11, 2023

RosaDeSa commented May 11, 2023

RosaDeSa commented May 24, 2023 • edited Loading

arvkevi commented May 24, 2023

RosaDeSa commented May 25, 2023

arvkevi commented May 25, 2023 • edited Loading

RosaDeSa commented May 10, 2023 •

edited

Loading

RosaDeSa commented May 24, 2023 •

edited

Loading

arvkevi commented May 25, 2023 •

edited

Loading