Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve statistics output #1619

Merged

Conversation

nuno-agostinho
Copy link
Contributor

@nuno-agostinho nuno-agostinho commented Feb 21, 2024

This PR fixes two issues with statistics output:

  1. When running VEP using only FASTA and GTF with stats enabled, VEP will print the following message:
Can't use an undefined value as a HASH reference at /hps/software/users/ensembl/repositories/nuno/ensembl-vep/modules/Bio/EnsEMBL/VEP/Stats.pm line 687.

This PR fixes this message by not printing the data version table when there is no such information.

  1. Also, when using --input_data, the statistics will show (for both HTML and TXT files):
Input file	*Bio::EnsEMBL::VEP::Runner::IN

This PR will print --input_data when --input_file is *Bio::EnsEMBL::VEP::Runner::IN.

Testing

For testing, you can use the FASTA and GTF from the t/testdata folder. The following command should not print the message above:

vep --id "21 25587758 rs116645811 G A" \
    --fasta t/testdata/fasta/Homo_sapiens.GRCh38.toplevel.test.fa \
    --gtf t/testdata/custom/test.gtf.gz --stats_html --stats_text

This will output two summary files (in HTML and TXT). The file in TXT should report the following if using --input_file:

[VEP run statistics]
VEP version (API)        112.0 (112)
Annotation sources      Custom: t/testdata/custom/test.gtf.gz (overlap)
Species homo_sapiens
Command line options    --fasta t/testdata/fasta/Homo_sapiens.GRCh38.toplevel.test.fa --force_ove>
Start time      2024-02-21 15:17:28
End time        2024-02-21 15:17:28
Run time        0 seconds
Input file	    i.txt
Output file     variant_effect_output.txt

[Data version]
No data version information available.

[General statistics]
Lines of input read     1
Variants processed      1

And the same TXT file but when using --input_data:

[VEP run statistics]
VEP version (API)	 112.0 (112)
Annotation sources	Custom: t/testdata/custom/test.gtf.gz (overlap)
Species	homo_sapiens
Command line options	--fasta t/testdata/fasta/Homo_sapiens.GRCh38.toplevel.test.fa --force_overwrite --gtf t/testdata/custom/test.gtf.gz --input_data 21 25587758 rs116645811 G A --stats_html --stats_text
Start time	2024-02-21 15:29:01
End time	2024-02-21 15:29:02
Run time	1 seconds
Input data	21 25587758 rs116645811 G A
Output file	variant_effect_output.txt

[Data version]
No data version information available.

[General statistics]
Lines of input read	1
Variants processed	1

The HTML file should look like this if using --input_file:

Screenshot 2024-02-21 at 15 27 52

And the HTML file should look like this if using --input_data:

Screenshot 2024-02-21 at 15 29 13

@nuno-agostinho nuno-agostinho changed the title Fix bug when running Stats without version data Fix Stats message when there is no version data table Feb 21, 2024
@nuno-agostinho nuno-agostinho changed the title Fix Stats message when there is no version data table Improve statistics output Feb 21, 2024
@dglemos dglemos self-requested a review February 21, 2024 15:58
@dglemos dglemos self-assigned this Feb 21, 2024
Copy link
Contributor

@dglemos dglemos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The updates look good:

  • the stats file displays the input data instead of *Bio::EnsEMBL::VEP::Runner::IN
  • the message is clear if no data version is available

All the tests were passing when I ran them locally however, Travis is failing in t/AnnotationSource_File_GTF.t

@dglemos dglemos merged commit b06c889 into Ensembl:postreleasefix/112 Apr 8, 2024
1 check passed
@dglemos
Copy link
Contributor

dglemos commented Apr 8, 2024

Merged into release/112 and main

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants