Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

So many Error messages: please help #8

Open
RenaeAtkinson opened this issue Apr 14, 2023 · 13 comments
Open

So many Error messages: please help #8

RenaeAtkinson opened this issue Apr 14, 2023 · 13 comments

Comments

@RenaeAtkinson
Copy link

Directory ./ already exists. Writing into existing directory..
mkdir: cannot create directory ‘.//SCASA_testscasaHNVC02_20230414001259/’: File exists

Preparing for alignment..
Indexing reference..
Directory .//SCASA_testscasaHNVC02_20230414001259/0PRESETS//REF_INDEX/ already exists. Writing into existing directory..
Version Info: ### PLEASE UPGRADE SALMON ###

A newer version of salmon with important bug fixes and improvements is available.

The newest version, available at https://github.com/COMBINE-lab/salmon/releases
contains new features, improvements, and bug fixes; please upgrade at your
earliest convenience.

Sign up for the salmon mailing list to hear about new versions, features and updates at:
https://oceangenomics.com/subscribe
[2023-04-14 00:12:59.520] [jLog] [warning] The salmon index is being built without any decoy sequences. It is recommended that decoy sequence (either computed auxiliary decoy sequence or the genome of the organism) be provided during indexing. Further details can be found at https://salmon.readthedocs.io/en/latest/salmon.html#preparing-transcriptome-indices-mapping-based-mode.
[2023-04-14 00:12:59.520] [jLog] [info] building index
out : .//SCASA_testscasaHNVC02_20230414001259/0PRESETS//REF_INDEX/
[2023-04-14 00:12:59.527] [puff::index::jointLog] [info] Running fixFasta

[Step 1 of 4] : counting k-mers

[2023-04-14 00:13:07.009] [puff::index::jointLog] [warning] Removed 236 transcripts that were sequence duplicates of indexed transcripts.
[2023-04-14 00:13:07.010] [puff::index::jointLog] [warning] If you wish to retain duplicate transcripts, please use the --keepDuplicates flag
[2023-04-14 00:13:07.012] [puff::index::jointLog] [info] Replaced 4 non-ATCG nucleotides
[2023-04-14 00:13:07.012] [puff::index::jointLog] [info] Clipped poly-A tails from 11,186 transcripts
wrote 76267 cleaned references
[2023-04-14 00:13:07.789] [puff::index::jointLog] [info] Filter size not provided; estimating from number of distinct k-mers
[2023-04-14 00:13:10.356] [puff::index::jointLog] [info] ntHll estimated 85097693 distinct k-mers, setting filter size to 2^31
Threads = 2
Vertex length = 31
Hash functions = 5
Filter size = 2147483648
Capacity = 2
Files:
.//SCASA_testscasaHNVC02_20230414001259/0PRESETS//REF_INDEX/ref_k31_fixed.fa

Round 0, 0:2147483648
Pass Filling Filtering
1 36 77
2 5 0
True junctions count = 277411
False junctions count = 422333
Hash table size = 699744
Candidate marks count = 4646414

Reallocating bifurcations time: 0
True marks count: 3337299
Edges construction time: 6

Distinct junctions = 277411

TwoPaCo::buildGraphMain:: allocated with scalable_malloc; freeing.
TwoPaCo::buildGraphMain:: Calling scalable_allocation_command(TBBMALLOC_CLEAN_ALL_BUFFERS, 0);
allowedIn: 12
Max Junction ID: 318881
seen.size():2551057 kmerInfo.size():318882
approximateContigTotalLength: 66002535
counters for complex kmers:
(prec>1 & succ>1)=26025 | (succ>1 & isStart)=63 | (prec>1 & isEnd)=73 | (isStart & isEnd)=10
contig count: 433949 element count: 98078572 complex nodes: 26171

of ones in rank vector: 433948

[2023-04-14 00:15:32.167] [puff::index::jointLog] [info] Starting the Pufferfish indexing by reading the GFA binary file.
[2023-04-14 00:15:32.167] [puff::index::jointLog] [info] Setting the index/BinaryGfa directory .//SCASA_testscasaHNVC02_20230414001259/0PRESETS//REF_INDEX
size = 98078572

| Loading contigs | Time = 47.228 ms

size = 98078572

| Loading contig boundaries | Time = 25.94 ms

Number of ones: 433948
Number of ones per inventory item: 512
Inventory entries filled: 848
433948
[2023-04-14 00:15:32.408] [puff::index::jointLog] [info] Done wrapping the rank vector with a rank9sel structure.
[2023-04-14 00:15:32.412] [puff::index::jointLog] [info] contig count for validation: 433,948
[2023-04-14 00:15:32.736] [puff::index::jointLog] [info] Total # of Contigs : 433,948
[2023-04-14 00:15:32.736] [puff::index::jointLog] [info] Total # of numerical Contigs : 433,948
[2023-04-14 00:15:32.756] [puff::index::jointLog] [info] Total # of contig vec entries: 3,427,302
[2023-04-14 00:15:32.756] [puff::index::jointLog] [info] bits per offset entry 22
[2023-04-14 00:15:32.870] [puff::index::jointLog] [info] Done constructing the contig vector. 433949
[2023-04-14 00:15:33.302] [puff::index::jointLog] [info] # segments = 433,948
[2023-04-14 00:15:33.303] [puff::index::jointLog] [info] total length = 98,078,572
[2023-04-14 00:15:33.331] [puff::index::jointLog] [info] Reading the reference files ...
[2023-04-14 00:15:34.093] [puff::index::jointLog] [info] positional integer width = 27
[2023-04-14 00:15:34.093] [puff::index::jointLog] [info] seqSize = 98,078,572
[2023-04-14 00:15:34.093] [puff::index::jointLog] [info] rankSize = 98,078,572
[2023-04-14 00:15:34.093] [puff::index::jointLog] [info] edgeVecSize = 0
[2023-04-14 00:15:34.093] [puff::index::jointLog] [info] num keys = 85,060,132
for info, total work write each : 2.331 total work inram from level 3 : 4.322 total work raw : 25.000
[Building BooPHF] 100 % elapsed: 0 min 8 sec remaining: 0 min 0 sec
Bitarray 445693632 bits (100.00 %) (array + ranks )
final hash 0 bits (0.00 %) (nb in final hash 0)
[2023-04-14 00:15:41.958] [puff::index::jointLog] [info] mphf size = 53.1308 MB
[2023-04-14 00:15:42.025] [puff::index::jointLog] [info] chunk size = 49,039,286
[2023-04-14 00:15:42.025] [puff::index::jointLog] [info] chunk 0 = [0, 49,039,286)
[2023-04-14 00:15:42.025] [puff::index::jointLog] [info] chunk 1 = [49,039,286, 98,078,542)
[2023-04-14 00:15:53.934] [puff::index::jointLog] [info] finished populating pos vector
[2023-04-14 00:15:53.934] [puff::index::jointLog] [info] writing index components
[2023-04-14 00:15:54.455] [puff::index::jointLog] [info] finished writing dense pufferfish index
[2023-04-14 00:15:54.494] [jLog] [info] done building index
Finnished indexing reference..
Begins pseudo-alignment..
nohup: redirecting stderr to stdout
Congratulations! Pseudo-alignment has completed in 30 seconds!
Scasa quantification has started..
Begin Scasa quantification for sample SRR10340946..
Error in file(con, "r") : cannot open the connection
Calls: readLines -> file
In addition: Warning message:
In file(con, "r") :
cannot open file './/SCASA_testscasaHNVC02_20230414001259/1ALIGN//SRR10340946_alignout/alevin/bfh.txt': No such file or directory
Execution halted
Loading required package: iterators
Loading required package: parallel
Error in readChar(con, 5L, useBytes = TRUE) : cannot open the connection
Calls: load -> readChar
In addition: Warning message:
In readChar(con, 5L, useBytes = TRUE) :
cannot open compressed file '/network/rit/lab/conklinlab/Renae/HNVC/HNVC02/SRR10340946/SCASA_testscasaHNVC02_20230414001259/2QUANT/SRR10340946_quant/Sample_eqClass.RData', probable reason 'No such file or directory'
Execution halted
Error in readChar(con, 5L, useBytes = TRUE) : cannot open the connection
Calls: load -> readChar
In addition: Warning message:
In readChar(con, 5L, useBytes = TRUE) :
cannot open compressed file './/SCASA_testscasaHNVC02_20230414001259/2QUANT//SRR10340946_quant//scasa_isoform_expression.RData', probable reason 'No such file or directory'
Execution halted
Congratulations! Scasa single cell RNA-Seq transcript quantification has completed in 30 seconds!
All done!

@nghiavtr
Copy link
Collaborator

Hi @RenaeAtkinson,

Thank you for using Scasa in your research.

The error is at the mapping step of alevin, likely it can not find out the input fastq files. Please check if the file names if they are in the right format. It is noted that the names of fastq files should contain "R1" and "R2", please see the details here: https://github.com/eudoraleer/scasa/wiki#6-input-fastq-files

Best,
Nghia

@RenaeAtkinson
Copy link
Author

RenaeAtkinson commented Apr 14, 2023 via email

@nghiavtr
Copy link
Collaborator

hi @RenaeAtkinson,

I don't see a clear issue in your command except "--project" instead of "–project". Most default values parameters are used in your command, so can you try again with the shorter version below:

scasa --in /network/rit/lab/conklinlab/Renae/HNVC/HNVC02/SRR10340946/
--fastq SRR10340946_R1.fastq,SRR10340946_R2.fastq
--out /network/rit/lab/conklinlab/Renae/SCASA/HNVC02/
--ref /network/rit/lab/conklinlab/Renae/SCASA/refMrna.fa
--whitelist /network/rit/lab/conklinlab/Renae/HNVC/V2/737K-august-2016.txt
--tech 10xv2
--nthreads 32

Best,
Nghia

@RenaeAtkinson
Copy link
Author

RenaeAtkinson commented Apr 18, 2023 via email

@nghiavtr
Copy link
Collaborator

Hi,
The error indicates that the alignment by Alevin has not been performed.
I am thinking of the reason that the input filename is not correct, but it is so weird because likely it is not.

Can you try to test the issue by renaming SRR10340946_R1.fastq by Sample_01_S1_L001_R1_001.fastq and SRR10340946_R2.fastq by Sample_01_S1_L001_R2_001.fastq as in the sample files of Scasa

Another possibility is that R1 and R2 files do not contain the correct information (one for sequence content and another for barcode+UMI), in that case we just switch the file name.

Let try and please let me know if any of these ways work, thanks!

Nghia

@RenaeAtkinson
Copy link
Author

RenaeAtkinson commented Apr 18, 2023 via email

@nghiavtr
Copy link
Collaborator

Hi,

It is really strange. Can you put the few first lines of R1 and R2 here?
And if possible, can you send me the files or a subset of reads from the files, I will try to reproduce the error by running Scasa on the files.

Nghia

@RenaeAtkinson
Copy link
Author

RenaeAtkinson commented Apr 19, 2023 via email

@nghiavtr
Copy link
Collaborator

nghiavtr commented Apr 21, 2023

hi @RenaeAtkinson ,

Well, I can not reproduce your error, please see the codes I tried below. So it is sure that the issue is not at the input data format.

I guess you might have missed some steps, for example forgetting to add the paths of scasa or salmon alevin( export PATH and export LD_LIBRARY_PATH)

Nghia


##################################################################
# 1. Download scasa:
##################################################################
wget https://github.com/eudoraleer/scasa/releases/download/scasa.v1.0.0/scasa_v1.0.0.tar.gz
tar -xzvf scasa_v1.0.0.tar.gz
export PATH=$PWD/scasa:$PATH

##################################################################
# 2. Download salmon alevin:
##################################################################
wget https://github.com/COMBINE-lab/salmon/releases/download/v1.4.0/salmon-1.4.0_linux_x86_64.tar.gz
tar -xzvf salmon-1.4.0_linux_x86_64.tar.gz
export PATH=$PWD/salmon-latest_linux_x86_64/bin:$PATH
export LD_LIBRARY_PATH=$PWD/salmon-latest_linux_x86_64/lib:$LD_LIBRARY_PATH

##################################################################
# 3. Download UCSC hg38 cDNA fasta reference:
##################################################################
mkdir Annotation
cd Annotation
wget https://www.dropbox.com/s/xoa6yl562a5lv35/refMrna.fa.gz
refPath=$PWD/refMrna.fa.gz

wget https://github.com/10XGenomics/cellranger/blob/master/lib/python/cellranger/barcodes/737K-august-2016.txt
whitelistFile=$PWD/737K-august-2016.txt

cd ..

##################################################################
# 4. Download the CITE-seq RNA samples:
##################################################################

mkdir CiteSeqData
cd CiteSeqData

### use sratools to download the sample
# module load sratools/3.0.0
prefetch SRR10340946
cd SRR10340946
fastq-dump --gzip --split-3 SRR10340946.sra

#change the name
mv SRR10340946_1.fastq.gz SRR10340946_L001_R1_001.fastq.gz
mv SRR10340946_2.fastq.gz SRR10340946_L001_R2_001.fastq.gz

InputDir=$PWD
cd ..

#number of threads
threadNum=$(nproc)

#run scasa
scasa --in $InputDir --fastq SRR10340946_L001_R1_001.fastq.gz,SRR10340946_L001_R2_001.fastq.gz --ref $refPath  --tech 10xv2 --nthreads $threadNum --whitelist $whitelistFile --out ScasaOut_SRR10340946

@RenaeAtkinson
Copy link
Author

RenaeAtkinson commented May 18, 2023 via email

@nghiavtr
Copy link
Collaborator

Hi @RenaeAtkinson,

If you see the message: 'Error in file(con, "r") : cannot open the connection', it is definitely that the program can not find out the file and so it is not the issue of Scasa.

I have tried to run Scasa with your working sample SRR10340946 on my linux computer, it worked well without error. I have provided you the codes previously (but I forgot to put them in the code format, very sorry). So I put the codes again below. I use the sratools to download SRR10340946 data. You just need to copy-and-paste the command lines and it should work.

Nghia

##################################################################
# 1. Download scasa:
##################################################################
wget https://github.com/eudoraleer/scasa/releases/download/scasa.v1.0.0/scasa_v1.0.0.tar.gz
tar -xzvf scasa_v1.0.0.tar.gz
export PATH=$PWD/scasa:$PATH

##################################################################
# 2. Download salmon alevin:
##################################################################
wget https://github.com/COMBINE-lab/salmon/releases/download/v1.4.0/salmon-1.4.0_linux_x86_64.tar.gz
tar -xzvf salmon-1.4.0_linux_x86_64.tar.gz
export PATH=$PWD/salmon-latest_linux_x86_64/bin:$PATH
export LD_LIBRARY_PATH=$PWD/salmon-latest_linux_x86_64/lib:$LD_LIBRARY_PATH

##################################################################
# 3. Download UCSC hg38 cDNA fasta reference:
##################################################################
mkdir Annotation
cd Annotation
wget https://www.dropbox.com/s/xoa6yl562a5lv35/refMrna.fa.gz
refPath=$PWD/refMrna.fa.gz

wget https://github.com/10XGenomics/cellranger/blob/master/lib/python/cellranger/barcodes/737K-august-2016.txt
whitelistFile=$PWD/737K-august-2016.txt

cd ..

##################################################################
# 4. Download the CITE-seq RNA samples:
##################################################################

mkdir CiteSeqData
cd CiteSeqData

### use sratools to download the sample
# module load sratools/3.0.0
prefetch SRR10340946
cd SRR10340946
fastq-dump --gzip --split-3 SRR10340946.sra

#change the name
mv SRR10340946_1.fastq.gz SRR10340946_L001_R1_001.fastq.gz
mv SRR10340946_2.fastq.gz SRR10340946_L001_R2_001.fastq.gz

InputDir=$PWD
cd ..

#number of threads
threadNum=$(nproc)

#run scasa
scasa --in $InputDir --fastq SRR10340946_L001_R1_001.fastq.gz,SRR10340946_L001_R2_001.fastq.gz --ref $refPath  --tech 10xv2 --nthreads $threadNum --whitelist $whitelistFile --out ScasaOut_SRR10340946

@RenaeAtkinson
Copy link
Author

RenaeAtkinson commented May 18, 2023 via email

@nghiavtr
Copy link
Collaborator

Hi @RenaeAtkinson ,

The first error of mkdir can be ignored, it is harmless
The second error indicates that the salmon alevin was not performed properly because no bfh.txt file exists, so yes this is the main issue.
I have no experience with running Salmon using conda, but usually we dont need conda to run Salmon. I also have not tried salmon version 1.10.1 that I am not sure if it has any changes in setting. I suggest you use the same salmon version as I have tested.

Nghia

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants