Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement uniwig #1

Open
nleroy917 opened this issue Nov 7, 2023 · 10 comments
Open

Implement uniwig #1

nleroy917 opened this issue Nov 7, 2023 · 10 comments
Assignees
Labels
new tool Request to implement a new tool

Comments

@nleroy917
Copy link
Member

To enable universe creation, we will need to port uniwig over to this package and offer it up as a cli, a library crate interface, and ideally a python interface.

@nleroy917 nleroy917 added the new tool Request to implement a new tool label Nov 7, 2023
@nleroy917
Copy link
Member Author

@edward9065 is going to try to implement this. @nsheff he will need help with the algorithm if at all possible. Is there pseudocode anywhere? Or an algorithm figure?

@donaldcampbelljr
Copy link
Member

Per discussion, much of the key code that needs to be ported is here: https://github.com/databio/uniwig/blob/master/src/uniwig.cpp

I noticed that uniwig relies on a C library, libBigWig, but there appears to be a Rust-based tool that is available (and in preprint!) that may help with this port:

https://github.com/jackh726/bigtools
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10871241/

donaldcampbelljr added a commit that referenced this issue Mar 15, 2024
donaldcampbelljr added a commit that referenced this issue Mar 16, 2024
donaldcampbelljr added a commit that referenced this issue Mar 16, 2024
donaldcampbelljr added a commit that referenced this issue Mar 18, 2024
donaldcampbelljr added a commit that referenced this issue Mar 18, 2024
@donaldcampbelljr
Copy link
Member

Opened a PR to begin reviewing WIP.

Where this is currently 'stuck':

  • I attempted to use bigtools as a replacement for the C library libBigWig. However, when attempting to write to a bigWig file after using the built in BedParser, I get a type mismatch (Value vs BedEntry). It appears that I must parse a bedgraph file to get the appropriate type before proceeding with writing to a bigwig file.

Example code:

        let mut chrom_map = HashMap::new();
        chrom_map.insert("chr17".to_string(), 83257441);

        let vals_iter = BedParser::from_bed_file(file);
        let vals = BedParserStreamingIterator::new(vals_iter, true);

        let mut out = BigWigWrite::create_file(file_names[0].clone());

        out.write(chrom_map, vals, runtime).unwrap();

Original code bins regions using smoothFixedStartEndBW before calling libBigWig func bwAddIntervalSpanSteps to write to bigwig file. I had hoped to replicate that here. However, I may need to create a new struct that implements the proper traits/Values such that the BigWigWrite functions can be used properly.

@nleroy917
Copy link
Member Author

However, when attempting to write to a bigWig file after using the built in BedParser, I get a type mismatch (Value vs BedEntry)

I need to really look into it, but this kind of sounds like an error in their library? Or should we implement the Write trait for the BedEntry structs? I'm probably not understanding fully, though.

@donaldcampbelljr
Copy link
Member

@donaldcampbelljr
Copy link
Member

Per discussion, we should rethink writing to a bigwig file as we do not need to use these files in the genome browser. Instead, this implementation should focus on taking either a combined bed file or a directory of bed files and create something similar to a wiggle file,i.e. do not worry about capturing the libBigWig functionality or attempting to implement items from bigtools. We should investigate using our own gtok file format or potentially a zarr format.

For inspiration of basic algorithm in Rust: https://github.com/databio/rustwig/blob/master/src/exact.rs

@donaldcampbelljr
Copy link
Member

I've ported the core functionality from the above rustwig repository. genimtools::uniwig can now count starts and/ends if given a single/sorted bed file.

We should determine what output file we want. I believe this to be higher priority before proceeding with covering the other gaps (sorted vs unsorted, reading a list of beds instead of a single, etc).

We've discussed implementing zarr, though I haven't yet looked at the various Rust implementations to check their maturity.
https://zarr.dev/implementations/

This project had a release as recently as March 2024: https://github.com/LDeakin/zarrs
However, their github page warns that the repository is not production ready.

A simple, short term option, could be to make the output BED-like, similar to a bedgraph file, e.g.

file1.unibed

chromA  chromStartA  countValue
chromA  chromStartB  countValue

Open to other suggestions, especially if there is already an existing file format that makes more sense.

@nleroy917
Copy link
Member Author

This is great! Are you ready for a review of the code?

@donaldcampbelljr
Copy link
Member

Not quite yet. Earlier today, we discussed potentially just converting these to wig files in the short term, so I'll look into writing these arrays to some file type first and then, as a first pass, this would be ready to merge into dev.

@nleroy917
Copy link
Member Author

Gotcha 👍🏼 Just lmk

@nleroy917 nleroy917 mentioned this issue Apr 22, 2024
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new tool Request to implement a new tool
Projects
None yet
Development

No branches or pull requests

3 participants