Big data homework solutions
-
Updated
Jun 21, 2017 - Python
Big data homework solutions
Tool to analyze news from a dataset.
Aurora karton for calculating minhash from input dataset.
An implementation of the minhash algorithm in golang
Rust implementation of alignment-free similarity estimation methods.
University work. Approximate aligner for long DNA sequences. Estimates Jaccard similarity from k-mers via minimizers and MinHash, then uses it as a sequence identity proxy.
Fast Jaccard similarity search for abstract sets (documents, products, users, etc.) using MinHashing and Locality Sensitve Hashing
Software to identify known plasmid sequence from metagenomic assembly using Minhash
Aurora karton for similiarity matching.
This repository contains code and analysis for a homework assignment on recommendation systems and clustering algorithms in Python. Implements techniques like minhash, LSH, feature engineering, dimensionality reduction, K-means and DBSCAN clustering.
Probability Methods for Informatics Engineering | UA 2018/2019
similarity of the texts (Jaccard Similarity, Minhash, LSH)
Finding Similar Items: Textually Similar Documents
An implementation of the MinHashing algorithm in C using POSIX threads.
Network Anomaly Detection Using Probabilistic Data Structures
Add a description, image, and links to the minhash topic page so that developers can more easily learn about it.
To associate your repository with the minhash topic, visit your repo's landing page and select "manage topics."