Big data homework solutions
-
Updated
Jun 21, 2017 - Python
Big data homework solutions
Aurora karton for calculating minhash from input dataset.
Rust implementation of alignment-free similarity estimation methods.
University work. Approximate aligner for long DNA sequences. Estimates Jaccard similarity from k-mers via minimizers and MinHash, then uses it as a sequence identity proxy.
MPEI Project - Search repeated news (using Bloom filter) and similar news (using MinHash) from a news API.
This repository contains code and analysis for a homework assignment on recommendation systems and clustering algorithms in Python. Implements techniques like minhash, LSH, feature engineering, dimensionality reduction, K-means and DBSCAN clustering.
Finding Similar Items: Textually Similar Documents
SpellChecker: an application to check for spell errors.
Probability Methods for Informatics Engineering | UA 2018/2019
similarity of the texts (Jaccard Similarity, Minhash, LSH)
Assignment-2 for CS F469 Information Retrieval Course
(WIP) HTTP server that deploy distributes Vokter (https://github.com/vokter/vokter) through a REST API.
Python package for fast MinHash calculation and operations
An implementation of the MinHashing algorithm in C using POSIX threads.
Implementation of the paper "Finding Highly Correlated Pairs with Powerful Pruning" in Java.
Trabalho Prático da UC de Métodos Probabilísticos para Engenharia Informática, UA 2019/2020
Add a description, image, and links to the minhash topic page so that developers can more easily learn about it.
To associate your repository with the minhash topic, visit your repo's landing page and select "manage topics."