In this repository you can find an implementation of LSH (Local | Sensitive Hashing) and Finesse algorithms, designed to find similar data based on their hashes
-
Updated
Mar 22, 2024 - C++
In this repository you can find an implementation of LSH (Local | Sensitive Hashing) and Finesse algorithms, designed to find similar data based on their hashes
Lab solutions for Analysis of Massive Datasets ("Analiza velikih skupova podataka") course at FER 2020/21
The extended version of simhash supports fingerprint extraction of documents and images.
🐾 Create a behavioral fingerprint based on your zsh command line history
Analysis of Massive Datasets FER labs
Knowledge extraction through Data Analysis, including Locality Sensitive Hashing (LSH).
event coding using spark and stanford-core-nlp
Implementacija algoritama predstavljenih na predmetu Analiza velikih skupova podataka (AVSP)
Proof-of-concept for measuring similarity of phoneme sequences using locality sensitive hashing (LSH).
A deduplication lib built Over [SIMHASH](https://github.com/yanyiwu/simhash).
documents my master's level thesis work on building continous, topical web crawler based on mercator 1999
Implemented simhash technique to estimate duplicated pages in a given dataset. University project for Information Retrieval (Spring 2015)
A barebones implementation of the simhash data sketching algorithm.
Add a description, image, and links to the simhash topic page so that developers can more easily learn about it.
To associate your repository with the simhash topic, visit your repo's landing page and select "manage topics."