deduplication
Entity resolution (also known as data matching, data linkage, record linkage, and many other terms) is the task of finding entities in a dataset that refer to the same entity across different data sources (e.g., data files, books, websites, and databases). Entity resolution is necessary when joining different data sets based on entities that may or may not share a common identifier (e.g., database key, URI, National identification number), which may be due to differences in record shape, storage location, or curator style or preference.
Here are 412 public repositories matching this topic...
A UI application for File Deduplication using Hashing
-
Updated
Jan 18, 2018 - Java
Blazing Fast Petabyte Scale Static Web Server + Tools. Serve Billion Files from an Indexed, Compressed and Deduplicated Archive.
-
Updated
Feb 19, 2019 - PHP
Efficiently import pictures while handling duplicates gracefully
-
Updated
Sep 20, 2022 - Haskell
A workflow template for deduplication and record linkage using the Dedupe library
-
Updated
Jun 30, 2020 - Jupyter Notebook
Deduplicate Google Calendar events that were created by Fastmail import
-
Updated
Feb 1, 2022 - Python
ATBU Cloud/Local Backup & File Integrity/Duplication Management Utility
-
Updated
Nov 10, 2023 - Python
📈 The benchmarking and graph generation code for my blog post!
-
Updated
Jun 20, 2024 - JavaScript
A general purpose deduplication framework
-
Updated
Mar 14, 2019 - Java
Super simple list-based password guesser.
-
Updated
May 15, 2018 - C#
Project for helping brother in finding duplicates in his photos directory.
-
Updated
Apr 26, 2021 - Java
a collection of image deduplication repositories
-
Updated
Jan 15, 2019 - Python
A simple tool for cataloging/deduplication/other backup preparation tasks.
-
Updated
Aug 21, 2019 - C
Big Data Analysis
-
Updated
Feb 23, 2020 - Python
A tool to enrich any OCDM compliant Knowledge Graph, finding new identifiers and deduplicating entities.
-
Updated
Apr 12, 2021 - Python
Implementation of text classification, duplicate question recognition and text deduplication in Python.
-
Updated
Jan 23, 2021 - Python
DeDuplicationKit: Advanced File Storage Deduplication
-
Updated
Mar 31, 2023 - C++
Research Project of Image de duplication
-
Updated
Nov 7, 2023 - Python
A pure rust library for parsing and generating zchunk file
-
Updated
Dec 28, 2023 - Rust
Created by Halbert L. Dunn
Released 1946
- Followers
- 37 followers
- Organization
- entity-resolution
- Wikipedia
- Wikipedia