data-profiling

Star

Here are 75 public repositories matching this topic...

great-expectations / great_expectations

Star

Always know what to expect from your data.

Updated Jul 16, 2024
Python

open-metadata / OpenMetadata

Star

OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.

Updated Jul 16, 2024
TypeScript

ydataai / ydata-profiling

Star

1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.

Updated Jul 16, 2024
Python

dqops / dqo

Star

Data Quality and Observability platform for the whole data lifecycle, from profiling new data sources to full automation with Data Observability. Configure data quality checks from the UI or in YAML files, let DQOps run the data quality checks daily to detect data quality issues.

monitoring data-quality-checks data-quality data-profiling data-ops data-quality-measurement data-quality-monitoring data-quality-report data-observability

Updated Jul 15, 2024
Java

DataKitchen / data-observability-installer

Star

Installer for DataKitchen's Open Source Data Observability Products. Data breaks. Servers break. Your toolchain breaks. Ensure your team is the first to know and the first to solve with visibility across and down your data estate. Save time with simple, fast data quality test generation and execution. Trust your data, tools, and systems end to end.

Updated Jul 15, 2024
Python

Desbordante / desbordante-core

Star

Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.

data-science data-mining exploratory-data-analysis tabular-data feature-selection data-engineering feature-extraction data-analytics knowledge-discovery data-wrangling data-preprocessing feature-engineering spreadsheets data-exploration data-mining-algorithms data-cleaning data-profiling anomaly-detection data-cleansing correlations

Updated Jul 16, 2024
C++

opendatadiscovery / odd-platform

Star

First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.

Updated Jul 15, 2024
Java

ray310 / Panda-Helper

Star

Panda-Helper is a simple, open-source, Python data-profiling utility for Pandas' DataFrames and Series.

python pandas data-analysis data-cleaning python-package data-profiling data-profiler data-profiling-utility

Updated Jul 16, 2024
Python

cleanlab / cleanlab

Star

The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.

Updated Jul 12, 2024
Python

sodadata / soda-core

Star

⚡ Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io

Updated Jul 16, 2024
Python

open-metadata / openmetadata-site

Star

Open Standard for Metadata. A Single place to Discover, Collaborate and Get your data right.

Updated Jul 12, 2024
CSS

cleanlab / cleanlab-studio

Star

Client interface for all things Cleanlab Studio

Updated Jul 15, 2024
Python

tsegall / fta

Star

Metadata/data identification Java library. Identifies Semantic Type information (e.g. Gender, Age, Color, Country,...). Extensive country/language support. Extensible via user-defined plugins. Comprehensive Profiling support.

java metadata date data-discovery data-profiling semantic-types semantic-typechecking semantic-type-detection data-profiler

Updated Jul 15, 2024
Java

apicrafter / metacrafter

Star

Metadata and data identification tool and Python library. Identifies PII, common identifiers, language specific identifiers. Fully customizable and flexible rules

metadata data-profiling pii entity-recognition pii-detection datadiscovery