Always know what to expect from your data.
-
Updated
Jul 16, 2024 - Python
Always know what to expect from your data.
OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.
1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
Data Quality and Observability platform for the whole data lifecycle, from profiling new data sources to full automation with Data Observability. Configure data quality checks from the UI or in YAML files, let DQOps run the data quality checks daily to detect data quality issues.
Installer for DataKitchen's Open Source Data Observability Products. Data breaks. Servers break. Your toolchain breaks. Ensure your team is the first to know and the first to solve with visibility across and down your data estate. Save time with simple, fast data quality test generation and execution. Trust your data, tools, and systems end to end.
Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.
First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.
Panda-Helper is a simple, open-source, Python data-profiling utility for Pandas' DataFrames and Series.
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
⚡ Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io
Open Standard for Metadata. A Single place to Discover, Collaborate and Get your data right.
Client interface for all things Cleanlab Studio
Metadata/data identification Java library. Identifies Semantic Type information (e.g. Gender, Age, Color, Country,...). Extensive country/language support. Extensible via user-defined plugins. Comprehensive Profiling support.
Metadata and data identification tool and Python library. Identifies PII, common identifiers, language specific identifiers. Fully customizable and flexible rules
Papers about training data quality management for ML models.
🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Monitor the stability of a Pandas or Spark dataframe ⚙︎
A simple widget for interactive EDA / QA. Works on top of Pandas [in Jupyter Notebook] using IPyWidgets with a sprinkle of Regex.
Lineage metadata API, artifacts streams, sandbox, API, and spaces for Polyaxon
Linked Data Science powered by Knowledge Graphs
Add a description, image, and links to the data-profiling topic page so that developers can more easily learn about it.
To associate your repository with the data-profiling topic, visit your repo's landing page and select "manage topics."