The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
-
Updated
Jul 16, 2024 - Python
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
An orchestration platform for the development, production, and observation of data assets.
A portable Datamart and Business Intelligence suite built with Docker, Mage, dbt, DuckDB and Superset
Let's pipe some data!
My personal project for data engineering zoomcamp
OpenSnowcat Relational Database Loader (Apache 2.0 License)
Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
The Open Source Feature Store for Machine Learning
SQL stream processing, analytics, and management. We decouple storage and compute to offer efficient joins, instant failover, dynamic scaling, speedy bootstrapping, and concurrent query serving.
Distributed DataFrame for Python designed for the cloud, powered by Rust
Apache Superset is a Data Visualization and Data Exploration Platform
Open Source Feature Flagging and A/B Testing Platform
Dagster Labs' open-source data platform, built with Dagster.
Codebase for CCAO data infrastructure construction and management
Snowflake Snowpark Python API
This starter project for AWS Managed Workflows for Apache Airflow (MWAA) is designed to streamline the setup and deployment process. It also offers functionality to test MWAA workflows locally, ensuring a smooth transition before deploying to a production environment.
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage and metadata. Runs and scales everywhere python does.
Business intelligence as code: build fast, interactive data visualizations in pure SQL and markdown
Installer for DataKitchen's Open Source Data Observability Products. Data breaks. Servers break. Your toolchain breaks. Ensure your team is the first to know and the first to solve with visibility across and down your data estate. Save time with simple, fast data quality test generation and execution. Trust your data, tools, and systems end to end.
Add a description, image, and links to the data-engineering topic page so that developers can more easily learn about it.
To associate your repository with the data-engineering topic, visit your repo's landing page and select "manage topics."