Database replication platform that leverages change data capture. Stream production data from databases to your data warehouse (Snowflake, BigQuery, Redshift) in real-time.
-
Updated
Jul 16, 2024 - Go
Database replication platform that leverages change data capture. Stream production data from databases to your data warehouse (Snowflake, BigQuery, Redshift) in real-time.
An orchestration platform for the development, production, and observation of data assets.
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
dbt package that is part of Elementary, the dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
Build data pipelines with SQL and Python, ingest data from different sources, add quality checks, and build end-to-end flows.
🧙 Build, run, and manage data pipelines for integrating and transforming data.
One framework to develop, deploy and operate data workflows with Python and SQL.
Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code
A high-performance, extremely flexible, and easily extensible universal workflow engine.
Smart Automation Tool for building modern Data Lakes and Data Pipelines
USC DSCI 560 - Data Science Professional Practicum - Spring 2024 - Prof. Young Cho
Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
A curated list of awesome projects and resources related to Kubeflow (a CNCF incubating project)
Building data processing pipelines for documents processing with NLP using Apache NiFi and related services
Main repo including core data model, data marts, reference data, terminology, and the clinical concept library
Multi-hop declarative data pipelines
Relational data pipelines for the science lab
Add a description, image, and links to the data-pipelines topic page so that developers can more easily learn about it.
To associate your repository with the data-pipelines topic, visit your repo's landing page and select "manage topics."