pyspark

Star

Here are 3,480 public repositories matching this topic...

ibis-project / ibis

Star

the portable Python dataframe library

Updated Jul 16, 2024
Python

Aless19 / pyspark-dev

Star

ezpz pyspark dev environment with docker

docker spark docker-compose pyspark jupyter-lab

Updated Jul 16, 2024
Shell

Python library designed to enhance the developer experience when working with AWS Glue ETL and Python Shell jobs. It reduces boilerplate code, increases type safety, and improves IDE auto-completion, making Glue development easier and more efficient.

python aws spark etl pyspark data-engineering elt aws-glue

Updated Jul 16, 2024
Python

opentargets / gentropy

Star

Open Targets python framework for post-GWAS analysis

python open-source gwas genetics pyspark drug-discovery

Updated Jul 16, 2024
Jupyter Notebook

MrPowers / quinn

Star

pyspark methods to enhance developer productivity 📣 👯 🎉

apache-spark pyspark

Updated Jul 16, 2024
Python

apache / linkis

Star

Apache Linkis builds a computation middleware layer to facilitate connection, governance and orchestration between the upper applications and the underlying data engines.

Updated Jul 16, 2024
Java

Intellipaat-software-solution-official / Azure-Data-Engineering-Capstone-Project

Star

This Capstone Project includes an End to End Data Engineering Pipeline right from Ingesting the data from HTTPs server to cleaning and transforming the data in Azure Databricks and finally reporting the data on Power BI Desktop

mysql azure pyspark azure-data-factory service-principal azure-key-vault azure-databricks power-bi-dashboard azure-entra-id

Updated Jul 16, 2024

J-sephB-lt-n / useful-code-snippets

Star

A searchable collection of useful little pieces of code

python shell bash docker dockerfile cloud spark ec2 graph virtual-machine gcp pyspark compute-engine dataproc plotly-dash streamlit rustworkx

Updated Jul 16, 2024
Python

microsoft / SynapseML

Star

Simple and Distributed Machine Learning

Updated Jul 16, 2024
Scala

iobruno / data-engineering-zoomcamp

Star

Data Engineering examples covering Airflow and Mage for workflows; dbt for BigQuery, Redshift, ClickHouse; Spark and Kafka for Batch/Streaming Processing

kafka spark pyspark kafka-streams spark-sql workflow-orchestration ksqldb dbt-bigquery dbt-postgres dbt-clickhouse dbt-redshift

Updated Jul 16, 2024
Python

canimus / cuallee

Star

Possibly the fastest DataFrame-agnostic quality check library in town.

unit-testing bigdata pandas python3 performance-metrics pyspark data-quality-checks data-quality dataquality snowpark pydeequ

Updated Jul 15, 2024
Python

KevinShindel / MachineLearning

Star

Pandas, Sci-kit, SparkML

scikit-learn pandas pyspark

Updated Jul 15, 2024
Jupyter Notebook

databrickslabs / dbldatagen

Star

Generate relevant synthetic data quickly for your projects. The Databricks Labs synthetic data generator (aka `dbldatagen`) may be used to generate large simulated / synthetic data sets for test, POCs, and other uses in Databricks environments including in Delta Live Tables pipelines

python spark faker pyspark spark-streaming data-generation databricks synthetic-data datagen datagenerator deltalake datageneration delta-live-tables

Updated Jul 15, 2024
Python

RichaSavant / Stock-Market-Analysis-A-Two-Stage-Comparative-Stacking-Approach-using-Pyspark-Jan_2024

Star

This project aims to enhance the accuracy and efficiency of stock market predictions by employing a sophisticated machine learning methodology. This project leverages the power of PySpark, a robust framework for distributed data processing, to handle large datasets and perform complex computations.

machine-learning random-forest linear-regression pyspark stock-market decision-trees rmse gradient-boosting stock-market-prediction mae stock-market-analysis stacking-ensemble r-squared

Updated Jul 15, 2024

mitchelllisle / sparkdantic

Star

✨ A Pydantic to PySpark schema library

schema pyspark pydantic

Updated Jul 15, 2024
Python

sb-ai-lab / RePlay

Star

A Comprehensive Framework for Building End-to-End Recommendation Systems with State-of-the-Art Models

machine-learning deep-learning algorithms evaluation distributed-computing transformers pytorch collaborative-filtering matrix-factorization pyspark recsys recommender-system recommendation-algorithms

Updated Jul 15, 2024
Python

TjarkGerken / big-data-project

Star

Repository for the Module Big Data at the DHBW Mannheim

kafka nextjs pyspark

Updated Jul 14, 2024
TypeScript

JohnSnowLabs / spark-nlp

Star

State of the Art Natural Language Processing

Updated Jul 16, 2024
Scala

Arnab1311 / CustomerChurnPrediction_PySpark

Star

This project demonstrates how to use PySpark for predicting customer churn. The dataset contains various parameters related to customers of a telecom company. The main objective is to build a machine learning model using PySpark to predict whether a customer will churn.

pyspark machinelearning colab-notebook

Updated Jul 14, 2024
Jupyter Notebook

riju18 / play-around-with-Databricks-and-PySpark

Star

Play around with Databricks and PySpark

python sql big-data apache-spark pyspark data-engineering databricks

Updated Jul 14, 2024

Improve this page

Add a description, image, and links to the pyspark topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the pyspark topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pyspark

Here are 3,480 public repositories matching this topic...

ibis-project / ibis

Aless19 / pyspark-dev

dashmug / glue-utils

opentargets / gentropy

MrPowers / quinn

apache / linkis

Intellipaat-software-solution-official / Azure-Data-Engineering-Capstone-Project

J-sephB-lt-n / useful-code-snippets

microsoft / SynapseML

iobruno / data-engineering-zoomcamp

canimus / cuallee

KevinShindel / MachineLearning

databrickslabs / dbldatagen

RichaSavant / Stock-Market-Analysis-A-Two-Stage-Comparative-Stacking-Approach-using-Pyspark-Jan_2024

mitchelllisle / sparkdantic

sb-ai-lab / RePlay

TjarkGerken / big-data-project

JohnSnowLabs / spark-nlp

Arnab1311 / CustomerChurnPrediction_PySpark

riju18 / play-around-with-Databricks-and-PySpark

Improve this page

Add this topic to your repo