Server for the ListenBrainz project, including the front-end (javascript/react) code that it serves and all of the data processing components that LB uses.
-
Updated
Jul 16, 2024 - Python
Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
Server for the ListenBrainz project, including the front-end (javascript/react) code that it serves and all of the data processing components that LB uses.
YTsaurus is a scalable and fault-tolerant open-source big data platform.
A tool to help you to test and develop pyspark code with sampled and local data
Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
A Toy Weather Prediction for predicting weather condition based on location and time
Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends
Index Common Crawl archives in tabular format
🧙 Build, run, and manage data pipelines for integrating and transforming data.
ezpz pyspark dev environment with docker
Python library designed to enhance the developer experience when working with AWS Glue ETL and Python Shell jobs. It reduces boilerplate code, increases type safety, and improves IDE auto-completion, making Glue development easier and more efficient.
A better compressed bitset in Java: used by Apache Spark, Netflix Atlas, Apache Pinot, Tablesaw, and many others
Qbeast-spark: DataSource enabling multi-dimensional indexing and efficient data sampling. Big Data, free from the unnecessary!
Formal specification and generation of verifiable binary parsers, message generators and protocol state machines
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
Created by Matei Zaharia
Released May 26, 2014