Pandas is a fast and powerful open-source data analysis and manipulation framework written in Python.
Apache Spark is an open-source unified analytics engine for large-scale data processing.
Both are widely adopted in the Data Engineering and Data Science communities.
Even though there's a great value in combining them in terms of productivity, scalability and performance, it's often overlooked.
The notebooks in this repository demonstrate how you can leverage recent developments in Apache Spark together with Pandas to enjoy the best of both worlds!
Date | Time | Event | Title | Session Type | Speaker(s) | Slides | Recording | Location | Language |
---|---|---|---|---|---|---|---|---|---|
2022-04-20 | 10:40 AM (EDT) | Open Data Science Conference East 2022 | A Bamboo of Pandas: Crossing Pandas' Single-machine Barrier with Apache Spark | Talk | Itai Yaffe, Daniel Haviv | 🇺🇸 | English | ||
2022-04-27 | 4:30 PM (CEST) | Big Data Technology Warsaw Summit | Super-charge your Pandas code with Apache Spark | Roundtable discussion | Itai Yaffe, Daniel Haviv | 🇵🇱 | English | ||
2022-06-29 | 4:00 PM (PDT) | Data+AI Summit North America 2022 | Pandas API on Spark: What's New in the Upcoming Apache Spark 3.3 | Talk | Hyukjin Kwon | 🇺🇸 | English |
- This table structure was inspired by @lirantal's https://github.com/lirantal/public-speaking/blob/main/README.md