Pandas-and-Spark

Pandas is a fast and powerful open-source data analysis and manipulation framework written in Python.
Apache Spark is an open-source unified analytics engine for large-scale data processing.
Both are widely adopted in the Data Engineering and Data Science communities.

Even though there's a great value in combining them in terms of productivity, scalability and performance, it's often overlooked.

The notebooks in this repository demonstrate how you can leverage recent developments in Apache Spark together with Pandas to enjoy the best of both worlds!

Feel free to contact us

Itai Yaffe - LinkedIn, Twitter, Medium
Daniel Haviv - LinkedIn, Twitter

Recent and upcomings events

Date	Time	Event	Title	Session Type	Speaker(s)	Location	Language
2022-04-20	10:40 AM (EDT)	Open Data Science Conference East 2022	A Bamboo of Pandas: Crossing Pandas' Single-machine Barrier with Apache Spark	Talk	Itai Yaffe, Daniel Haviv	🇺🇸	English
2022-04-27	4:30 PM (CEST)	Big Data Technology Warsaw Summit	Super-charge your Pandas code with Apache Spark	Roundtable discussion	Itai Yaffe, Daniel Haviv	🇵🇱	English
2022-06-29	4:00 PM (PDT)	Data+AI Summit North America 2022	Pandas API on Spark: What's New in the Upcoming Apache Spark 3.3	Talk	Hyukjin Kwon	🇺🇸	English

This table structure was inspired by @lirantal's https://github.com/lirantal/public-speaking/blob/main/README.md

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
notebooks		notebooks
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pandas-and-Spark

Feel free to contact us

Recent and upcomings events

About

Releases

Packages

Languages

License

itaiy/Pandas-and-Spark

Folders and files

Latest commit

History

Repository files navigation

Pandas-and-Spark

Feel free to contact us

Recent and upcomings events

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages