Skip to content

itaiy/Pandas-and-Spark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 

Repository files navigation

Pandas-and-Spark

Pandas is a fast and powerful open-source data analysis and manipulation framework written in Python.
Apache Spark is an open-source unified analytics engine for large-scale data processing.
Both are widely adopted in the Data Engineering and Data Science communities.

Even though there's a great value in combining them in terms of productivity, scalability and performance, it's often overlooked.

The notebooks in this repository demonstrate how you can leverage recent developments in Apache Spark together with Pandas to enjoy the best of both worlds!

Feel free to contact us

Recent and upcomings events

Date Time Event Title Session Type Speaker(s) Slides Recording Location Language
2022-04-20 10:40 AM (EDT) Open Data Science Conference East 2022 A Bamboo of Pandas: Crossing Pandas' Single-machine Barrier with Apache Spark Talk Itai Yaffe, Daniel Haviv 🇺🇸 English
2022-04-27 4:30 PM (CEST) Big Data Technology Warsaw Summit Super-charge your Pandas code with Apache Spark Roundtable discussion Itai Yaffe, Daniel Haviv 🇵🇱 English
2022-06-29 4:00 PM (PDT) Data+AI Summit North America 2022 Pandas API on Spark: What's New in the Upcoming Apache Spark 3.3 Talk Hyukjin Kwon 🇺🇸 English

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages