This workshop is a joint collaboration between the Southern California R Users Group (SoCal RUG) and the UCI Paul Merage School of Business, Masters of Science in Business Analytics (MSBA)
In this workshop, a basic introduction to Python will be presented covering fundamentals of Python programming and practical data science skills using the pandas
Python library.
- 2023-08-03 5 PM to 8 PM, Pacific Time
- 2023-08-04 5 PM to 8 PM, Pacific Time
Online - Zoom link here
- Before the workshop starts, you will need Python and VScode (including some extensions) installed on your computer
- A Tutorial to install Python, VScode and Extensions can be found here
- 3 hours each day, 2 days total
- Divide into 40 min sessions
- 15 min instruction
- 15 min practice in breakout rooms
- Teaching assistants will be assigned to breakout rooms
- 10 min review & questions
- 4 sessions per day (160 min for sessions + two 10 min breaks)
- Eight sessions total for both days
Bryan Mariscal, Engineering Specialist in the Structural Dynamics Department at The Aerospace Corporation.
Bryan Mariscal is an Engineering Specialist in the Structural Dynamics Department at The Aerospace Corporation, where he performs computational simulations, data analysis, and software development in support of National Security Space programs. He previously spent some time at Tecolote Research as a Data Scientist supporting Space Systems Command cost research efforts, and earned BS and MS degrees in Structural Engineering from UC San Diego.
The focus will be on Python as a language, drawing from the Python docs.
- Session 1: Using Python, VSCode and Jupyter Notebooks
- What is a notebook, why are they useful?
- JupyterLab interface
- Working with cells (creating, executing, cell types, etc)
- Tips and best practices
- Session 2: Review of Python Fundamentals
- Importance of spacing
- Expressions and variables
- Math operations
- Data types (numbers, strings, boolean)
- Lists
- Session 3: Control Flows
- Conditional statements
- Loops
- Session 4: Functions
- What are they and why are they important
- Function syntax
- How to write your own functions
- Tips and best practices
The focus will be on Pandas as the entry into data science specific tasks, drawing from the getting started tutorials.
- Session 5: Introduction to
pandas
- Why tabular data tables are useful for data science (compare to Excel)
- Series and DataFrames
- How to create Series and DataFrames
- How to read Series and DataFrames from files
- Session 6: Subsetting
DataFrames
- Selecting columns
- Filtering rows
- The various ways of indexing data frames (by labels, slices, conditional expressions),
loc
andiloc
- Session 7: Reshaping and Merging
DataFrames
- Wide vs. long formats and converting between the two: pivot and melt
- Grouped summaries,
groupby
- Concatenating tables by column and row:
concat
- Joining data tables:
merge
- Session 8: Data Visualization with
pandas
- Basic plotting from pandas:
plot
,scatter
,box
,hist
, etc - Examples of more complex plots, coloring and grouping by variables
- Tuning plot parameters (sizes, colors, layouts)
- Saving plots (e.g. to use in presentations, etc)
- Basic plotting from pandas: