The OCRUG Hackathon, hosted by the Orange County R Users Group and the UCI Paul Merage School of Business is a two day event where we will "hack" a data set for fun, education, and prizes. The focus of the event is on education and teamwork, with the main goal of taking a data set from its "raw" form all the way through to a final "product" (e.g. visualization, insight, presentation). At the end of the event, teams will present their work in short presentations. Fellow attendees will help judge the work with prizes awarded in several categories (more below).
The event will start with a series of practical educational tutorials to get you started with fundamental data analysis using the R programming language, followed by working sessions where teams will explore and analyze a data set in preparation for the team presentations. Participants will work in small teams, ~ 5 people each.
This event is open to data scientists, enthusiasts and hackers of all levels, from the beginner to the highly experienced. If you are a beginner, it may be helpful to do some preparatory learning before the event — see the suggested resources below. If you are an experienced user, we look forward to you sharing your expertise with others. Assisting others, even across teams, is highly encouraged.
- The hackathon is primarily an educational event, not a competition. We hope that the event will allow participants to practice their skills, as well as teach and learn from others.
- Novice Users: provide an opportunity to work with a real-world data sets from start (acquire the data) to finish (produce a final presentation on the findings from their work).
- Experienced Users: provide an opportunity to practice data analysis skills in a structured environment, interact with others, and assist new users.
When: Saturday, April 10 – Sunday April 11, 2021
All times are Pacific Daylight Time (PDT)
- Saturday: 10:00 AM - 10:00 PM
- Sunday: 8:00 AM - 4:30 PM
Where
This is a 100% Online/Virtual event, using Zoom and Slack. Event links will be provided to registered participants prior to the event.
Registration
- Cost: $15
- Register through EventBright
- Saturday morning tutorials are optional
- All participants are required to attend the 1:00 PM Introduction session
Time | Event | Location |
---|---|---|
10:00 AM - 11:00 AM | Tutorial - Basic Data Analysis in R with the tidyverse | Zoom Main Room |
11:00 AM - 12:00 AM | Tutorial - Basic Data Visualization in R with ggplot2 | Zoom Main Room |
12:00 AM - 1:00 PM | Break - Lunch | On Your Own |
1:00 PM - 2:00 PM | Introduction, Data Set Review, Teams Assemble | Zoom Main Room |
2:00 PM - 7:00 PM | Individual Teams Working Session | Zoom Breakout Rooms, Slack |
7:00 PM - 8:00 PM | Evening Data Challenge | Alternate Zoom Room |
8:00 PM - 10:00 PM | Individual Teams Working Session | Zoom Breakout Rooms, Slack |
Time | Event | Location |
---|---|---|
8:00 AM - 1:00 PM | Individual Teams Working Session | Zoom Breakout Rooms, Slack |
1:00 PM | Presentations Due | |
1:00 PM - 2:00 PM | Break - Lunch | On Your Own |
2:00 PM - 3:00 PM | Team Presentations | Zoom Main Room. |
3:00 PM - 3:30 PM | Award Voting, Break | On Your Own |
3:30 PM - 4:00 PM | Award Presentation & Wrap-up | Zoom Main Room |
- All participants must register for the event and have a valid ticket to attend.
- All participants must abide by the OCRUG Code of Conduct, including the R Consortium and the R Community Code of Conduct.
- Please immediately report any Code of Conduct violations to the event organizers.
- All participant must check-in (log into the Zoom meeting) by 01:00 PM on Saturday and attend the Introduction session to be eligible to participate and for award consideration.
- Though this is an R focused event, participants are free to use any programming language or tool for their work.
- Participants are free to work on their projects as they'd like on their own schedules, though we highly encourage participants to attend all working sessions to maximize team and group interactions.
- We ask that the final submissions from the teams are a result of work performed during the event. Please do not use any previous work you or others may have produced as part of team submissions.
OCRUG GitHub Repo: https://github.com/ocrug/
Please install git and clone the following repo before the event and pull before the start of the event
command:
git clone [email protected]:ocrug/hackathon-2021-04.git
Hackathon Repo: https://github.com/ocrug/hackathon-2021-04
A Slack channel has been set up for the hackathon. This will be used for general announcements but it is also a great source for you to ask questions to other participants.
If you have not created an account on our slack group, create one using the following link:
Slack Group Sign-up: https://tinyurl.com/socalrug-slack-signup
Once you have an account, sign in (you can do it on a web browser or download an app on your phone or desktop).
Slack channel: https://socalrug.slack.com
The channel for the hackathon is hackathon-2021-04
Each team will have a slack channel so that they can communicate and share files. You will also be able on Zoom. The team rooms are:
- 01_team-hackathon
- 02_team-hackathon
- 03_team-hackathon
- 04_team-hackathon
- 05_team-hackathon
- 06_team-hackathon
- 07_team-hackathon
Please follow us on twitter, oc_rug, and also tweet about the event with the hash tag #OCRUG
- All participants will work on teams, with approximated 5 people each.
- Participants will be randomly assigned into teams during the Introduction session.
- Teams and individuals are free to work when and how they'd like. However, we recommend using the provided Zoom break-out rooms throughout the event so teams can work together more easily.
- Teams will select a team name to use throughout the hackathon.
- Assisting others within and between teams is highly encouraged.
See the presentation guidelines for the presentation requirements. Awards and prizes will be determined by fellow hackathon participants.
Below is a list of the awards and prizes:
- Best Insight (Amazon Gift Card)
- Best Visualization (Amazon Gift Card)
- Best Presentation (Amazon Gift Card)
Information on the data set used for the hackathon can be found here:
Overview slides on the event can be found here (PDF)
Two short tutorials will be given on Saturday morning covering essential data analysis topics using R. The tutorials as aimed at R users who have already have some familiarity with the language, but could also be a useful reference for R beginners or users of other programming languages.
- Tutorial 1: Introduction to Tidy Data & Data Manipulation with the tidyverse Slides
- Tutorial 2: Introduction to Data Visualization with ggplot2 Slides
On Saturday evening, we will have an hour long data challenge event, to take a break from the main hackathon work. This will be an opportunity to interact with other participants outside of your team, practice your data hacking skills on a new data set, and win prizes.
Event Info Deck: PDF
Data Set
Palmer Penguins data set prepared by Allison Horst.
Questions and Solution Set
Each team will prepare a short presentation and slide deck describing their work during the hackathon. The presentation and slides should focus on the main findings/insights the group found durng their work, and will be used for determining the hackathon awards.
Rules
- All presentations must be 5 slides total
- Slide 1: Title slide
- Slide 2: Concise summary slide - what you did, what was your main finding/output?
- Slide 3 & 4: Supporting slides with more details on your main finding/output
- Slide 5: Wrap-up, conclusions, acknowldegements (be sure to acknowledge the data source)
- Slides can be prepared using any programs, but must be submitted in PDF format.
- Presentations will be submitted through Dropbox -- more information will be provided during the event.
- Presentations are due by 1PM on Sunday.
- Each team will choose 1 person to present during the Presentation session on Sunday afternoon.
- Presentations must be 5 minutes or less. Presentations will be stopped after 5 minutes to keep on schedule.
-
- A booklet of cheat sheets that you can print out and bind. It is a handy reference guide
-
- 1-page note sheets covering data science fundamentals and useful R packages.
-
- Comprehensive book on the complete data science workflow, including data importing/cleaning, visualization, and data analysis
- Focus on
tidyverse
packages - Accessible for beginners who have a basic grasp of R
-
- This is the hub website for the core
tidyverse
packages - Check out the Packages section and associated links for helpful information on using the packages.
- This is the hub website for the core
-
- This book digs into the details of R.
- A great resource for more advanced users wanting to learning more about R under the hood.
- There is also a 1st Edition of the book.
-
- Useful when you need to look up more info on specific geoms, stats, scales, etc.
- Check out the examples in the details pages for each function.
-
- Gallary of various types of chart and the code needed to create them.
-
- A practical guide that provides more than 150 recipes
-
Mistakes, we’ve drawn a few: Learning from our errors in data visualisation
- From the Economist about mistakes they've made with published data visualizations, and how they'd fix the problems.
- Note: even professionals make mistakes too!
-
- Good overview of caret with code examples
- In particular, check out the table of available models
-
DALEX R Package -- Descriptive mAchine Learning EXplanations
- Provides a set of tools that help you to understand how complex models are working
- Helps you visualize what's going on
- Check out the cheatsheet