- Motivation
- Installation
- File Descriptions
- Results
- Licensing, Authors, and Acknowledgements
Using Seattle Airbnb dataset, this project aims to answer three business questions of interest using exploratory data analysis and , and machine learning:
Q1 : How are properties distributed among neighboorhood group?
Q2: How many times room types are reviewed? What is the average review score rating? Is there a relationship room type price and the number of review and review score ratings? What is the average highest price for room type in different neighborhoods?
Q3 : Implement linear regression model to apply ML algorithm to forecast price based on variables are selected
This project requires Python 3.x and the following Python libraries installed:
- NumPy
- Pandas
- matplotlib
- seaborn
- scikit-learn
- statsmodels
You will also need to have software installed to run and execute an iPython Notebook Install Anaconda, a pre-packaged Python distribution that contains all of the necessary libraries and software for this project.
A Jupyter Notebook with all the codes following the steps of CRISP-DM for analyzing Airbnb data in Seattle.
The listings data of Seattle, downloaded from here
The main findings of the code can be found at the post available here
Credit to Airbnb for the open data. All data are downloaded from here. You can find the Licensing for the data and other descriptive information at the [Kaggle link] (https://www.kaggle.com/airbnb/seattle) available here