- In this project, we build a simple ML model (multiple-linear regression) for the Kaggle dataset 50_startups.csv.
- The model can predict the profit a startup can generate, given it's investments on R&D, administration, and marketing along with the state it is located at.
- Our model can make accurate predictions upto an R-squared value of 98%.
- We are in a mission to make investors, and founders smarter; give it a try....
This project is based on the 50 startups public dataset available on Kaggle.
Frontend template was borrowed from here.
Install using pip
$ pip install -r requirements
To access the live demo, click here
To deploy this project run
$ python fiftyStartUps.py
After identifying the dependent and independent variables, we check the correlation between dependent variables and ind. variables. Notice that both R&D, and marketing shows a strong positive correlation with profits. It makes sense right? You need a great product that is perfected over time as well as a killer marketing strategy to get it out there to the users/ customers.
It is also important that we check for multicollinearity. Hmm..., there's some correlation between R&D and marketing. Maybe the more R&D a startup invests in, the more stuff to show off through marketing channels?
After building the model, we tested with some data and plotted the predictions along with the test labels. Notice that we did that for all three numerical features. The big green line is the mathematical equation for our model.
- R&D against profits
- Administration against profits
- marketing against profits