This is the data engine for the Fortunato Wheels project. It manages the data and machine learning used to power the Fortunato wheels website.
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system.
To get started first setup the environments:
Create environment with Conda from environment.yml file:
conda env create -f environment.yml
Or with pip requirements.txt:
pip install -r requirements.txt
There are two primary data sources, one is the Fortunato Wheels database hosted in a MongoDB instance on Azure. The other is the open source dataset of carguru.com ads originally used from Kaggle here.
- Fortunato Wheels Database
- Get the connection string from the project owner.
- Put the connection string into the .env file in the root of the project.
- To get all car ads from the database use:
from src.data.upload_to_db import connect_to_db client, db, collection = connect_to_db() all_ads_raw = pd.DataFrame(collection.find())
- Carguru.com Ads: Download Processed Data
- To download an already processed version of the dataset download the parquet file from:
Link to Processed Cargurus Data Download (~3GB) - Move the downloaded file to saved to
data/processed/
- To download an already processed version of the dataset download the parquet file from:
To start working with the data, make a copy of the notebooks/00-getting-started.ipynb
notebook and start working from there. It imports all the necessary libraries and sets up the data into a single dataframe in a CarAds
object and shows how to get data loaded into a dataframe.
Here is what the sample data looks like for Subaru Outback from 2008-2012:
Pytest is used to manage the tests. To run the tests in the tests
folder use the following command:
pytest tests
TODO: Add additional notes about how to deploy this on a live system
Please read CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests to us.
We use SemVer for versioning. For the versions available, see the tags on this repository.
- Ty Andrews - Initial work - LinkedIn
This project is licensed under the MIT License - see the LICENSE.md file for details