Skip to content

Management of the data and machine learning used to power the Fortunato Wheels website

License

Notifications You must be signed in to change notification settings

tieandrews/fortunato-wheels-engine

Repository files navigation

Contributors Forks Stargazers Issues MIT License codecov

Fortunato Wheels Engine

This is the data engine for the Fortunato Wheels project. It manages the data and machine learning used to power the Fortunato wheels website.

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system.

Setup

To get started first setup the environments:

Create environment with Conda from environment.yml file:

conda env create -f environment.yml

Or with pip requirements.txt:

pip install -r requirements.txt

Downloading & Preprocessing Data

There are two primary data sources, one is the Fortunato Wheels database hosted in a MongoDB instance on Azure. The other is the open source dataset of carguru.com ads originally used from Kaggle here.

  1. Fortunato Wheels Database
    1. Get the connection string from the project owner.
    2. Put the connection string into the .env file in the root of the project.
    3. To get all car ads from the database use:
      from src.data.upload_to_db import connect_to_db
      client, db, collection = connect_to_db()
      all_ads_raw = pd.DataFrame(collection.find())
      
  2. Carguru.com Ads: Download Processed Data
    1. To download an already processed version of the dataset download the parquet file from:
      Link to Processed Cargurus Data Download (~3GB)
    2. Move the downloaded file to saved to data/processed/

Start Working with the Data

To start working with the data, make a copy of the notebooks/00-getting-started.ipynb notebook and start working from there. It imports all the necessary libraries and sets up the data into a single dataframe in a CarAds object and shows how to get data loaded into a dataframe.

Here is what the sample data looks like for Subaru Outback from 2008-2012:

Running the tests

Pytest is used to manage the tests. To run the tests in the tests folder use the following command:

pytest tests

Deployment

TODO: Add additional notes about how to deploy this on a live system

Contributing

Please read CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests to us.

Versioning

We use SemVer for versioning. For the versions available, see the tags on this repository.

Authors

License

This project is licensed under the MIT License - see the LICENSE.md file for details

About

Management of the data and machine learning used to power the Fortunato Wheels website

Resources

License

Stars

Watchers

Forks