Probatus
aims to provide a set of tools that can speed up common workflows around validating regressors & classifiers and the data used to train them.
We're very much open to contributions but there are some things to keep in mind:
- Discuss the feature and implementation you want to add on Github before you write a PR for it. On disagreements, maintainer(s) will have the final word.
- Features need a somewhat general use case. If the use case is very niche it will be hard for us to consider maintaining it.
- If you’re going to add a feature, consider if you could help out in the maintenance of it.
- When issues or pull requests are not going to be resolved or merged, they should be closed as soon as possible. This is kinder than deciding this after a long period. Our issue tracker should reflect work to be done.
That said, there are many ways to contribute to Probatus, including:
- Contribution to code
- Improving the documentation
- Reviewing merge requests
- Investigating bugs
- Reporting issues
Starting out with open source? See the guide How to Contribute to Open Source and have a look at our issues labelled good first issue.
Development install:
pip install -e '.[all]'
Unit testing:
pytest
We use pre-commit hooks to ensure code styling. Install with:
pre-commit install
Now if you install it (which you are encouraged to do), you are encouraged to do the following command before committing your work:
pre-commit run --all-files
This will allow you to quickly see if the work you made contains some adaptions that you still might need to make before a pull request is accepted.
- Python 3.8+
- Follow PEP8 as closely as possible (except line length)
- google docstring format
- Git: Include a short description of what and why was done, how can be seen in the code. Use present tense, imperative mood
- Git: limit the length of the first line to 72 chars. You can use multiple messages to specify a second (longer) line:
git commit -m "Patch load function" -m "This is a much longer explanation of what was done"
- Model validation modules assume that trained models passed for validation are developed in a scikit-learn framework (i.e. have predict_proba and other standard functions), or follow a scikit-learn API e.g. XGBoost.
- Every python file used for model validation needs to be in
/probatus/
- Class structure for a given module should have a base class and specific functionality classes that inherit from base. If a given module implements only a single way of computing the output, the base class is not required.
- Functions should not be as short as possible in terms of lines of code. If a lot of code is needed, try to put together snippets of code into other functions. This make the code more readable, and easier to test.
- Classes follow the probatus API structure:
- Each class implements
fit()
,compute()
andfit_compute()
methods.fit()
is used to fit an object with provided data (unless no fit is required), andcompute()
calculates the output e.g. DataFrame with a report for the user. Lastly,fit_compute()
applies one after the other. - If applicable, the
plot()
method presents the user with the appropriate graphs. - For
compute()
andplot()
, check if the object is fitted first.
- Each class implements
Documentation is a very crucial part of the project because it ensures usability of the package. We develop the docs in the following way:
- We use mkdocs with mkdocs-material theme. The
docs/
folder contains all the relevant documentation. - We use
mkdocs serve
to view the documentation locally. Use it to test the documentation everytime you make any changes. - Maintainers can deploy the docs using
mkdocs gh-deploy
. The documentation is deployed tohttps://ing-bank.github.io/probatus/
.