Skip to content

Latest commit

 

History

History
211 lines (161 loc) · 10.4 KB

CONTRIBUTING.md

File metadata and controls

211 lines (161 loc) · 10.4 KB

Contributing to cuML

If you are interested in contributing to cuML, your contributions will fall into three categories:

  1. You want to report a bug, feature request, or documentation issue
    • File an issue describing what you encountered or what you want to see changed.
    • Please run and paste the output of the cuml/print_env.sh script while reporting a bug to gather and report relevant environment details.
    • The RAPIDS team will evaluate the issues and triage them, scheduling them for a release. If you believe the issue needs priority attention comment on the issue to notify the team.
  2. You want to propose a new Feature and implement it
    • Post about your intended feature, and we shall discuss the design and implementation.
    • Once we agree that the plan looks good, go ahead and implement it, using the code contributions guide below.
  3. You want to implement a feature or bug-fix for an outstanding issue
    • Follow the code contributions guide below.
    • If you need more context on a particular issue, please ask and we shall provide.

Code contributions

Your first issue

  1. Read the project's README.md to learn how to setup the development environment.
  2. Find an issue to work on. The best way is to look for the good first issue or help wanted labels
  3. Comment on the issue saying you are going to work on it.
  4. Get familiar with the developer guide relevant for you:
  5. Code! Make sure to update unit tests!
  6. When done, create your pull request.
  7. Verify that CI passes all status checks, or fix if needed.
  8. Wait for other developers to review your code and update code as needed.
  9. Once reviewed and approved, a RAPIDS developer will merge your pull request.

Remember, if you are unsure about anything, don't hesitate to comment on issues and ask for clarifications!

Code Formatting

Consistent code formatting is important in the cuML project to ensure readability, maintainability, and thus simplifies collaboration.

Using pre-commit hooks

cuML uses pre-commit to execute code linters and formatters that check the code for common issues, such as syntax errors, code style violations, and help to detect bugs. Using pre-commit ensures that linter versions and options are aligned for all developers. The same hooks are executed as part of the CI checks. This means running pre-commit checks locally avoids unnecessary CI iterations.

To use pre-commit, install the tool via conda or pip into your development environment:

conda install -c conda-forge pre-commit

Alternatively:

pip install pre-commit

After installing pre-commit, it is recommended to install pre-commit hooks to run automatically before creating a git commit. In this way, it is less likely that style checks will fail as part of CI checks. To install pre-commit hooks, simply run the following command within the repository root directory:

pre-commit install

By default, pre-commit runs on staged files only, meaning only on changes that are about to be committed. To run pre-commit checks on all files, execute:

pre-commit run --all-files

To skip the checks temporarily, use git commit --no-verify or its short form -n.

Note: If the auto-formatters' changes affect each other, you may need to go through multiple iterations of git commit and git add -u.

cuML also uses codespell to find spelling mistakes, and this check is run as part of the pre-commit hook. To apply the suggested spelling fixes, you can run codespell -i 3 -w . from the command-line in the cuML root directory. This will bring up an interactive prompt to select which spelling fixes to apply.

If you want to ignore errors highlighted by codespell you can:

  • Add the word to the ignore-words-list in pyproject.toml, to exclude for all of cuML
  • Exclude the entire file from spellchecking, by adding to the exclude regex in .pre-commit-config.yaml
  • Ignore only specific lines as shown in codespell-project/codespell#1212 (comment)

Summary of pre-commit hooks

The pre-commit hooks configured for this repository consist of a number of linters and auto-formatters that we summarize here. For a full and current list, please see the .pre-commit-config.yaml file.

  • clang-format: Formats C++ and CUDA code for consistency and readability.
  • black: Auto-formats Python code to conform to the PEP 8 style guide.
  • flake8: Lints Python code for syntax errors and common code style issues.
  • cython-lint: Lints Cython code for syntax errors and common code style issues.
  • DeprecationWarning checker: Checks for new DeprecationWarning being introduced in Python code, and instead FutureWarning should be used.
  • #include syntax checker: Ensures consistent syntax for C++ #include statements.
  • Copyright header checker and auto-formatter: Ensures the copyright headers of files are up-to-date and in the correct format.
  • codespell: Checks for spelling mistakes

Clang-tidy

In order to maintain high-quality code, cuML uses not only pre-commit hooks featuring various formatters and linters but also the clang-tidy tool. Clang-tidy is designed to detect potential issues within the C and C++ code. It is typically run as part of our continuous integration (CI) process.

While it's generally unnecessary for contributors to run clang-tidy locally, there might be cases where you would want to do so. There are two primary methods to run clang-tidy on your local machine: using Docker or Conda.

  • Docker

    1. Navigate to the repository root directory.

    2. Run the following Docker command:

      docker run --rm --pull always \
          --mount type=bind,source="$(pwd)",target=/opt/repo --workdir /opt/repo \
          -e SCCACHE_S3_NO_CREDENTIALS=1 \
          rapidsai/ci-conda:latest /opt/repo/ci/run_clang_tidy.sh
  • Conda

    1. Navigate to the repository root directory.
    2. Create and activate the needed conda environment:
      conda env create --yes -n cuml-clang-tidy -f conda/environments/clang_tidy_cuda-118_arch-x86_64.yaml
      conda activate cuml-clang-tidy
    3. Generate the compile command database with
      ./build.sh --configure-only libcuml
    4. Run clang-tidy with the following command:
      python cpp/scripts/run-clang-tidy.py --config pyproject.toml

Managing PR labels

Each PR must be labeled according to whether it is a "breaking" or "non-breaking" change (using Github labels). This is used to highlight changes that users should know about when upgrading.

For cuML, a "breaking" change is one that modifies the public, non-experimental, Python API in a non-backward-compatible way. The C++ API does not have an expectation of backward compatibility at this time, so changes to it are not typically considered breaking. Backward-compatible API changes to the Python API (such as adding a new keyword argument to a function) do not need to be labeled.

Additional labels must be applied to indicate whether the change is a feature, improvement, bugfix, or documentation change. See the shared RAPIDS documentation for these labels: https://github.com/rapidsai/kb/issues/42.

Seasoned developers

Once you have gotten your feet wet and are more comfortable with the code, you can look at the prioritized issues of our next release in our project boards.

Pro Tip: Always look at the release board with the highest number for issues to work on. This is where RAPIDS developers also focus their efforts.

Look at the unassigned issues, and find an issue you are comfortable with contributing to. Start with Step 3 from above, commenting on the issue to let others know you are working on it. If you have any questions related to the implementation of the issue, ask them in the issue instead of the PR.

Branches and Versions

The cuML repository has two main branches:

  1. main branch: it contains the last released version. Only hotfixes are targeted and merged into it.
  2. branch-x.y: it is the development branch which contains the upcoming release. All the new features should be based on this branch and Merge/Pull request should target this branch (with the exception of hotfixes).

Additional details

For every new version x.y of cuML there is a corresponding branch called branch-x.y, from where new feature development starts and PRs will be targeted and merged before its release. The exceptions to this are the 'hotfixes' that target the main branch, which target critical issues raised by Github users and are directly merged to main branch, and create a new subversion of the project. While trying to patch an issue which requires a 'hotfix', please state the intent in the PR.

For all development, your changes should be pushed into a branch (created using the naming instructions below) in your own fork of cuML and then create a pull request when the code is ready.

A few days before releasing version x.y the code of the current development branch (branch-x.y) will be frozen and a new branch, 'branch-x+1.y' will be created to continue development.

Branch naming

Branches used to create PRs should have a name of the form <type>-<name> which conforms to the following conventions:

  • Type:
    • fea - For if the branch is for a new feature(s)
    • enh - For if the branch is an enhancement of an existing feature(s)
    • bug - For if the branch is for fixing a bug(s) or regression(s)
  • Name:
    • A name to convey what is being worked on
    • Please use dashes or underscores between words as opposed to spaces.

Attribution

Portions adopted from https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md