Skip to content

Latest commit

 

History

History
51 lines (41 loc) · 1.96 KB

README.md

File metadata and controls

51 lines (41 loc) · 1.96 KB

Code comment filter

Script filtering comments present in a git repository source code according to a predefined set of patterns.

Dependencies

This script relies on the following packages:

  • GitPython==2.1.5
  • comment-parser==1.0.3

To check and install the dependencies simply run the command pip install -r requirements.txt

Usage

From the root directory execute: python parse.py

Input

The script takes as input the file patterns.txt, in which the patterns to be matched are specified.

Output

The output of the script is stored in the file output_parsing.tsv, which contains the source code comments matching the predefined patterns. The three columns of the output file are:

  • File name: Location of the souce code file in which the matched comment appears
  • Keyword: Pattern keyword(s) contained in the matched comment
  • Comment: Content of the matched source code comment

Notes

  • The git repository to be analyzed is currently hardcoded in the script. Change the variable git_repository_url to utilize a different repository.

  • The language of the repository has to be specified in the MIME type variable MIME. For the mapping of languages to MIME types refer to the documentation of the comment_parser package.

  • Extension type(s) of the files to be considered during the parsing have to be specified in the extension variable extensions

  • Currently supported languages:

    • C
    • C++
    • Go
    • Java
    • Javascript
    • Bash/Sh

Credits and license

Author:

Sample patterns were taken from the dataset of the research "An Exploratory Study on Self-Admitted Technical Debt" by Potdar et. al available here.

This project is licensed under the MIT License - see the file license.txt