Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exclude linguist-vendored and linguist-generated in .gitattributes when using --no-autogen #722

Closed
BrianL-STCU opened this issue Apr 18, 2023 · 13 comments

Comments

@BrianL-STCU
Copy link

GitHub supports specifying vendor-created code and generated code in a repository using Lingust extensions in the .gitattributes file, which it uses to optimally fold files in diff views. It would be nice if these were excluded with the --no-autogen option, since this is already being maintained in repos.

Example

src/Project/OpenAPIs/* linguist-generated=true
src/Project/Models/*.cs linguist-generated=true
**/packages/** linguist-vendored
**/lib/** linguist-vendored
"**/Service References/**" linguist-generated=true
"**/Web References/**" linguist-generated=true
@AlDanial
Copy link
Owner

Can you recommend a repo to clone that has such entries?

@BrianL-STCU
Copy link
Author

I've got a couple at brianary/webcoder or brianary/scripts, but there are a lot of others.

@BrianL-STCU
Copy link
Author

It looks like linguist-generated=true can be just linguist-generated now.

AlDanial added a commit that referenced this issue Apr 22, 2023
added Text::Glob by Richard Clamp to handle glob -> regex
@AlDanial
Copy link
Owner

Take a shot at the update I just pushed, only test on Linux so far.

@brianary
Copy link

I pulled down the repo and ran `./cloc`...
     789 text files.
     700 unique files.
      96 files ignored.

github.com/AlDanial/cloc v 1.97  T=0.19 s (3692.9 files/s, 367610.3 lines/s)
----------------------------------------------------------------------------------------
Language                              files          blank        comment           code
----------------------------------------------------------------------------------------
Perl                                      8           2327           5109          26465
YAML                                    362             12            364           8770
Markdown                                  3            305             40           2860
TableGen                                  1            241            128           1124
ANTLR Grammar                             2            200             59           1012
R                                         3             95            312            698
C/C++ Header                              1            191            780            617
C++                                      11            132            183            603
Forth                                     2             17             84            529
TypeScript                                4             53             39            416
Logtalk                                   1             59             57            368
C                                         8            111             72            359
Windows Message File                      2             89              9            348
TeX                                       2             36             64            265
CMake                                     1             36             40            261
Racket                                    1             32            159            247
make                                      4             85            159            247
SVG                                       1             19              4            242
Glade                                     1              0             22            232
DIET                                      1             10              4            230
Windows Resource File                     1             42             45            218
Assembly                                  4             40            142            205
Linker Script                             1              3             60            197
CSV                                       1              0              0            158
ReScript                                  1             31             43            157
Juniper Junos                             1              0              8            129
Zig                                       1              2             10            128
Idris                                     2             38             90            117
ECPP                                      1             26             34            116
Prolog                                    2             43              8            114
Text                                     17             14              0            113
Ruby                                      1             11             30            111
Hoon                                      1              0             10            110
Imba                                      1             71             30            108
Dockerfile                                3             18             13            106
P4                                        1             28             33            102
Thrift                                    1             57            134             97
Bourne Shell                              5             14             10             96
Bourne Again Shell                        1             11             19             92
Xtend                                     1             17             52             91
BizTalk Orchestration                     1              1              3             90
Lean                                      1             36             20             90
Odin                                      1             32             56             90
kvlang                                    1             13              2             86
Smalltalk                                 2             19              5             85
Vuejs Component                           1             10              2             85
Java                                      5             13             28             81
Circom                                    1             34             26             80
Scheme                                    1             10             18             78
Constraint Grammar                        1             12             11             77
WGSL                                      1              5              8             76
Cairo                                     1             17              9             75
MXML                                      1             23              5             74
MATLAB                                    3              3             11             68
Oracle PL/SQL                             1              0             15             67
Haml                                      1              5             16             66
Pony                                      1             23             43             66
Visual Basic                              2             44             55             66
Swift                                     1             23             13             65
Fish Shell                                1             14             47             62
NetLogo                                   1             17             14             62
RAML                                      1              5              3             62
Verilog-SystemVerilog                     1              4             20             62
SCSS                                      2             16              8             59
Clean                                     1             10             30             58
Qt Linguist                               1              0              4             57
SaltStack                                 1              6              1             55
Containerfile                             1              5              2             53
tspeg                                     2             26             31             53
Pest                                      1             16              9             51
Meson                                     1             13              9             48
JSON                                      3              0              0             46
Fennel                                    1              6              3             44
JCL                                       1              0             18             44
HCL                                       1             14             36             43
Nim                                       1              5             13             43
Nix                                       1             15             15             43
OpenSCAD                                  1             18              3             42
Go                                        3             14             41             40
HolyC                                     1              4             14             40
Metal                                     1             13             10             40
ASP.NET                                   2             16             21             39
Raku                                      1             19             12             39
SQL                                       3             24             36             39
Agda                                      1             10              3             38
Ring                                      1             11             11             38
Web Services Description                  1              4              0             36
COBOL                                     3              5              8             35
Haskell                                   4             23             26             35
RobotFramework                            1              9              5             35
X++                                       1              8             16             35
AsciiDoc                                  1             17             27             34
EJS                                       1              0             11             34
Godot Scene                               1              4              8             34
Puppet                                    4              2              8             34
IPL                                       1              6             15             33
PO File                                   1              9             18             33
GLSL                                      1             10             14             32
WebAssembly                               1              8             20             32
Mustache                                  2              5              7             31
Specman e                                 2              4             12             31
Squirrel                                  1              6              4             31
Python                                    7             16             54             30
Apex Class                                1              3              6             28
C# Designer                               1              8             22             28
Cake Build Script                         1              6              6             28
Cucumber                                  1              3              2             28
Drools                                    1              7             16             28
Freemarker Template                       1              0              2             27
Bazel                                     1              7              1             26
PHP                                       2             11             13             26
Umka                                      1              7              5             26
LFE                                       1             15             21             25
Objective-C                               1             11             11             25
Scala                                     1              8              8             25
Visual Studio Solution                    1              0              1             25
Brainfuck                                 1              1              3             24
Fortran 90                                6              1             18             24
Haxe                                      1             26             99             24
Lisp                                      1              5             26             24
C#                                        4              9              7             23
peg.js                                    1             18              9             23
Blade                                     1             10              5             22
GraphQL                                   2              3              6             22
JSON5                                     1              0              4             22
Mathematica                               2             24             17             22
PEG                                       1             24              9             22
Stata                                     1              7              7             22
TOML                                      1              8              4             22
Gleam                                     1              6             41             21
Jupyter Notebook                          1              0            126             21
Smarty                                    1              1              1             21
Godot Resource                            1              2              8             20
BrightScript                              1              0              3             19
Igor Pro                                  1              4              6             19
PL/M                                      1              1              5             19
Solidity                                  1              0              2             19
TTCN                                      1             11             16             19
XSLT                                      2              0              4             19
peggy                                     1             25              7             19
Jai                                       1              4              7             18
Pascal                                    4              4             15             18
Windows Module Definition                 1              1              1             18
Gradle                                    1              0              2             17
Mojo                                      1              6              4             17
Razor                                     2              6              7             17
TEAL                                      1             16             37             17
Futhark                                   1              7             35             16
Logos                                     2              6              3             16
Carbon                                    1             11              6             15
DenizenScript                             1              0              6             15
Gencat NLS                                1              1              4             15
JavaScript                                5              3              0             15
Lem                                       1             11             24             15
Pig Latin                                 1             19             40             15
SWIG                                      1              4              4             15
TNSDL                                     1              5              3             15
Embedded Crystal                          1              4              4             14
F#                                        1              3              6             14
Finite State Language                     1              7              3             14
IDL                                       2             25              7             14
Derw                                      1              2              5             13
SugarSS                                   1              5              4             13
Velocity Template Language                1              0             20             13
Starlark                                  1              3              4             11
Nunjucks                                  1              0              6             10
Slim                                      1              0              3             10
reStructuredText                          1              6              4             10
Godot Shaders                             1              3              3              9
Kotlin                                    1              0              3              9
Mako                                      1              3              8              9
Properties                                1              0             15              9
Svelte                                    1              2              2              9
Vala                                      1              0              5              9
Visual Studio Module                      1              3              5              9
XML                                       3              0              5              9
F# Script                                 1              1              2              8
FXML                                      1              2              3              8
SparForte                                 1              6              8              8
WXML                                      1              3              2              8
C# Generated                              1              2             16              7
Elixir                                    1              3             10              7
Fortran 77                                2              1              8              7
INI                                       1              2              3              7
Lua                                       3              9             33              7
Chapel                                    1              7             35              6
VB for Applications                       1              4              2              6
HTML EEx                                  1              1              4              5
Julia                                     2              4             15              5
PL/I                                      1              0              7              5
PlantUML                                  1              2              5              5
APL                                       1              3              6              4
Arduino Sketch                            1              1              5              4
ReasonML                                  1              2              8              4
Rmd                                       1             10             19              4
WXSS                                      1              0              0              4
Elm                                       2              0              5              3
Flatbuffers                               1              1              2              3
Groovy                                    1              0              3              3
LLVM IR                                   1              2              6              3
Literate Idris                            1              2              2              3
NAnt script                               1              1              0              3
OCaml                                     1              0              5              3
ProGuard                                  1              7             14              3
Tcl/Tk                                    1              1              2              3
dhall                                     1              6             17              3
ColdFusion                                1              1              2              2
DOS Batch                                 1              1              2              2
Focus                                     1              1              2              1
MUMPS                                     1              0              2              1
XQuery                                    1              0              1              1
xBase                                     1              0              9              1
----------------------------------------------------------------------------------------
SUM:                                    700           5804          10594          53284
----------------------------------------------------------------------------------------

I added this .gitattributes file:

./cloc linguist-generated=true
./Unix/cloc linguist-vendored
Then I ran `./cloc` again, which seemed to add one ignored file, but didn't exclude any lines of Perl code...
     790 text files.
     700 unique files.
      97 files ignored.

github.com/AlDanial/cloc v 1.97  T=0.20 s (3578.0 files/s, 356176.6 lines/s)
----------------------------------------------------------------------------------------
Language                              files          blank        comment           code
----------------------------------------------------------------------------------------
Perl                                      8           2327           5109          26465
YAML                                    362             12            364           8770
Markdown                                  3            305             40           2860
TableGen                                  1            241            128           1124
ANTLR Grammar                             2            200             59           1012
R                                         3             95            312            698
C/C++ Header                              1            191            780            617
C++                                      11            132            183            603
Forth                                     2             17             84            529
TypeScript                                4             53             39            416
Logtalk                                   1             59             57            368
C                                         8            111             72            359
Windows Message File                      2             89              9            348
TeX                                       2             36             64            265
CMake                                     1             36             40            261
Racket                                    1             32            159            247
make                                      4             85            159            247
SVG                                       1             19              4            242
Glade                                     1              0             22            232
DIET                                      1             10              4            230
Windows Resource File                     1             42             45            218
Assembly                                  4             40            142            205
Linker Script                             1              3             60            197
CSV                                       1              0              0            158
ReScript                                  1             31             43            157
Juniper Junos                             1              0              8            129
Zig                                       1              2             10            128
Idris                                     2             38             90            117
ECPP                                      1             26             34            116
Prolog                                    2             43              8            114
Text                                     17             14              0            113
Ruby                                      1             11             30            111
Hoon                                      1              0             10            110
Imba                                      1             71             30            108
Dockerfile                                3             18             13            106
P4                                        1             28             33            102
Thrift                                    1             57            134             97
Bourne Shell                              5             14             10             96
Bourne Again Shell                        1             11             19             92
Xtend                                     1             17             52             91
BizTalk Orchestration                     1              1              3             90
Lean                                      1             36             20             90
Odin                                      1             32             56             90
kvlang                                    1             13              2             86
Smalltalk                                 2             19              5             85
Vuejs Component                           1             10              2             85
Java                                      5             13             28             81
Circom                                    1             34             26             80
Scheme                                    1             10             18             78
Constraint Grammar                        1             12             11             77
WGSL                                      1              5              8             76
Cairo                                     1             17              9             75
MXML                                      1             23              5             74
MATLAB                                    3              3             11             68
Oracle PL/SQL                             1              0             15             67
Haml                                      1              5             16             66
Pony                                      1             23             43             66
Visual Basic                              2             44             55             66
Swift                                     1             23             13             65
Fish Shell                                1             14             47             62
NetLogo                                   1             17             14             62
RAML                                      1              5              3             62
Verilog-SystemVerilog                     1              4             20             62
SCSS                                      2             16              8             59
Clean                                     1             10             30             58
Qt Linguist                               1              0              4             57
SaltStack                                 1              6              1             55
Containerfile                             1              5              2             53
tspeg                                     2             26             31             53
Pest                                      1             16              9             51
Meson                                     1             13              9             48
JSON                                      3              0              0             46
Fennel                                    1              6              3             44
JCL                                       1              0             18             44
HCL                                       1             14             36             43
Nim                                       1              5             13             43
Nix                                       1             15             15             43
OpenSCAD                                  1             18              3             42
Go                                        3             14             41             40
HolyC                                     1              4             14             40
Metal                                     1             13             10             40
ASP.NET                                   2             16             21             39
Raku                                      1             19             12             39
SQL                                       3             24             36             39
Agda                                      1             10              3             38
Ring                                      1             11             11             38
Web Services Description                  1              4              0             36
COBOL                                     3              5              8             35
Haskell                                   4             23             26             35
RobotFramework                            1              9              5             35
X++                                       1              8             16             35
AsciiDoc                                  1             17             27             34
EJS                                       1              0             11             34
Godot Scene                               1              4              8             34
Puppet                                    4              2              8             34
IPL                                       1              6             15             33
PO File                                   1              9             18             33
GLSL                                      1             10             14             32
WebAssembly                               1              8             20             32
Mustache                                  2              5              7             31
Specman e                                 2              4             12             31
Squirrel                                  1              6              4             31
Python                                    7             16             54             30
Apex Class                                1              3              6             28
C# Designer                               1              8             22             28
Cake Build Script                         1              6              6             28
Cucumber                                  1              3              2             28
Drools                                    1              7             16             28
Freemarker Template                       1              0              2             27
Bazel                                     1              7              1             26
PHP                                       2             11             13             26
Umka                                      1              7              5             26
LFE                                       1             15             21             25
Objective-C                               1             11             11             25
Scala                                     1              8              8             25
Visual Studio Solution                    1              0              1             25
Brainfuck                                 1              1              3             24
Fortran 90                                6              1             18             24
Haxe                                      1             26             99             24
Lisp                                      1              5             26             24
C#                                        4              9              7             23
peg.js                                    1             18              9             23
Blade                                     1             10              5             22
GraphQL                                   2              3              6             22
JSON5                                     1              0              4             22
Mathematica                               2             24             17             22
PEG                                       1             24              9             22
Stata                                     1              7              7             22
TOML                                      1              8              4             22
Gleam                                     1              6             41             21
Jupyter Notebook                          1              0            126             21
Smarty                                    1              1              1             21
Godot Resource                            1              2              8             20
BrightScript                              1              0              3             19
Igor Pro                                  1              4              6             19
PL/M                                      1              1              5             19
Solidity                                  1              0              2             19
TTCN                                      1             11             16             19
XSLT                                      2              0              4             19
peggy                                     1             25              7             19
Jai                                       1              4              7             18
Pascal                                    4              4             15             18
Windows Module Definition                 1              1              1             18
Gradle                                    1              0              2             17
Mojo                                      1              6              4             17
Razor                                     2              6              7             17
TEAL                                      1             16             37             17
Futhark                                   1              7             35             16
Logos                                     2              6              3             16
Carbon                                    1             11              6             15
DenizenScript                             1              0              6             15
Gencat NLS                                1              1              4             15
JavaScript                                5              3              0             15
Lem                                       1             11             24             15
Pig Latin                                 1             19             40             15
SWIG                                      1              4              4             15
TNSDL                                     1              5              3             15
Embedded Crystal                          1              4              4             14
F#                                        1              3              6             14
Finite State Language                     1              7              3             14
IDL                                       2             25              7             14
Derw                                      1              2              5             13
SugarSS                                   1              5              4             13
Velocity Template Language                1              0             20             13
Starlark                                  1              3              4             11
Nunjucks                                  1              0              6             10
Slim                                      1              0              3             10
reStructuredText                          1              6              4             10
Godot Shaders                             1              3              3              9
Kotlin                                    1              0              3              9
Mako                                      1              3              8              9
Properties                                1              0             15              9
Svelte                                    1              2              2              9
Vala                                      1              0              5              9
Visual Studio Module                      1              3              5              9
XML                                       3              0              5              9
F# Script                                 1              1              2              8
FXML                                      1              2              3              8
SparForte                                 1              6              8              8
WXML                                      1              3              2              8
C# Generated                              1              2             16              7
Elixir                                    1              3             10              7
Fortran 77                                2              1              8              7
INI                                       1              2              3              7
Lua                                       3              9             33              7
Chapel                                    1              7             35              6
VB for Applications                       1              4              2              6
HTML EEx                                  1              1              4              5
Julia                                     2              4             15              5
PL/I                                      1              0              7              5
PlantUML                                  1              2              5              5
APL                                       1              3              6              4
Arduino Sketch                            1              1              5              4
ReasonML                                  1              2              8              4
Rmd                                       1             10             19              4
WXSS                                      1              0              0              4
Elm                                       2              0              5              3
Flatbuffers                               1              1              2              3
Groovy                                    1              0              3              3
LLVM IR                                   1              2              6              3
Literate Idris                            1              2              2              3
NAnt script                               1              1              0              3
OCaml                                     1              0              5              3
ProGuard                                  1              7             14              3
Tcl/Tk                                    1              1              2              3
dhall                                     1              6             17              3
ColdFusion                                1              1              2              2
DOS Batch                                 1              1              2              2
Focus                                     1              1              2              1
MUMPS                                     1              0              2              1
XQuery                                    1              0              1              1
xBase                                     1              0              9              1
----------------------------------------------------------------------------------------
SUM:                                    700           5804          10594          53284
----------------------------------------------------------------------------------------

@AlDanial
Copy link
Owner

The failure was caused by the spurious ./ in the path defined by the pattern. Once I expanded it to a canonical path I got the expected behavior.

However...

I took a closer look at the Text::Glob module I vendored in for this issue and saw it is rudimentary and won't handle ** recursive matches and is also confused by quoted paths. I'll need to come up with my own way to deal with these meaning it will take a while.

@AlDanial
Copy link
Owner

ea192f1 is my next attempt at this, please give it a try

AlDanial added a commit that referenced this issue May 27, 2023
this is a partial implementation, still need --match-f,
--match-d
AlDanial added a commit that referenced this issue May 28, 2023
@brianary
Copy link

Unfortunately, I still don't see a difference when adding that .gitattributes.

@AlDanial
Copy link
Owner

Are you using --git --no-autogen? Without these switches the first few lines of output I see are

github.com/AlDanial/cloc v 1.97  T=2.33 s (335.9 files/s, 169801.5 lines/s)                                                                                     
----------------------------------------------------------------------------------------
Language                              files          blank        comment           code
----------------------------------------------------------------------------------------
Perl                                     36          23797          49078         265666
Text                                     47           2330              0          17361
YAML                                    370             12            372           9075
Markdown                                  4            305             40           2862

However with --git --no-autogen the two Perl files are omitted:

github.com/AlDanial/cloc v 1.97  T=1.74 s (446.6 files/s, 208591.7 lines/s)
----------------------------------------------------------------------------------------
Language                              files          blank        comment           code
----------------------------------------------------------------------------------------
Perl                                     34          21624          44084         241342
Text                                     47           2330              0          17361
YAML                                    370             12            372           9075
Markdown                                  4            305             40           2862

@brianary
Copy link

I did have the wrong options, but retrying on Windows with the right ones, both still start with:

github.com/AlDanial/cloc v 1.97  T=1.68 s (420.2 files/s, 41535.8 lines/s)
----------------------------------------------------------------------------------------
Language                              files          blank        comment           code
----------------------------------------------------------------------------------------
Perl                                      8           2326           5113          26552
YAML                                    365             12            367           8830
Markdown                                  3            305             40           2860
TableGen                                  1            241            128           1124

Something seems way off for us to be getting such different results in either case.

@AlDanial
Copy link
Owner

AlDanial commented Jul 8, 2023

Couple of things: I had a bunch of extra files in my dir that inflated my count. Also the glitch had to do with file path separators, I wasn't treating \ and / consistently in this code branch. The latest push should fix it. Here are my results on Windows, using extra filters to just narrow it down to the 8 or 10 Perl files:

C:>perl cloc --git --no-autogen --by-file --include-lang=Perl .
     834 text files.
     721 unique files.
     839 files ignored.

github.com/AlDanial/cloc v 1.97  T=2.00 s (5.0 files/s, 26986.6 lines/s)
-----------------------------------------------------------------------------------
File                                            blank        comment           code
-----------------------------------------------------------------------------------
.\cloc-1.96.pl                                   1485           3656          12103
.\cloc-1.92.pl                                   1481           3628          11998
.\cloc.ok                                        1484           3635          11950
.\Unix\t\00_C.t                                     3              3           1282
.\Unix\t\01_opts.t                                 86             28            697
.\Unix\t\02_git.t                                  15              1            134
.\tests\inputs\issues\380\wrapper.pl               44             72             71
.\sqlite_formatter                                  5             15             42
.\tests\inputs\issues\420\mixed_case_ext.Pl         5              1             14
.\tests\inputs\diff\B\extra_file.pl                 0              0              2
-----------------------------------------------------------------------------------
SUM:                                             4608          11039          38293
-----------------------------------------------------------------------------------

If I don't include --no-autogen it won't apply the rules in .gitattributes and I'll see two extra Perl files:

C:>perl cloc --git --by-file --include-lang=Perl .
     834 text files.
     721 unique files.
     837 files ignored.

github.com/AlDanial/cloc v 1.97  T=2.02 s (5.9 files/s, 42343.3 lines/s)
-----------------------------------------------------------------------------------
File                                            blank        comment           code
-----------------------------------------------------------------------------------
.\cloc                                           1488           3666          12290
.\cloc-1.96.pl                                   1485           3656          12103
.\Unix\cloc                                       685           1328          12041
.\cloc-1.92.pl                                   1481           3628          11998
.\cloc.ok                                        1484           3635          11950
.\Unix\t\00_C.t                                     3              3           1282
.\Unix\t\01_opts.t                                 86             28            697
.\Unix\t\02_git.t                                  15              1            134
.\tests\inputs\issues\380\wrapper.pl               44             72             71
.\sqlite_formatter                                  5             15             42
.\tests\inputs\issues\420\mixed_case_ext.Pl         5              1             14
.\tests\inputs\diff\B\extra_file.pl                 0              0              2
-----------------------------------------------------------------------------------
SUM:                                             6781          16033          62624
-----------------------------------------------------------------------------------

Your file lists may be smaller but if you do the two runs one should produce two fewer files than the other.

@BrianL-STCU
Copy link
Author

I think that has done it! :)

@AlDanial
Copy link
Owner

Glad to hear it, thanks for your patience with testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants