Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add lint backend to run semgrep #18593

Merged
merged 54 commits into from
Aug 28, 2023
Merged
Changes from 1 commit
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
c041769
Basics working
huonw Mar 5, 2023
f4bde7f
Switch to more explicit names
huonw Mar 15, 2023
8955c50
Refine env vars, refine settings
huonw Mar 26, 2023
a2fa07b
require glob matches
huonw Mar 26, 2023
56dccf9
rm unused imports
huonw Mar 26, 2023
955538e
Glob better
huonw Mar 26, 2023
68fc026
issue link
huonw Mar 26, 2023
8be9b21
partitioning by config
huonw Mar 26, 2023
594d1ea
revert spurious changes
huonw Mar 26, 2023
790fe51
stop using semgrep in pants itself
huonw Mar 26, 2023
4f33a5d
Add target types
huonw Apr 1, 2023
89c5b4d
mypy fixes
huonw Apr 1, 2023
25733f3
tailor semgrep_rule_source(s)
huonw Apr 1, 2023
2bb0672
register target types
huonw Apr 2, 2023
323f693
implement dependency inference: NB. full dependency
huonw Apr 2, 2023
d08611f
Run Semgrep using FieldSet/targets, partitioning by config
huonw Apr 2, 2023
798ad6c
avoid generating partitions with no configs
huonw Apr 2, 2023
a381711
ignore files, tweaks
huonw Apr 2, 2023
987b484
allow ignore files to work by not passig all args
huonw Apr 2, 2023
613fc83
remove now-unused glob files
huonw Apr 2, 2023
d0f5d4e
Add --force option
huonw Apr 14, 2023
b86ba09
register tailor rules
huonw Apr 14, 2023
3efeac6
Start sketching tests
huonw Apr 14, 2023
a405f7b
minor clean-up
huonw Apr 14, 2023
1fad45e
basic test passing
huonw Apr 19, 2023
f71db25
a bunch more tests, xfail --force test
huonw Apr 20, 2023
42f9683
Note --semgrep-force test behaviour
huonw Apr 21, 2023
ec9c1fd
Minor fixes
huonw Apr 21, 2023
ec79db0
Test semgrep PEX exclusion
huonw Apr 21, 2023
d7ff26e
default to --quiet
huonw Apr 21, 2023
b390be2
replace explicit dependency inference with implicit
huonw Apr 21, 2023
00d4a97
Merge remote-tracking branch 'upstream/main' into feature/semgrep
huonw Apr 22, 2023
a208b90
New lockfile, update version
huonw Apr 22, 2023
e2ca370
black, types, timeout
huonw Apr 22, 2023
a198538
doc tweaks, years
huonw Apr 22, 2023
af1101f
Merge remote-tracking branch 'upstream/main' into feature/semgrep
huonw Apr 22, 2023
71091e6
Merge remote-tracking branch 'upstream/main' into feature/semgrep
huonw Apr 29, 2023
e2902ac
Rename
huonw Apr 29, 2023
cee2487
Outdated features
huonw Apr 29, 2023
05c49c4
Update to semgrep 1.20
huonw Apr 29, 2023
a3c7e07
loop to method
huonw Apr 29, 2023
5f7fdaa
Merge remote-tracking branch 'upstream/main' into feature/semgrep
huonw May 5, 2023
195d090
pathlib for ancestor_targets
huonw May 5, 2023
9c68431
Remove unnecessary lockfile juggling
huonw May 5, 2023
ace84a6
Remove fancy ignore file handling for now
huonw May 5, 2023
409e73c
Merge remote-tracking branch 'upstream/main' into feature/semgrep
huonw Aug 27, 2023
25c4f48
Tweaks for upstream changes
huonw Aug 28, 2023
385666a
Rewrite to just use pathglobs, no new targets
huonw Aug 28, 2023
5db7e97
Remove now-dead code
huonw Aug 28, 2023
16f22c4
Test some pure code
huonw Aug 28, 2023
ed0687a
Tweak integration tests
huonw Aug 28, 2023
4593fb2
Restore accidental deletion
huonw Aug 28, 2023
db6bc2e
Update semgrep lockfile
huonw Aug 28, 2023
7e7135f
Tweak subsystem
huonw Aug 28, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Run Semgrep using FieldSet/targets, partitioning by config
  • Loading branch information
huonw committed Apr 2, 2023

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
commit d08611f6deb7c26fa9c3435d5582ad15b6eeef3e
85 changes: 39 additions & 46 deletions src/python/pants/backend/tools/semgrep/rules.py
Original file line number Diff line number Diff line change
@@ -2,74 +2,67 @@
# Licensed under the Apache License, Version 2.0 (see LICENSE).
from __future__ import annotations

from collections import defaultdict
from dataclasses import dataclass
from typing import Any, Iterable
from typing import Iterable

from pants.backend.python.util_rules.pex import PexRequest, VenvPex, VenvPexProcess
from pants.core.goals.lint import LintFilesRequest, LintResult
from pants.core.util_rules.partitions import PartitionerType, Partitions
from pants.engine.fs import (
CreateDigest,
Digest,
FileContent,
GlobMatchErrorBehavior,
MergeDigests,
PathGlobs,
)
from pants.engine.internals.native_engine import FilespecMatcher, Snapshot
from pants.core.goals.lint import LintResult, LintTargetsRequest
from pants.core.util_rules.partitions import Partition, Partitions
from pants.core.util_rules.source_files import SourceFiles, SourceFilesRequest
from pants.engine.fs import CreateDigest, Digest, FileContent, MergeDigests
from pants.engine.process import FallibleProcessResult
from pants.engine.rules import Get, MultiGet, Rule, collect_rules, rule
from pants.engine.target import DependenciesRequest, Targets
from pants.engine.unions import UnionRule
from pants.option.global_options import GlobalOptions
from pants.util.logging import LogLevel
from pants.util.strutil import pluralize

from .subsystem import Semgrep
from .subsystem import Semgrep, SemgrepFieldSet
from .target_types import SemgrepRuleSourceField


class SemgrepRequest(LintFilesRequest):
class SemgrepRequest(LintTargetsRequest):
huonw marked this conversation as resolved.
Show resolved Hide resolved
field_set_type = SemgrepFieldSet
tool_subsystem = Semgrep

partitioner_type = PartitionerType.CUSTOM


@dataclass(frozen=True)
class SemgrepConfigFilesRequest:
pass

class PartitionMetadata:
config_files: frozenset[SemgrepRuleSourceField]

@dataclass(frozen=True)
class SemgrepConfigFiles:
snapshot: Snapshot
@property
def description(self) -> str:
return ", ".join(sorted(field.value for field in self.config_files))


@rule
async def gather_config_files(
request: SemgrepConfigFilesRequest, semgrep: Semgrep
) -> SemgrepConfigFiles:
globs = [f"**/{glob}" for glob in semgrep.config_globs]
config_files_snapshot = await Get(
Snapshot,
PathGlobs(
globs=globs,
glob_match_error_behavior=GlobMatchErrorBehavior.error,
description_of_origin="the option `--semgrep-config-globs`",
),
async def partition(
request: SemgrepRequest.PartitionRequest[SemgrepFieldSet], semgrep: Semgrep
) -> Partitions:
if semgrep.skip:
return Partitions()

dependencies = await MultiGet(
Get(Targets, DependenciesRequest(field_set.dependencies))
for field_set in request.field_sets
)
return SemgrepConfigFiles(snapshot=config_files_snapshot)

by_config = defaultdict(list)

@rule
async def partition(request: SemgrepRequest.PartitionRequest, semgrep: Semgrep) -> Partitions:
if semgrep.skip:
return Partitions()
for field_set, deps in zip(request.field_sets, dependencies):
semgrep_configs = frozenset(
d[SemgrepRuleSourceField] for d in deps if d.has_field(SemgrepRuleSourceField)
)

matching_files = FilespecMatcher(
includes=semgrep.file_glob_include, excludes=semgrep.file_glob_exclude
).matches(request.files)
by_config[semgrep_configs].append(field_set)

# TODO: partition by config
return Partitions.single_partition(matching_files)
return Partitions(
Partition(tuple(field_sets), PartitionMetadata(configs))
for configs, field_sets in by_config.items()
)


# We have a hard-coded settings file to side-step
@@ -82,19 +75,19 @@ async def partition(request: SemgrepRequest.PartitionRequest, semgrep: Semgrep)

@rule(desc="Lint with Semgrep", level=LogLevel.DEBUG)
async def lint(
request: SemgrepRequest.Batch[str, Any],
request: SemgrepRequest.Batch[SemgrepFieldSet, PartitionMetadata],
semgrep: Semgrep,
global_options: GlobalOptions,
Copy link
Member

@kaos kaos Apr 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side observation (not suggesting actionable in this PR):

I think it is unfortunate to depend on the entire GlobalOptions when it is only a single bool value you care about.

I see this is the case in quite a few other places as well, which makes me think we might want to consider having a generic IsColorEnabled API type along with a rule that extracts the GlobalOptions.color value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been meaning to generate a rule per option from subsystems...

Let me file a ticket...

) -> LintResult:
config_files, semgrep_pex, input_files, settings = await MultiGet(
Get(SemgrepConfigFiles, SemgrepConfigFilesRequest()),
Get(SourceFiles, SourceFilesRequest(request.partition_metadata.config_files)),
Get(VenvPex, PexRequest, semgrep.to_pex_request()),
Get(Snapshot, PathGlobs(globs=request.elements)),
Get(SourceFiles, SourceFilesRequest(field_set.source for field_set in request.elements)),
Get(Digest, CreateDigest([_DEFAULT_SETTINGS])),
)

input_digest = await Get(
Digest, MergeDigests((input_files.digest, config_files.snapshot.digest, settings))
Digest, MergeDigests((input_files.snapshot.digest, config_files.snapshot.digest, settings))
)

# TODO: https://github.com/pantsbuild/pants/issues/18430 support running this with --autofix
14 changes: 14 additions & 0 deletions src/python/pants/backend/tools/semgrep/subsystem.py
Original file line number Diff line number Diff line change
@@ -3,6 +3,7 @@

from __future__ import annotations

from dataclasses import dataclass
from typing import Iterable

from pants.backend.python.goals import lockfile
@@ -13,11 +14,24 @@
from pants.backend.python.util_rules.pex_requirements import GeneratePythonToolLockfileSentinel
from pants.core.goals.generate_lockfiles import GenerateToolLockfileSentinel
from pants.engine.rules import Rule, collect_rules, rule
from pants.engine.target import Dependencies, FieldSet, SingleSourceField, Target
from pants.engine.unions import UnionRule
from pants.option.option_types import ArgsListOption, BoolOption, SkipOption, StrListOption
from pants.util.docutil import git_url


@dataclass(frozen=True)
class SemgrepFieldSet(FieldSet):
required_fields = (SingleSourceField, Dependencies)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh this is interesting. I have long standing thoughts on how a hybrid plugin would work with targets.

Anything with a source seems like a good way of doing that

🙌

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool 👍

(Coincidentally, your comment makes me realise that the Dependencies field isn't used any more, so I should get rid of it. Note to self.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I feel like there's an example of targets that operate on a level that isn't single-file. Maybe this should be any source field?

You can still run it on individual files, but you'll match more targets.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

were you still getting rid of the dependencies field?

source: SingleSourceField
dependencies: Dependencies

@classmethod
def opt_out(cls, tgt: Target) -> bool:
# FIXME: global skip_semgrep field?
return False


class Semgrep(PythonToolBase):
huonw marked this conversation as resolved.
Show resolved Hide resolved
name = "Semgrep"
options_scope = "semgrep"