Skip to content
This repository was archived by the owner on Feb 25, 2025. It is now read-only.

clang-tidy: added the ability to shard jobs #37265

Merged
merged 17 commits into from
Nov 7, 2022
136 changes: 115 additions & 21 deletions tools/clang_tidy/lib/clang_tidy.dart
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,17 @@ class _ComputeJobsResult {
final bool sawMalformed;
}

enum _SetStatus {
Intersection,
Difference,
}

class _SetStatusCommand {
_SetStatusCommand(this.setStatus, this.command);
final _SetStatus setStatus;
final Command command;
}

/// A class that runs clang-tidy on all or only the changed files in a git
/// repo.
class ClangTidy {
Expand Down Expand Up @@ -92,14 +103,14 @@ class ClangTidy {

_outSink.writeln(_linterOutputHeader);

final List<io.File> changedFiles = await computeChangedFiles();
final List<io.File> filesOfInterest = await computeFilesOfInterest();

if (options.verbose) {
_outSink.writeln('Checking lint in repo at ${options.repoPath.path}.');
if (options.checksArg.isNotEmpty) {
_outSink.writeln('Checking for specific checks: ${options.checks}.');
}
final int changedFilesCount = changedFiles.length;
final int changedFilesCount = filesOfInterest.length;
if (options.lintAll) {
_outSink.writeln('Checking all $changedFilesCount files the repo dir.');
} else {
Expand All @@ -109,12 +120,19 @@ class ClangTidy {
}
}

final List<dynamic> buildCommandsData = jsonDecode(
final List<Object?> buildCommandsData = jsonDecode(
options.buildCommandsPath.readAsStringSync(),
) as List<dynamic>;
final List<Command> changedFileBuildCommands = await getLintCommandsForChangedFiles(
) as List<Object?>;
final List<List<Object?>> shardBuildCommandsData = options
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"collection-for" is more idiomatic:

final List<List<Object?>> shardBuildCommandsData = <List<Object?>>[
  for (io.File file in options.shardCommandsPaths)
    jsonDecode(file.readAsStringSync()) as List<Object?>,
];

.shardCommandsPaths
.map((io.File file) =>
jsonDecode(file.readAsStringSync()) as List<Object?>)
.toList();
final List<Command> changedFileBuildCommands = await getLintCommandsForFiles(
buildCommandsData,
changedFiles,
filesOfInterest,
shardBuildCommandsData,
options.shardId,
);

if (changedFileBuildCommands.isEmpty) {
Expand Down Expand Up @@ -153,7 +171,7 @@ class ClangTidy {
/// The files with local modifications or all the files if `lintAll` was
/// specified.
@visibleForTesting
Future<List<io.File>> computeChangedFiles() async {
Future<List<io.File>> computeFilesOfInterest() async {
if (options.lintAll) {
return options.repoPath
.listSync(recursive: true)
Expand All @@ -171,23 +189,99 @@ class ClangTidy {
return repo.changedFiles;
}

/// Given a build commands json file, and the files with local changes,
/// compute the lint commands to run.
/// Returns f(n) = value(n * [shardCount] + [id]).
Iterable<T> _takeShard<T>(Iterable<T> values, int id, int shardCount) sync* {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the code from here down that is reasoning about sets of commands could be made a bit more idiomatic, and maybe more readable as well, by encapsulating it in a new class (Commands, CommandSet, or CommandShard, or something) that can live in lib/src/command.dart.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Respectfully, I don't think that will help make the code easier to read or maintain. The small codebase already has poor oop design already and it actually made my job harder. In objective terms adding more classes will result in more lines of code, more segmentation of the code, and more concepts you need to know to modify the code.

As an example, there is no reason for CommandSet to exist when Set<Command> (or List<Command>) already encapsulates the idea perfectly. People can come into the codebase and know what that is immediately, not so with CommandSet. I also can't understand from what the difference between a CommandSet and a CommandShard is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an example, there is no reason for CommandSet to exist when Set<Command> (or List<Command>) already encapsulates the idea perfectly.

The code in getLintCommandsForFiles is not operating on the Set, List, etc. in its full generality. Instead it's doing something very specific: trying to filter unwanted items out of a big set. The specific internal data structures that are used to accomplish that goal are an implementation detail. This code would be easier to read and maintain if the the implementation details were encapsulated somehow. It doesn't necessarily have to be a class. It could be more helper methods, for example. I don't have a strong opinion there. However, I think even with comments, the code in getLintCommandsForFiles could be improved on, and that is overall what I'm asking for.

I also can't understand from what the difference between a CommandSet and a CommandShard is.

These were just naming suggestions for a single new class.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I think the documentation could be better and will help with that. I've updated the docstring in getLintCommandsForFiles and added one for _takeShard.

int count = 0;
for (final T val in values) {
if (count % shardCount == id) {
yield val;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe instead of the sync* function:

return Iterable.generate(
  values.length / shardCount,
  (int index) => values.elementAt(index * shardCount + id),
);

but without the likely off-by-one errors.

}
count++;
}
}

/// This returns a `_SetStatusCommand` for each [Command] in [items].
/// `Intersection` if the Command shows up in [items] and its filePath in all
/// [filePathSets], otherwise `Difference`.
Iterable<_SetStatusCommand> _calcIntersection(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following is more idiomatic. You can use Iterable.every in allSetsContain, and you can use a "collection-for" instead of a sync*:

Iterable<_SetStatusCommand> _calcIntersection(
  Iterable<Commands> items,
  Iterable<Set> filePathSets,
) {
  bool allSetsContain(Command command) {
    return filePathSets.every(
      (Set<String> filePathSet) => filePathSet.contains(command.filePath),
    );
  }
  return <_SetStatusCommand>[
    for (final Command command in items)
      if (allSetsContain(command))
        _SetStatusCommand(_SetStatus.Intersection, command)
      else
        _SetStatusCommand(_SetStatus.Difference, command),
  ];
}

Iterable<Command> items, Iterable<Set<String>> filePathSets) sync* {
bool allSetsContain(Command command) {
for (final Set<String> filePathSet in filePathSets) {
if (!filePathSet.contains(command.filePath)) {
return false;
}
}
return true;
}
for (final Command command in items) {
if (allSetsContain(command)) {
yield _SetStatusCommand(_SetStatus.Intersection, command);
} else {
yield _SetStatusCommand(_SetStatus.Difference, command);
}
}
}

/// Given a build commands json file's contents in [buildCommandsData], and
/// the [files] with local changes, compute the lint commands to run. If
/// build commands are supplied in [sharedBuildCommandsData] the intersection
/// of those build commands will be calculated and distributed across
/// instances via the [shardId].
@visibleForTesting
Future<List<Command>> getLintCommandsForChangedFiles(
List<dynamic> buildCommandsData,
List<io.File> changedFiles,
) async {
final List<Command> buildCommands = <Command>[];
for (final dynamic data in buildCommandsData) {
final Command command = Command.fromMap(data as Map<String, dynamic>);
final LintAction lintAction = await command.lintAction;
// Short-circuit the expensive containsAny call for the many third_party files.
if (lintAction != LintAction.skipThirdParty && command.containsAny(changedFiles)) {
buildCommands.add(command);
Future<List<Command>> getLintCommandsForFiles(
List<Object?> buildCommandsData,
List<io.File> files,
List<List<Object?>> sharedBuildCommandsData,
int? shardId,
) {
final List<Command> totalCommands = <Command>[];
if (sharedBuildCommandsData.isNotEmpty) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-writing with collection-for I think is an improvement.

if (sharedBuildCommandsData.isNotEmpty) {
  final Iterable<Command> buildCommands = <Command>[
    for (Object? data in buildCommandsData)
      Command.fromMap(data as Map<String, Object?>),
  ];
  final Iterable<Set<String>> shardFilePaths = <Set<String>>[
    for (List<Object?> list in  sharedBuildCommandsData) {
      for (Object? data in list)
        Command.fromMap((data as Map<String, Object?>?)!).filePath,
    },
  ];
  final Iterable<_SetStatusCommand> intersectionResults = _calcIntersection(
    buildCommands, shardFilePaths,
  );
  totalCommands.addAll(<Command>[
    for (_SetStatusCommand element in intersectionResults)
      if (element.setStatus == _SetStatus.Difference)
        element.command,
  ]);
  final List<Command> intersection = <Command>[
    for (_SetStatusCommand element in intersectionResults)
      if (element.setStatus == _SetStatus.Intersection)
        element.command,
  ];
  totalCommands.addAll(_takeShard(
    intersection,
    shardId!,
    1 + sharedBuildCommandsData.length,
  ));
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

final List<Command> buildCommands = <Command>[
for (Object? data in buildCommandsData)
Command.fromMap((data as Map<String, Object?>?)!)
];
final List<Set<String>> shardFilePaths = <Set<String>>[
for (List<Object?> list in sharedBuildCommandsData)
<String>{
for (Object? data in list)
Command.fromMap((data as Map<String, Object?>?)!).filePath
}
];
final Iterable<_SetStatusCommand> intersectionResults =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method could use some comments.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

_calcIntersection(buildCommands, shardFilePaths);
for (final _SetStatusCommand result in intersectionResults) {
if (result.setStatus == _SetStatus.Difference) {
totalCommands.add(result.command);
}
}
final List<Command> intersection = <Command>[
for (_SetStatusCommand result in intersectionResults)
if (result.setStatus == _SetStatus.Intersection) result.command
];
// Make sure to sort results so the sharding scheme is guaranteed to work
// since we are not sure if there is a defined order in the json file.
intersection
.sort((Command x, Command y) => x.filePath.compareTo(y.filePath));
totalCommands.addAll(
_takeShard(intersection, shardId!, 1 + sharedBuildCommandsData.length));
} else {
totalCommands.addAll(<Command>[
for (Object? data in buildCommandsData)
Command.fromMap((data as Map<String, Object?>?)!)
]);
}
return buildCommands;
return () async {
final List<Command> result = <Command>[];
for (final Command command in totalCommands) {
final LintAction lintAction = await command.lintAction;
// Short-circuit the expensive containsAny call for the many third_party files.
if (lintAction != LintAction.skipThirdParty &&
command.containsAny(files)) {
result.add(command);
}
}
return result;
}();
}

Future<_ComputeJobsResult> _computeJobs(
Expand Down
56 changes: 50 additions & 6 deletions tools/clang_tidy/lib/src/options.dart
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,8 @@ class Options {
this.fix = false,
this.errorMessage,
this.warningsAsErrors,
this.shardId,
this.shardCommandsPaths = const <io.File>[],
StringSink? errSink,
}) : checks = checksArg.isNotEmpty ? '--checks=$checksArg' : null,
_errSink = errSink ?? io.stderr;
Expand Down Expand Up @@ -64,6 +66,8 @@ class Options {
ArgResults options, {
required io.File buildCommandsPath,
StringSink? errSink,
required List<io.File> shardCommandsPaths,
int? shardId,
}) {
return Options(
help: options['help'] as bool,
Expand All @@ -76,6 +80,8 @@ class Options {
fix: options['fix'] as bool,
errSink: errSink,
warningsAsErrors: _platformSpecificWarningsAsErrors(options),
shardCommandsPaths: shardCommandsPaths,
shardId: shardId,
);
}

Expand All @@ -87,25 +93,42 @@ class Options {
final ArgResults argResults = _argParser.parse(arguments);

String? buildCommandsPath = argResults['compile-commands'] as String?;

String variantToBuildCommandsFilePath(String variant) =>
path.join(
argResults['src-dir'] as String,
'out',
variant,
'compile_commands.json',
);
// path/to/engine/src/out/variant/compile_commands.json
buildCommandsPath ??= path.join(
argResults['src-dir'] as String,
'out',
argResults['target-variant'] as String,
'compile_commands.json',
);
buildCommandsPath ??= variantToBuildCommandsFilePath(argResults['target-variant'] as String);
final io.File buildCommands = io.File(buildCommandsPath);
final List<io.File> shardCommands =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

final String shardVariants = argResults['shard-variants'] as String? ?? '';
final List<io.File> shardCommands = <io.File>[
  for (String element in buildCommands.split(','))
    if (element.isNotEmpty)
      io.FIle(variantToBuildCommandsFilePath(element)),
];

(argResults['shard-variants'] as String? ?? '')
.split(',')
.where((String element) => element.isNotEmpty)
.map((String variant) =>
io.File(variantToBuildCommandsFilePath(variant)))
.toList();
final String? message = _checkArguments(argResults, buildCommands);
if (message != null) {
return Options._error(message, errSink: errSink);
}
if (argResults['help'] as bool) {
return Options._help(errSink: errSink);
}
final String? shardIdString = argResults['shard-id'] as String?;
final int? shardId = shardIdString == null ? null : int.parse(shardIdString);
if (shardId != null && (shardId >= shardCommands.length || shardId < 0)) {
return Options._error('Invalid shard-id value: $shardId.', errSink: errSink);
}
return Options._fromArgResults(
argResults,
buildCommandsPath: buildCommands,
errSink: errSink,
shardCommandsPaths: shardCommands,
shardId: shardId,
);
}

Expand All @@ -132,6 +155,17 @@ class Options {
'verbose',
help: 'Print verbose output.',
)
..addOption(
'shard-id',
help: 'When used with the shard-commands option this identifies which shard will execute.',
valueHelp: 'A number less than 1 + the number of shard-commands arguments.',
)
..addOption(
'shard-variants',
help: 'Comma separated list of other targets, this invocation '
'will only execute a subset of the intersection and the difference of the '
'compile commands. Use with `shard-id`.'
)
..addOption(
'compile-commands',
help: 'Use the given path as the source of compile_commands.json. This '
Expand Down Expand Up @@ -171,6 +205,12 @@ class Options {
/// The location of the compile_commands.json file.
final io.File buildCommandsPath;

/// The location of shard compile_commands.json files.
final List<io.File> shardCommandsPaths;

/// The identifier of the shard.
final int? shardId;

/// The root of the flutter/engine repository.
final io.Directory repoPath = io.Directory(_engineRoot);

Expand Down Expand Up @@ -233,6 +273,10 @@ class Options {
return "ERROR: Build commands path ${buildCommandsPath.absolute.path} doesn't exist.";
}

if (argResults.wasParsed('shard-variants') && !argResults.wasParsed('shard-id')) {
return 'ERROR: a `shard-id` must be specified with `shard-variants`.';
}

return null;
}
}
Loading