Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up function naming, documentation, and to some degree code without changing behaviour #8

Merged

Conversation

smowton
Copy link

@smowton smowton commented May 9, 2022

Changes:

  • Renamed predicates, arguments to try to be more explicit about their meaning
  • Removed predicate results or parameters that are never used
  • Improved documentation
  • Added TODOs and FIXMEs where there's reasonably clear trouble
  • Added binding-order directives where there are small integer argument indices and similar tempting the join orderer to do something silly (though these also use localFlow etc so probably warrant further join-order engineering)
  • Unpicked some redundant logic in isUnsafeUseUnconstrainedByIfCheck, but this function still needs significant work due to perhaps-accidental use of global and local flow tests against the e of e.mkdir() and the e of mkdirWrapper(e) respectively
  • Removed the non-functional BarrierGuard DirectoryCreationBarrierGuard (this never did anything due to not implementing checks as expected (this should indicate what expressions are cleaned when this test passes/fails)). Removing it didn't alter the test results.

Copy link
Owner

@JLLeitschuh JLLeitschuh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, these changes make sense. I appreciate the addition of a ton of comments and better naming. I always struggle with naming things.

isNonThrowingDirectoryCreationExpression(any(Expr e2 | DataFlow::localExprFlow(arg, e2)),
creationCall)
exists(Method m, int argumentIndex, Expr fileInstanceParam |
DataFlow::localExprFlow(m.getParameter(pragma[only_bind_into](argumentIndex)).getAnAccess(),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pragma[only_bind_into](argumentIndex) What does this do, and why use it in this case?

Copy link
Author

@smowton smowton May 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://codeql.github.com/docs/ql-language-reference/annotations/#pragma-only-bind-into

Basically I am trying to discourage a join on just argumentIndex, because this is an integer column with small cardinality which invites a very high cost join.

To give a simplified example, consider argument <-> parameter matching:

predicate argParamPair(Parameter p, Expr arg) {
  exists(MethodAccess ma, Method m, int i | m = ma.getMethod() and arg = ma.getArgument(i) and p = m.getParameter(i))
}

A bad solution ordering might look like:

Scan `MethodAccess.getArgument`, result `[methodAccess, i, argExpr]`
Join against `Method.getParameter` on matching `i`, result `[method, methodAccess, i, argExpr, parameter]`
Join against `MethodAccess.getMethod` on matching `method, methodAccess`, result `[argExpr, parameter]`

This is a bad choice because there are few values of i: in the worst possible case all our methods and therefore all our method accesses have a single parameter/argument and "join on matching i" is a no-op, equivalent to a cartesian product of MethodAccess.getArgument and Method.getParmeter.

Generally CodeQL will not try to use a column as a join key until it has been bound (constrained) -- for primitive types this avoids a potentially infinite set of values, and for others it is simply good practice to ensure we don't join on a finite but large column and produce a very large intermediate result. By wrapping uses of i in pragma[only_bind_into] I tell CodeQL that evaluating either getArgument or getParameter, which uses i, shouldn't be regarded as usefully constraining the other use of i.

Precisely what it will do with that information I don't know exactly, but I'm hoping for an ordering like

Scan `MethodAccess.getArgument`, result `[methodAccess, i, argExpr]`
Join against `MethodAccess.getMethod` on matching `methodAccess`, result `[method, i, argExpr]`
Join against `Method.getParameter` on matching `method, i`, result `[argExpr, parameter]`

Crucially i was matched at the same time as another higher-cardinality column method, avoiding a large intermediate result.

@JLLeitschuh JLLeitschuh merged commit b412c7f into JLLeitschuh:feat/JLL/java/CWE-378 Jun 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants