-
Notifications
You must be signed in to change notification settings - Fork 870
The Rectangle Rule
Imagine we’re writing a mechanical formatter for Java, like
google-java-format
. We’d like the formatted output to follow a few strict
rules (e.g., no lines longer than some limit, with specified exceptions) but
otherwise we’d just like it to be as “readable” as possible. A poorly formatted
statement, like:
currentEstimate = (currentEstimate + x
/ currentEstimate) / 2.0f;
(imagining it won’t all fit on one line) might not be as "readable" as, say, the better formatted:
currentEstimate =
(currentEstimate + x / currentEstimate)
/ 2.0f;
Formatting for this sort of abstract readability is not easily mechanized; it’s loose, not strict. Without more precise "big rules" for readability, we can end up with lots of "little rules"—don’t ever break a line here; don’t break there unless you absolutely have to—and all these little rules can easily interact in complex and confusing ways.
Alternatively, if we can find a few big rules that produce readability, or that at very least promote readability, we can simplify writing a formatter, by using the big rules to reduce the number of special little rules that we must add.
Here’s the Rectangle Rule, one such big rule for promoting readability:
When a source file is formatted, each subtree gets its own bounding rectangle, containing all of that subtree’s text and none of any other subtree’s.
What does this mean? Take the well formatted example above, and draw a rectangle
around just the subexpression x / currentEstimate
:
currentEstimate =
(currentEstimate + x / currentEstimate)
/ 2.0f;
This is possible—good! But in the badly formatted example, there is no rectangle containing just that subexpression and nothing more—bad!
currentEstimate = (currentEstimate + x
/ currentEstimate) / 2.0f;
In the well formatted example, every subtree has its own rectangle; for instance, the right-hand side ("RHS") of the assignment has its own rectangle in the well formatted example, but not in the other. This promotes readability by exposing program structure in the physical layout; the RHS is in just one place, not partly in one place and partly another.
There are some complexities and exceptions. We must ignore some “random”
punctuation like ;
. Perhaps we should just reduce overlap, not
eliminate it altogether. Even so, the Rectangle Rule is a simple big rule for
promoting readability by exposing code structure. It forces a number of
formatting choices, simplifying the formatter’s job in deciding the remainder.
Here’s a more complicated sequence of statements:
PCollection<List<Integer>> recomputedMean =
p.apply(Create.of(Arrays.asList(assigned)).withCoder(KvCoder.of(
ListCoder.of(BigEndianIntegerCoder.of()), ListCoder.of(BigEndianIntegerCoder.of()))))
.apply(
Combine.<List<Integer>, List<Integer>, List<Integer>>perKey(
new RecomputeMeanCombineFn()))
.apply(Values.<List<Integer>>create());
DirectPipelineRunner.EvaluationResults results = p.run();
Assert.assertThat(results.getPCollection(recomputedMean),
containsInAnyOrder(Arrays.asList(20, 2), Arrays.asList(15, 55)));
Here are the same statements, reformatted to follow the Rectangle Rule:
PCollection<List<Integer>> recomputedMean =
p.apply(
Create.of(Arrays.asList(assigned))
.withCoder(
KvCoder.of(
ListCoder.of(BigEndianIntegerCoder.of()),
ListCoder.of(BigEndianIntegerCoder.of()))))
.apply(
Combine.<List<Integer>, List<Integer>, List<Integer>>perKey(
new RecomputeMeanCombineFn()))
.apply(Values.<List<Integer>>create());
DirectPipelineRunner.EvaluationResults results = p.run();
Assert.assertThat(
results.getPCollection(recomputedMean),
containsInAnyOrder(Arrays.asList(20, 2), Arrays.asList(15, 55)));
Each method call has its own rectangle. (As before, we ignore random punctuation
like )
and {
.) The two arguments to Assert.assertThat
even have a
rectangle together, as well as separately, further exposing program structure.
This is not the only way to format according to the Rectangle Rule, but it is
the layout that google-java-format
produces. Here, the reformatted statements
are radically unfolded from the original.
Here’s another bit of code:
mCropView.postDelayed(new Runnable() {
@Override
public void run() {
if(!visible) {
changeWallpaperFlags(visible);
} else {
mCropView.setVisibility(View.INVISIBLE);
}
}
}, FLAG_POST_DELAY_MILLIS);
Here’s the same code, reformatted to follow the Rectangle Rule:
mCropView.postDelayed(
new Runnable() {
@Override
public void run() {
if (!visible) {
changeWallpaperFlags(visible);
} else {
mCropView.setVisibility(View.INVISIBLE);
}
}
},
FLAG_POST_DELAY_MILLIS);
The anonymous inner class is in one place and nothing overlaps it visually, again helping to expose the program structure.
So far we’ve shown the rectangle rule at work with expressions, where the idea of subtree is straightforward. It can also be useful in more complex situations, such as declarations. Consider this code:
public static <W extends BoundedWindow> StateTag<Object, WatermarkHoldState<W>>
watermarkStateInternal(String id, OutputTimeFn<? super W> outputTimeFn) {
return new WatermarkStateTagInternal<W>(new StructuredId(id), outputTimeFn);
}
Perhaps we find it distracting that the method’s return type (StateTag<Object, WatermarkHoldState<W>>
) and name (watermarkStateInternal
) are so far apart.
If we arbitrarily define them together to be treated as a subtree of the
declaration, we force a different formatting, perhaps google-java-format
's:
public static <W extends BoundedWindow>
StateTag<Object, WatermarkHoldState<W>> watermarkStateInternal(
String id, OutputTimeFn<? super W> outputTimeFn) {
return new WatermarkStateTagInternal<W>(new StructuredId(id), outputTimeFn);
}
The real world is complicated and rules sometimes need exceptions. (The highest-level rule is probably Don’t needlessly confuse or annoy users.)
The Rectangle Rule is not strictly applied to right parens )
. Trailing right
parens are always rendered with no preceding whitespace and this may cause the
right edge of inner bounding rectangles to be poorly defined:
outerMethod(
methodWithExcessivelyLongName(
deeplyNestedArgument));
In the above example, there is no proper rectangle which exactly encloses the
argument of outerMethod(...)
. The right edge cannot both include Name(
and
exclude );
. This departure from a pure interpretation of the Rectangle Rule is
similar to the treament of semicolons and follows the more typical convention of
never breaking before )
.
Consider the expression in this statement:
return annotationStrategy.equals(other.annotationStrategy)
&& typeLiteral.equals(other.typeLiteral);
Oops! To make it strictly follow the Rectangle Rule, we’d have to reformat it to break before the expression being returned:
return
annotationStrategy.equals(other.annotationStrategy)
&& typeLiteral.equals(other.typeLiteral);
We've seen almost no existing Java code that breaks after the return
,
suggesting we make an exception here. We can rationalize it (and others like it)
by saying that not much of the enclosing subtree overlaps. If we change the
formatter’s indentation rules to follow the Rectangle Rule more closely, we risk
surprising or annoying a lot of people.
Consider this statement:
int fifteen =
0 + 1 + 2 + 3
+ 4 + 5;
Since addition in Java is left-associative, 0 + 1 + 2 + 3 + 4 is a subtree, and yet it doesn’t have its own rectangle here. We must redefine the shape of the tree to avoid surprising users with unexpected layouts like:
int fifteen =
0 + 1 + 2 + 3
+ 4
+ 5;
google-java-format
implements a number of exceptions to the Rectangle Rule,
but it seems certain that even more might be worthwhile. For example, it
currently generates the somewhat annoying formatting:
method1(
method2(
method3(
method4(
method5(
"Long, long expression"
+ "that won't fit on one line.")))));
which might (or might not) be improved by violating the Rectangle Rule:
method1(method2(method3(method4(method5(
"Long, long expression"
+ "that won't fit on one line.")))));
Creating new exceptions, and doing so precisely, is an ongoing challenge.
Finally, google-java-format
is limited because it is not a compiler. It can
make formatting choices based only on syntax (and initial layout), not on
semantics. For example, it might make sense to lay out fluent chains of methods
calls differently from other chains, but google-java-format
cannot (for
example) look at the methods’ type signatures to determine which are which.
The Rectangle Rule is a big rule that helps to promote readability. Many other possible rules promote readability too, the Rectangle Rule is simple and broad in its implications.
Because the Rectangle Rule limits how code can be folded together, it forces more white space into the formatted output, increasing the number of lines required for some code.
The Rectangle Rule is compatible with existing Java Style Guide rules, such as
the indentation rules. It is largely compatible with existing practice, although
there are exceptions like return
statements, and although much existing code
is heavily folded to reduce the number of lines required.
While the Rectangle Rule is shown here in use with Java, experience shows that it is also usable with other programming languages.
There is a rich history of rules and algorithms for the formatting of programs
or other structured text, also called “pretty-printing” or "grinding"
(Goldstein, Moon). google-java-format
implements a variant of the
linear-time Oppen algorithm invented by Derek Oppen, Greg Nelson, and Eric
Roberts at Harvard University in the 1970s; this algorithm has inspired a
fascinating series of interesting variants (Wadler, Swierstra & Chitil). The
Oppen algorithm makes it easy to implement the Rectangle Rule (and its
exceptions), although it does not mandate it.