-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add mechanism for verifying that source code in documentation is valid #7956
Changes from all commits
928c415
1b220a1
653bace
4db03cc
feedaf7
3de5a04
fb35b4a
8d77197
9594b30
ca62c08
cb55689
0137395
edd3bf6
fa4f8f4
44434d7
ec05922
c867cf9
7763328
074876a
c6a8c77
fa96ed3
73de2e2
f684f13
b148ac5
df37516
8d26611
712eb09
38f6b79
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -21,8 +21,14 @@ | |
set -e | ||
rm -rf build 2> /dev/null | ||
rm -rf temp 2> /dev/null | ||
|
||
# copy source to temp dir | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There is also a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks, I had missed that. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't have a convenient way to test on Windows, unfortunately |
||
mkdir temp | ||
cp -rf source/* temp/ | ||
|
||
# update markdown files with latest example source code from tests | ||
python preprocess.py | ||
|
||
# replace relative URLs with absolute URLs | ||
sed -i -e 's/\.\.\/\.\.\/\.\.\//https:\/\/github.com\/apache\/arrow-datafusion\/blob\/main\//g' temp/contributor-guide/index.md | ||
make SOURCEDIR=`pwd`/temp html |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,108 @@ | ||
#!/usr/bin/python | ||
# | ||
# Licensed to the Apache Software Foundation (ASF) under one or more | ||
# contributor license agreements. See the NOTICE file distributed with | ||
# this work for additional information regarding copyright ownership. | ||
# The ASF licenses this file to You under the Apache License, Version 2.0 | ||
# (the "License"); you may not use this file except in compliance with | ||
# the License. You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
import glob | ||
import os | ||
import re | ||
|
||
|
||
def read_source(test_filename, test_method): | ||
lines = [] | ||
with open(test_filename) as test: | ||
in_example_code = False | ||
for test_line in test.readlines(): | ||
if test_line.strip() == "//begin:{}".format(test_method): | ||
in_example_code = True | ||
continue | ||
if test_line.strip() == "//end:{}".format(test_method): | ||
break | ||
if in_example_code: | ||
lines.append(test_line) | ||
|
||
# remove leading indent when possible | ||
consistent_indent = True | ||
for line in lines: | ||
if len(line.strip()) > 0 and not ( | ||
line.startswith(" ") or line.startswith("\t") | ||
): | ||
consistent_indent = False | ||
break | ||
if consistent_indent: | ||
old_lines = lines | ||
lines = [] | ||
for line in old_lines: | ||
if len(line) >= 4: | ||
lines.append(line[4:]) | ||
else: | ||
lines.append(line) | ||
|
||
return lines | ||
|
||
|
||
def update_examples(source_file): | ||
print("Updating code samples in ", source_file) | ||
lines = [] | ||
# finite state machine to track state | ||
state_scan = "scan" | ||
state_before_code = "before" | ||
state_in_code = "in" | ||
# <!-- include: library_logical_plan::plan_builder_1 --> | ||
include_pattern = "<!-- include: ([a-z0-9_]*)::([a-z0-9_]*) -->" | ||
with open(source_file, "r") as input: | ||
state = state_scan | ||
for line in input.readlines(): | ||
if state == state_scan: | ||
lines.append(line) | ||
matches = re.search(include_pattern, line) | ||
if matches is not None: | ||
state = state_before_code | ||
test_file = matches.group(1) | ||
test_method = matches.group(2) | ||
test_filename = "src/{}.rs".format(test_file) | ||
lines.append("\n```rust\n") | ||
source = read_source(test_filename, test_method) | ||
if len(source) == 0: | ||
raise "failed to read source code from unit tests" | ||
for x in source: | ||
lines.append(x) | ||
lines.append("```\n") | ||
elif state == state_before_code: | ||
# there can be blank lines between the include directive and the start of the code | ||
if len(line.strip()) > 0: | ||
if line.startswith("```rust"): | ||
state = state_in_code | ||
else: | ||
raise "expected Rust code to immediately follow include directive but found other content" | ||
elif state == state_in_code: | ||
if line.strip() == "```": | ||
state = state_scan | ||
|
||
if state == state_scan: | ||
with open(source_file, "w") as output: | ||
for line in lines: | ||
output.write(line) | ||
else: | ||
raise "failed to rewrite example source code" | ||
|
||
|
||
def main(): | ||
for file in glob.glob("source/**/*.md"): | ||
update_examples(file) | ||
|
||
|
||
if __name__ == "__main__": | ||
main() |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -36,7 +36,7 @@ much easier to use the [LogicalPlanBuilder], which is described in the next sect | |
|
||
Here is an example of building a logical plan directly: | ||
|
||
<!-- source for this example is in datafusion_docs::library_logical_plan::plan_1 --> | ||
<!-- include: library_logical_plan::create_plan --> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I tested what happens if there is a bad link to an example here: $ git diff
diff --git a/docs/source/library-user-guide/building-logical-plans.md b/docs/source/library-user-guide/building-logical-plans.md
index b75a788e83..6ae376d445 100644
--- a/docs/source/library-user-guide/building-logical-plans.md
+++ b/docs/source/library-user-guide/building-logical-plans.md
@@ -36,7 +36,7 @@ much easier to use the [LogicalPlanBuilder], which is described in the next sect
Here is an example of building a logical plan directly:
-<!-- include: library_logical_plan::create_plan -->
+<!-- include: library_logical_plan::create_planXXXXX -->
This example produces the following plan: It seems to just leave a blank in the docs Is there any way to generate an error in this case instead? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I will add some error checks. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I tried this again and now the script simply doesn't change the section, which I think is fine |
||
|
||
```rust | ||
// create a logical table source | ||
|
@@ -58,7 +58,7 @@ let table_scan = LogicalPlan::TableScan(TableScan::try_new( | |
fetch, | ||
)?); | ||
|
||
// create a Filter plan that evaluates `id > 500` that wraps the TableScan | ||
// create a Filter plan that evaluates `id > 500` and wraps the TableScan | ||
let filter_expr = col("id").gt(lit(500)); | ||
let plan = LogicalPlan::Filter(Filter::try_new(filter_expr, Arc::new(table_scan))?); | ||
|
||
|
@@ -99,7 +99,7 @@ Here are some examples of transformation methods, but for a full list, refer to | |
|
||
The following example demonstrates building the same simple query plan as the previous example, with a table scan followed by a filter. | ||
|
||
<!-- source for this example is in datafusion_docs::library_logical_plan::plan_builder_1 --> | ||
<!-- include: library_logical_plan::build_plan --> | ||
|
||
```rust | ||
// create a logical table source | ||
|
@@ -115,10 +115,8 @@ let projection = None; | |
// create a LogicalPlanBuilder for a table scan | ||
let builder = LogicalPlanBuilder::scan("person", Arc::new(table_source), projection)?; | ||
|
||
// perform a filter operation and build the plan | ||
let plan = builder | ||
.filter(col("id").gt(lit(500)))? // WHERE id > 500 | ||
.build()?; | ||
// perform a filter that evaluates `id > 500`, and build the plan | ||
let plan = builder.filter(col("id").gt(lit(500)))?.build()?; | ||
|
||
// print the plan | ||
println!("{}", plan.display_indent_schema()); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if you considered the opposite:
Leaving the code example inlined in the documentation, and extracting it out to
datafusion-docs-test
with a script.This would have the benefit that the markdown would be closer to the actual output documentation, and it might be easier to work on the examples inline.
The downside is that then we would need to somehow run a script to copy the examples into datafusion-docs-tests as part of CI to make sure they were up to date
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This approach is more challenging because sometimes we are copying a whole function, and sometimes just a few lines of code, and some snippets depend on other snippets. It is also hard to write and test code with this approach because there is no IDE support.