Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify bytecomp for BUILD_PATH_PREFIX_MAP #12139

Open
wants to merge 2 commits into
base: trunk
Choose a base branch
from

Conversation

richardlford
Copy link
Contributor

@richardlford richardlford commented Mar 27, 2023

  1. bytecomp/emitcode.ml

Sanitize the paths in debug events using BUILD_PATH_PREFIX_MAP. However, if the mapping has no effect, then do nothing.

  1. bytecomp/bytelink.ml

Do not do BUILD_PATH_PREFIX_MAP mapping of the path supplied by the user with the -use-runtime option. This is used to fill in the shebang part of the
executable, and an abstract path is unlikely to work there.

@richardlford
Copy link
Contributor Author

The failures are all the same test, testsuite/tests/lib-dynlink-initializers/test10_main.ml. It produced a back-trace which is sensitive to the file name in the debug events. Since this current change discards the directory part of debug event paths, the results change.

This did not show up in my local testing because it depends on how modules in the standard library were compiled. Now I can get a local reproduction by making a switch from my compiler sources. Presumably I could also have reproduced it by bootstrapping my compiler (not sure about that).

I have pushed a commit that promotes the output of that test so that the directory part of the name is not expected.

@richardlford
Copy link
Contributor Author

Having thought further about these changes, I have pushed (but not yet squashed) two commits.

  1. For bytecomp (the only part that is part of this PR):

1a. I decided that the path given by the -with-runtime option should not be subject to BUILD_PATH_PREFIX_MAP rewriting, since it is used to construct the shebang line and that is not likely to work with an abstract path. Also, when that option is not given the header is copied the ocaml installation and is not "reproducible", but it must match the ocaml installation.

1b. In my introduction to this PR I offered two options for sanitizing events, either using BUILD_PATH_PREFIX_MAP, or just taking the basename of the path. Having heard no opinions either way, I've decided to just use the basename as it is simpler, and also in the debugger it provides more efficient lookup.

Another reason for this choice is that the debug event path shows up in backtrace messages (as the failure of a test mentioned above showed). If we used BUILD_PATH_PREFIX_MAP mapping, then abstract paths (e.g. /workspace_root/mod.ml) would show up in backtraces unless we changed the backtrace logic to avoid them.

  1. In the debugger (not part of this PR, but provided for reference), I made the change corresponding to 1b, i.e. use the basename to lookup the source location.

@gasche
Copy link
Member

gasche commented Mar 31, 2023

Not using BUILD_PATH_PREFIX_MAP for the shebang sounds wise.

I am not sure about taking the basename in debug events. This is an observable change for users who look at the backtrace, and it goes in the direction of losing information. There is an updated testsuite reference file in your patchset that looks like this:

 Called from Test10_plugin.g in file "test10_plugin.ml", line 3, characters 2-21
 Called from Test10_plugin.f in file "test10_plugin.ml", line 6, characters 2-6
 Called from Test10_plugin in file "test10_plugin.ml", line 10, characters 2-6
-Called from Dynlink.Bytecode.run in file "otherlibs/dynlink/dynlink.ml", line 148, characters 16-25
-Re-raised at Dynlink.Bytecode.run in file "otherlibs/dynlink/dynlink.ml", line 150, characters 6-137
-Called from Dynlink_common.Make.load.(fun) in file "otherlibs/dynlink/dynlink_common.ml", line 363, characters 13-56
+Called from Dynlink.Bytecode.run in file "dynlink.ml", line 148, characters 16-25
+Re-raised at Dynlink.Bytecode.run in file "dynlink.ml", line 150, characters 6-137
+Called from Dynlink_common.Make.load.(fun) in file "dynlink_common.ml", line 363, characters 13-56
 Called from Stdlib__List.iter in file "list.ml", line 112, characters 12-15
-Called from Dynlink_common.Make.load in file "otherlibs/dynlink/dynlink_common.ml", line 359, characters 8-392
-Re-raised at Dynlink_common.Make.load in file "otherlibs/dynlink/dynlink_common.ml", line 372, characters 8-17
+Called from Dynlink_common.Make.load in file "dynlink_common.ml", line 359, characters 8-392
+Re-raised at Dynlink_common.Make.load in file "dynlink_common.ml", line 372, characters 8-17
 Called from Test10_main in file "test10_main.ml", line 51, characters 13-69

Personally I have the impression that the new backtrace is less convenient than the previous one -- now as the user I have to do the work of figuring out where to find the module. Maybe this is a corner case (I notice that stdlib module do not get prefixed with stdlib/, so presumably this depends on how the build system is setup, and it may be the case that most users do not benefit from informative paths in their local project in practice). Maybe this isn't. I'm not sure.

Maybe @lpw25, who is surprisingly useful in this discussion so far, has an opinion about this. I feel that I don't have all the relevant details in my head so I am willing to be convinced by other people's opinions.

@richardlford
Copy link
Contributor Author

I have squashed and removed all of the non-PR files, so this is ready to merge when approved.

@richardlford
Copy link
Contributor Author

richardlford commented Mar 31, 2023

Personally I have the impression that the new backtrace is less convenient than the previous one -- now as the user I have to do the work of figuring out where to find the module. Maybe this is a corner case (I notice that stdlib module do not get prefixed with stdlib/, so presumably this depends on how the build system is setup, and it may be the case that most users do not benefit from informative paths in their local project in practice). Maybe this isn't. I'm not sure.

Currently, the debug events have as a path whatever was passed to the compiler, which might be an absolute path. So this is definitely a point of leakage of build-time absolute paths.

Here are some possibilities:

  1. Only take the basename if it is an absolute path. That will leave the full relative path so as to be more informative. Note that a relative path may also have some build-time artifacts in it (compared to the path when it is installed, because of where Dune or other build systems decide to install things).
  2. Do the BUILD_PATH_PREFIX_MAP mapping, only for absolute paths, leaving relative paths alone.
  3. Do the BUILD_PATH_PREFIX_MAP mapping always.
  4. We could just check for the existence of a BUILD_PATH_PREFIX_MAP. If not set, we could leave the paths as is, since it would then be assumed that the user did not care about absolute path leakage.

All of the alternates that do BUILD_PATH_PREFIX_MAP will result in changes to the backtrace output. They will include the abstract path. That will not be less informative than the current output, though the users may wonder what "/workspace_root" is, or where it came from.

The runtime also has access to the debug directories. It could use that to find the absolute path and provide even more information for the user. However, if mapping was used, that would only work if an inverse BUILD_PATH_PREFIX_MAP mapping from abstract to runtime paths were available.

I'm open to any of these possibilities, or others you may suggest. In any case, the debugger only needs the basename and can ignore the directory part, since the debug directories are sufficient for it to find the sources.

@gasche
Copy link
Member

gasche commented Mar 31, 2023

Personally I find (3) to be the option most consistent with the rest of what you are doing (I think?), given that if I understand correctly you do rewrite both absolute and relative paths in other parts of the compiler.

I don't find it abnormal that users would see rewritten paths; I think that the prefix map should be designed to be human-readable, and in general we should assume that users may observe the rewritten path. (If there was no way to get the rewritten paths back from the build artifact, we would not store those paths in the first place.)

@richardlford
Copy link
Contributor Author

I've pushed again.

At first I tried no. 3, calling Location.absolute_path. But since no mapping was defined, this just had the effect of turning the paths into absolute paths, which resulted in 26 failed tests whose backtraces had absolute paths pointing to my development environment. So either we need a way to sanitize the test output, or else we need a different approach.

For now I've gone to checking to see if the mapping had an effect. If it did not, then I just leave the event unchanged.

I've rebased and pushed that result. We now no longer have a change in the testbase results.

@richardlford
Copy link
Contributor Author

So, is this approach acceptable, or is there something else I should try?

@richardlford
Copy link
Contributor Author

Any additional comments? Is this ready for merge?

…te shebang

1. bytecomp/emitcode.ml

Sanitize the paths in debug events using
BUILD_PATH_PREFIX_MAP. However if the mapping has no effect, then do
nothing.

2. bytecomp/bytelink.ml
Do not do BUILD_PATH_PREFIX_MAP mapping of the path
supplied by the user with the `-use-runtime` option.
This is used to fill in the shebang part of the
executable, and an abstract path is unlikely to
work there.
@richardlford
Copy link
Contributor Author

richardlford commented May 17, 2023

These changes are needed to proceed with #12126 and ocaml/dune#7750. Can we merge this now, or are there any further suggestions? I changing jobs at the end of the month and would like to get my OCaml and Dune changes merged by then if possible, as I won't have much time to devote to them after that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants