Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

findExe and getAppFilename are full of bugs #14292

Open
16 tasks
timotheecour opened this issue May 10, 2020 · 5 comments
Open
16 tasks

findExe and getAppFilename are full of bugs #14292

timotheecour opened this issue May 10, 2020 · 5 comments
Labels
OS (stdlib) Related to OS modules in standard library

Comments

@timotheecour
Copy link
Member

timotheecour commented May 10, 2020

TLDR: Please don't copy paste code. Refactor and keep code DRY. Nim makes it really easy thanks to templates + friends.

findExe and getAppFilename are full of bugs, that keep popping up in lots of unexpected ways, eg in #14285 using import std/browsers instead of import browsers caused CI failures because of that (I'm fixing that aspect in #14291)

findExe is full of bugs

  • bug 1
  • inside checkCurrentDir, it doesn't resolve symlinks regardless of followSymlinks
  • bug 2
  • the doc is misleading: Searches for exe in the current directory and then ... is not true on posix, where $PWD is not searched if / notin exe
  • bug 3
  • findExe tries to account for platform specific behavior but does something surprising and unique; eg on posix, I'd expect this:
# correct:
if '/' in exe:
  if existsFile(exe): return exe else: return ""
else:
  resolve in PATH

# findExe behavior:
if '/' in exe:
  if existsFile(exe): return exe
resolve in PATH

to match the behavior of the shell when you run program args (or likewise, the behavior of which)

  • bug 4
  • broken behavior with files containing multiple extensions, eg:
    findExe("src/main", extensions = [".out"]) # ok, finds src/main.out
    findExe("src/main.nim", extensions = [".out"]) # bug, doesn't find src/main.nim.out
    or on windows (IIRC):
    findExe("nim") # ok, finds nim.exe
    findExe("nim.temp") # bug, doesn't find nim.temp.exe
  • bug 5
    on windows:
    PATH=\baz
    findExe(r"\foo\bar")
    => would return \baz\foo\bar
    I don't think that's correct since \foo\bar is absolute, it shouldn't prepend it with PATH prefixes

  • bug 6
    followSymlinks is not observed on windows even if using a version of windows that supports symlinks (the docs say: If the system supports symlinks)

  • bug 7
    findExe duplicates the code for expanding symlinks, instead of calling expandFilename (or recursively expandSymlink), or fixing whatever needs fixing in expandSymlink if there's anything to fix for it to use it; also it does so with small differences... and with un-necessary allocations

  • bug 8
    findExe (on posix at least) returns the 1st existing file in PATH regardless whether it's an executable; this is not what the shell does (nor what which does); these only return a file if it's an executable

  • bug 9
    on posix at least, empty entries in PATH are treated as current directory (which is inconsistent with other shell tools and silly as it makes PATH=$PATH:$DIR error prone in case PATH wasn't already set); however that's the correct expected behavior (eg https://www.zsh.org/mla/users/2006/msg00519.html); findExe however incorrectly skips over empty PATH entries, which is a bug (other code is correct in that respect, eg getApplOpenBsd and getApplHeuristic)

The real bug here though is that this should be abstracted over via an iterator pathItems(): string and that code should be reused, not duplicated

  • bug 10
    expandTilde should not be called inside findExe when iterating over PATH candidates; eg PATH='~/temp/' foo doesn't expand the ~ (it only expands if you pass PATH=~/temp/ foo but that's done by the shell, before it even reaches the binary)

There are probably more bugs in that function, I haven't investigated

getAppFilename is full of bugs

  • bug 2.1 with getApplHeuristic (called by getAppFilename)
mkdir bar
touch bar/baz
# export getApplHeuristic and enable it for your OS (eg osx)
echo 'import os; echo getApplHeuristic()' | nim c -o:baz -
PATH=bar ./baz # prints `bar/baz` instead of `./baz`
  • bug 2.2 with getApplHeuristic (called by getAppFilename)
  • getApplHeuristic doesn't resolve symlinks, violating the contract in getAppFilename
  • bug 2.3 with getApplOpenBsd (called by getAppFilename)
  • implementation should be improved; it should reuse code (eg absolutePath) and refactor its code with other code doing something very similar to avoid duplication; chances are the differences should be either none or very small, and a when defined(x) would handle that
  • bug 2.4 implementation of getAppFilename is bad
  • expandFilename is called many times in some code paths (but not others, which is a bug); instead it should be called a single time in a way that captures all code paths. For example, this makes it really hard to update getAppFilename to make resolving symlinks optional
  • bug 2.4 doc of getApplHeuristic is wrong

POSIX guaranties that this contains the executable as it has been executed by the calling process

there is no such guarantee, it's merely a convention that doesn't have to be respected.

  • bug 2.5 getApplOpenBsd finds 1st existing file regardless whether it's an executable
    similar to bug 8 for findExe

Possible Solution

Please don't copy paste code, refactor instead. It's IMO the number 1 reason of bugs and hard to maintain/evolve code.

  • there's a lot of logic being inlined instead of reused. So you end up with N slightly different versions of copy pasted code instead of a single one that can be fixed in 1 place. Code reuse would reduce the number of bugs and make it easier to fix them.

  • bugs that can be fixed in findExe and getAppFilename without introducing problematic breaking changes should be fixed in those

  • a new proc which should be introduced, as a replacement for findExe (to be deprecated); without inheriting their bugs nor the cruft; eg: it shouldn't resolve symlinks. That's just standard separation of concerns; the user can just resolve them by 1 function call if they want.

Additional Information

example past bugs caused by findExe and getAppFilename

links

https://github.com/gpakosz/whereami seems relevant (but I'm not sure it's needed to solve those bugs); wai_getModulePath seems more interesting, for another use case

@mratsim
Copy link
Collaborator

mratsim commented May 10, 2020

You're missing that findExe cannot find nimble if the path was set with bash windows instead of Powershell: https://github.com/mratsim/weave/blob/46cf3232/azure-pipelines.yml#L173-L186

I've easily lost about 20 hours trying to deal with both Nimble (i.e. findExe and Azure) on various repos. Note that Azure also is the source of many problems with their flaky path breakages from the past 6 months but nimble handling of path should ideally be based on the shell not the OS.

@euantorano
Copy link
Contributor

Note that the OpenBSD implementation used to check if a file was executable, but doesn't do so now - this is because NimScript refused to work with isExecutable. See here for the blame.

@timotheecour
Copy link
Member Author

timotheecour commented May 11, 2020

this is because NimScript refused to work with isExecutable

nimscript can be fixed :-) it's simply a matter of adding a vmops for it

@Araq
Copy link
Member

Araq commented May 11, 2020

Well I'm all for better code but the issues you linked are all BSD specific and BSD follows Unix's tradition in making getAppFilename effectively impossible to do reliably. Instead the prefix should be compiled into the binary (!) or something like that.

@timotheecour
Copy link
Member Author

the issues you linked are all BSD specific

that's not true, several of these issues affect windows, several affect posix, and yes, several affect BSD too

making getAppFilename effectively impossible to do reliably

most of these bugs are fixable (just a matter of doing the work)

Instead the prefix should be compiled into the binary (!) or something like that.

I don't see how that could work (eg for a binary not under nim's control); or maybe I'm misunderstanding this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OS (stdlib) Related to OS modules in standard library
Projects
None yet
Development

No branches or pull requests

4 participants