Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tab-completion in REPL hangs #537

Closed
daniel-thom opened this issue Nov 22, 2019 · 21 comments
Closed

Tab-completion in REPL hangs #537

daniel-thom opened this issue Nov 22, 2019 · 21 comments

Comments

@daniel-thom
Copy link

daniel-thom commented Nov 22, 2019

I narrowed this problem down to v0.5.14 of the CSV package. More specifically, commit 745bb25. v0.5.13 does not exhibit the issue.

Test:

  1. Enter REPL
  2. import CSV
  3. CSV.Fi
  4. Press tab

Result:
Auto-completing to CSV.File takes more than 20 seconds. I ran top while this occurs and observed the CPUs spiking to 100% utilization.

I saw that commit 745bb25 added the FilePathsBase package. I commented out references to the structs you're using from that package (AbstractPath and SystemPath) and the problem went away. Interestingly, the issue did not occur if I repeated the test by only importing the FilePathsBase package.

Config:
Julia 1.2
Arch Linux version 2019.11.01

@nalimilan
Copy link
Member

I cannot reproduce with 0.5.16. Are you using the standard Julia REPL? Do you have many files in your current folder by chance?

@quinnj
Copy link
Member

quinnj commented Nov 28, 2019

I think this is an issue with Julia 1.2; please try with the newly released Julia 1.3 and I believe it should be resolved.

@nalimilan
Copy link
Member

I should have said I tried on 1.2.

@daniel-thom
Copy link
Author

The behavior seems about the same in Julia 1.3. The problem also goes away when I remove imports from FilePathsBase.

@nalimilan I tried in an empty directory as well as a local clone of CSV.jl. I'm using the standard Julia REPL.

Let me know if I can try anything else to help you reproduce. FWIW, a few of my colleagues were seeing the same problem until we downgraded to v0.5.13. At least one person said the hang was longer than two minutes.

@daniel-thom
Copy link
Author

Here is a clue that could help. I restarted with a clean install of Julia 1.3. With just CSV installed in the global package space the test with CSV.Fi and then tab took ~6 seconds. I then added DataFrames and repeated the test. It took ~28 seconds. I tried this a few times and it was very repeatable. If other people have more packages installed then that could explain the even longer times.

@nalimilan
Copy link
Member

That's really weird since DataFrames is a dependency of CSV already, so it should be loaded (in the background) disregarding the fact that you install it directly or not.

@EricForgy
Copy link

From Mason Potter on Julia Slack...

(Link will be invalid soon, but for now it is here https://julialang.slack.com/archives/C6A044SQH/p1575567273095800?thread_ts=1575565771.092300&cid=C6A044SQH.)

The REPL is written in julia code, so it would normally be JIT compiled, but to get rid of the compilation time, we actually statically compile a bunch of it's methods into the julia sysimage. CSV.jl commits some type piracy weird stuff that invalidates a bunch of those statically compiled methods. It casues all sorts of REPL slowdowns as soon as it's loaded because now REPL functions need to go through the JIT.

@nalimilan
Copy link
Member

I don't think that's related: 20s is really too much for recompilation of some methods. And why wouldn't it be reproducible on other machines?

@tk3369
Copy link
Contributor

tk3369 commented Dec 8, 2019

I have the same problem with julia 1.3.0 on Mac.

  1. Start REPL
  2. Type cd("~/Down" then hit TAB key. It expands to my full path to Downloads folder right away.
  3. Type using DataFrames. Do the same thing. It expands within 1-2 seconds.
  4. Type using CSV. Do the same thing. It expands in about 30 seconds.
  5. Do that again. It expands instantly.

@nalimilan
Copy link
Member

Then it really looks like loading CSV invalidates some compiled methods. That shouldn't take 30s, though. Can you try loading CSV's dependencies progressively to see what dependency or combination of dependencies triggers the problem? For example, it could be an interaction between FilePathsBase and CategoricalArrays.

@tk3369
Copy link
Contributor

tk3369 commented Dec 8, 2019

I've tried many combinations. It looks like the culprit is the combination of DataFrames and FilePathBase. The TAB key for this test took a little less time ~24 seconds.

julia> using DataFrames, FilePathsBase

julia> cd("~/Down

My versions:

CSV v0.5.18
DataFrames v0.20.0
FilePathsBase v0.7.0

@davidanthoff
Copy link

Maybe join is the culprit?

@quinnj
Copy link
Member

quinnj commented Dec 22, 2019

Reports from Base Julia is that this issue is fixed on Julia 1.3.1 and master; please comment here if you still see issues on those versions. Reference: JuliaLang/julia#34098 (comment)

@quinnj quinnj closed this as completed Dec 22, 2019
@evanfields
Copy link

@quinnj As requested commenting here because the issue seems to persist on 1.3.1 release: JuliaLang/julia#34098 (comment)

@KristofferC
Copy link
Contributor

Yep still happens

@KristofferC
Copy link
Contributor

KristofferC commented Jan 25, 2020

Can anyone confirm that commenting out this join method:

https://github.com/JuliaData/DataFrames.jl/blob/e8e5cb5f40fbafa46927c19c00270f6ea3caff74/src/abstractdataframe/join.jl#L309-L431

makes the problem pretty much go away?

In addition, the same is true for the join in FilePathsBase (commenting out the AbstractString argument type in the splatted argument also seems to fix it):

https://github.com/rofinn/FilePathsBase.jl/blob/37ce645ca40de031ac7fd7f05274bfdd1f1edd69/src/path.jl#L230-L242

So it seems that the join methods are interacting with eachother?

@tk3369
Copy link
Contributor

tk3369 commented Jan 26, 2020

@KristofferC I have replicating it again with latest version of everything. Then, I tried your suggestions.

  1. Commenting out the join function from DataFrames has no effect. The 2-3 second delay is still there when only DataFrame is loaded. The 35 second delay is also still there when both DataFrame and CSV are loaded.

  2. Commenting out the AbstractString part of join function from FilePathsBase with the above change almost eliminated the problem. It went from a 35-second pause to 3 seconds when both DataFrame and CSV are loaded.

  3. Commenting out the AbstractString part of join function from FilePathsBase by itself improves the speed from 35-second pause to 15 seconds when both DataFrame and CSV are loaded.

@KristofferC
Copy link
Contributor

KristofferC commented Jan 27, 2020

Thanks for trying it out. I have seen something similar happen before in Base, see this discussion:

JuliaLang/julia#27874 (comment).

The workaround was in JuliaLang/julia@198e452.

Here it is probably a similar inference cycle within join.

@KristofferC
Copy link
Contributor

I don't see why either DataFrames or FilePathsBase overload join. None of them do what the function docstring says.

@tk3369
Copy link
Contributor

tk3369 commented Jan 28, 2020

Should we reopen this issue? Or perhaps we should just fix DataFrames and FilePathsBase?

@KristofferC
Copy link
Contributor

It is not really a CSV issue since it is reproducible with just DataFrames and FilePathsBase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants