Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Secor is leaking memory through DeleteOnExit #785

Closed
giner opened this issue Jun 18, 2019 · 10 comments · Fixed by #1449
Closed

Secor is leaking memory through DeleteOnExit #785

giner opened this issue Jun 18, 2019 · 10 comments · Fixed by #1449

Comments

@giner
Copy link
Contributor

giner commented Jun 18, 2019

Secor is exhausting the while heap over time and either crashes or starts spinning due to GC trying to free memory very frequently. Memory is accumulated in the DeleteOnExitHook by DeleteOnExit method. We are running Secor v0.25 and the related code seems to be the same in the latest version (which is v0.27 at the moment of writing).

secor - memory usage over time
secor - memory dump analysis 1
secor - memory dump analysis 2

@dovka
Copy link
Contributor

dovka commented Oct 2, 2019

We are getting the same problem - every few days secor crashes with identical Memory and CPu patterns. Filed under #964

@yehoshoua
Copy link

any updates ?

@HenryCaiHaiying
Copy link
Contributor

This is interesting. If you can reproduce the problem, can you turn on the debug logging category for FileRegistry.java which will prints out all the files added for deleteOnExit.

It would be interesting to see why there will be so many files added, we have never seen the memory problem on DeleteOnExitHook before and DeleteOnExitHook doesn't have a method to clear the map either. One suspicion I have is the file size is set to too small (secor.max.file.size.bytes and secor.max.file.age.seconds) so it ends up creating too many files.

@pdambrauskas
Copy link
Contributor

pdambrauskas commented Jul 7, 2020

I think I have the same problem.
In my case, I have custom MessageParser implementation, which adds multi-level partitions, something like: /owner=1/workspace=2/event=click/dt=2020-07-07 in that case, we can get really big number of files and storing them in compressed parquet ads up to even bigger memory consumption.

Any ideas on how to approach this problem?
Maybe we should pause Kafka consumers, when we reach X number of files, upload those files and resume? But in that case we'll potentially have many small files 🤔. We could also partition kafka topic by same values as we partition our data on destination system, in that case consumed data would appear on same consumer thread and that would help us to have smaller number of files?

@HenryCaiHaiying
Copy link
Contributor

Looks good to me, I left a small comment in the code for backward compatible behavior. And I don't think we should sort the files since that would change the file uploading order, some people might not like that.

@HenryCaiHaiying
Copy link
Contributor

@giner @dovka @yehoshoua Do you guys want to try #1443?

@dovka
Copy link
Contributor

dovka commented Jul 9, 2020

Thank you Henry.
is #1443 parameter going to limit the throughout?

Which release it will be available?

@HenryCaiHaiying
Copy link
Contributor

I haven't cut any release yet since this PR is just landed. I think you can just pull that PR's classes into your local build to try it out, it's a very simple PR. I don't think it will affect the upload throughput since most of your files are probably small files and you can tune the upload threshold parameter.

@giner
Copy link
Contributor Author

giner commented Jul 10, 2020

I'm wondering why using deleteOnExit at all? It could be fine if there was a limited number of files created during the process lifetime hover it's not the case with Secor and deleteOnExit doesn't release memory until the process is restarted.

@pdambrauskas
Copy link
Contributor

pdambrauskas commented Jul 10, 2020

Yes, my change solves only one part of the problem, it helps to keep FileRegistry HashMaps sizes under controll:

private HashMap<TopicPartitionGroup, HashSet<LogFilePath>> mFiles;
private HashMap<LogFilePath, FileWriter> mWriters;
private HashMap<LogFilePath, Long> mCreationTimes;

that was enough for me, to keep secor running without OOM for now.

It does not affect DeleteFileOnExit though.
It would be nice to use FileRegistry on the hook, so we have single list of active files, but since the hook is static, we cannot access that list :(.

What is the idea behind DeleteOnExit tracking list of files? Wouldn't it be sufficient to delete all staged files on exit?

Edit:*
I just realized, that DeleteOnExitHook is part of java.io. I guess, it shouldn't be used for long-running apps. I'll try to replace it with simple cleanup of staging folder.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants