Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible memory leak when reloading a model after disposing of it. #8459

Open
stevexbritton opened this issue Nov 25, 2024 · 7 comments
Open
Assignees
Labels
type:bug Something isn't working

Comments

@stevexbritton
Copy link

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow.js): Modified version of the @tensorflow/tfjs npm page (https://www.npmjs.com/package/@tensorflow/tfjs) example
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): MacOS 16.6.1
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
  • TensorFlow.js installed from (npm or script link): script link
  • TensorFlow.js version (use command below): https://cdn.jsdelivr.net/npm/@tensorflow/tfjs/dist/tf.js
  • Browser version: Chrome 131
  • Tensorflow.js Converter Version:

Describe the current behavior
After loading and using a LayersModel I call model.dispose() and tf.disposeVariables() to release the tf memory. However, if I reload the model to use it again memory is leaked, at least 16k of Array data. This occurs each time around the loop.

Describe the expected behavior
I would not expect a memory leak and would expect it to behave the same as if the model was just reused.

Standalone code to reproduce the issue
The url "https://vykingsneakerkitnative.s3.eu-central-1.amazonaws.com/SteveTest/tmp/tf-leak-test.html" demonstrates the problem.
Steps to demonstrate:

  1. Load the page "https://vykingsneakerkitnative.s3.eu-central-1.amazonaws.com/SteveTest/tmp/tf-leak-test.html" in chrome and open Developer Tools
  2. Click "Run Test1" button.
  3. Garbage collect and take a memory sample.
  4. Click "Run Test1" button again.
  5. Garbage collect and take another memory sample.
  6. Comparing sample 2 with sample 1 shows the "Array" objects has increased by about 16K

Steps to demonstrate model reuse with minimal memory grown

  1. Load the page in chrome and open devtools (or do a page reload)
  2. Click "Run Test2" button.
  3. Garbage collect and take a memory sample.
  4. Click "Run Test2" button again.
  5. Garbage collect and take another memory sample.
  6. Comparing sample 2 with sample 1 shows the "Array" objects has increased only by about 96

Other info / logs Include any logs or source code that would be helpful to
diagnose the problem. If including tracebacks, please include the full
traceback. Large logs and files should be attached.

@stevexbritton stevexbritton added the type:bug Something isn't working label Nov 25, 2024
@shmishra99 shmishra99 self-assigned this Nov 27, 2024
@stevexbritton
Copy link
Author

Hi @shmishra99,
Thank you for assigning this to yourself. Is there any more you need from me to progress this?
Steve

@stevexbritton
Copy link
Author

Hi @shmishra99, I'm confused. Are you investigating this issue or have you assigned it to yourself for some other reason?

@shmishra99
Copy link
Contributor

Hi @stevexbritton ,

Apologies for the late response.

I was testing the link you shared. For me, both runs show the same response without increasing the size of the array in Test 1.

I'm not sure about Test 2. It's giving the same number of tensors with increased tensor values.

For Test1 output:

image

For Test2 output:

image

Can you please confirm if I'm getting the same output as you are, and how this could be a case of a memory leak?

Please let me know if I'm missing anything. Thank you!

@stevexbritton
Copy link
Author

Hi,
It's not the number of tensors left in the GPU that's the issue. It's the number of Javascript Array objects that are not garbage collected.
This image shows the procedure for Test1 holding onto 16K Array objects.
Screenshot 2024-12-13 at 13 22 03
This image shows the procedure for Test2 only holding onto 96 Array objects and the only difference is we do not dispose of and reload the model for each iteration..
Screenshot 2024-12-13 at 13 28 08

@stevexbritton
Copy link
Author

Any further updates?

@stevexbritton
Copy link
Author

Hi @shmishra99, would you please respond to this, even if it's to say you can't look it at the moment. I don't think just ignoring it is acceptable once you've assigned it to yourself.

@shmishra99
Copy link
Contributor

Hi @stevexbritton ,

Thank you for reaching out. I apologize for the delay in responding. I've tested the code snippet you provided, and I've noticed that the array size is increasing with each run, even after disposing of the model and tensors. Your code flow seems correct to me, but I am not sure why this is happening. I will discuss this issue internally next week and provide an update.

Here is the console snapshot after each run:

Snapshot1:

image

Snapshot2:

image

Snapshot3:

image

Thank You!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants