Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FileBox.fromStream() should only be used once, and it must throw if be used again. #50

Closed
windmemory opened this issue Dec 4, 2020 · 13 comments
Labels
enhancement New feature or request

Comments

@windmemory
Copy link
Contributor

Seems like the FileBox in Stream type does not behave the same as all the other types, since the data for this type of FileBox is lazy loaded and can only be transferred once. So how might we support multiple operations on the Stream FileBox?

I created a unit test to test this case and currently it is failed. Let's make it pass.

@windmemory
Copy link
Contributor Author

Solution 1

We can add a flag to Stream type FileBox when doing operations, like persist, which means this will save the stream data locally somewhere(TBD, memory or disk), then the FileBox can support more operations. If this flag is not set, the data can only be access once, and future call on this FileBox related to data, will throw an exception.

@huan
Copy link
Owner

huan commented Dec 4, 2020

I believe that this is the expected behavior of the node stream design.

Why do we need to use a stream twice, and could we save it to reuse next time so that we can just let the stream passed by?

@windmemory
Copy link
Contributor Author

I agree this is the expected behavior of the node stream, but this should not be the expected behavior for FileBox. Since FileBox does not include only Stream files. Sometimes when user get a FileBox, they don't know what type is it, and they just use it in their code as a normal FileBox. If the Stream type FileBox behaves differently from all the other types, this will cause confusion, also this will be very hard for developer to debug. Since I just debugged this bug in wechaty-puppet-hostie, the behavior of this bug is that I got an empty file.

I believe that the design for FileBox is aim to provide a class for developers to manipulate different kind of files with ease, so we might want to align the Stream FileBox's behavior to all the other types.

@huan
Copy link
Owner

huan commented Dec 4, 2020

Yes, you are right, the file box behavior should be consist.

The best solution I could think right now, is that all the typed of the file box can be only used once.

If the file box has been used once, the next time it should throw an error.

The reason for doing this, is that we have to save the stream to memory (or tmp file) if we want to support reusing of the file box.

Just image that we have a stream with 1TB data to transfer from the network (e.g. gRPC), and the file box use fromStream and then use toFile to save it. It is obviously we can not call the toFile twice because all the data were gone.

@windmemory
Copy link
Contributor Author

I agree that if we transfer large data with FileBox, problem will be introduced with multiple operations. But for the current use case, Wechaty is heavily depends on FileBox, and inside Wechaty, the file that we transferred are usually images, videos or files that under 10M, 99% files are under 100M. So for the current use case, allow the FileBox to be used multiple times makes more sense than just 1 time.

If we change the behavior here, the update will definitely break the current code of the developers using FileBox.

@huan
Copy link
Owner

huan commented Dec 4, 2020

I think the streaming type file-box will behave differently from the other types.

We might not be able to reuse a stream file box because of the following thoughts:

  1. When we use FileBox.from(stream) and the stream contains 100MB data, where can we get the data again after the first stream has been consumed?
    1. Should we store all the data in the memory for later use?
    2. If we stored the data in the memory, and no one consumes this data again, then we will cost lots of memory before it can be garbage collected, and if the bot receives lots of files (e.g. 10 x 100MB videos), then it will easily meet an OOM for a 1GB ram container.
  2. What if we received a 1TB stream data? I agree with what you said is true: most of the time, 99% of files are under 100M. However:
    1. what can we do when we received the 1% probability 1TB stream data?
    2. If we have no good to deal with it, then this is not an acceptable solution.

So for now, the solution that I can imagine is to:

  1. For the non-stream type of the file-box, it can be reused
  2. for the stream type of the file-box, it can be only used once. If it is used the second time, it will through exception.

But the above workaround will cause non-consists behavior for the FileBox system, so I'd like to always choose an "always can be able to use once" straitegy for the FileBox.

The above is just my draft after start thinking about this problem, please feel free to let me know if you have a better solution, thanks.

@windmemory
Copy link
Contributor Author

  • When we use FileBox.from(stream) and the stream contains 100MB data, where can we get the data again after the first stream has been consumed?

    1. Should we store all the data in the memory for later use?
    2. If we stored the data in the memory, and no one consumes this data again, then we will cost lots of memory before it can be garbage collected, and if the bot receives lots of files (e.g. 10 x 100MB videos), then it will easily meet an OOM for a 1GB ram container.
  • What if we received a 1TB stream data? I agree with what you said is true: most of the time, 99% of files are under 100M. However:

    1. what can we do when we received the 1% probability 1TB stream data?
    2. If we have no good to deal with it, then this is not an acceptable solution.

Yes, you are right, seems like it is pretty hard to support reusable Stream FileBox.

We should at least throw an exception when the toFile() or toBuffer() etc is called the second times for Stream FileBox.

@huan
Copy link
Owner

huan commented Dec 4, 2020

I'm glad that you agree with me, and the exception has already added by the last commit yesterday, please upgrade to the latest version of the file box to test it.

@huan
Copy link
Owner

huan commented Dec 4, 2020

One more thing: do you think we should always throw error when the file box has been used twice, no matter than what type it is? Because as you said, the box might be any type, and the user should get a consist behavior for it.

@huan huan added the enhancement New feature or request label Dec 5, 2020
@huan huan changed the title Stream FileBox does not support multiple operations on it FileBox.fromStream() should only be used once, and it must throw if be used again. Dec 5, 2020
@windmemory
Copy link
Contributor Author

One more thing: do you think we should always throw error when the file box has been used twice, no matter than what type it is? Because as you said, the box might be any type, and the user should get a consist behavior for it.

Seems like this is a workaround. What do you think about this design below:

We add another method into FileBox called isConsumable(), for Stream type, this will only return true before the first consume, but for all the other types, this will always return true. Thus developers could use this to check the status of the FileBox before they consume it.

@huan
Copy link
Owner

huan commented Dec 9, 2020

I believe the user who is using the FileBox instance should take the responsibility to remember whether this instance has been consumed, and they should not consume the FileBox again because the FileBox has already finished its mission.

If anyone wants to get a reusable filehandle, they should save the FileBox to a local temp file so that it can be reused later.

@helperAI
Copy link

haha, I just encounter this problem: UnhandledPromiseRejectionWarning: Error: The stream has already been consumed once, and now it was destroyed. See: #50
Thanks for the clear explanation.
It seems I have to save as a temp and read locally by myself for later use.

@sargeto
Copy link

sargeto commented Jan 15, 2024

Yy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants