Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to read large files? Is there a way to read them in a streaming? #237

Open
Zzaniu opened this issue Nov 1, 2024 · 12 comments
Open

How to read large files? Is there a way to read them in a streaming? #237

Zzaniu opened this issue Nov 1, 2024 · 12 comments

Comments

@Zzaniu
Copy link

Zzaniu commented Nov 1, 2024

Big file read, then memory burst

let mut book = umya_spreadsheet::reader::xlsx::lazy_read(self.file_path.as_ref())?;
let cell = book
            .get_lazy_read_sheet_cells(&sheet_index)
            .map_err(|e| anyhow!("{e}"))?;
let v = cell.get_cell_value((1, index)).get_value();
...
memory allocation of 17867735056 bytes failed
@Zzaniu
Copy link
Author

Zzaniu commented Nov 1, 2024

I solved this problem with calamine, but calamine can only be read and not written. I really hope that umya_spreadsheet will solve this problem

@MathNya
Copy link
Owner

MathNya commented Nov 5, 2024

@Zzaniu
Thank you for contacting us.
We are sorry, but we may not be able to meet your expectations.
We are aware that umya-spreadsheet consumes more memory than other libraries.
However, we have not found a solution at this time.
I think a solution will be quite a while away.

@BharathIO
Copy link

BharathIO commented Nov 6, 2024

@MathNya I am also looking for same. Any work arounds at this moment to support streaming? Appreciate your response

I specifically need Streaming while writing rows to Excel workbook sheet,

@MathNya
Copy link
Owner

MathNya commented Nov 8, 2024

@BharathIO
When umya-spreadsheet updates a cell, it deserializes all cells in the sheet.
Because of this implementation, it is not currently possible to achieve the expected behavior.

@BharathIO
Copy link

Ok. Thanks for the update @MathNya .

When i am writing around 17k records into sheet, i observed it is consuming more memory. Any way i can do to use it less memory? Please share your thoughts

I observed it consumed 800 to 900 MB of RAM while writing 17k records

@BharathIO
Copy link

BharathIO commented Nov 11, 2024

Is there any way to write custom serializer/deserializer for my usecase to process 17k records?

@schungx
Copy link
Contributor

schungx commented Nov 12, 2024

17k records in memory is probably going to take large amounts of RAM by themselves.

@BharathIO
Copy link

But when i observed, while using other libraries like xlsxwriter or so, it did not consume large amount of RAM.

umya-spreadsheet library has lot more features compared to other libraries, only issue is with large amount of RAM consumption and high CPU Utilization. Any workarounds at this moment?

@schungx
Copy link
Contributor

schungx commented Nov 12, 2024

umya-spreadsheet library has lot more features compared to other libraries, only issue is with large amount of RAM consumption and high CPU Utilization. Any workarounds at this moment?

I believe there are venues to reduce the memory footprint of many data types, but essentially more features = more data types to keep track of. Therefore it is not always avoidable.

@schungx
Copy link
Contributor

schungx commented Nov 16, 2024

Try my PR to see how much it reduces...

#242

@BharathIO
Copy link

BharathIO commented Nov 19, 2024

Try my PR to see how much it reduces...

#242

Great, i could see a bigger change now in terms of Memory & CPU utilization. I will validate few more use cases and post my observations here.

@MathNya MathNya added this to the [milestone]Version3.0.0 milestone Nov 22, 2024
@schungx
Copy link
Contributor

schungx commented Nov 27, 2024

@BharathIO would be interested to know the memory usage in the new version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants