Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 629 - Use a generator for Cells::getAllCacheKeys to improve performance #822

Closed
wants to merge 1 commit into from
Closed

Issue 629 - Use a generator for Cells::getAllCacheKeys to improve performance #822

wants to merge 1 commit into from

Conversation

matt-allan
Copy link

@matt-allan matt-allan commented Dec 18, 2018

Using a generator reduces memory usage and improves performance
when loading large spreadsheets.

This is:

- [x] a bugfix
- [ ] a new feature

Checklist:

Why this change is needed?

PHPSpreadsheet currently uses a lot of memory when loading large spreadsheets (see #648, #629). All of the coordinates in the spreadsheet are copied into a new array when the method Cells::getAllCacheKeys is called. Since the coordinates are concatenated with a new string this results in new strings being created in memory.

This result of this method is only ever passed to Psr\SimpleCache\Cacheinterface::getMultiple and Psr\SimpleCache\Cacheinterface::setMultiple. Since both of these methods accept an iterable we can use a generator instead. Using a generator means we don't need to build the entire array in memory and instead can return one value at a time as needed.

I benchmarked this with a ~100K row spreadsheet and saw a 16% improvement in run time and 20% improvement in memory usage. You can view a comparison here.

You can also view the individual profiles here:

This is the benchmark script I used:

<?php

require __DIR__ . '/vendor/autoload.php';

$reader = new \PhpOffice\PhpSpreadsheet\Reader\Xlsx();

$spreadsheet = $reader->load(__DIR__ . '/my_export_100k.xlsx');

I also attached the xlsx if you would like to run the benchmark yourself.

Using a generator reduces memory usage and improves performance
when loading large spreadsheets.
@PowerKiKi
Copy link
Member

Excellent PR, thanks !

guillaume-ro-fr pushed a commit to guillaume-ro-fr/PhpSpreadsheet that referenced this pull request Jun 12, 2019
Using a generator reduces memory usage and improves performance
when loading large spreadsheets.

Closes PHPOffice#822
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants