-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filter Not working? #251
Comments
Your MCVE did not run at all. But once adapted like so: <?php
require __DIR__ . '/vendor/autoload.php';
use PhpOffice\PhpSpreadsheet\Spreadsheet;
use PhpOffice\PhpSpreadsheet\Writer\Xlsx;
class MyReadFilter implements \PhpOffice\PhpSpreadsheet\Reader\IReadFilter
{
private $startRow = 0;
private $endRow = 0;
private $columns = [];
/** Get the list of rows and columns to read */
public function __construct($startRow, $endRow, $columns)
{
$this->startRow = $startRow;
$this->endRow = $endRow;
$this->columns = $columns;
}
public function readCell($column, $row, $worksheetName = '')
{
// Only read the rows and columns that were configured
if ($row >= $this->startRow && $row <= $this->endRow) {
if (in_array($column, $this->columns)) {
return true;
}
}
return false;
}
}
// Create new Spreadsheet on disk
$spreadsheet = new Spreadsheet();
foreach (range('A', 'C') as $column) {
foreach (range(1, 3) as $row) {
$coord = $column . $row;
$spreadsheet->getActiveSheet()->setCellValue($coord, $coord);
}
}
$originalFilename = '/tmp/original.xlsx';
$writer = new Xlsx($spreadsheet);
$writer->save($originalFilename);
// Reload spreadsheet from disk with filter
$reader = new \PhpOffice\PhpSpreadsheet\Reader\Xlsx();
$filterSubset = new MyReadFilter(2, 2, range('A', 'B'));
$reader->setReadFilter($filterSubset);
$spreadsheet = $reader->load($originalFilename);
foreach ($spreadsheet->getActiveSheet()->getRowIterator() as $row) {
$cellIterator = $row->getCellIterator();
$cellIterator->setIterateOnlyExistingCells(false);
foreach ($cellIterator as $cell) {
var_dump([$cell->getCoordinate(), $cell->getValue()]);
}
} It will then correctly ouput the following, effectively ignoring anything higher than row 2 and column B, and showing NULL values for rows lower than 2:
So the filter is working as intended as far as I can tell. |
Well, that's what was exactly in the docs. Besides that, it seems counter intuitive that ignoring the rows just makes them null. Anyway, thanks for answering at least. I will have to just loop and not use the bad rows. |
The problem is that if you want to make chunks of a large excel for the memory issue, and you are filling all the object inside for any row that you are filtering, at the end I have the issue again In my case, I divided the excel in parts of 5000 rows, and each it's a complete new petition to the server. At the end of the first petition it consumes 208MB, at the end of the second (5000 nulls + 5000 useful rows) 412MB, at the end of the third 658MB... and each starts from 0. I think I should have a way to ask only the rows of the filter. |
That memory shouldn't be consumed if you use a filter, cells outside the filter range are not loaded into memory, so they don't consume memory, unless you then try to access any of those cells.... they don't contain nulls, they do not exist in memory at all. Note that using What you're suggesting should happen is basically asking the loader to change the addresses of the cells that are actually loaded, so if rows 1-10 are filtered, then all cells in row 11 actually becomes cells in rows 1; i.e. cell A11 is physically re-addresses to become cell A1.... that can create a lot of additional loader overhead. |
First, thanks for the option of the I get your point regarding formulas, but if I'm using a Filter and there are formulas that include cells in the area outside the filter, the formula will give a wrong number anyway. So, it will be my problem to solve that (changing the Filter or whatever). What I mean is that you are giving a wonderful utility to minimize the memory use and at the same time using some of that space because the Cells should continue existing. In fact, it's interesting that if my filter is for the first 100 rows, the row 101 doesn't exist and you don't care if a formula in the cell A100 asks for the A101, but if I'm reading the second 100 rows, then the first group of 100 should exist because of any formula... Sorry if I'm not expressing right |
That would always be a problem; but in the example I cited for a formula in a cell inside the filtered range that references other cells that are also inside the filtered range, it would need to be updated otherwise I'd get 10x the number of complaints for wrong values; and with the differences between relative and absolute references, it might not be possible to update it; and any update would be a processing overhead, slowing the load time. The general principle behind filtering was that if the filter said to load rows 101-200, then it would load only those rows, and they would still be loaded as rows addressable as 101-200... that seemed intuitive at the time, and I still believe that is how it should work. My biggest mistake was that |
This is:
What is the expected behavior?
Using the code in the docs, this should only display rows 3 - 5.
What is the current behavior?
This outputs all rows up to 5.
What are the steps to reproduce?
Please provide a Minimal, Complete, and Verifiable example of code that exhibits the issue without relying on an external Excel file or a web server:
Output displays 1st 5 rows. The first 2 are garbage and that is what I'm trying to get rid of.
Which versions of PhpSpreadsheet and PHP are affected?
1.0.0 beta
The text was updated successfully, but these errors were encountered: