Skip to content

Commit

Permalink
Xlsx Reader Namespace Aware Handling of Drawings, RowAndColumnAttribu…
Browse files Browse the repository at this point in the history
…tes, MergeCells

Fix PHPOffice#1482 (actually fix a problem recently attached to that ticket long after it closed). There were problems processing a spreadsheet generated by third party software. That spreadsheet used unexpected namespacing, and absolute paths within the zip file where relative paths were expected.

Xlsx Reader handles most, but not all, of its processing in a namespace-aware manner. Two versions of a worksheet's xml are available - `$xmlSheet` is not namespace aware and `$xmlSheetNS` is aware. This was necessary in order to add namespace support in an incremental manner. The primary reason to continue to use the unaware version is the absence of test cases. In particular, drawings, row and column attributes, and merge cells continue to use the unaware version; this PR changes those to use the aware version.

As noted in the summary above, a couple of new places in the handling of the those items were expecting file locations to be specified as relative paths in the zip file, but the file used absolute paths instead. Those unexpected usages are now addressed.

The user reporting the new problem tried a change which effectively made all uses of `$xmlSheet` namespace aware, and that seemed helpful. It may be time eliminate its usage altogether, whether or not we have appropriate examples of unexpected namespaces to test with. I will not do that with this change, but I may add a new PR to do so after this one is merged. Remaining areas which still use the unaware version include conditional formatting (internal or external), sheet view options, sheet protection, auto filters, unparsed loaded data, data validation (internal or external), alternate content, and header/footer images.

There is an interesting anomaly with the new test file. When I load it and save it, the appearance of the output file does not quite match the input. Oddly, the output file seems much better than the input - the picture no longer covers any data, for example. This is because, in particular, the output file row heights and column widths seem to match the xml, but the input file does not. For example, the xml in both files seems to indicate that row 5 should have a height of 234, which it does in the output file, but the height of that row when the input file is opened is 156. It appears that all row heights and column widths when the input file is opened are very close to 2/3 of what is expected. I will continue to research that anomaly for a few days, but I will not let it prevent me from moving forward with this PR if I don't find the explanation. Whatever that problem is, it seems distinct from the namespacing/pathing problems which the PR addresses.
  • Loading branch information
oleibman committed Oct 23, 2022
1 parent f1d73d8 commit d0ffd82
Show file tree
Hide file tree
Showing 6 changed files with 241 additions and 46 deletions.
13 changes: 13 additions & 0 deletions samples/Basic/27_Images_Xlsx.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
<?php

use PhpOffice\PhpSpreadsheet\IOFactory;

require __DIR__ . '/../Header.php';

// Read from Xlsx (.xls) template
$helper->log('Load Xlsx template file');
$reader = IOFactory::createReader('Xlsx');
$spreadsheet = $reader->load(__DIR__ . '/../templates/27template.xlsx');

// Save
$helper->write($spreadsheet, __FILE__);
Binary file added samples/templates/27template.xlsx
Binary file not shown.
26 changes: 18 additions & 8 deletions src/PhpSpreadsheet/Reader/Xlsx.php
Original file line number Diff line number Diff line change
Expand Up @@ -763,7 +763,7 @@ protected function loadSpreadsheetFromFile(string $filename): Spreadsheet
$sheetViewOptions = new SheetViewOptions($docSheet, $xmlSheet);
$sheetViewOptions->load($this->getReadDataOnly(), $this->styleReader);

(new ColumnAndRowAttributes($docSheet, $xmlSheet))
(new ColumnAndRowAttributes($docSheet, $xmlSheetNS))
->load($this->getReadFilter(), $this->getReadDataOnly());
}

Expand Down Expand Up @@ -913,11 +913,12 @@ protected function loadSpreadsheetFromFile(string $filename): Spreadsheet
$this->readTables($xmlSheet, $docSheet, $dir, $fileWorksheet, $zip);
}

if ($xmlSheet && $xmlSheet->mergeCells && $xmlSheet->mergeCells->mergeCell && !$this->readDataOnly) {
foreach ($xmlSheet->mergeCells->mergeCell as $mergeCell) {
$mergeRef = (string) $mergeCell['ref'];
if ($xmlSheetNS && $xmlSheetNS->mergeCells && $xmlSheetNS->mergeCells->mergeCell && !$this->readDataOnly) {
foreach ($xmlSheetNS->mergeCells->mergeCell as $mergeCellx) {
$mergeCell = $mergeCellx->attributes();
$mergeRef = (string) ($mergeCell['ref'] ?? '');
if (strpos($mergeRef, ':') !== false) {
$docSheet->mergeCells((string) $mergeCell['ref'], Worksheet::MERGE_CELL_CONTENT_HIDE);
$docSheet->mergeCells($mergeRef, Worksheet::MERGE_CELL_CONTENT_HIDE);
}
}
}
Expand Down Expand Up @@ -1229,6 +1230,9 @@ protected function loadSpreadsheetFromFile(string $filename): Spreadsheet
if (substr($drawingFilename, 0, 7) === 'xl//xl/') {
$drawingFilename = substr($drawingFilename, 4);
}
if (substr($drawingFilename, 0, 8) === '/xl//xl/') {
$drawingFilename = substr($drawingFilename, 5);
}
if ($zip->locateName($drawingFilename)) {
$relsWorksheet = $this->loadZipNoNamespace($drawingFilename, Namespaces::RELATIONSHIPS);
$drawings = [];
Expand All @@ -1243,10 +1247,10 @@ protected function loadSpreadsheetFromFile(string $filename): Spreadsheet
}
}

if ($xmlSheet->drawing && !$this->readDataOnly) {
if ($xmlSheetNS->drawing && !$this->readDataOnly) {
$unparsedDrawings = [];
$fileDrawing = null;
foreach ($xmlSheet->drawing as $drawing) {
foreach ($xmlSheetNS->drawing as $drawing) {
$drawingRelId = (string) self::getArrayItem(self::getAttributes($drawing, $xmlNamespaceBase), 'id');
$fileDrawing = $drawings[$drawingRelId];
$drawingFilename = dirname($fileDrawing) . '/_rels/' . basename($fileDrawing) . '.rels';
Expand All @@ -1261,7 +1265,13 @@ protected function loadSpreadsheetFromFile(string $filename): Spreadsheet
$hyperlinks[(string) $ele['Id']] = (string) $ele['Target'];
}
if ($eleType === "$xmlNamespaceBase/image") {
$images[(string) $ele['Id']] = self::dirAdd($fileDrawing, $ele['Target']);
$eleTarget = (string) $ele['Target'];
if (substr($eleTarget, 0, 4) === '/xl/') {
$eleTarget = substr($eleTarget, 1);
$images[(string) $ele['Id']] = $eleTarget;
} else {
$images[(string) $ele['Id']] = self::dirAdd($fileDrawing, $eleTarget);
}
} elseif ($eleType === "$xmlNamespaceBase/chart") {
if ($this->includeCharts) {
$eleTarget = (string) $ele['Target'];
Expand Down
85 changes: 47 additions & 38 deletions src/PhpSpreadsheet/Reader/Xlsx/ColumnAndRowAttributes.php
Original file line number Diff line number Diff line change
Expand Up @@ -137,39 +137,45 @@ private function readColumnAttributes(SimpleXMLElement $worksheetCols, bool $rea
{
$columnAttributes = [];

foreach ($worksheetCols->col as $column) {
$startColumn = Coordinate::stringFromColumnIndex((int) $column['min']);
$endColumn = Coordinate::stringFromColumnIndex((int) $column['max']);
++$endColumn;
for ($columnAddress = $startColumn; $columnAddress !== $endColumn; ++$columnAddress) {
$columnAttributes[$columnAddress] = $this->readColumnRangeAttributes($column, $readDataOnly);

if ((int) ($column['max']) == 16384) {
break;
foreach ($worksheetCols->col as $columnx) {
$column = $columnx->attributes();
if ($column !== null) {
$startColumn = Coordinate::stringFromColumnIndex((int) $column['min']);
$endColumn = Coordinate::stringFromColumnIndex((int) $column['max']);
++$endColumn;
for ($columnAddress = $startColumn; $columnAddress !== $endColumn; ++$columnAddress) {
$columnAttributes[$columnAddress] = $this->readColumnRangeAttributes($column, $readDataOnly);

if ((int) ($column['max']) == 16384) {
break;
}
}
}
}

return $columnAttributes;
}

private function readColumnRangeAttributes(SimpleXMLElement $column, bool $readDataOnly): array
private function readColumnRangeAttributes(?SimpleXMLElement $column, bool $readDataOnly): array
{
$columnAttributes = [];

if ($column['style'] && !$readDataOnly) {
$columnAttributes['xfIndex'] = (int) $column['style'];
}
if (self::boolean($column['hidden'])) {
$columnAttributes['visible'] = false;
}
if (self::boolean($column['collapsed'])) {
$columnAttributes['collapsed'] = true;
}
if (((int) $column['outlineLevel']) > 0) {
$columnAttributes['outlineLevel'] = (int) $column['outlineLevel'];
if ($column !== null) {
if (isset($column['style']) && !$readDataOnly) {
$columnAttributes['xfIndex'] = (int) $column['style'];
}
if (isset($column['hidden']) && self::boolean($column['hidden'])) {
$columnAttributes['visible'] = false;
}
if (isset($column['collapsed']) && self::boolean($column['collapsed'])) {
$columnAttributes['collapsed'] = true;
}
if (isset($column['outlineLevel']) && ((int) $column['outlineLevel']) > 0) {
$columnAttributes['outlineLevel'] = (int) $column['outlineLevel'];
}
if (isset($column['width'])) {
$columnAttributes['width'] = (float) $column['width'];
}
}
$columnAttributes['width'] = (float) $column['width'];

return $columnAttributes;
}
Expand All @@ -189,21 +195,24 @@ private function readRowAttributes(SimpleXMLElement $worksheetRow, bool $readDat
{
$rowAttributes = [];

foreach ($worksheetRow as $row) {
if ($row['ht'] && !$readDataOnly) {
$rowAttributes[(int) $row['r']]['rowHeight'] = (float) $row['ht'];
}
if (self::boolean($row['hidden'])) {
$rowAttributes[(int) $row['r']]['visible'] = false;
}
if (self::boolean($row['collapsed'])) {
$rowAttributes[(int) $row['r']]['collapsed'] = true;
}
if ((int) $row['outlineLevel'] > 0) {
$rowAttributes[(int) $row['r']]['outlineLevel'] = (int) $row['outlineLevel'];
}
if ($row['s'] && !$readDataOnly) {
$rowAttributes[(int) $row['r']]['xfIndex'] = (int) $row['s'];
foreach ($worksheetRow as $rowx) {
$row = $rowx->attributes();
if ($row !== null) {
if (isset($row['ht']) && !$readDataOnly) {
$rowAttributes[(int) $row['r']]['rowHeight'] = (float) $row['ht'];
}
if (isset($row['hidden']) && self::boolean($row['hidden'])) {
$rowAttributes[(int) $row['r']]['visible'] = false;
}
if (isset($row['collapsed']) && self::boolean($row['collapsed'])) {
$rowAttributes[(int) $row['r']]['collapsed'] = true;
}
if (isset($row['outlineLevel']) && (int) $row['outlineLevel'] > 0) {
$rowAttributes[(int) $row['r']]['outlineLevel'] = (int) $row['outlineLevel'];
}
if (isset($row['s']) && !$readDataOnly) {
$rowAttributes[(int) $row['r']]['xfIndex'] = (int) $row['s'];
}
}
}

Expand Down
163 changes: 163 additions & 0 deletions tests/PhpSpreadsheetTests/Reader/Xlsx/Issue1482Test.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,163 @@
<?php

namespace PhpOffice\PhpSpreadsheetTests\Reader\Xlsx;

use PhpOffice\PhpSpreadsheet\Reader\Xlsx;

class Issue1482Test extends \PHPUnit\Framework\TestCase
{
/**
* @var string
*/
private static $testbook = 'tests/data/Reader/XLSX/issue.1482.xlsx';

public function testPreliminaries(): void
{
$file = 'zip://';
$file .= self::$testbook;
$file .= '#xl/worksheets/sheet.xml';
$data = file_get_contents($file);
// confirm that file contains expected namespaced xml tag
if ($data === false) {
self::fail('Unable to read file');
} else {
self::assertStringContainsString('<x:drawing r:id="rId1" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" />', $data);
}
}

public function testLoadXlsx(): void
{
$reader = new Xlsx();
$spreadsheet = $reader->load(self::$testbook);
$sheet = $spreadsheet->getSheet(0);
$expectedColWidths = [
0.95,
7.45,
0.6,
4.4,
5.91,
3.92,
7.71,
2.12,
2.56,
6.94,
9.6,
1.82,
3.01,
8.63,
10.37,
2.09,
];
$columnDimensions = $sheet->getColumnDimensions();
$actualColWidths = [];
foreach ($columnDimensions as $columnDimension) {
$actualColWidths[] = $columnDimension->getWidth();
}
self::assertSame($expectedColWidths, $actualColWidths);
$expectedRowHeights = [
14.0,
7.0,
14.0,
80.0,
234.0,
63.0,
13.0,
19.0,
18.0,
18.0,
18.0,
18.0,
18.0,
18.0,
18.0,
18.0,
];
$rowHeights = $sheet->getRowDimensions();
$actualRowHeights = [];
foreach ($rowHeights as $rowDimension) {
$actualRowHeights[] = $rowDimension->getRowHeight();
}
self::assertSame($expectedRowHeights, $actualRowHeights);
$expectedMergeCells = [
'B1:E1',
'F1:G1',
'J1:L1',
'M1:N1',
'B3:E3',
'F3:G3',
'B5:O5',
'A7:C7',
'D7:M7',
'A8:B8',
'C8:D8',
'E8:F8',
'G8:H8',
'I8:J8',
'L8:P8',
'A9:B9',
'C9:D9',
'E9:F9',
'G9:H9',
'I9:J9',
'L9:P9',
'A10:B10',
'C10:D10',
'E10:F10',
'G10:H10',
'I10:J10',
'L10:P10',
'A11:B11',
'C11:D11',
'E11:F11',
'G11:H11',
'I11:J11',
'L11:P11',
'A12:B12',
'C12:D12',
'E12:F12',
'G12:H12',
'I12:J12',
'L12:P12',
'A13:B13',
'C13:D13',
'E13:F13',
'G13:H13',
'I13:J13',
'L13:P13',
'A14:B14',
'C14:D14',
'E14:F14',
'G14:H14',
'I14:J14',
'L14:P14',
'A15:B15',
'C15:D15',
'E15:F15',
'G15:H15',
'I15:J15',
'L15:P15',
'A16:B16',
'C16:D16',
'E16:F16',
'G16:H16',
'I16:J16',
'L16:P16',
];
self::assertSame($expectedMergeCells, array_keys($sheet->getMergeCells()));

$drawingCollection = $sheet->getDrawingCollection();
self::assertCount(1, $drawingCollection);
$drawing = $drawingCollection[0];
if ($drawing === null) {
self::fail('Unexpected null instead of drawing');
} else {
self::assertSame('B5', $drawing->getCoordinates());
self::assertSame('', $drawing->getCoordinates2());
self::assertFalse($drawing->getResizeProportional());
self::assertSame(600, $drawing->getWidth());
self::assertSame(312, $drawing->getHeight());
}
// no name, no description, no type
$spreadsheet->disconnectWorksheets();
}
}
Binary file added tests/data/Reader/XLSX/issue.1482.xlsx
Binary file not shown.

0 comments on commit d0ffd82

Please sign in to comment.