-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failed to load Worksheets on some .xlsx-files with namespaces #1482
Comments
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
Just in case smebody else is stumbling over this issue as I did: in my case it helped to change Line 740 of Reader/Xlsx.php to: $xmlSheet = $this->loadZip("$dir/$fileWorksheet", $mainNS); Note the change from |
Can you upload a file which PhpSpreadsheet handles incorrectly without your suggested change, but correctly with it? |
Sure. The appended file is exported from a thrid party software. The textual content itself is parsed correctly but you cannot access e.g. the image inside of this file (which I urgently needed). An example file: 2022-09-05 15-02-45+02-00-02_1.xlsx Btw, this might be related to #274. Edit: one additional remark: when you open the file with excel, save it (without changes) and try again, PhpSpreadsheet will recognize the image since Excel removed all the namespaces. |
Thank you for the sample. Problem confirmed. There are several problems with how the code handles this particular spreadsheet. Namespacing doesn't appear to be one of the problems, but the use of absolute paths in the zip file where relative paths are expected turns up in several places. I will work on it. In the meantime, can I ask you to open a new issue to which I can ultimately tie the solution, rather than using this old and closed issue. |
I have now identified some namespacing problems as well (which is why your workaround sort-of worked), affecting things like row height as well as reading the image on your spreadsheet. |
…tes, MergeCells Fix PHPOffice#1482 (actually fix a problem recently attached to that ticket long after it closed). There were problems processing a spreadsheet generated by third party software. That spreadsheet used unexpected namespacing, and absolute paths within the zip file where relative paths were expected. Xlsx Reader handles most, but not all, of its processing in a namespace-aware manner. Two versions of a worksheet's xml are available - `$xmlSheet` is not namespace aware and `$xmlSheetNS` is aware. This was necessary in order to add namespace support in an incremental manner. The primary reason to continue to use the unaware version is the absence of test cases. In particular, drawings, row and column attributes, and merge cells continue to use the unaware version; this PR changes those to use the aware version. As noted in the summary above, a couple of new places in the handling of the those items were expecting file locations to be specified as relative paths in the zip file, but the file used absolute paths instead. Those unexpected usages are now addressed. The user reporting the new problem tried a change which effectively made all uses of `$xmlSheet` namespace aware, and that seemed helpful. It may be time eliminate its usage altogether, whether or not we have appropriate examples of unexpected namespaces to test with. I will not do that with this change, but I may add a new PR to do so after this one is merged. Remaining areas which still use the unaware version include conditional formatting (internal or external), sheet view options, sheet protection, auto filters, unparsed loaded data, data validation (internal or external), alternate content, and header/footer images. There is an interesting anomaly with the new test file. When I load it and save it, the appearance of the output file does not quite match the input. Oddly, the output file seems much better than the input - the picture no longer covers any data, for example. This is because, in particular, the output file row heights and column widths seem to match the xml, but the input file does not. For example, the xml in both files seems to indicate that row 5 should have a height of 234, which it does in the output file, but the height of that row when the input file is opened is 156. It appears that all row heights and column widths when the input file is opened are very close to 2/3 of what is expected. I will continue to research that anomaly for a few days, but I will not let it prevent me from moving forward with this PR if I don't find the explanation. Whatever that problem is, it seems distinct from the namespacing/pathing problems which the PR addresses.
…ls (#3137) * Xlsx Reader Namespace Aware Handling of Drawings, RowAndColumnAttributes, MergeCells Fix #1482 (actually fix a problem recently attached to that ticket long after it closed). There were problems processing a spreadsheet generated by third party software. That spreadsheet used unexpected namespacing, and absolute paths within the zip file where relative paths were expected. Xlsx Reader handles most, but not all, of its processing in a namespace-aware manner. Two versions of a worksheet's xml are available - `$xmlSheet` is not namespace aware and `$xmlSheetNS` is aware. This was necessary in order to add namespace support in an incremental manner. The primary reason to continue to use the unaware version is the absence of test cases. In particular, drawings, row and column attributes, and merge cells continue to use the unaware version; this PR changes those to use the aware version. As noted in the summary above, a couple of new places in the handling of the those items were expecting file locations to be specified as relative paths in the zip file, but the file used absolute paths instead. Those unexpected usages are now addressed. The user reporting the new problem tried a change which effectively made all uses of `$xmlSheet` namespace aware, and that seemed helpful. It may be time eliminate its usage altogether, whether or not we have appropriate examples of unexpected namespaces to test with. I will not do that with this change, but I may add a new PR to do so after this one is merged. Remaining areas which still use the unaware version include conditional formatting (internal or external), sheet view options, sheet protection, auto filters, unparsed loaded data, data validation (internal or external), alternate content, and header/footer images. There is an interesting anomaly with the new test file. When I load it and save it, the appearance of the output file does not quite match the input. Oddly, the output file seems much better than the input - the picture no longer covers any data, for example. This is because, in particular, the output file row heights and column widths seem to match the xml, but the input file does not. For example, the xml in both files seems to indicate that row 5 should have a height of 234, which it does in the output file, but the height of that row when the input file is opened is 156. It appears that all row heights and column widths when the input file is opened are very close to 2/3 of what is expected. I will continue to research that anomaly for a few days, but I will not let it prevent me from moving forward with this PR if I don't find the explanation. Whatever that problem is, it seems distinct from the namespacing/pathing problems which the PR addresses. * Scrutinizer New False Positives Eliminate them with annotations.
This is:
What is the expected behavior?
List all the worksheets / Load them
What is the current behavior?
Worksheets are empty (printing the following in the below example:
What are the steps to reproduce?
This, however, relies on an .xlsx-file with namespaces. I know you asked for code without that - but that is not possible in this case.
I know this file was generated by a system - not Excel itself. It should however be valid.
I can't post the original file, but the content of xl\workbook.xml in the .xlsx-Archive is (originally without line breaks:
If I open the file in Excel and save it again it becomes (again originally without formatting)
This again is readable by PhpSpreadsheet. However, it does not make sense to do this a 1000 times ;)
What I found is, that in the first namespaced example the following code produces a false and so the worksheets are empty:
if ($xmlWorkbook->sheets) {
in vendor\phpoffice\phpspreadsheet\src\PhpSpreadsheet\Reader\Xlsx.php on line 600). The corresponding loading procedure can be found in line 574.I think one needs to just swap the
simplexml_load_string
with a namespace-aware method - but I don't know how.Which versions of PhpSpreadsheet and PHP are affected?
I am using the current PhpSpreadsheet 1.12.0 and PHP 7.4.5.
The text was updated successfully, but these errors were encountered: