Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Formula Misidentifying Text as Cell After Insertion/Deletion #3915

Merged
merged 3 commits into from
Mar 1, 2024

Conversation

oleibman
Copy link
Collaborator

Fix #3907. After row/column insertion/deletion, PhpSpreadsheet updates formulas which include cells which have moved. However, it can mis-identify cell addresses within the formula. Examples:

  • =SUM(A2,'F1 (SETTINGS)'!A1:B1) It identifes F1 as a cell address.
  • =SUM(A2,'x F1 (SETTINGS)'!A1:B1) It identifes F1 as a cell address. (This looks the same as the above, but, for technical reasons, it's different.)
  • =SUM(A2,definedname1A1) It identifes A1 as a cell address.
  • Sheet names in formulas are compared case-sensitively, and should be compared insensitively. This can make a difference if the formula includes its own sheet name, e.g. on sheet Data, formula =SUM(DATA!A1:A2) might have to change, but it will not do so with the existing logic.

The defined name part is fairly straightforward. The regular expressions that identify a cell address just have to be a bit more robust. It was doing a negative look-behind for an alphabetic character or dollar sign; underscore, period, and digits, all of which can be part of a defined name, need to be added to that list.

The other situations need a bit of a kludge, but not one so bad that I'm ashamed of it. The formulas will be altered before analysis so that sheet names are replaced with Unicode FFFD (sheetname does not match current sheet), FFFC (sheetname, enclosed in apostrophes, matches current sheet), and FFFB (sheetname, not enclosed in apostrophes, matches current sheet). This prevents the existing regular expressions from finding a cell address within a sheet name, and makes it easy to restore the original, with or without apostrophes, when the sheet name matches the current sheet and the cell(s) which it qualifies have to be changed.

Tests are added for all the situations mentioned above. No existing tests required changes.

This is:

  • a bugfix
  • a new feature
  • refactoring
  • additional unit tests

Checklist:

  • Changes are covered by unit tests
    • Changes are covered by existing unit tests
    • New unit tests have been added
  • Code style is respected
  • Commit message explains why the change is made (see https://github.com/erlang/otp/wiki/Writing-good-commit-messages)
  • CHANGELOG.md contains a short summary of the change and a link to the pull request if applicable
  • Documentation is updated as necessary

Why this change is needed?

Provide an explanation of why this change is needed, with links to any Issues (if appropriate).
If this is a bugfix or a new feature, and there are no existing Issues, then please also create an issue that will make it easier to track progress with this PR.

Fix PHPOffice#3907. After row/column insertion/deletion, PhpSpreadsheet updates formulas which include cells which have moved. However, it can mis-identify cell addresses within the formula. Examples:
- `=SUM(A2,'F1 (SETTINGS)'!A1:B1)` It identifes F1 as a cell address.
- `=SUM(A2,'x F1 (SETTINGS)'!A1:B1)` It identifes F1 as a cell address. (This looks the same as the above, but, for technical reasons, it's different.)
- `=SUM(A2,definedname1A1)` It identifes A1 as a cell address.
- Sheet names in formulas are compared case-sensitively, and should be compared insensitively. This can make a difference if the formula includes its own sheet name, e.g. on sheet `Data`, formula `=SUM(DATA!A1:A2)` might have to change, but it will not do so with the existing logic.

The defined name part is fairly straightforward. The regular expressions that identify a cell address just have to be a bit more robust. It was doing a negative look-behind for an alphabetic character or dollar sign; underscore, period, and digits, all of which can be part of a defined name, need to be added to that list.

The other situations need a bit of a kludge, but not one so bad that I'm ashamed of it. The formulas will be altered before analysis so that sheet names are replaced with Unicode FFFD (sheetname does not match current sheet), FFFC (sheetname, enclosed in apostrophes, matches current sheet), and FFFB (sheetname, not enclosed in apostrophes, matches current sheet). This prevents the existing regular expressions from finding a cell address within a sheet name, and makes it easy to restore the original, with or without apostrophes, when the sheet name matches the current sheet and the cell(s) which it qualifies have to be changed.

Tests are added for all the situations mentioned above. No existing tests required changes.
@oleibman oleibman added this pull request to the merge queue Mar 1, 2024
Merged via the queue into PHPOffice:master with commit 76b2dc7 Mar 1, 2024
13 of 14 checks passed
@oleibman oleibman deleted the issue3907 branch March 1, 2024 01:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect Handling of Sheet Names Resembling Cell Addresses in Formulas
1 participant