-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change Hash Code for Worksheet #4207
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Fix PHPOffice#4192. Although that issue can be dealt with by changing user code, it would be better to fix it within PhpSpreadsheet. A cloned worksheet may have a pointer to a spreadsheet to which it is not attached. Code can assume it does belong to the spreadsheet, and throw an exception when the spreadsheet cannot find the worksheet in question. It may also not throw an exception when it should. In my comments to the issue, I was concerned that adding in the needed protection would add overhead to an extremely common situation (setting a cell's value) in order to avoid a pretty rare problem. However, there are problems with both the accuracy and efficiency of the existing code, and I think any performance losses caused by the additional checks will be offset by the performance gains and accuracy of the new code. Spreadsheet `getIndex` attempts to find the index of a worksheet within its spreadsheet collection. It does so by comparing the hash codes of each sheet in its collection with the hash code of the sheet it is looking for. Its major problem problem is performance-related, namely that it recomputes the hash code of the target sheet with each iteration. A more severe problem is the accuracy of the hash code. It generates this by hashing together the sheet title, the string range of its auto-filter, and a character representation of whether sheet protection is enabled. Title should definitely be part of the calculation (it must be unique for all sheets attached to a spreadsheet), but it is not clear why this subset of the other properties of Worksheet is used. It tries to save some cycles by using a `dirty` property to indicate whether re-hashing is necessary. It sets that property whenever the title changes, or when `setProtection` is called. So, it doesn't set it when auto-filter changes, and you can easily bypass `setProtection` when changing any of the `Protection` properties. Not to mention the many other properties of worksheet that can be changed. Additionally, if you clone a worksheet, the clone and the original will have the same hash code, which can lead to problems: ```php $clone = clone $original; $spreadsheet->getSheet($spreadsheet->getIndex($clone)) ->setCellValue('A1', 100); ``` That code will change the value of A1 in the original, not the clone. The `hash` property in Worksheet will now be calculated immediately when the object is constructed or cloned or unserialized. It will not be recalculated, and there is no longer a need for the `dirty` property, which is removed. Hash will be generated by spl_object_id, which was designed for this purpose. (So was spl_object_hash, but many online references suggest that \_id performs much better than \_hash.) Our problem example above will now throw an Exception, as it should, rather than changing the wrong cell. `setValueExplicit`, the problem in the original issue, will now test that the worksheet is attached to the spreadsheet before doing any style manipulation. In order that this not be a breaking change, `getHashCode` will continue to return string, but it is deprecated in favor of `getHashInt`, and Worksheet will no longer implement IComparable to facilitate the deprecation. I had a vague hope that this change might help with issue PHPOffice#641. It doesn't.
See if typehint helps.
oleibman
added a commit
to oleibman/PhpSpreadsheet
that referenced
this pull request
Jan 7, 2025
Backport of [PR PHPOffice#4207](PHPOffice#4207)
11 tasks
oleibman
added a commit
to oleibman/PhpSpreadsheet
that referenced
this pull request
Jan 7, 2025
11 tasks
oleibman
added a commit
to oleibman/PhpSpreadsheet
that referenced
this pull request
Jan 7, 2025
Backport of PR PHPOffice#4207.
11 tasks
oleibman
added a commit
that referenced
this pull request
Jan 8, 2025
oleibman
added a commit
that referenced
this pull request
Jan 8, 2025
oleibman
added a commit
that referenced
this pull request
Jan 8, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fix #4192. Although that issue can be dealt with by changing user code, it would be better to fix it within PhpSpreadsheet. A cloned worksheet may have a pointer to a spreadsheet to which it is not attached. Code can assume it does belong to the spreadsheet, and throw an exception when the spreadsheet cannot find the worksheet in question. It may also not throw an exception when it should.
In my comments to the issue, I was concerned that adding in the needed protection would add overhead to an extremely common situation (setting a cell's value) in order to avoid a pretty rare problem. However, there are problems with both the accuracy and efficiency of the existing code, and I think any performance losses caused by the additional checks will be offset by the performance gains and accuracy of the new code.
Spreadsheet
getIndex
attempts to find the index of a worksheet within its spreadsheet collection. It does so by comparing the hash codes of each sheet in its collection with the hash code of the sheet it is looking for. Its major problem problem is performance-related, namely that it recomputes the hash code of the target sheet with each iteration.A more severe problem is the accuracy of the hash code. It generates this by hashing together the sheet title, the string range of its auto-filter, and a character representation of whether sheet protection is enabled. Title should definitely be part of the calculation (it must be unique for all sheets attached to a spreadsheet), but it is not clear why this subset of the other properties of Worksheet is used. It tries to save some cycles by using a
dirty
property to indicate whether re-hashing is necessary. It sets that property whenever the title changes, or whensetProtection
is called. So, it doesn't set it when auto-filter changes, and you can easily bypasssetProtection
when changing any of theProtection
properties. Not to mention the many other properties of worksheet that can be changed. Additionally, if you clone a worksheet, the clone and the original will have the same hash code, which can lead to problems:That code will change the value of A1 in the original, not the clone.
The
hash
property in Worksheet will now be calculated immediately when the object is constructed or cloned or unserialized. It will not be recalculated, and there is no longer a need for thedirty
property, which is removed. Hash will be generated by spl_object_id, which was designed for this purpose. (So was spl_object_hash, but many online references suggest that _id performs much better than _hash.) Our problem example above will now throw an Exception, as it should, rather than changing the wrong cell.setValueExplicit
, the problem in the original issue, will now test that the worksheet is attached to the spreadsheet before doing any style manipulation. In order that this not be a breaking change,getHashCode
will continue to return string, but it is deprecated in favor ofgetHashInt
, and Worksheet will no longer implement IComparable to facilitate the deprecation.I had a vague hope that this change might help with issue #641. It doesn't.
This is:
Checklist:
Why this change is needed?
Provide an explanation of why this change is needed, with links to any Issues (if appropriate).
If this is a bugfix or a new feature, and there are no existing Issues, then please also create an issue that will make it easier to track progress with this PR.