Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MATCH() match type smallest value has inferior handling of numeric types than greatest or equal matches #3141

Closed
Natrak opened this issue Oct 25, 2022 · 2 comments · Fixed by #3142

Comments

@Natrak
Copy link

Natrak commented Oct 25, 2022

matchSmallestValue does not have any extra handling to make sure double and integer types are treated as matchable.
$typeMatch = gettype($lookupValue) === gettype($lookupArrayValue);

Both matchFirstvalue and matchLargestValue use:
$typeMatch = ((gettype($lookupValue) === gettype($lookupArrayValue)) || (is_numeric($lookupValue) && is_numeric($lookupArrayValue)));

This results in the matcher for matchSmallestValue skipping valid comparisons when one of the values is an integer and the other a double.

I had no problems just reusing the logic from the other matching functions.

@MarkBaker
Copy link
Member

It isn't quite so straightforward, because we can't then use the same comparison for numerics and non-numeric strings; but I'll take a look and see how best we can do that check

@oleibman
Copy link
Collaborator

@MarkBaker I've looked and I think I have something that will work. I believe the logic in matchFirstValue is also not correct.

oleibman added a commit to oleibman/PhpSpreadsheet that referenced this issue Oct 26, 2022
Fix PHPOffice#3141. Function matchSmallestValue did not recognize that an integer could match a float. Adding test cases, it seems that matchFirstValue had the same problem. However, matchLargestValue seemed to handle things correctly - but see below.

In addition, the wildcard logic in matchFirstValue is faulty. It ignored tilde as a wildcard character. Although it would have been easy to just add that, I think it was wrong to determine on its own if a wildcard was in use. Just using the already available wildcard functions whenever comparing two strings is sufficient.

I note that Excel doesn't seem to follow its own rules for MATCH (https://support.microsoft.com/en-us/office/match-function-e8dffd45-c762-47d6-bf89-533f4a37673a?ns=excel&version=90&syslcid=1033&uilcid=1033&appver=zxl900&helpid=xlmain11.chm60112&ui=en-us&rs=en-us&ad=us). PhpSpreadsheet's results match Excel's, so no problem. However, when match_type is not zero, the match array is supposed to be sorted, so I would expect `#N/A` when it isn't; but that's not how Excel operates. I have no idea what Excel is doing. If `MATCH(2,{2,0,4,3},1)` isn't `#N/A` because of the unsorted array, then surely it should be `1` (item 1 of the array is the largest number less than or equal to the lookup value); but Excel and PhpSpreadsheet (before and after changes) return `2`. I have moved this example to be the first of the test cases.

One would think strings would behave similarly. But, no - see the second test case. This time Excel does look for an exact match. But the existing logic doesn't get the matching result in PhpSpreadsheet. It requires a whole new block of code, one which doesn't work correctly for numeric lookup value. Ugh.

LibreOffice doesn't always agree with Excel. It seems that it will use wildcard matching even when the match type is not zero (Excel documentation says wildcards are only for type zero, which is just as well because I don't really know what greater/less mean when wildcards are involved). I have not attempted to duplicate this behavior. For the record, Gnumeric agrees with Excel here.
oleibman added a commit that referenced this issue Nov 4, 2022
* MATCH Problems with Int/Float Compare and Wildcards

Fix #3141. Function matchSmallestValue did not recognize that an integer could match a float. Adding test cases, it seems that matchFirstValue had the same problem. However, matchLargestValue seemed to handle things correctly - but see below.

In addition, the wildcard logic in matchFirstValue is faulty. It ignored tilde as a wildcard character. Although it would have been easy to just add that, I think it was wrong to determine on its own if a wildcard was in use. Just using the already available wildcard functions whenever comparing two strings is sufficient.

I note that Excel doesn't seem to follow its own rules for MATCH (https://support.microsoft.com/en-us/office/match-function-e8dffd45-c762-47d6-bf89-533f4a37673a?ns=excel&version=90&syslcid=1033&uilcid=1033&appver=zxl900&helpid=xlmain11.chm60112&ui=en-us&rs=en-us&ad=us). PhpSpreadsheet's results match Excel's, so no problem. However, when match_type is not zero, the match array is supposed to be sorted, so I would expect `#N/A` when it isn't; but that's not how Excel operates. I have no idea what Excel is doing. If `MATCH(2,{2,0,4,3},1)` isn't `#N/A` because of the unsorted array, then surely it should be `1` (item 1 of the array is the largest number less than or equal to the lookup value); but Excel and PhpSpreadsheet (before and after changes) return `2`. I have moved this example to be the first of the test cases.

One would think strings would behave similarly. But, no - see the second test case. This time Excel does look for an exact match. But the existing logic doesn't get the matching result in PhpSpreadsheet. It requires a whole new block of code, one which doesn't work correctly for numeric lookup value. Ugh.

LibreOffice doesn't always agree with Excel. It seems that it will use wildcard matching even when the match type is not zero (Excel documentation says wildcards are only for type zero, which is just as well because I don't really know what greater/less mean when wildcards are involved). I have not attempted to duplicate this behavior. For the record, Gnumeric agrees with Excel here.

* More Changes - LibreOffice

Add support for LibreOffice matching wildcard strings when type is not zero. Add support for type to be specified as integer other than 0/1/-1, or as float, or as numeric string; non-numeric string should case `#VALUE!` error.

I have found an example of undefined behavior (unsorted array where type is non-zero) where PhpSpreadsheet does not produce the same result as Excel. It is present as a new `incomplete` test case. I can fix it, but not without breaking other tests where the proper behavior is undefined. IMO, this is not a problem we should be concerned about.

Many test cases are added. Chances are I will add some more before merging this change.
MarkBaker added a commit that referenced this issue Dec 21, 2022
### Added

- Extended flag options for the Reader `load()` and Writer `save()` methods
- Apply Row/Column limits (1048576 and XFD) in ReferenceHelper [PR #3213](#3213)
- Allow the creation of In-Memory Drawings from a string of binary image data, or from a stream. [PR #3157](#3157)
- Xlsx Reader support for Pivot Tables [PR #2829](#2829)
- Permit Date/Time Entered on Spreadsheet to be calculated as Float [Issue #1416](#1416) [PR #3121](#3121)

### Changed

- Nothing

### Deprecated

- Direct update of Calculation::suppressFormulaErrors is replaced with setter.
- Font public static variable defaultColumnWidths replaced with constant DEFAULT_COLUMN_WIDTHS.
- ExcelError public static variable errorCodes replaced with constant ERROR_CODES.
- NumberFormat constant FORMAT_DATE_YYYYMMDD2 replaced with existing identical FORMAT_DATE_YYYYMMDD.

### Removed

- Nothing

### Fixed

- Fixed handling for `_xlws` prefixed functions from Office365 [Issue #3245](#3245) [PR #3247](#3247)
- Conditionals formatting rules applied to rows/columns are removed [Issue #3184](#3184) [PR #3213](#3213)
- Treat strings containing currency or accounting values as floats in Calculation Engine operations [Issue #3165](#3165) [PR #3189](#3189)
- Treat strings containing percentage values as floats in Calculation Engine operations [Issue #3155](#3155) [PR #3156](#3156) and [PR #3164](#3164)
- Xlsx Reader Accept Palette of Fewer than 64 Colors [Issue #3093](#3093) [PR #3096](#3096)
- Use Locale-Independent Float Conversion for Xlsx Writer Custom Property [Issue #3095](#3095) [PR #3099](#3099)
- Allow setting AutoFilter range on a single cell or row [Issue #3102](#3102) [PR #3111](#3111)
- Xlsx Reader External Data Validations Flag Missing [Issue #2677](#2677) [PR #3078](#3078)
- Reduces extra memory usage on `__destruct()` calls [PR #3092](#3092)
- Additional properties for Trendlines [Issue #3011](#3011) [PR #3028](#3028)
- Calculation suppressFormulaErrors fix [Issue #1531](#1531) [PR #3092](#3092)
- Permit Date/Time Entered on Spreadsheet to be Calculated as Float [Issue #1416](#1416) [PR #3121](#3121)
- Incorrect Handling of Data Validation Formula Containing Ampersand [Issue #3145](#3145) [PR #3146](#3146)
- Xlsx Namespace Handling of Drawings, RowAndColumnAttributes, MergeCells [Issue #3138](#3138) [PR #3136](#3137)
- Generation3 Copy With Image in Footer [Issue #3126](#3126) [PR #3140](#3140)
- MATCH Function Problems with Int/Float Compare and Wildcards [Issue #3141](#3141) [PR #3142](#3142)
- Fix ODS Read Filter on number-columns-repeated cell [Issue #3148](#3148) [PR #3149](#3149)
- Problems Formatting Very Small and Very Large Numbers [Issue #3128](#3128) [PR #3152](#3152)
- XlsxWrite preserve line styles for y-axis, not just x-axis [PR #3163](#3163)
- Xlsx Namespace Handling of Drawings, RowAndColumnAttributes, MergeCells [Issue #3138](#3138) [PR #3137](#3137)
- More Detail for Cyclic Error Messages [Issue #3169](#3169) [PR #3170](#3170)
- Improved Documentation for Deprecations - many PRs [Issue #3162](#3162)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

3 participants