Problems Formatting Very Small and Very Large Numbers

Fix PHPOffice#3128 (author found a workaround but the problem remains). For some complex masks, when a cast of the cell value from float to string results in the use of scientific notation, the result of the formatting is unusable. I believe this PR solves the problem for numbers close to zero (Php cast from float to string uses scientific notation starting with 1E-5), and for a range of large numbers which are not handled correctly now. However, I have not found a way to ensure that the results match Excel for very large numbers (1E18 or larger); this change at least ensures that the resulting string is an accurate rendition of the number (which is not the case now) even if doesn't match Excel. As an example, if you use the mask reported in the original issue `0 000.0` and enter a value of 1E90 into the cell, Excel will show it as 1 followed by 87 zeros, a space, 3 more zeros, decimal point and zero. I have not figured out how to get PhpSpreadsheet to do that; for now, it will just return the formatted value as 1 followed by 90 zeroes instead (I might have chosen to go with scientific notation instead). I will continue to think about those, but do not feel it is worth delaying the improvements in this ticket while I do so. The affected section of code also truncated to the appropriate precision. It now rounds, as Excel does. This seemed to be an area of code where problems might arise on 32-bit systems, and, indeed, I found something in the formatting code which had to be changed for 32-bit to work correctly. As long as I was doing that anyhow, I ran the full test suite, and found that Php8.1 had introduced some new stringencies which caused problems in a handful of places. All were found in Xls Reader, and all are corrected now.
oleibman · Nov 1, 2022 · ea80511 · ea80511
1 parent f1d73d8
commit ea80511
Show file tree

Hide file tree

Showing 6 changed files with 116 additions and 12 deletions.
diff --git a/samples/templates/47_xlsfill.xls b/samples/templates/47_xlsfill.xls
diff --git a/src/PhpSpreadsheet/Reader/Xls.php b/src/PhpSpreadsheet/Reader/Xls.php
@@ -2278,7 +2278,7 @@ private function readXf(): void
                 $diagonalDown = (0x40000000 & self::getInt4d($recordData, 10)) >> 30 ? true : false;
 
                 // bit: 31; mask: 0x80000000; 1 = diagonal line from bottom left to top right
-                $diagonalUp = (0x80000000 & self::getInt4d($recordData, 10)) >> 31 ? true : false;
+                $diagonalUp = ((int) 0x80000000 & self::getInt4d($recordData, 10)) >> 31 ? true : false;
 
                 if ($diagonalUp === false) {
                     if ($diagonalDown == false) {
@@ -2308,7 +2308,7 @@ private function readXf(): void
                 }
 
                 // bit: 31-26; mask: 0xFC000000 fill pattern
-                if ($fillType = Xls\Style\FillPattern::lookup((0xFC000000 & self::getInt4d($recordData, 14)) >> 26)) {
+                if ($fillType = Xls\Style\FillPattern::lookup(((int) 0xFC000000 & self::getInt4d($recordData, 14)) >> 26)) {
                     $objStyle->getFill()->setFillType($fillType);
                 }
                 // offset: 18; size: 2; pattern and background colour
@@ -2360,7 +2360,7 @@ private function readXf(): void
                 $objStyle->getBorders()->getBottom()->setBorderStyle(Xls\Style\Border::lookup((0x01C00000 & $borderAndBackground) >> 22));
 
                 // bit: 31-25; mask: 0xFE000000; bottom line color
-                $objStyle->getBorders()->getBottom()->colorIndex = (0xFE000000 & $borderAndBackground) >> 25;
+                $objStyle->getBorders()->getBottom()->colorIndex = ((int) 0xFE000000 & $borderAndBackground) >> 25;
 
                 // offset: 12; size: 4; cell border lines
                 $borderLines = self::getInt4d($recordData, 12);
@@ -7699,10 +7699,10 @@ private static function extractNumber($data)
     {
         $rknumhigh = self::getInt4d($data, 4);
         $rknumlow = self::getInt4d($data, 0);
-        $sign = ($rknumhigh & 0x80000000) >> 31;
+        $sign = ($rknumhigh & (int) 0x80000000) >> 31;
         $exp = (($rknumhigh & 0x7ff00000) >> 20) - 1023;
         $mantissa = (0x100000 | ($rknumhigh & 0x000fffff));
-        $mantissalow1 = ($rknumlow & 0x80000000) >> 31;
+        $mantissalow1 = ($rknumlow & (int) 0x80000000) >> 31;
         $mantissalow2 = ($rknumlow & 0x7fffffff);
         $value = $mantissa / 2 ** (20 - $exp);
 
@@ -7733,7 +7733,7 @@ private static function getIEEE754($rknum)
             // The RK format calls for using only the most significant 30 bits
             // of the 64 bit floating point value. The other 34 bits are assumed
             // to be 0 so we use the upper 30 bits of $rknum as follows...
-            $sign = ($rknum & 0x80000000) >> 31;
+            $sign = ($rknum & (int) 0x80000000) >> 31;
             $exp = ($rknum & 0x7ff00000) >> 20;
             $mantissa = (0x100000 | ($rknum & 0x000ffffc));
             $value = $mantissa / 2 ** (20 - ($exp - 1023));

diff --git a/src/PhpSpreadsheet/Reader/Xls/MD5.php b/src/PhpSpreadsheet/Reader/Xls/MD5.php
@@ -190,9 +190,9 @@ private static function i(int $X, int $Y, int $Z): int
     private static function step(callable $func, int &$A, int $B, int $C, int $D, int $M, int $s, $t): void
     {
         $t = self::signedInt($t);
-        $A = ($A + call_user_func($func, $B, $C, $D) + $M + $t) & self::$allOneBits;
+        $A = (int) ($A + call_user_func($func, $B, $C, $D) + $M + $t) & self::$allOneBits;
         $A = self::rotate($A, $s);
-        $A = ($B + $A) & self::$allOneBits;
+        $A = (int) ($B + $A) & self::$allOneBits;
     }
 
     /** @param float|int $result may be float on 32-bit system */

diff --git a/src/PhpSpreadsheet/Style/NumberFormat/NumberFormatter.php b/src/PhpSpreadsheet/Style/NumberFormat/NumberFormatter.php
@@ -68,10 +68,18 @@ private static function processComplexNumberFormatMask($number, string $mask): s
      */
     private static function complexNumberFormatMask($number, string $mask, bool $splitOnPoint = true): string
     {
-        $sign = ($number < 0.0) ? '-' : '';
         /** @var float */
         $numberFloat = $number;
-        $number = (string) abs($numberFloat);
+        if ($splitOnPoint) {
+            $masks = explode('.', $mask);
+            if (count($masks) <= 2) {
+                $decmask = $masks[1] ?? '';
+                $decpos = substr_count($decmask, '0');
+                $numberFloat = round($numberFloat, $decpos);
+            }
+        }
+        $sign = ($numberFloat < 0.0) ? '-' : '';
+        $number = self::f2s(abs($numberFloat));
 
         if ($splitOnPoint && strpos($mask, '.') !== false && strpos($number, '.') !== false) {
             $numbers = explode('.', $number);
@@ -80,16 +88,57 @@ private static function complexNumberFormatMask($number, string $mask, bool $spl
                 $masks = self::mergeComplexNumberFormatMasks($numbers, $masks);
             }
             $integerPart = self::complexNumberFormatMask($numbers[0], $masks[0], false);
+            $numlen = strlen($numbers[1]);
+            $msklen = strlen($masks[1]);
+            if ($numlen < $msklen) {
+                $numbers[1] .= str_repeat('0', $msklen - $numlen);
+            }
             $decimalPart = strrev(self::complexNumberFormatMask(strrev($numbers[1]), strrev($masks[1]), false));
+            $decimalPart = substr($decimalPart, 0, $msklen);
 
             return "{$sign}{$integerPart}.{$decimalPart}";
         }
 
+        if (strlen($number) < strlen($mask)) {
+            $number = str_repeat('0', strlen($mask) - strlen($number)) . $number;
+        }
         $result = self::processComplexNumberFormatMask($number, $mask);
 
         return "{$sign}{$result}";
     }
 
+    public static function f2s(float $f): string
+    {
+        return self::floatStringConvertScientific((string) $f);
+    }
+
+    public static function floatStringConvertScientific(string $s): string
+    {
+        // convert only normalized form of scientific notation:
+        //  optional sign, single digit 1-9,
+        //    decimal point and digits (allowed to be omitted),
+        //    E (e permitted), optional sign, one or more digits
+        if (preg_match('/^([+-])?([1-9])([.]([0-9]+))?[eE]([+-]?[0-9]+)$/', $s, $matches) === 1) {
+            $sign = '';
+            $exponent = (int) $matches[5];
+            $sign = ($matches[1] === '-') ? '-' : '';
+            if ($exponent >= 0) {
+                $exponentPlus1 = $exponent + 1;
+                $out = $matches[2] . $matches[4];
+                $len = strlen($out);
+                if ($len < $exponentPlus1) {
+                    $out .= str_repeat('0', $exponentPlus1 - $len);
+                }
+                $out = substr($out, 0, $exponentPlus1) . ((strlen($out) === $exponentPlus1) ? '' : ('.' . substr($out, $exponentPlus1)));
+                $s = "$sign$out";
+            } else {
+                $s = $sign . '0.' . str_repeat('0', -$exponent - 1) . $matches[2] . $matches[4];
+            }
+        }
+
+        return $s;
+    }
+
     /**
      * @param mixed $value
      */
@@ -118,11 +167,20 @@ private static function formatStraightNumericValue($value, string $format, array
             //    Scientific format
             return sprintf('%5.2E', $valueFloat);
         } elseif (preg_match('/0([^\d\.]+)0/', $format) || substr_count($format, '.') > 1) {
-            if ($value == (int) $valueFloat && substr_count($format, '.') === 1) {
+            if ($valueFloat == floor($valueFloat) && substr_count($format, '.') === 1) {
                 $value *= 10 ** strlen(explode('.', $format)[1]);
             }
 
-            return self::complexNumberFormatMask($value, $format);
+            $result = self::complexNumberFormatMask($value, $format);
+            if (strpos($result, 'E') !== false) {
+                // This is a hack and doesn't match Excel.
+                // It will, at least, be an accurate representation,
+                //  even if formatted incorrectly.
+                // This is needed for absolute values >=1E18.
+                $result = self::f2s($valueFloat);
+            }
+
+            return $result;
         }
 
         $sprintf_pattern = "%0$minWidth." . strlen($right) . 'f';

diff --git a/tests/PhpSpreadsheetTests/Style/NumberFormatTest.php b/tests/PhpSpreadsheetTests/Style/NumberFormatTest.php
@@ -4,6 +4,7 @@
 
 use PhpOffice\PhpSpreadsheet\Shared\StringHelper;
 use PhpOffice\PhpSpreadsheet\Style\NumberFormat;
+use PhpOffice\PhpSpreadsheet\Style\NumberFormat\NumberFormatter;
 use PHPUnit\Framework\TestCase;
 
 class NumberFormatTest extends TestCase
@@ -59,6 +60,7 @@ public function providerNumberFormat(): array
      * @dataProvider providerNumberFormatDates
      *
      * @param mixed $expectedResult
+     * @param mixed $args
      */
     public function testFormatValueWithMaskDate($expectedResult, ...$args): void
     {
@@ -84,4 +86,34 @@ public function testCurrencyCode(): void
         self::assertEquals($rslt, '$ 12,345.679');
         StringHelper::setCurrencyCode($cur);
     }
+
+    /**
+     * @dataProvider providerNoScientific
+     */
+    public function testNoScientific(string $expectedResult, string $numericString): void
+    {
+        $result = NumberFormatter::floatStringConvertScientific($numericString);
+        self::assertSame($expectedResult, $result);
+    }
+
+    public function providerNoScientific(): array
+    {
+        return [
+            'large number' => ['92' . str_repeat('0', 16), '9.2E+17'],
+            'no decimal portion' => ['16', '1.6E1'],
+            'retain decimal 0 if supplied in string' => ['16.0', '1.60E1'],
+            'exponent 0' => ['2.3', '2.3E0'],
+            'whole and decimal' => ['16.5', '1.65E1'],
+            'plus signs' => ['165000', '+1.65E+5'],
+            'e2 one decimal' => ['489.7', '4.897E2'],
+            'e2 no decimal' => ['-489', '-4.89E2'],
+            'e2 fill units position' => ['480', '4.8E+2'],
+            'no scientific notation' => ['3.14159', '3.14159'],
+            'non-zero in first decimal' => ['0.165', '1.65E-1'],
+            'one leading zero in decimal' => ['0.0165', '1.65E-2'],
+            'four leading zeros in decimal' => ['-0.0000165', '-1.65E-5'],
+            'small number' => ['0.' . str_repeat('0', 16) . '1', '1E-17'],
+            'very small number' => ['0.' . str_repeat('0', 69) . '1', '1E-70'],
+        ];
+    }
 }
diff --git a/tests/data/Style/NumberFormat.php b/tests/data/Style/NumberFormat.php
@@ -1491,4 +1491,18 @@
     'percent with leading 0' => ['06.2%', 0.062, '00.0%'],
     'percent lead0 no decimal' => ['06%', 0.062, '00%'],
     'percent nolead0 no decimal' => ['6%', 0.062, '##%'],
+    'scientific small complex mask discard all decimals' => ['0 000.0', 1e-17, '0 000.0'],
+    'scientific small complex mask keep some decimals' => ['-0 000.000027', -2.7e-5, '0 000.000000'],
+    'scientific small complex mask keep some decimals trailing zero' => ['-0 000.000040', -4e-5, '0 000.000000'],
+    'scientific large complex mask' => ['92' . str_repeat('0', 13) . ' 000.0', 9.2e17, '0 000.0'],
+    'scientific very large complex mask PhpSpreadsheet does not match Excel' => ['1' . str_repeat('0', 18), 1e18, '0 000.0'],
+    'scientific even larger complex mask PhpSpreadsheet does not match Excel' => ['43' . str_repeat('0', 89), 4.3e90, '0 000.0'],
+    'scientific many decimal positions' => ['000 0.000 01', 1e-5, '000 0.000 00'],
+    'round with scientific notation' => ['000 0.000 02', 1.6e-5, '000 0.000 00'],
+    'round with no decimals' => ['009 8', 97.7, '000 0'],
+    'round to 1 decimal' => ['009 7.2', 97.15, '000 0.0'],
+    'truncate with no decimals' => ['009 7', 97.3, '000 0'],
+    'truncate to 1 decimal' => ['009 7.1', 97.13, '000 0.0'],
+    'scientific many decimal positions truncated' => ['000 0.000 00', 1e-7, '000 0.000 00'],
+    'scientific very many decimal positions truncated' => ['000 0.000 00', 1e-17, '000 0.000 00'],
 ];