-
Notifications
You must be signed in to change notification settings - Fork 7.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix undefined behavior of MT_RAND_PHP if range exceeds ZEND_LONG_MAX #9197
Conversation
The implementation appears adequate. However, given the current situation where Nevertheless, we must also carefully consider whether we should align our results with Intel or others. Maybe I should conform to Intel's behavior considering the actual workload of PHP, but I can't determine that anyway... |
This is undefined behavior. By definition there is no previous correct behavior. It can depend on compiler, version, flags, operating system, and architecture. In fact with my simple test script I get different results for both clang and gcc and for both -O0 and -O2: #include<stdint.h>
#include<stdio.h>
typedef int64_t zend_long;
typedef uint64_t zend_ulong;
# define PHP_MT_RAND_MAX ((zend_long) (0x7FFFFFFF))
# define RAND_RANGE_BADSCALING(__n, __min, __max, __tmax) \
(__n) = (__min) + (zend_long) ((double) ( (double) (__max) - (__min) + 1.0) * ((__n) / ((__tmax) + 1.0)))
int
main(void)
{
zend_long min, max;
zend_ulong r;
min = INT64_MIN;
max = INT64_MAX;
r = PHP_MT_RAND_MAX - 1;
RAND_RANGE_BADSCALING(r, min, max, PHP_MT_RAND_MAX);
printf("%ld\n", (zend_long) r);
}
… and all of the results are crap when taking into account that With the
which is also in the rough neighborhood of This scaling function is also unable to generate all possible results in the general case, because doubles only have 53 bits of precision. So once the range is larger than 53 bits, some numbers will be impossible to generate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR!
I wonder whether we should use zend_dval_to_lval()
to convert the double
to zend_long
. This function is build exactly for this purpose.
And instead of inlining the macro, maybe we should introduce an inline function.
I'd rather not. The change I made, changing the cast from In other words: Compatibility trumps cleanliness in this case.
If anything I would remove the macro entirely, as it is broken (as the name implies). I've left the macro in place as it is, to avoid breakage for third-party consumers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is undefined behavior. By definition there is no previous correct behavior. It can depend on compiler, version, flags, operating system, and architecture.
That is certainly true. Although most production environments probably use binaries compiled with gcc due to performance issues, it may not be necessary to maintain an implementation whose behavior changes with optimization options.
I confirm that compatibility is maintained with proper usage. I approve this change.
<?php
mt_srand(1234, MT_RAND_PHP);
for ($i = 0; $i < 65535; $i++) {
if (($i % 1024) === 0) {
mt_srand(mt_rand(), MT_RAND_PHP);
}
echo mt_rand(1024, 2048) . PHP_EOL;
}
PS: Should this change be merge as a patch for older versions in support? I personally think it should be merge.
I don't have a strong opinion on this one, but I'd lean to "no". Paging @cmb69 for his opinion and to have a look at my answers to his previous remarks. |
As I understand it, this is only for the legacy mode. If that is correct, I wouldn't backport the fix. |
@cmb69 Yes, you understand that correctly. This fix applies only to As the backporting question is resolved now (no backporting): Does the PR look good to you? See also my answers in: #9197 (comment) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you!
RAND_RANGE_BADSCALING() invokes undefined behavior when (max - min) > ZEND_LONG_MAX, because the intermediate `double` might not fit into `zend_long`. Fix this by inlining a fixed version of the macro into Mt19937's range() function. Fixing the macro itself cannot be done in the general case, because the types of the inputs are not known. Instead of replacing one possibly broken version with another possibly broken version, the macro is simply left as is and should be removed in a future version. The fix itself is simple: Instead of storing the "offset" in a `zend_long`, we use a `zend_ulong` which is capable of storing the resulting double by construction. With this fix the implementation of this broken scaling is effectively identical to the implementation of php_random_range from a data type perspective, making it easy to verify the correctness. It was further empirically verified that the broken macro and the fix return the same results for all possible values of `r` for several distinct pairs of (min, max). Fixes phpGH-9190 Fixes phpGH-9191
f89900f
to
baccea0
Compare
RAND_RANGE_BADSCALING() invokes undefined behavior when (max - min) >
ZEND_LONG_MAX, because the intermediate
double
might not fit intozend_long
.Fix this by inlining a fixed version of the macro into Mt19937's range()
function. Fixing the macro itself cannot be done in the general case, because
the types of the inputs are not known. Instead of replacing one possibly broken
version with another possibly broken version, the macro is simply left as is
and should be removed in a future version.
The fix itself is simple: Instead of storing the "offset" in a
zend_long
, weuse a
zend_ulong
which is capable of storing the resulting double byconstruction. With this fix the implementation of this broken scaling is
effectively identical to the implementation of php_random_range from a data
type perspective, making it easy to verify the correctness.
It was further empirically verified that the broken macro and the fix return
the same results for all possible values of
r
for several distinct pairs of(min, max).
Fixes GH-9190
Fixes GH-9191