-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Guard members of MonoType union & fix related bugs #111645
base: main
Are you sure you want to change the base?
Conversation
tracing\eventpipe\gcdump\gcdump\gcdump.cmd test failure is a real bug in mono. |
@@ -1296,6 +1296,92 @@ m_type_is_byref (const MonoType *type) | |||
return type->byref__; | |||
} | |||
|
|||
MONO_NEVER_INLINE void |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are going to have this in production than I believe we should adopt a similar pattern as we have when using MONO_CLASS_DEF_PRIVATE, either extend that or make a similar thing for MonoType, that way we would run all these additional checks when running ENABLE_CHECKED_BUILD_PRIVATE_TYPES (we should have CI legs doing that), but use inline wrappers with zero overhead on release builds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to do actual full checks in Release unless benchmarks show that it meaningfully regresses performance, at least during the preview period, in order to flush out any remaining issues of this type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would still add the same support for types as we have for class so we could do a MONO_TYPE_DEF_PRIVATE under ENABLED_CHECKED_BUILD_PRIVATE_TYPES to catch errors bypassing the getters on CI. We could also add support for the full check vs zero overhead wrappers as part of that and then just decide to initially opt into full checks during preview (and monitor perf hit during that time) and then have the option to move back to zero wrappers closer to release, but keep the full checks on CI builds running with the ENABLED_CHECKED_BUILD_PRIVATE_TYPES.
case MONO_TYPE_SZARRAY: | ||
{ | ||
// FIXME: Previously was handled by below ARRAY block but that is incorrect; this implementation is speculative -kg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume you should be able to handle both cases similar, except that you can't assume mono_type is MonoArrayType it could also be MonoClass, so you would just need to handle the case where you get the type, rank and mono_type_parameter differently.
Add comments above some apparent bugs
I went through and audited all the union accesses I could find and changed the ones that were blatantly and correctly guarded to I found some more bugs (mostly around SZARRAY handling) and added FIXME comments if I couldn't trivially fix them. I will probably try to fix all the FIXMEs I've found so far. |
The AOT compiler failure isn't generating usable stacks for some reason so I'll have to try and reproduce it locally in Debug configuration and hopefully get stacks there. I'm not sure if the aot compiler lacking stack traces when it crashes is intentional, I'm surprised by it. EDIT: Lesson learned: If you fix a bug in |
Build linux-x64 Release AllSubsets_Mono_LLVMFULLAOT_RuntimeTests llvmfullaot and its sibling look hung, but maybe they're just normally slow, or they're very slow because of the changes in this PR? I'm not familiar with these lanes and I can't tell whether the stdout on both of them stops at 1999 by coincidence or because azdo is truncating the actual output. |
Prune a dead goto to make unchecked safe
I took a middle ground between @lateralusX 's suggestion and my personal preference, by renaming the union but not having a configuration where direct union access is allowed. Having two modes would have complicated the code a lot so I decided not to do it yet. If we decide we need direct access for some reason in a wider set of code I can do the work to enable it, but I think it's better to enforce use of the getters and setters everywhere. Most uses of the getters ended up being unchecked because they were obviously guarded by a switch case or an if, so I expect the performance impact to be minor in practice. If this leads to unacceptable performance regressions my plan is to just disable the checks by default in release builds as we approach the later previews. My audit revealed a handful of places where it wasn't obvious whether the union was being used safely beforehand, but also wasn't immediately obvious that it was being used incorrectly. As a result I couldn't "fix" them easily and left them guarded, so customers may hit failures in their applications if those locations turn out to be bugs. i.e. a function with a name indicating it's used to manipulate generic parameters that isn't doing a type check before accessing the union as if it's a generic parameter. There's one bare use of the raw union still left in the code because the union member was being passed to some sort of macro/function as an lvalue, and I couldn't come up with a straightforward fix for that. Something I'd love to do is have even the _unchecked accessors do checks in debug builds, just in case my audits were incorrect and I applied _unchecked somewhere I shouldn't have. But I don't know if it's worth the trouble, so I didn't bother doing it. Would love to hear people's thoughts on that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gone through all changes, mainly commenting on more places that could potentially use unchecked instead of checked gets since we already have checks of type before making calls to the types get function.
At first, I was a little reluctant to split into checked/unchecked, because it could introduce bugs by using the unchecked in wrong places or refactoring that might falsify the use of unchecked in a piece of code. Just having one implementation with full checks enabled in debug/checked build running on all our internal testing and then having a zero cost implementation in release would have mitigated this issue, but if we believe we would like to keep some of the checks in release builds, then we would need to split into two functions like you have in this PR, but then I believe it will be crucial to make the unchecked version working as the checked version in debug/checked builds so we can capture errors using the unchecked version udner false assumptions.
@@ -5693,8 +5693,8 @@ decode_value_compute_size (MonoType *t, int type, MonoDomain *domain, guint8 *bu | |||
goto end; | |||
} | |||
} else if ((t->type == MONO_TYPE_GENERICINST) && | |||
mono_metadata_generic_class_is_valuetype (t->data.generic_class) && | |||
m_class_is_enumtype (t->data.generic_class->container_class)){ | |||
mono_metadata_generic_class_is_valuetype (m_type_data_get_generic_class (t)) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we checked type to be MONO_TYPE_GENERICINST could these be unchecked calls?
@@ -5916,8 +5916,8 @@ decode_value_internal (MonoType *t, int type, MonoDomain *domain, guint8 *addr, | |||
return ERR_INVALID_ARGUMENT; | |||
} | |||
} else if ((t->type == MONO_TYPE_GENERICINST) && | |||
mono_metadata_generic_class_is_valuetype (t->data.generic_class) && | |||
m_class_is_enumtype (t->data.generic_class->container_class)){ | |||
mono_metadata_generic_class_is_valuetype (m_type_data_get_generic_class (t)) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we checked type to be MONO_TYPE_GENERICINST could these be unchecked calls?
@@ -5944,7 +5944,7 @@ decode_value (MonoType *t, MonoDomain *domain, gpointer void_addr, gpointer void | |||
int type = decode_byte (buf, &buf, limit); | |||
|
|||
if (t->type == MONO_TYPE_GENERICINST && mono_class_is_nullable (mono_class_from_mono_type_internal (t))) { | |||
MonoType *targ = t->data.generic_class->context.class_inst->type_argv [0]; | |||
MonoType *targ = m_type_data_get_generic_class (t)->context.class_inst->type_argv [0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we checked type to be MONO_TYPE_GENERICINST could these be unchecked calls?
buffer_add_typeid (buf, domain, mono_class_from_mono_type_internal (type->data.method->ret)); | ||
for (int j = 0; j < type->data.method->param_count; ++j) { | ||
buffer_add_typeid (buf, domain, mono_class_from_mono_type_internal (type->data.method->params[j])); | ||
buffer_add_int (buf, 1 + m_type_data_get_method (type)->param_count); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we checked type to be MONO_TYPE_FNPTR could these be unchecked calls?
@@ -9683,9 +9683,9 @@ method_commands_internal (int command, MonoMethod *method, MonoDomain *domain, g | |||
MonoGenericContext *context; | |||
GString *res; | |||
res = g_string_new (""); | |||
mono_type_get_desc (res, m_class_get_byval_arg (type->data.generic_class->container_class), TRUE); | |||
mono_type_get_desc (res, m_class_get_byval_arg (m_type_data_get_generic_class (type)->container_class), TRUE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we checked type to be MONO_TYPE_GENERICINST could these be unchecked calls?
char *str = type_get_qualified_name (type, NULL); | ||
len = strlen (str); | ||
g_free (str); | ||
} else if (type->type == MONO_TYPE_SZARRAY && type->data.klass->enumtype) { | ||
char *str = type_get_qualified_name (m_class_get_byval_arg (type->data.klass), NULL); | ||
} else if (type->type == MONO_TYPE_SZARRAY && m_type_data_get_klass (type)->enumtype) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unchecked?
@@ -3019,10 +3019,10 @@ reflection_methodbuilder_to_mono_method (MonoClass *klass, | |||
for (gint32 i = 0; i < m->signature->param_count; ++i) { | |||
MonoType *t = m->signature->params [i]; | |||
if (t->type == MONO_TYPE_MVAR) { | |||
MonoGenericParam *gparam = t->data.generic_param; | |||
MonoGenericParam *gparam = m_type_data_get_generic_param (t); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unchecked?
@@ -3722,7 +3722,7 @@ remove_instantiations_of_and_ensure_contents (gpointer key, | |||
ERROR_DECL (lerror); | |||
MonoError *error = already_failed ? lerror : data->error; | |||
|
|||
if ((type->type == MONO_TYPE_GENERICINST) && (type->data.generic_class->container_class == klass)) { | |||
if ((type->type == MONO_TYPE_GENERICINST) && (m_type_data_get_generic_class (type)->container_class == klass)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unchecked?
@@ -10542,8 +10542,8 @@ mono_method_to_ir (MonoCompile *cfg, MonoMethod *method, MonoBasicBlock *start_b | |||
int ro_type = ftype->type; | |||
if (!addr) | |||
addr = mono_static_field_get_addr (vtable, field); | |||
if (ro_type == MONO_TYPE_VALUETYPE && m_class_is_enumtype (ftype->data.klass)) { | |||
ro_type = mono_class_enum_basetype_internal (ftype->data.klass)->type; | |||
if (ro_type == MONO_TYPE_VALUETYPE && m_class_is_enumtype (m_type_data_get_klass (ftype))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unchecked?
@@ -3618,8 +3619,8 @@ is_async_method (MonoMethod *method) | |||
/* Do less expensive checks first */ | |||
sig = mono_method_signature_internal (method); | |||
if (attr_class && sig && ((sig->ret->type == MONO_TYPE_VOID) || | |||
(sig->ret->type == MONO_TYPE_CLASS && !strcmp (m_class_get_name (sig->ret->data.generic_class->container_class), "Task")) || | |||
(sig->ret->type == MONO_TYPE_GENERICINST && !strcmp (m_class_get_name (sig->ret->data.generic_class->container_class), "Task`1")))) { | |||
(sig->ret->type == MONO_TYPE_CLASS && !strcmp (m_class_get_name (m_type_data_get_generic_class (sig->ret)->container_class), "Task")) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unchecked?
Thanks for the detailed review! I forgot to mention that for a few modules I decided not to optimize them with unchecked - sre, metadata, debugger-agent. It doesn't seem like they are performance critical enough to be worth the effort/risk. But since you already audited them (sorry!) I'll apply the changes there. |
There is a class of issue where we fail to properly check the type of a MonoType before accessing the members of its data union, which can lead to type confusion or null pointer dereference bugs.
This PR adds dedicated access helpers for every member of the union that do a type check, renames the union to make direct accesses obvious, and removes almost all direct uses of the union members.
The process of adding the access helpers and auditing related code revealed multiple bugs; I've either fixed those bugs in this PR or added FIXME comments describing the bug.
For accesses that were obviously correct (I could see a type check right above the accessor call) I used the
_unchecked
variant to optimize out the type check.As a follow-up later on in the .NET 10 cycle we can remove the type check from these accessors to make them more efficient if we need to. I don't think we'll need to, though.