Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guard members of MonoType union & fix related bugs #111645

Open
wants to merge 26 commits into
base: main
Choose a base branch
from

Conversation

kg
Copy link
Member

@kg kg commented Jan 20, 2025

There is a class of issue where we fail to properly check the type of a MonoType before accessing the members of its data union, which can lead to type confusion or null pointer dereference bugs.

This PR adds dedicated access helpers for every member of the union that do a type check, renames the union to make direct accesses obvious, and removes almost all direct uses of the union members.

The process of adding the access helpers and auditing related code revealed multiple bugs; I've either fixed those bugs in this PR or added FIXME comments describing the bug.

For accesses that were obviously correct (I could see a type check right above the accessor call) I used the _unchecked variant to optimize out the type check.

As a follow-up later on in the .NET 10 cycle we can remove the type check from these accessors to make them more efficient if we need to. I don't think we'll need to, though.

@kg
Copy link
Member Author

kg commented Jan 21, 2025

tracing\eventpipe\gcdump\gcdump\gcdump.cmd test failure is a real bug in mono.

@@ -1296,6 +1296,92 @@ m_type_is_byref (const MonoType *type)
return type->byref__;
}

MONO_NEVER_INLINE void
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are going to have this in production than I believe we should adopt a similar pattern as we have when using MONO_CLASS_DEF_PRIVATE, either extend that or make a similar thing for MonoType, that way we would run all these additional checks when running ENABLE_CHECKED_BUILD_PRIVATE_TYPES (we should have CI legs doing that), but use inline wrappers with zero overhead on release builds.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to do actual full checks in Release unless benchmarks show that it meaningfully regresses performance, at least during the preview period, in order to flush out any remaining issues of this type

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would still add the same support for types as we have for class so we could do a MONO_TYPE_DEF_PRIVATE under ENABLED_CHECKED_BUILD_PRIVATE_TYPES to catch errors bypassing the getters on CI. We could also add support for the full check vs zero overhead wrappers as part of that and then just decide to initially opt into full checks during preview (and monitor perf hit during that time) and then have the option to move back to zero wrappers closer to release, but keep the full checks on CI builds running with the ENABLED_CHECKED_BUILD_PRIVATE_TYPES.

case MONO_TYPE_SZARRAY:
{
// FIXME: Previously was handled by below ARRAY block but that is incorrect; this implementation is speculative -kg
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you should be able to handle both cases similar, except that you can't assume mono_type is MonoArrayType it could also be MonoClass, so you would just need to handle the case where you get the type, rank and mono_type_parameter differently.

@kg
Copy link
Member Author

kg commented Jan 22, 2025

I went through and audited all the union accesses I could find and changed the ones that were blatantly and correctly guarded to _unchecked which should remove a lot of overhead while also giving us a way to add debug/checked guards if we want them for some reason later on.

I found some more bugs (mostly around SZARRAY handling) and added FIXME comments if I couldn't trivially fix them. I will probably try to fix all the FIXMEs I've found so far.

@kg kg changed the title Guard members of MonoType union behind type check helpers Guard members of MonoType union & fix related bugs Jan 23, 2025
@kg
Copy link
Member Author

kg commented Jan 24, 2025

The AOT compiler failure isn't generating usable stacks for some reason so I'll have to try and reproduce it locally in Debug configuration and hopefully get stacks there. I'm not sure if the aot compiler lacking stack traces when it crashes is intentional, I'm surprised by it.

EDIT: Lesson learned: If you fix a bug in mono_allocate_stack_slots, the function mono_allocate_stack_slots2 probably has the same bug and you should fix it there too

@kg
Copy link
Member Author

kg commented Jan 25, 2025

Build linux-x64 Release AllSubsets_Mono_LLVMFULLAOT_RuntimeTests llvmfullaot and its sibling look hung, but maybe they're just normally slow, or they're very slow because of the changes in this PR? I'm not familiar with these lanes and I can't tell whether the stdout on both of them stops at 1999 by coincidence or because azdo is truncating the actual output.

@kg kg marked this pull request as ready for review January 28, 2025 02:06
@kg
Copy link
Member Author

kg commented Jan 28, 2025

I took a middle ground between @lateralusX 's suggestion and my personal preference, by renaming the union but not having a configuration where direct union access is allowed. Having two modes would have complicated the code a lot so I decided not to do it yet. If we decide we need direct access for some reason in a wider set of code I can do the work to enable it, but I think it's better to enforce use of the getters and setters everywhere.

Most uses of the getters ended up being unchecked because they were obviously guarded by a switch case or an if, so I expect the performance impact to be minor in practice. If this leads to unacceptable performance regressions my plan is to just disable the checks by default in release builds as we approach the later previews.

My audit revealed a handful of places where it wasn't obvious whether the union was being used safely beforehand, but also wasn't immediately obvious that it was being used incorrectly. As a result I couldn't "fix" them easily and left them guarded, so customers may hit failures in their applications if those locations turn out to be bugs. i.e. a function with a name indicating it's used to manipulate generic parameters that isn't doing a type check before accessing the union as if it's a generic parameter.

There's one bare use of the raw union still left in the code because the union member was being passed to some sort of macro/function as an lvalue, and I couldn't come up with a straightforward fix for that.

Something I'd love to do is have even the _unchecked accessors do checks in debug builds, just in case my audits were incorrect and I applied _unchecked somewhere I shouldn't have. But I don't know if it's worth the trouble, so I didn't bother doing it. Would love to hear people's thoughts on that.

Copy link
Member

@lateralusX lateralusX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gone through all changes, mainly commenting on more places that could potentially use unchecked instead of checked gets since we already have checks of type before making calls to the types get function.

At first, I was a little reluctant to split into checked/unchecked, because it could introduce bugs by using the unchecked in wrong places or refactoring that might falsify the use of unchecked in a piece of code. Just having one implementation with full checks enabled in debug/checked build running on all our internal testing and then having a zero cost implementation in release would have mitigated this issue, but if we believe we would like to keep some of the checks in release builds, then we would need to split into two functions like you have in this PR, but then I believe it will be crucial to make the unchecked version working as the checked version in debug/checked builds so we can capture errors using the unchecked version udner false assumptions.

@@ -5693,8 +5693,8 @@ decode_value_compute_size (MonoType *t, int type, MonoDomain *domain, guint8 *bu
goto end;
}
} else if ((t->type == MONO_TYPE_GENERICINST) &&
mono_metadata_generic_class_is_valuetype (t->data.generic_class) &&
m_class_is_enumtype (t->data.generic_class->container_class)){
mono_metadata_generic_class_is_valuetype (m_type_data_get_generic_class (t)) &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we checked type to be MONO_TYPE_GENERICINST could these be unchecked calls?

@@ -5916,8 +5916,8 @@ decode_value_internal (MonoType *t, int type, MonoDomain *domain, guint8 *addr,
return ERR_INVALID_ARGUMENT;
}
} else if ((t->type == MONO_TYPE_GENERICINST) &&
mono_metadata_generic_class_is_valuetype (t->data.generic_class) &&
m_class_is_enumtype (t->data.generic_class->container_class)){
mono_metadata_generic_class_is_valuetype (m_type_data_get_generic_class (t)) &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we checked type to be MONO_TYPE_GENERICINST could these be unchecked calls?

@@ -5944,7 +5944,7 @@ decode_value (MonoType *t, MonoDomain *domain, gpointer void_addr, gpointer void
int type = decode_byte (buf, &buf, limit);

if (t->type == MONO_TYPE_GENERICINST && mono_class_is_nullable (mono_class_from_mono_type_internal (t))) {
MonoType *targ = t->data.generic_class->context.class_inst->type_argv [0];
MonoType *targ = m_type_data_get_generic_class (t)->context.class_inst->type_argv [0];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we checked type to be MONO_TYPE_GENERICINST could these be unchecked calls?

buffer_add_typeid (buf, domain, mono_class_from_mono_type_internal (type->data.method->ret));
for (int j = 0; j < type->data.method->param_count; ++j) {
buffer_add_typeid (buf, domain, mono_class_from_mono_type_internal (type->data.method->params[j]));
buffer_add_int (buf, 1 + m_type_data_get_method (type)->param_count);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we checked type to be MONO_TYPE_FNPTR could these be unchecked calls?

@@ -9683,9 +9683,9 @@ method_commands_internal (int command, MonoMethod *method, MonoDomain *domain, g
MonoGenericContext *context;
GString *res;
res = g_string_new ("");
mono_type_get_desc (res, m_class_get_byval_arg (type->data.generic_class->container_class), TRUE);
mono_type_get_desc (res, m_class_get_byval_arg (m_type_data_get_generic_class (type)->container_class), TRUE);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we checked type to be MONO_TYPE_GENERICINST could these be unchecked calls?

char *str = type_get_qualified_name (type, NULL);
len = strlen (str);
g_free (str);
} else if (type->type == MONO_TYPE_SZARRAY && type->data.klass->enumtype) {
char *str = type_get_qualified_name (m_class_get_byval_arg (type->data.klass), NULL);
} else if (type->type == MONO_TYPE_SZARRAY && m_type_data_get_klass (type)->enumtype) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unchecked?

@@ -3019,10 +3019,10 @@ reflection_methodbuilder_to_mono_method (MonoClass *klass,
for (gint32 i = 0; i < m->signature->param_count; ++i) {
MonoType *t = m->signature->params [i];
if (t->type == MONO_TYPE_MVAR) {
MonoGenericParam *gparam = t->data.generic_param;
MonoGenericParam *gparam = m_type_data_get_generic_param (t);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unchecked?

@@ -3722,7 +3722,7 @@ remove_instantiations_of_and_ensure_contents (gpointer key,
ERROR_DECL (lerror);
MonoError *error = already_failed ? lerror : data->error;

if ((type->type == MONO_TYPE_GENERICINST) && (type->data.generic_class->container_class == klass)) {
if ((type->type == MONO_TYPE_GENERICINST) && (m_type_data_get_generic_class (type)->container_class == klass)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unchecked?

@@ -10542,8 +10542,8 @@ mono_method_to_ir (MonoCompile *cfg, MonoMethod *method, MonoBasicBlock *start_b
int ro_type = ftype->type;
if (!addr)
addr = mono_static_field_get_addr (vtable, field);
if (ro_type == MONO_TYPE_VALUETYPE && m_class_is_enumtype (ftype->data.klass)) {
ro_type = mono_class_enum_basetype_internal (ftype->data.klass)->type;
if (ro_type == MONO_TYPE_VALUETYPE && m_class_is_enumtype (m_type_data_get_klass (ftype))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unchecked?

@@ -3618,8 +3619,8 @@ is_async_method (MonoMethod *method)
/* Do less expensive checks first */
sig = mono_method_signature_internal (method);
if (attr_class && sig && ((sig->ret->type == MONO_TYPE_VOID) ||
(sig->ret->type == MONO_TYPE_CLASS && !strcmp (m_class_get_name (sig->ret->data.generic_class->container_class), "Task")) ||
(sig->ret->type == MONO_TYPE_GENERICINST && !strcmp (m_class_get_name (sig->ret->data.generic_class->container_class), "Task`1")))) {
(sig->ret->type == MONO_TYPE_CLASS && !strcmp (m_class_get_name (m_type_data_get_generic_class (sig->ret)->container_class), "Task")) ||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unchecked?

@kg
Copy link
Member Author

kg commented Jan 31, 2025

Gone through all changes, mainly commenting on more places that could potentially use unchecked instead of checked gets since we already have checks of type before making calls to the types get function.

At first, I was a little reluctant to split into checked/unchecked, because it could introduce bugs by using the unchecked in wrong places or refactoring that might falsify the use of unchecked in a piece of code. Just having one implementation with full checks enabled in debug/checked build running on all our internal testing and then having a zero cost implementation in release would have mitigated this issue, but if we believe we would like to keep some of the checks in release builds, then we would need to split into two functions like you have in this PR, but then I believe it will be crucial to make the unchecked version working as the checked version in debug/checked builds so we can capture errors using the unchecked version udner false assumptions.

Thanks for the detailed review!
I will make the change so that the unchecked versions are still checked in debug/checked builds.

I forgot to mention that for a few modules I decided not to optimize them with unchecked - sre, metadata, debugger-agent. It doesn't seem like they are performance critical enough to be worth the effort/risk. But since you already audited them (sorry!) I'll apply the changes there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants