Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve authority_checker's performance: don't use exceptions as flow control for non-existing permissions #2117

Merged
merged 2 commits into from
Jan 23, 2024

Conversation

spoonincode
Copy link
Member

Periodically I've seen significant overhead handling exceptions in nodeos while it's live on EOS. It can even approach nearly 10% of main thread time in some of the worst cases. And these exceptions come about even when just applying blocks -- not processing speculative transactions.

What appears to be happening is contracts performing a send_inline causing the authority_checker to in some situations generate a fury of exceptions for missing eosio.code as it counts up weights.

This refactors the authority_checker to not lean on an exception to indicate a non-existing permission, a nullptr is returned indicating a non-existing permission instead.

@@ -225,19 +225,18 @@ namespace detail {
bool r = false;
typename permission_cache_type::iterator itr = cached_permissions.end();

bool propagate_error = false;
std::invoke_result_t<decltype(checker.permission_to_authority), const permission_level> auth = nullptr;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty weird; it comes about because permission_to_authority could return either a shared_authority or an authority (well, now either a shared_authority* or authority*). I want the scope to be outside the try to make handling the null check more straightforward so I couldn't use auto in this particular pattern.


auto res = cached_permissions.emplace( permission.permission, being_evaluated );
itr = res.first;
r = checker.satisfied( std::forward<decltype(*auth)>(*auth), cached_permissions, recursion_depth + 1 );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe the std::forward does anything with auth declared as it is now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 indeed

Comment on lines +234 to +235
if(!auth)
return total_weight;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't you just have added these two lines after line 230 in the original version?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the original 230, permission_to_authority calls get_permission() which throws when it can't find the permission. So, there is no execution after 230 until the catch (which then does the return total_weight if propagate_error is false).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then, how does this PR reduce the overhead of handling exceptions, if they still occur and we catch them as before?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original code was using get_permission(),

const permission_object& authorization_manager::get_permission( const permission_level& level )const
{ try {
EOS_ASSERT( !level.actor.empty() && !level.permission.empty(), invalid_permission, "Invalid permission" );
return _db.get<permission_object, by_owner>( boost::make_tuple(level.actor,level.permission) );
} EOS_RETHROW_EXCEPTIONS( chain::permission_query_exception, "Failed to retrieve permission: ${level}", ("level", level) ) }

This throws on either an invalid permission or when chainbase's get() throws due to it not existing.

The new code uses find_permission(),

const permission_object* authorization_manager::find_permission( const permission_level& level )const
{ try {
EOS_ASSERT( !level.actor.empty() && !level.permission.empty(), invalid_permission, "Invalid permission" );
return _db.find<permission_object, by_owner>( boost::make_tuple(level.actor,level.permission) );
} EOS_RETHROW_EXCEPTIONS( chain::permission_query_exception, "Failed to retrieve permission: ${level}", ("level", level) ) }

This only throws on an invalid permission -- chainbase's find() just returns a nullptr if it can't find the permission.

That's why the new code still handles a permission_query_exception,

std::invoke_result_t<decltype(checker.permission_to_authority), const permission_level> auth = nullptr;
try {
auth = checker.permission_to_authority( permission.permission );
}
catch( const permission_query_exception& ) {}

permission_to_authority() can still throw via find_permission()'s "Invalid permission" check. I'm not actually sure under what conditions that comes in to play (I've not seen it), but of course any refactor here needs to be exact 1:1 behavior unless we do a significant audit and the previous behavior was to treat that identically to a failed lookup (return total_weight).

It does appear like find_permission() maybe could be refactored to be nothrow by also returning a nullptr when an invalid permission is used too. But again, we'd have to investigate all uses of find_permission() very closely.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just wondering about the benefit of moving lines 234 to 239 out of the try block, which requires the awkward declaration of auth on line 228. My apologies if I'm still missing something obvious.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just wondering about the benefit of moving lines 234 to 239 out of the try block, which requires the awkward declaration of auth on line 228. My apologies if I'm still missing something obvious.

Never mind, got it, thanks Matt for the thorough explanation!

@ericpassmore
Copy link
Contributor

Note:start
group: PERFORMANCE
category: CONTRACTS
summary: Improve overall performance of authority_checker by removing exceptions. At times up to 10% of time on main thread spent dealing with exceptions. Most frequent exceptions are from authority_checker.
Note: end

@spoonincode spoonincode merged commit cf09e01 into main Jan 23, 2024
26 checks passed
@spoonincode spoonincode deleted the no_except_missing_perm branch January 23, 2024 18:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants