Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

winapisupp.cpp: Dramatically shrink the function pointer table #2841

Merged
merged 15 commits into from
Jul 12, 2022

Conversation

StephanTLavavej
Copy link
Member

Now that we've merged #2317, there are only a couple of functions we're dynamically loading.

  • Dramatically shrink the __KERNEL32Functions table.
    • The only direct uses of __KERNEL32Functions are in STOREFUNCTIONPOINTER and DYNAMICGETCACHEDFUNCTION.
    • The only use of DYNAMICGETCACHEDFUNCTION is in IFDYNAMICGETCACHEDFUNCTION.
    • There are only two uses of IFDYNAMICGETCACHEDFUNCTION, loading GetCurrentPackageId and GetSystemTimePreciseAsFileTime.
    • Therefore, all other table entries are unused.
    • The comment // This enum should not change was introduced by Remove XP (and Server 2003) support from STL #1194, but we were being overly cautious. The enum and table aren't mentioned outside of stl/src. Also, while the table was extern "C", it wasn't exported. Therefore, they're purely internal to the separately compiled STL.
  • De Morgan the guards around GetCurrentPackageId.
  • Move __KERNEL32Functions machinery to winapisupp.cpp.
    • It was within _CRT_BEGIN_C_HEADER / _CRT_END_C_HEADER.
  • Guard __KERNEL32Functions machinery with #if !defined(_ONECORE). Fuse declaration and definition.
    • The definition and initialization of __KERNEL32Functions were effectively guarded by #if !defined(_ONECORE), so we should extend that guard to all of the machinery.
    • When fusing the declaration and definition, we're within _CRT_BEGIN_C_HEADER so we don't need extern "C". We also don't need extern (the definition already has external linkage, and other TUs don't need this anyways).
    • For style, initialize with {} instead of = {nullptr}. They have the same effect here, but {} emphasizes that we want to value-initialize the entire table.
  • Use an unnamed namespace instead of extern "C" for the __KERNEL32Functions machinery.
    • We don't need this in other TUs. The unnamed namespace emphasizes that it's purely internal.
  • Cache GetSystemTimePreciseAsFileTime only when we need it.
    • Now we need to guard against eMaxKernel32Function being 0.
  • Fix return type in PFNGETCURRENTPACKAGEID typedef.
  • Fuse DYNAMICGETCACHEDFUNCTION into IFDYNAMICGETCACHEDFUNCTION.
    • Update comment.
    • Style: Directly test the function pointer.
  • Always use decltype in IFDYNAMICGETCACHEDFUNCTION.
    • We no longer need PFNGETSYSTEMTIMEPRECISEASFILETIME.
    • Also, always emit pf##function_name.
  • Use C++17 if-statement with initializer to improve scoping.
  • Avoid emitting empty statements after STOREFUNCTIONPOINTER.
  • winapisupp.cpp isn't using <cstdint> at all.
  • Simplify __crt_IsPackagedAppHelper() with range-for and auto.
  • awint.hpp doesn't need to declare __crtIsPackagedApp().
    • We still need to export it for bincompat.
  • We can always use decltype for GetCurrentPackageId.
    • Verified with an internal build.

This shrinks the STL's DLL by a couple of KB on x64:

File Size (bytes) Before After Difference
msvcp140_oss.dll 551,936 549,888 2,048
msvcp140d_oss.dll 914,432 911,872 2,560
libcpmt.lib 35,669,646 35,622,078 47,568
libcpmtd.lib 36,966,642 36,926,400 40,242

The only direct uses of `__KERNEL32Functions` are in `STOREFUNCTIONPOINTER` and `DYNAMICGETCACHEDFUNCTION`.

The only use of `DYNAMICGETCACHEDFUNCTION` is in `IFDYNAMICGETCACHEDFUNCTION`.

There are only two uses of `IFDYNAMICGETCACHEDFUNCTION`, loading `GetCurrentPackageId` and `GetSystemTimePreciseAsFileTime`.

Therefore, all other table entries are unused.

The comment `// This enum should not change` was introduced by GH 1194, but we were being overly cautious. The enum and table aren't mentioned outside of `stl/src`. Also, while the table was `extern "C"`, it wasn't exported. Therefore, they're purely internal to the separately compiled STL.
It was within `_CRT_BEGIN_C_HEADER` / `_CRT_END_C_HEADER`.
…Fuse declaration and definition.

The definition and initialization of `__KERNEL32Functions` were effectively guarded by `#if !defined(_ONECORE)`, so we should extend that guard to all of the machinery.

When fusing the declaration and definition, we're within `_CRT_BEGIN_C_HEADER` so we don't need `extern "C"`. We also don't need `extern` (the definition already has external linkage, and other TUs don't need this anyways).

For style, initialize with `{}` instead of ` = {nullptr}`. They have the same effect here, but `{}` emphasizes that we want to value-initialize the entire table.
…unctions` machinery.

We don't need this in other TUs. The unnamed namespace emphasizes that it's purely internal.
Now we need to guard against `eMaxKernel32Function` being `0`.
Update comment.

Style: Directly test the function pointer.
We no longer need `PFNGETSYSTEMTIMEPRECISEASFILETIME`.

Also, always emit `pf##function_name`.
We still need to export it for bincompat.
@StephanTLavavej StephanTLavavej added the performance Must go faster label Jul 2, 2022
@StephanTLavavej StephanTLavavej requested a review from a team as a code owner July 2, 2022 05:19
@StephanTLavavej StephanTLavavej added the affects redist Results in changes to separately compiled bits label Jul 2, 2022
@AlexGuteniev
Copy link
Contributor

AlexGuteniev commented Jul 2, 2022

In the past there were calls from other TUs, that's why macros and extern "C". I'm not sure ifvthis won't be neeeded if we'll expand the table again.

@StephanTLavavej
Copy link
Member Author

Going forward:

  • The table and its associated machinery can remain in winapisupp.cpp's unnamed namespace.
  • Functions using the table via IFDYNAMICGETCACHEDFUNCTION can be centrally defined in winapisupp.cpp.
  • Such functions can be declared in awint.hpp for other TUs to use.
    • This is what we did for __crtGetSystemTimePreciseAsFileTime(), except that we unnecessarily marked it _CRTIMP2 (i.e. exported), even though only stl/src/xtime.cpp needs it.

I expect that this will be uncommon, since it's rare for us to need new WinAPIs in the STL's DLL (since we can't add new exports to it); #2302 may be an example that needs it.

Note that because the table wasn't exported, it is useless to anything outside of the STL's DLL (in particular, it cannot be accessed by the import lib), so isolating it to one TU in stl/src isn't restricting its power. This is why stacktrace.cpp in the import lib needed to call GetProcAddress() directly, and so did tzdb.cpp in the atomic_wait satellite DLL.

@StephanTLavavej
Copy link
Member Author

I'm speculatively mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed.

Copy link
Member

@barcharcraz barcharcraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved with suggestions.

Comment on lines +21 to +30
enum wrapKERNEL32Functions {
#if !defined(_CRT_WINDOWS) && !defined(UNDOCKED_WINDOWS_UCRT)
eGetCurrentPackageId,
#endif // !defined(_CRT_WINDOWS) && !defined(UNDOCKED_WINDOWS_UCRT)

#if _STL_WIN32_WINNT < _WIN32_WINNT_WIN8
eGetSystemTimePreciseAsFileTime,
#endif // _STL_WIN32_WINNT < _WIN32_WINNT_WIN8

eMaxKernel32Function
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it strikes me as not the best idea to preprocess out enumerators like this without explicitly assigning values. If this is ever passed across ABI boundaries (it does not appear to be now) it might blow up pretty immediately

Pre-existing though

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, pre-existing. I believe that the risk is low (as you mentioned, this table and its enumerators aren't part of the ABI surface area - also, the conditions are inherently of the form "how is the STL being built"), and it's further mitigated by this PR, as the table and its enumerators are now confined to an unnamed namespace in a single TU.

That said, it now seems to me that the case for having a table is weaker than ever. We never iterate over the table, so we could simply use separate function pointers (which the compiler could still store densely). That would avoid any concerns about varying enumerator values, and actually make debugging potentially easier. I may explore this in a followup PR.

Comment on lines +36 to +42
#define STOREFUNCTIONPOINTER(instance, function_name) \
__KERNEL32Functions[e##function_name] = reinterpret_cast<PVOID>(GetProcAddress(instance, #function_name))

// Use this macro for retrieving a cached function pointer from a DLL
#define IFDYNAMICGETCACHEDFUNCTION(name) \
if (const auto pf##name = reinterpret_cast<decltype(&name)>(__KERNEL32Functions[e##name]); pf##name)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now that we are only doing this a few times it might be easier to just expand these macros.

If we're going to use macros for this we could also use xmacros to define the above enumeration and the below initialization function at the same time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm - the primary value of the macros is now enforcing the consistent use of decltype, so I would hesitate to eliminate them completely.

X-macros are definitely a potential technique, although they are quite powerful/opaque. As there are a small number of locations (definition, initialization, usage) and they're slowly-changing, I tend to think that we don't need such machinery at this time - especially now that the number of enumerators is so much smaller.

I'll think about refactoring this in a followup PR. I think it might be feasible to have one macro for definition (using decltype for individual variables) and one for initialization (ensuring that the GetProcAddress uses the correct spelling), and then directly refer to the variable for usage.

Thanks for bringing this up!

@StephanTLavavej StephanTLavavej merged commit 658a04b into microsoft:main Jul 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects redist Results in changes to separately compiled bits performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants