-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SqlServer: Massive performance degradation in PerformanceCounters Query #7236
Comments
@denzilribeiro Can you take a look? Is it possible statistics need rebuilt? |
What version are you seeing the degradation against? A recompile each time does eat CPU - seems like you got stuck with a bad plan. |
From what I read you cannot have statistics for table variables (which this script uses). Therefore the query optimizer always assumes an estimated row count of "1" for building the plan. I am not sure why adding the recompile option changes things then, but the difference is massive as described above. This is happening with the most recent version and can be reproduced over different SQL servers. Impact seems to increase significantly when having many databases/resource groups, maybe therefore it was not discovered earlier. |
In compat 150 some of that is changed ( later/latest versions and compat) part of reason also wasn't caught - and yes realized there was a table variable now. All queries had option recompile and was overkill, when removing it an masse, didn't remember this one had an Table variable sorry. Table variable deferred compilation I will submit a PR tomorrow to fix. |
If you could actual send me the Plan ( run the query and get an actual plan) would be great |
Thanks for the explanation, makes sense. I think for now I'll be sticking with my custom patched telegraf binary as I dont have SQL 2019 running yet. I've just invited you to a private repo that contains an execution plan from an affected system. That particular system has ~150 databases, ~20 resource pools, ~50 workload groups. The PerformanceCounter query returns around 2300 rows and takes 10 seconds query time without option(recompile). |
I can confirm this is now working as intended with the most recent release (1.14.2) |
Relevant telegraf.conf:
System info:
Steps to reproduce:
Expected behavior:
Similar performance to the telegraf version we used previously (from around mid 2019): in the ballpark on 500-800ms query duration for a busy server
Actual behavior:
Query durations between 60-90 seconds on the same busy server.
Additional info:
Running the old/new versions of the PerformanceCounters query manually shows the same behaviour: old query sub-second duration, new query 60+ seconds.
I've done some troubleshooting and it seems that the issue is due to a removed
OPTION (RECOMPILE)
query hint in the following commit: 1df88dd#diff-5359fe28fcc15fe8b504f2d8033ce59dL662Adding it back leads to similar execution times than before. Some quick research leads me to believe that the execution plan is optimized very poorly without that option as the row estimates are vastly off.
I wonder why it was removed and if there is any reason to not add it back.
The text was updated successfully, but these errors were encountered: