Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance logging causes extra unnecessary queries to the database after a request has finished #550

Closed
bubaflub opened this issue Feb 9, 2020 · 7 comments

Comments

@bubaflub
Copy link

bubaflub commented Feb 9, 2020

At work we started noticing a huge increase in our Rails app's memory usage in production. The kernel would OOM-kill our ruby processes as they seemed to use more and more memory over time without bound.

Short version: I believe 6cb2bc9 is the cause and I believe there are at least two bugs here:

  1. The call to stringify the Hash payload is causing instance variables in Controllers that are ActiveRecord relations that haven't actually "instantiated" or "resolved" to query the database. This behavior is the root cause of the increase in memory usage for us. This behavior isn't present when I revert 6cb2bc9.
  2. I don't have any logging configuration and my understanding was that the default logging level was INFO and that the block passed to logger.debug would not execute. It appears that the block is eagerly executing regardless of the log level.

Long version:

After putting in some mitigations, configuration changes, code changes, and analyzing the slow query log for inefficient queries, we discovered that some of these expensive DB requests were being made after a response had already been rendered and served to the client. Furthermore, the expensive queries coming from the slow query log didn't appear to correspond to an incoming request. We were also seeing queries that didn't correspond to any direct uses in our app. But there would be the queries generated from ActiveRecord relations stored in Controller or Pundit policy instance variables. Our app would LIMIT, paginate, filter, or otherwise restrict the base query but our code would never call the expensive ActiveRecord query itself.

Further investigation demonstrated that reverting the Gems airbrake 10.0.1 && airbrake-ruby 4.13 to airbrake 10.0.0 && airbrake-ruby 4.12 solved the extra expensive queries and solved our memory issue problems. We've since deployed that revert everywhere in production and haven't seen any more memory issues.

I was able to reproduce this in my local development environment by enabling Airbrake and performance stats gathering. I'd hit a simple app endpoint that would render a JSON response and a few seconds later would see additional queries from both the Rails server logs and from the MySQL general log. The MySQL general log would show these queries coming seconds later on a completely separate connection than the queries to generate the original response. For example:

200209 14:26:50   265 Connect   root@localhost on redacted
...
                  265 Query     SELECT `redacted`.`id` FROM `redacted` WHERE `redacted`.`redacted` = 5595
                  265 Query     SHOW FULL FIELDS FROM `redacted`
                  265 Query     SELECT  `redacted`.* FROM `redacted` INNER JOIN `redacted` ON `redacted`.`redacted` = `redacted`.`id` WHERE `redacted`.`type` IN ('redacted') AND `redacted`.`redacted` = 5595 AND `redacted`.`redacted` = 5595 AND `redacted`.`redacted` IN ('redacted', 'redacted', 'redacted') LIMIT 1
200209 14:27:06   263 Query     SELECT `redacted`.* FROM `redacted` INNER JOIN `redacted` ON `redacted`.`redacted` = `redacted`.`id` WHERE `redacted`.`type` IN ('redacted') AND `redacted`.`redacted` = 5595 AND `redacted`.`redacted` = 5595

The rails server log showed the same thing -- queries coming after the response had rendered:

[85e857fc-5efa-48e0-84a1-ec8b9b1d3847] [u:5108] Started GET "/redacted/redacted/redacted/redacted.json" for ::1 at 2020-02-09 15:02:10 -0600
[85e857fc-5efa-48e0-84a1-ec8b9b1d3847] [u:5108] Processing by Redacted::RedactedController#redacted as JSON
[85e857fc-5efa-48e0-84a1-ec8b9b1d3847] [u:5108]   Parameters: {"redacted"=>"redacted"}
[85e857fc-5efa-48e0-84a1-ec8b9b1d3847] [u:5108]     Redacted Load (0.3ms) SELECT  `redacted`.* FROM `redacted` WHERE `redacted`.`active` = 1 AND `redacted`.`id` = 5108 LIMIT 1
...
[85e857fc-5efa-48e0-84a1-ec8b9b1d3847] [u:5108] Completed 200 OK in 220ms (Views: 0.2ms | ActiveRecord: 34.2ms)
  Redacted Load (0.8ms) SELECT `redacted`.* FROM `redacted` INNER JOIN `redacted` ON `redacted`.`redacted` = `redacted`.`id` WHERE `redacted`.`redacted` IN ('redacted') AND `redacted`.`redacted` = 5595 AND `redacted`.`redacted` = 5595
  Redacted Load (0.9ms)  SELECT `redacted`.* FROM `redacted` INNER JOIN `redacted` ON `redacted`.`redacted` = `redacted`.`id` WHERE `redacted`.`redacted` IN ('redacted') AND `redacted`.`redacted` = 5595
  Redacted Load (0.8ms) SELECT `redacted`.* FROM `redacted` INNER JOIN `redacted` ON `redacted`.`redacted` = `redacted`.`id` WHERE `redacted`.`redacted` IN ('redacted') AND `redacted`.`redacted` = 5595 AND `redacted`.`redacted` = 5595

We have configuration to attach a UUID to the Rails logger for each request. What made debugging this very difficult is that there is no UUID attached to these logs, the queries trigger after a request-response pair, and Airbrake does not report the queries. In my development environment these queries are fast since I've emptied my tables but in production these were killing performance and eating tons of memory.

Walking through the airbrake and airbrake-ruby code I determined this was coming from Airbrake's ActionControllerPerformanceBreakdownSubscriber. Since performance stats are enqueued after a response is complete and sent asynchronously this explains the behavior we were seeing -- the DB connections would sometimes come from another Ruby thread and they were not directly associated with any incoming request, so no UUID.

Delving a bit further, I found that this was coming from airbrake-ruby and eventually from lib/airbrake-ruby/performance_notifier.rb's private method send.

    def send(sender, payload, promise)
      signature = "#{self.class.name}##{__method__}"
      raise "#{signature}: payload (#{payload}) cannot be empty. Race?" if payload.none?

      logger.debug { "#{LOG_LABEL} #{signature}: #{payload}" }
      ...
    end

By the time we're in this method payload is a Hash that contains references to the Controller objects that handled the request which themselves contain references to the instance variables that were set during the request.

I confirmed that by commenting out the logger.debug line I no longer saw the extra DB queries trigger. I also don't see the extra DB queries by completely reverting 6cb2bc9 we only log out resource_hash. That explains why reverting to airbrake 10.0.0 / airbrake-ruby 4.12 fixed our memory issues.

While I was debugging here I believe it is the act of string-ifying payload that causes the ActiveRecord relation objects to "resolve" and query the database.

I was surprised that the block given to logger.debug was triggering because the whole purpose of the block is to defer the expensive computation or avoid it entirely if our debug log level wouldn't print it out.

So I think there are at least two bugs here --

  1. The logger.debug call is still executing the code in the block.
  2. Printing the payload hash is incidentally causing other ActiveRecord relation objects to resolve.

(Note: we've also seen for this first time a can't add a new key into hash during iteration reported in #545. I suspect this might be related if in a separate thread the performance stats notifier is trying to iterate over objects in a Controller that might be handling other requests simultaneously.)

More notes:

In my example payload has 11 keys, the first 9 of which are Airbrake::Query, the tenth is an Airbrake::Request, and the last is an Airbrake::PerformanceBreakdown. I can verify that it's string-ifying this last one that is causing the queries to trigger.

None of the fields marked attr_accessor on Airbrake::PerformanceBreakdown appear to be problematic. It's the @stash instance variable that holds a reference to both the request and the user. The user hash is only a single level and not a problem. So it's coming from request which is an ActionDispatch::Request. The request has an @env which has a key action_controller.instance which has references to every instance variable on the controller. Trying to turn these instance variables in to strings causes the ActiveRecord relation to "resolve" and query the DB.

In my opinion I don't think we should be logging this at all -- the request has references to the top-level Rails.application, every piece of middleware, and all configuration settings. I don't think this will be too useful during debugging and could potentially trigger other ActiveRecord queries to resolve.

@bubaflub
Copy link
Author

bubaflub commented Feb 9, 2020

I believe the error where DEBUG logs are still being evaluated is coming from

@instance ||= ::Logger.new(File::NULL)

Since I don't have an Airbrake logger configured I'm getting this default logger that should throw away all logs. The README says that the default log level is WARN but that's not configured anywhere that I can find.

Created a new standard library Logger with just one argument defaults to log-level DEBUG:

=> #<Logger:0x00007faac42cdca8
 @default_formatter=#<Logger::Formatter:0x00007faac42cdc08 @datetime_format=nil>,
 @formatter=nil,
 @level=0,
 @logdev=
  #<Logger::LogDevice:0x00007faac42cdac8
   @dev=#<File:/dev/null>,
   @filename="/dev/null",
   @mon_count=0,
   @mon_mutex=#<Thread::Mutex:0x00007faac42cda00>,
   @mon_owner=nil,
   @shift_age=0,
   @shift_period_suffix="%Y%m%d",
   @shift_size=1048576>,
 @progname=nil>

Note: @level=0 is DEBUG:

[4] pry(main)> Logger::DEBUG
=> 0

This matches the docs for Ruby 2.4, 2.5, 2.6, and 2.7:

level

Logging severity threshold. Default values is Logger::DEBUG.

To get a Logger that defaults to WARN we really should pass a second parameter:

[6] pry(main)> l = ::Logger.new(File::NULL, level: Logger::WARN)
=> #<Logger:0x00007faacf923c00
 @default_formatter=#<Logger::Formatter:0x00007faacf9239f8 @datetime_format=nil>,
 @formatter=nil,
 @level=2,
 @logdev=
  #<Logger::LogDevice:0x00007faacf9239a8
   @dev=#<File:/dev/null>,
   @filename="/dev/null",
   @mon_count=0,
   @mon_mutex=#<Thread::Mutex:0x00007faacf923958>,
   @mon_owner=nil,
   @shift_age=0,
   @shift_period_suffix="%Y%m%d",
   @shift_size=1048576>,
 @progname=nil>

@kyrylo
Copy link
Contributor

kyrylo commented Feb 10, 2020

Quick response: are you running airbrake-ruby master? The commit you mentioned (6cb2bc9) hasn't been released yet.

kyrylo added a commit that referenced this issue Feb 10, 2020
Issue #550 mentions that README states that our default logger is `Logger::WARN`,
however the code never sets that level. The default value is `Logger::DEBUG`,
which is unwanted, since it generates a lot of clutter.
kyrylo added a commit that referenced this issue Feb 10, 2020
Issue #550 mentions that README states that our default logger is `Logger::WARN`,
however the code never sets that level. The default value is `Logger::DEBUG`,
which is unwanted, since it generates a lot of clutter.
kyrylo added a commit that referenced this issue Feb 10, 2020
Issue #550 mentions that README states that our default logger is `Logger::WARN`,
however the code never sets that level. The default value is `Logger::DEBUG`,
which is unwanted, since it generates a lot of clutter.
@kyrylo
Copy link
Contributor

kyrylo commented Feb 10, 2020

[...] the default logging level was INFO and that the block passed to logger.debug would not execute. It appears that the block is eagerly executing regardless of the log level.

Yeah, that was a bug/mishap. I dealt with it in #551, thanks!

As for the other issue, something doesn't sound right to me. As mentioned in #550 (comment) commit 6cb2bc9 is not a part of any release yet. My assumption is that you are running airbrake-ruby from master and I don't recommend that (although thank you for catching a bug early on 😁).

Next, you say that we print payload.

1

This is not true. On master We print resource_hash, which is free of any references to controller objects or whatnot:

def send(sender, payload, promise)
signature = "#{self.class.name}##{__method__}"
raise "#{signature}: payload (#{payload}) cannot be empty. Race?" if payload.none?
with_grouped_payload(payload) do |resource_hash, destination|
url = URI.join(
@config.host,
"api/v5/projects/#{@config.project_id}/#{destination}",
)
logger.debug { "#{LOG_LABEL} #{signature}: #{resource_hash}" }
sender.send(resource_hash, promise, url)
end

In airbrake-ruby v4.13.0 (and v4.12.0 as well) we do print payload, which is probably causing these memory issues for your app (but why only now, that line has been there since the birth of PerformanceNotifier?):

def send(sender, payload, promise)
signature = "#{self.class.name}##{__method__}"
raise "#{signature}: payload (#{payload}) cannot be empty. Race?" if payload.none?
logger.debug { "#{LOG_LABEL} #{signature}: #{payload}" }

Please clarify this bug report. As of now I see a few major inconsistencies, which complicate understanding of the issue. My current understanding is that there's a bug present in the current release, which was resolved on master (accidentally), but you seem to use master airbrake-ruby, so you shouldn't see this issue. 🤯

@yitznewton
Copy link

We also noticed a catastrophic increase in resource consumption recently when updating the airbrake gem (and by extension this one). We have not yet delved into the root cause.

@bubaflub
Copy link
Author

bubaflub commented Feb 10, 2020

Thank you for the quick response!

And sorry for the confusion on my part -- I got myself turned around in git history. I can see that the git tag for v4.13.0 is at 628c77e which won't include the later commit 6cb2bc9. I was wrong about that.

What I should have said was that in my local testing applying the later commit 6cb2bc9 solves the issue because it only logs resource_hash instead of the entire payload. And I believe the changes in #551 would also solve the issue we were seeing since we wouldn't execute the block passed to the debug log.

Reverting from airbrake 10.0.1 -> 10.0.0 (and thus from airbrake-ruby 4.13 -> 4.12) appears to have fixed our resource consumption in production. Though I am now unsure why this fixes our problem since, as you said, that debug-level log message is present in both 4.12 and 4.13.

@bubaflub
Copy link
Author

Ok, I think I understand now why we weren't seeing the extra queries and resource usage in 4.12 but we were in 4.13 even though the debug logging was always there. In f2726a6 all of the Structs (including PerformanceBreakdown) were changed to Classes with attr_accessor.

In 4.12 with a Struct an Airbrake::PerformanceBreakdown string-ifies as:

(byebug) payload.keys[-1].inspect
"#<struct Airbrake::PerformanceBreakdown method=\"GET\", route=\"/redacted(.:format)\", response_type=:json, groups={:db=>30.64400000000001, :view=>0.1960000954568386}, start_time=2020-02-10 10:27:45 -0600, end_time=2020-02-10 10:27:46 -0600, timing=222.91, time=2020-02-10 10:27:45 -0600>"

In 4.13 with a Class an Airbrake::PerformanceBreakdown string-ifies as:

#<Airbrake::PerformanceBreakdown:0x00007f9270665e90 @time_utc="2020-02-10T16:35:00+00:00", @method="GET", @route="/redacted(.:format)", @response_type=:json, @groups={:db=>32.33, :view=>0.2079999540001154}, @start_time=2020-02-10 10:35:02 -0600, @end_time=2020-02-10 10:35:03 -0600, @timing=242.673, @time=2020-02-10 10:35:02 -0600, @stash={:request=>#<ActionDispatch::Request:0x00007f9279b4a740 @env={...}

And from @env we'll recursively print everything including action_controller.instance which is a reference to the Controller that serviced the request and contains references to all instance variables. Trying to print out the instance variable values as a string will cause the ActiveRecord relations to trigger and query the DB.

I don't think f2726a6 is wrong by any means. It just had an unintended side-effect when combined with the default log level being DEBUG.

@kyrylo
Copy link
Contributor

kyrylo commented Feb 11, 2020

Thanks for the extra details and pinning down this bug. It makes sense that the class conversion is the culprit. It was hard to predict, though, as it seemed to be a harmless change.

In theory we could define a custom inspect on model classes such as PerformanceBreakdown but since we've already applied 6cb2bc9, there's no real need to do that anymore.

I am going to close this issue and release a new version of the library.

@kyrylo kyrylo closed this as completed Feb 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants