Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rack Integration Raises an Error During File Upload #62

Closed
mstruve opened this issue Jan 13, 2020 · 6 comments · Fixed by #73
Closed

Rack Integration Raises an Error During File Upload #62

mstruve opened this issue Jan 13, 2020 · 6 comments · Fixed by #73

Comments

@mstruve
Copy link

mstruve commented Jan 13, 2020

I recently noticed that we have had a few errors raised in our production environment from the beeline gem when a user tries to upload a file. They seem pretty deep in the rack implementation so I figured I would share here to see if you all had any suggestions for a fix.

Here is the error and backtrace:

NoMethodError: undefined method `empty?' for #<ActionDispatch::Http::UploadedFile:0x00000000085b38a0>
 rack.rb  31 block (2 levels) in call(...)
[GEM_ROOT]/gems/honeycomb-beeline-1.3.0/lib/honeycomb/integrations/rack.rb:31:in `block (2 levels) in call'
29       client.start_span(name: "http_request", serialized_trace: hny) do |span|
30         add_field = lambda do |key, value|
31           next unless value && !value.empty?
32 
33           span.add_field(key, value)

 [GEM_ROOT]/gems/honeycomb-beeline-1.3.0/lib/honeycomb/integrations/rails.rb:14 :in `tap`
 [GEM_ROOT]/gems/honeycomb-beeline-1.3.0/lib/honeycomb/integrations/rails.rb:14 :in `add_package_information`
 [GEM_ROOT]/gems/honeycomb-beeline-1.3.0/lib/honeycomb/integrations/rack.rb:40 :in `block in call`
 [GEM_ROOT]/gems/honeycomb-beeline-1.3.0/lib/honeycomb/client.rb:54 :in `start_span`
 [GEM_ROOT]/gems/honeycomb-beeline-1.3.0/lib/honeycomb/integrations/rack.rb:29 :in `call`
 [GEM_ROOT]/gems/actionpack-5.2.4.1/lib/action_dispatch/middleware/remote_ip.rb:81 :in `call`
 [GEM_ROOT]/gems/request_store-1.4.1/lib/request_store/middleware.rb:19 :in `call`
 [GEM_ROOT]/gems/rack-timeout-0.6.0/lib/rack/timeout/core.rb:151 :in `block in call`
 [GEM_ROOT]/gems/rack-timeout-0.6.0/lib/rack/timeout/support/timeout.rb:19 :in `timeout`
 [GEM_ROOT]/gems/rack-timeout-0.6.0/lib/rack/timeout/core.rb:150 :in `call`
 [GEM_ROOT]/gems/actionpack-5.2.4.1/lib/action_dispatch/middleware/request_id.rb:27 :in `call`
 [GEM_ROOT]/gems/rack-2.0.8/lib/rack/method_override.rb:22 :in `call`
 [GEM_ROOT]/gems/rack-2.0.8/lib/rack/runtime.rb:22 :in `call`
 [GEM_ROOT]/gems/activesupport-5.2.4.1/lib/active_support/cache/strategy/local_cache_middleware.rb:29 :in `call`
 [GEM_ROOT]/gems/actionpack-5.2.4.1/lib/action_dispatch/middleware/executor.rb:14 :in `call`
 [GEM_ROOT]/gems/actionpack-5.2.4.1/lib/action_dispatch/middleware/static.rb:127 :in `call`
 [GEM_ROOT]/gems/rack-2.0.8/lib/rack/sendfile.rb:111 :in `call`
 [GEM_ROOT]/gems/ddtrace-0.30.1/lib/ddtrace/contrib/rack/middlewares.rb:85 :in `call`
 [GEM_ROOT]/gems/railties-5.2.4.1/lib/rails/engine.rb:524 :in `call`
[GEM_ROOT]/gems/puma-4.3.1/lib/puma/configuration.rb:228 :in `call`
[GEM_ROOT]/gems/puma-4.3.1/lib/puma/server.rb:681 :in `handle_request`
[GEM_ROOT]/gems/puma-4.3.1/lib/puma/server.rb:472 :in `process_client`
[GEM_ROOT]/gems/puma-4.3.1/lib/puma/server.rb:328 :in `block in run`
[GEM_ROOT]/gems/puma-4.3.1/lib/puma/thread_pool.rb:134 :in `block in spawn_thread`

Our Honeycomb setup is very basic and we just use the default suggested tracking for Rails at this point.

Thanks in advance for the help and if there is anything else you need from me let me know!

@DanielHeath
Copy link

You should use blank? instead of empty? - blank? is defined on everything in rails.

@ajvondrak
Copy link
Contributor

While the surface cause of the error is Honeycomb invoking #empty? on an ActionDispatch::Http::UploadedFile instance, I was confused about how this error could be triggered in the first place.

Honeycomb::Rails#add_package_information only adds (up to) 3 fields request.controller, request.action, and request.route. And of those, only 2 are taken from the request parameters where you might find the UploadedFile. Specifically, Honeycomb uses the :action and :controller routing parameters:

yield "request.controller", request.params[:controller]
yield "request.action", request.params[:action]

What I'm guessing is that you have a form field somewhere named action or controller, which clobbers the Rails routing parameters, causing the file upload to blow up the Honeycomb instrumentation.

Does this sound plausible? Because otherwise I can't figure out how your parameter is getting into the beeline's span in the first place. A cursory search of the dev.to source shows this partial, which looks like it might be generating the parameter named based on dynamic data (?):

  <%= f.file_field field.attribute %>

But that's just me taking a shot in the dark. The term "action" seems to crop up in your source code a lot, too; I don't know if there are any associated file uploads with that parameter name.


As for a solution, the beeline could technically handle unexpected types better. Something simple like

diff --git a/lib/honeycomb/integrations/rack.rb b/lib/honeycomb/integrations/rack.rb
index f3808d4..19b5188 100644
--- a/lib/honeycomb/integrations/rack.rb
+++ b/lib/honeycomb/integrations/rack.rb
@@ -28,9 +28,7 @@ module Honeycomb
       hny = env["HTTP_X_HONEYCOMB_TRACE"]
       client.start_span(name: "http_request", serialized_trace: hny) do |span|
         add_field = lambda do |key, value|
-          next unless value && !value.empty?
-
-          span.add_field(key, value)
+          span.add_field(key, value) if valuable?(value)
         end
 
         extract_fields(env, RACK_FIELDS, &add_field)
@@ -58,6 +56,14 @@ module Honeycomb
       end
     end
 
+    def valuable?(value)
+      if value.respond_to?(:empty?)
+        !value.empty?
+      else
+        !value.nil?
+      end
+    end
+
     # Rack middleware
     class Middleware
       include Rack

But if this is the error you're getting, I'm afraid you'd just get nonsensical strings like #<ActionDispatch::Http::UploadedFile:0x00000000085b38a0> in your request.action field, which doesn't truly solve the problem. Stops hard errors from happening, though, so still potentially worthwhile.

Alternatively, I wonder if there's a way to get at Rails routing information without having to look through the fraught request parameters. 🤔 If so, the beeline should almost certainly prefer to use that method (if my theory is right that action/controller can be clobbered).

@mstruve
Copy link
Author

mstruve commented Feb 19, 2020

Good catch! That information about how you parse the controller actions is SUPER HELPFUL!

I went back and looked at the Honeybadger involving the error and sure enough, the action param is getting overridden. I am going to dig further into this and figure out if we can prevent that overwritten action from happening.

{
  "action" => "#<ActionDispatch::Http::UploadedFile>",
  "submitter" => "#<ActionDispatch::Http::UploadedFile>",
  "overwriteexistingtheme" => "#<ActionDispatch::Http::UploadedFile>",
  "page" => "xyz"
}

I do think preventing the hard errors from rising up and better handling the unexpected types would probably be a good idea. Then people aren't being thrown 500s but rather seeing the issue in unexpected data rather than unexpected errors. At least that is how I would prefer it. It might also be useful to log a warning or something if you run into an action that is unexpected to help people debug.

@ajvondrak
Copy link
Contributor

The plot thickens... 🤔

I think a way we could fix this would be to use ActionDispatch::Request#path_parameters instead of #params, since that method seems to contain the subset of information we're looking for in the beeline (without being touched by the get/post parameters).

However, trying to write a failing unit test to the effect, I can't actually reproduce the issue! 😱 Just trying something simple like this passes no problem:

diff --git a/spec/honeycomb/integrations/rails_spec.rb b/spec/honeycomb/integrations/rails_spec.rb
index 9a0ffc0..52a3913 100644
--- a/spec/honeycomb/integrations/rails_spec.rb
+++ b/spec/honeycomb/integrations/rails_spec.rb
@@ -37,6 +37,7 @@ if defined?(Honeycomb::Rails)
 
         app.routes.draw do
           get "/hello/:name" => "test#hello"
+          post "/goodbye" => "test#goodbye"
         end
       end
     end
@@ -45,6 +46,10 @@ if defined?(Honeycomb::Rails)
       def hello
         render plain: "Hello World!"
       end
+
+      def goodbye
+        render plain: "Goodbye World!"
+      end
     end
 
     describe "a standard request" do
@@ -68,6 +73,15 @@ if defined?(Honeycomb::Rails)
       it_behaves_like "event data"
     end
 
+    describe "a request that clobbers routing parameters" do
+      before { post "/goodbye", action: "wave" }
+
+      it "still sets the request.action field correctly" do
+        event = libhoney_client.events.last
+        expect(event.data["request.action"]).to eq "goodbye" # not "wave"
+      end
+    end
+
     if VERSION >= Gem::Version.new("5")
       describe "a bad request" do
         before do

It seems that ActionDispatch::Request#params should always be overwriting the :action + :controller values with the routing params, specifically due to

        params = begin
                   request_parameters.merge(query_parameters)
                 rescue EOFError
                   query_parameters.dup
                 end
        params.merge!(path_parameters)

The dev.to code is complicated enough that I'm not really sure how to track down who's overriding the action parameter - and how such a thing is possible! Maybe some middleware from some gem is mangling the action_dispatch.request.parameters variable in the Rack env? Because then Rails would just return whatever value is cached in there.

I'm not sure if this would happen any way other than a rogue middleware, because the beeline is even constructing a brand-spanking-new ActionDispatch::Request to work with, so any mangling that your controller does (e.g., with a before_action) would be performed on a separate Request instance. Though I guess maybe they'd still share the params via the hash stashed in the Rack env...blargh. 😵

I look forward to you tracking down how this is happening, since it doesn't seem like Rails should be letting request.params[:action] be clobbered in the first place. 😅 If we can get a test that reproduces this reliably, then we would be able to see if various fixes work (e.g., using #path_parameters). Or if this is due to some gem mangling the routing parameters, maybe a better idea would be to fix the behavior with that gem (while of course preventing hard errors inside the beeline).

@mstruve
Copy link
Author

mstruve commented Feb 20, 2020

After some investigation, I am pretty sure this request is not originating from our Rails app. From looking at the Honeybadger, there is no referrer for the request, the url the user is making a POST to is http://dev.to/wp-admin/admin-post.php?page=wysija_campaigns and the browser is python-requests/2.22.0. All of that leads me to believe this is a pentester which I totally should have figured out sooner but I have been trying to triage these Honeybadgers quickly and completely missed those hints.

Given that conclusion, I don't think there is really anything here to fix. My solution at this point is going to be to mark the Honeybadger as ignored. We can definitely close this issue since I really don't want to waste any more of your time. Once again, I am so sorry I didn't see those details sooner!

@ajvondrak
Copy link
Contributor

Eureka! That was the clue we needed! 🎉

Rails can only set the action/controller parameters if the route was recognized in the first place, since #path_parameters is set by the Rails router. So ultimately, unrecognized requests result in #params being merged with an empty hash, meaning the get/post parameters from the HTTP request remain untouched.

We can definitely close this issue since I really don't want to waste any more of your time. Once again, I am so sorry I didn't see those details sooner!

Absolutely not a waste of time. I'm so happy to finally figure out the problem, and it took your sleuthing to narrow everything down. Thanks so much for satisfying my sense of order, lol. Writing a PR to fix this issue now. ❤️

P.S.: I think this shift away from get/post parameter parsing will be a vast improvement over the workarounds of #39 & #50, too. 😍

ajvondrak added a commit to ajvondrak/beeline-ruby that referenced this issue Feb 22, 2020
* Remove extraneous spans that were being generated due to the
  ActiveSupport::Notifications subscriptions. There's no need to
  actually configure these from the tests, since we already have the
  active_support_spec.rb, and it's not like the config in the
  rails_spec.rb even matched the one from the Rails generator or
  anything. These just get in the way of testing the Rails integration,
  since the custom fields we'd want to make assertions about only wind
  up on the root span anyway.

* Factor out assertions into a set of shared examples, and actually
  start making assertions about the `request.{action,controller,route}`
  fields that the integration adds (which weren't even being tested for
  previously 😱).

* Rework the "bad request" tests that reproduce honeycombio#49. These tests will
  now break against the old implementation, since they assert that
  routing information is still present (whereas honeycombio#50 wipes the fields out
  with error messages). There are ways to get the routing information
  without ever trying to parse the GET/POST parameters, so we can still
  save this use case. Furthermore, the only Rails 5+ specific changes
  are to whether or not Rails responds with an HTTP 400, so we needn't
  wipe out the entire `describe` block for Rails < 5.

* Add tests to reproduce the issue described in honeycombio#62, where unrecognized
  routes may cause the old implementation to erroneously fall back to
  the GET/POST parameters for routing information.

* Add tests that reproduce honeycombio#31. The bug is actually in the twirp gem,
  whose middleware fails to rewind the `rack.input` after reading it.
  I've verified these regression tests against the old version of the
  Rails integration that didn't check `if request.raw_post`, and they
  fail appropriately. I wanted to have this test in here to have
  confidence in changing the implementation of the Rails integration.
  The `raw_post` workaround of honeycombio#39 is unnecessary if we actually never
  look at the GET/POST parameters in the first place (no parsing, no
  problem).
ajvondrak added a commit to ajvondrak/beeline-ruby that referenced this issue Feb 22, 2020
This refactoring avoids ever parsing GET/POST parameters to obtain the
Rails routing information required for the `request.action`,
`request.controller`, and `request.route` fields.

This is accomplished in two ways:

1. Use `ActionDispatch::Request#path_parameters` instead of
   `ActionDispatch::Request#params`. The former is the subset of data we
   need for `request.action` + `request.controller`, and in fact are the
   values merged into the latter hash after `#params` parses the normal
   GET/POST parameters. So we can avoid the extraneous parsing by just
   getting the path parameters straight from the horse's mouth.

2. Call `ActionDispatch::Journey::Router#recognize` on a simplified
   request that *only* contains the necessary routing information. The
   `#recognize` method will yield the `ActionDispatch::Request#params`
   when it matches a route, which again triggers the GET/POST parsing.
   But we don't care about the HTTP parameters. So instead I copied the
   approach of `ActionDispatch::Routing::RouteSet#recognize_path`, which
   reconstructs a simplified request with just the information that it
   needs. (Unfortunately we can't use `#recognize_path` directly because
   instead of returning the route it returns the path parameters, which
   we can already get directly from `#path_parameters`.)

This accomplishes several things:

* Fixes honeycombio#62, since we don't get a chance to erroneously fall back to
  GET/POST parameters in the event that the routing information is
  missing (due to requests to unrecognized routes).

* Fixes honeycombio#31, since the bug ultimately comes from the twirp gem causing
  POST parameter parsing to fail. But if we never parse the HTTP
  parameters, no problem. This is a more direct workaround than honeycombio#39, but
  ultimately is a problem that should be fixed upstream.

* Fixes honeycombio#49, since the parameter encoding won't matter if we never try
  to parse them. Renders honeycombio#50 unnecessary, since we won't have a chance
  to trigger the exceptions in question.

* Gives us more data in more situations. The new implementation of the
  `request.route` field now comes up non-nil in the event of errors
  because we use the `ActionDispatch::Request#original_fullpath` and not
  the `#path_info` that gets [rewritten by
  `ActionDispatch::ShowExceptions`](https://github.com/rails/rails/blob/758e4f8406e680a6cbf21b170749202c537a2576/actionpack/lib/action_dispatch/middleware/show_exceptions.rb#L49).

* Gives us more data in more Rails versions. With the addition of actual
  tests for the field values, it became clear that `request.route` was
  missing in Rails 4, simply because of an interface change introduced
  in Rails 5.
martin308 pushed a commit that referenced this issue Mar 9, 2020
This refactoring avoids ever parsing GET/POST parameters to obtain the
Rails routing information required for the `request.action`,
`request.controller`, and `request.route` fields.

This is accomplished in two ways:

1. Use `ActionDispatch::Request#path_parameters` instead of
   `ActionDispatch::Request#params`. The former is the subset of data we
   need for `request.action` + `request.controller`, and in fact are the
   values merged into the latter hash after `#params` parses the normal
   GET/POST parameters. So we can avoid the extraneous parsing by just
   getting the path parameters straight from the horse's mouth.

2. Call `ActionDispatch::Journey::Router#recognize` on a simplified
   request that *only* contains the necessary routing information. The
   `#recognize` method will yield the `ActionDispatch::Request#params`
   when it matches a route, which again triggers the GET/POST parsing.
   But we don't care about the HTTP parameters. So instead I copied the
   approach of `ActionDispatch::Routing::RouteSet#recognize_path`, which
   reconstructs a simplified request with just the information that it
   needs. (Unfortunately we can't use `#recognize_path` directly because
   instead of returning the route it returns the path parameters, which
   we can already get directly from `#path_parameters`.)

This accomplishes several things:

* Fixes #62, since we don't get a chance to erroneously fall back to
  GET/POST parameters in the event that the routing information is
  missing (due to requests to unrecognized routes).

* Fixes #31, since the bug ultimately comes from the twirp gem causing
  POST parameter parsing to fail. But if we never parse the HTTP
  parameters, no problem. This is a more direct workaround than #39, but
  ultimately is a problem that should be fixed upstream.

* Fixes #49, since the parameter encoding won't matter if we never try
  to parse them. Renders #50 unnecessary, since we won't have a chance
  to trigger the exceptions in question.

* Gives us more data in more situations. The new implementation of the
  `request.route` field now comes up non-nil in the event of errors
  because we use the `ActionDispatch::Request#original_fullpath` and not
  the `#path_info` that gets [rewritten by
  `ActionDispatch::ShowExceptions`](https://github.com/rails/rails/blob/758e4f8406e680a6cbf21b170749202c537a2576/actionpack/lib/action_dispatch/middleware/show_exceptions.rb#L49).

* Gives us more data in more Rails versions. With the addition of actual
  tests for the field values, it became clear that `request.route` was
  missing in Rails 4, simply because of an interface change introduced
  in Rails 5.

---

Revamp the Rails integration's tests



* Remove extraneous spans that were being generated due to the
  ActiveSupport::Notifications subscriptions. There's no need to
  actually configure these from the tests, since we already have the
  active_support_spec.rb, and it's not like the config in the
  rails_spec.rb even matched the one from the Rails generator or
  anything. These just get in the way of testing the Rails integration,
  since the custom fields we'd want to make assertions about only wind
  up on the root span anyway.

* Factor out assertions into a set of shared examples, and actually
  start making assertions about the `request.{action,controller,route}`
  fields that the integration adds (which weren't even being tested for
  previously 😱).

* Rework the "bad request" tests that reproduce #49. These tests will
  now break against the old implementation, since they assert that
  routing information is still present (whereas #50 wipes the fields out
  with error messages). There are ways to get the routing information
  without ever trying to parse the GET/POST parameters, so we can still
  save this use case. Furthermore, the only Rails 5+ specific changes
  are to whether or not Rails responds with an HTTP 400, so we needn't
  wipe out the entire `describe` block for Rails < 5.

* Add tests to reproduce the issue described in #62, where unrecognized
  routes may cause the old implementation to erroneously fall back to
  the GET/POST parameters for routing information.

* Add tests that reproduce #31. The bug is actually in the twirp gem,
  whose middleware fails to rewind the `rack.input` after reading it.
  I've verified these regression tests against the old version of the
  Rails integration that didn't check `if request.raw_post`, and they
  fail appropriately. I wanted to have this test in here to have
  confidence in changing the implementation of the Rails integration.
  The `raw_post` workaround of #39 is unnecessary if we actually never
  look at the GET/POST parameters in the first place (no parsing, no
  problem).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants