Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CacheResponsesPlugin] Support serving out of cache #319

Closed
abhinavsingh opened this issue Mar 24, 2020 · 19 comments
Closed

[CacheResponsesPlugin] Support serving out of cache #319

abhinavsingh opened this issue Mar 24, 2020 · 19 comments
Labels
Plugin Related to plugin code (not core) Proposal Proposals for futuristic feature requests

Comments

@abhinavsingh
Copy link
Owner

Currently cache_responses.py plugin only caches but doesn't serves out of cached data.

Per @trianta2 request here it will be a good idea to support this in future releases.

However, for a production grade usage, this feature will require significant work. If someone is interested in taking a stab at this one, please feel free to reach out. Happy to discuss it further.

@abhinavsingh abhinavsingh self-assigned this Mar 24, 2020
@abhinavsingh abhinavsingh added Proposal Proposals for futuristic feature requests and removed Enhancement labels Mar 25, 2020
@ja8zyjits
Copy link

On the outset it appears as a straight forward approach to map the cached content to the saved url.

Is that the only thing that needs to be done? Can you elaborate the production grade requirement?

@abhinavsingh
Copy link
Owner Author

Cache invalidation is the key for production grade caching solutions. E.g. Upstream servers will serve a URL with some cache control/expiry headers. An ideal cache server must invalidate cache and fetch/serve fresh response after cache expiry. Over the course of HTML specification upgrades, there are just too many cache headers to take care of e.g. vary, etag, ... etc

@ja8zyjits
Copy link

That would mean replicating the entire HTTP cache protocol with active support for specification changes over every new releases?

@abhinavsingh
Copy link
Owner Author

Not necessarily. We can start adding subset of features (enabled explicitly via flags) as needed by the community.

Probably also a good idea to add references to section(s) of RFC that cache plugin implements in the documentation.

Happy to discuss if you have any particular scenario in mind.

@ja8zyjits
Copy link

If we start small, from where must we begin?
By RFC did you mean this https://tools.ietf.org/html/rfc7234 ?

@abhinavsingh
Copy link
Owner Author

@ja8zyjits Yep that's the RFC we can follow. Ideally, we must pick a use case that your application currently needs. If you would like to start working towards it generically, I'll recommend following steps:

  1. Start reading the RFC as much as you can :) Especially go through section 2, 3, 4, 5. Section 2 and 3 are rather short introductions. Section 4 describes various scenarios and strategies. Section 5 provide details about various cache headers and how to handle them.

  2. For implementation pick a use case from section 5. E.g. handling header fields like Age, Cache-Control, max-age, no-store, no-cache, must-revalidate, ... etc

Let me know where you would like to start. Thank you!!!

@ja8zyjits
Copy link

@abhinavsingh sure, would look into them.
Meanwhile to test these things, I must make the TLS interception work.
Would keep you posted.

@anthonykgross
Copy link

What's the point to store chunks and never deliver them for the same given request ? I guess it's most a logging plugin instead of caching one :)

@abhinavsingh
Copy link
Owner Author

abhinavsingh commented May 20, 2020

What's the point to store chunks and never deliver them for the same given request ? I guess it's most a logging plugin instead of caching one :)

Correct, unless we start to dispatch back responses to the client, it's a logging plugin, so yes CacheResponsesPlugin is actually LogResponsesPlugin :)

However, how to start serving response back out of cache is precisely what @ja8zyjits is working towards. We have discussed in some detail why CacheResponsesPlugin cannot start serving responses out of cache. Yes, we can serve cached content blindly for test cases but not yet ideal for real life traffic.

@iorlas
Copy link

iorlas commented Jun 22, 2020

Just my 2 cents...
By definition caching felt like it should actually cache the results. But feels like instead of caching, it just dumps the data like tcpdump(correct me if caching plugin actually does something...), which has nothing to do with actual caching.

Even tho it feels logical to respect caching policies and headers in HTTP proto to handle caching in proxy server, for my usecase, it would be wonderful to have KV-based caching. This is why I started to use this product in the first place. Attempted to use... and failed.
Feels like I'll need to write my own plugin then.

@abhinavsingh
Copy link
Owner Author

abhinavsingh commented Jun 23, 2020 via email

@ja8zyjits
Copy link

ja8zyjits commented Jun 23, 2020 via email

@pasccom
Copy link
Contributor

pasccom commented Aug 14, 2020

Hello,
For automatic testing purposes, I modified the CacheResponsesPlugin so that it can serve response from the cache. However, I did not implement any requirements of the RFC, since I want a fix response, as I do not want my tests to break when upstream modifies the response.
The implementation I did is able to answer GET and POST requests over HTTP and HTTPS. It uses a very basic mechanism for request storage (I wanted the requests to be stored as text so that it is simple to read), but POST request bodies are hashed (for compactness).
Would you like me to propose my implementation? Note I think it requires a few modifications to the code in proxy\http\proxy\server.py (at least in master branch, I did not test develop yet) since I want to be able to work without upstream server.
Best regards,
Pascal

@abhinavsingh
Copy link
Owner Author

@pasccom For test cases, I previously took a stab at similar feature. Idea was to record and replay responses by enabling CacheResponsesPlugin on the fly during test execution. We can probably revisit it and build on top of it OR start a fresh implementation.

See https://github.com/abhinavsingh/proxy.py/blob/develop/proxy/testing/test_case.py#L80 and https://github.com/abhinavsingh/proxy.py/blob/develop/tests/testing/test_embed.py#L56. It's pretty incomplete and will require more work for e2e correctness. Probably you already implemented some of those feature sets.

Internally I envisioned something like this:

  1. Users can enable VCR functionality during test execution.
  2. When enabled, VCR also enabled CacheResponsesPlugin.
  3. Instead of simply caching, now CacheResponsesPlugin also serves back cached response during execution of tests.
  4. Optionally, when enabling VCR, user can configure it e.g. path to cache directory, cache expiry etc.

Please propose your implementation in a PR. I would like to better understand reasons behind modifications within server.py. I feel, with above mentioned strategy a standalone plugin (clone of CacheResponsesPlugin) can serve all the functionalities.

Best

@abhinavsingh
Copy link
Owner Author

Also see related issue tracker for VCR functionality #184

@abhinavsingh
Copy link
Owner Author

Thinking of it, serving out of cache for test cases can also be a clone of MockRestApiPlugin. Whether this plugin serve from cached responses or from developer provided API template, it will still serve same purpose for assertions in the test cases. See https://github.com/abhinavsingh/proxy.py#mockrestapiplugin

Throwing a few options that can inspire implementation of this feature set 👍

@pasccom
Copy link
Contributor

pasccom commented Aug 14, 2020

Please propose your implementation in a PR. I would like to better understand reasons behind modifications within server.py. I feel, with above mentioned strategy a standalone plugin (clone of CacheResponsesPlugin) can serve all the functionalities.

For HTTP requests, yes, it does. But, I needed to intercept HTTPS requests, which did not work. I will test with develop version instead of v2.2.0. Maybe the bugs have been fixed. I will also have a deeper look in MockRestApiPlugin as you suggest.

I will test my implementation on the develop branch and then send a few PRs with my implementation. Then we should be able to implement other cache management features.

@pasccom
Copy link
Contributor

pasccom commented Aug 17, 2020

Hello,
I think I have a first version of the full cache. I pushed the changes on my fork. Any comments are welcome.

I think there is still some work to be done for it to work correctly (even for HTTP requests):

  • I did not implement ENABLED feature
  • I noticed that sometimes the host of a request is not set, which will break the code. I have to check when this occurs.

I will write some more thorough tests tomorrow, implement the missing features and create a PR.
By the way, are there examples of tests with pipelined HTTP requests? I would like to reuse the code.

@abhinavsingh
Copy link
Owner Author

@pasccom Cool.

  • We got a few pipelined requests test cases, but they were written mostly for HttpParser.
    def test_pipelined_response_parse(self) -> None:
    response = build_http_response(
    httpStatusCodes.OK, reason=b'OK',
    headers={
    b'Content-Length': b'15'
    },
    body=b'{"key":"value"}',
    )
    self.assert_pipeline_response(response)
    def test_pipelined_chunked_response_parse(self) -> None:
    response = build_http_response(
    httpStatusCodes.OK, reason=b'OK',
    headers={
    b'Transfer-Encoding': b'chunked',
    b'Content-Type': b'application/json',
    },
    body=b'f\r\n{"key":"value"}\r\n0\r\n\r\n'
    )
    self.assert_pipeline_response(response)
    def assert_pipeline_response(self, response: bytes) -> None:
    self.parser = HttpParser(httpParserTypes.RESPONSE_PARSER)
    self.parser.parse(response + response)
    self.assertEqual(self.parser.state, httpParserStates.COMPLETE)
    self.assertEqual(self.parser.body, b'{"key":"value"}')
    self.assertEqual(self.parser.buffer, response)
    # parse buffer
    parser = HttpParser(httpParserTypes.RESPONSE_PARSER)
    parser.parse(self.parser.buffer)
    self.assertEqual(parser.state, httpParserStates.COMPLETE)
    self.assertEqual(parser.body, b'{"key":"value"}')
    self.assertEqual(parser.buffer, b'')
  • ENABLED is ok to drop if we have an alternative
  • Regarding missing host headers, per spec, dropping such requests is the right thing to do. See screenshot below:
Screen Shot 2020-08-18 at 9 48 22 AM

@abhinavsingh abhinavsingh removed their assignment Nov 10, 2021
@abhinavsingh abhinavsingh added the Plugin Related to plugin code (not core) label Nov 10, 2021
Repository owner locked and limited conversation to collaborators Nov 24, 2021

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
Plugin Related to plugin code (not core) Proposal Proposals for futuristic feature requests
Projects
None yet
Development

No branches or pull requests

5 participants