Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache accepts Instance in Request Methods? #5906

Open
IamLizu opened this issue Sep 4, 2024 · 4 comments
Open

Cache accepts Instance in Request Methods? #5906

IamLizu opened this issue Sep 4, 2024 · 4 comments
Assignees

Comments

@IamLizu
Copy link
Member

IamLizu commented Sep 4, 2024

The current implementation of the req.accepts, req.acceptsEncodings, and req.acceptsEncoding methods in lib/request.js creates a new accepts instance for each call.

Am I correct to think that this can lead to redundant object creation and parsing overhead, especially in scenarios where these methods are called multiple times within the same request lifecycle?

If caching the accepts instance seems correct to reduce overhead and help with faster execution, I am willing to create PR and benchmark comparisons for this. However, since I have very little idea about the overall scenario I need a confirmation of go / no go.

@IamLizu IamLizu self-assigned this Sep 4, 2024
@IamLizu
Copy link
Member Author

IamLizu commented Sep 5, 2024

I conducted a performance comparison to evaluate the impact of caching the accepts instance in the req methods. The benchmark code used for this comparison can be found in this Gist.

Benchmark Results

Benchmark without caching:
req.accepts: 854.0032 ms
req.acceptsEncodings: 114.8911 ms
req.acceptsCharsets: 185.8458 ms
req.acceptsLanguages: 222.8439 ms

Benchmark with caching:
req.accepts: 758.2582 ms
req.acceptsEncodings: 55.7752 ms
req.acceptsCharsets: 68.0283 ms
req.acceptsLanguages: 93.0939 ms

Benchmark without caching:
req.accepts: 921.3947 ms
req.acceptsEncodings: 127.5885 ms
req.acceptsCharsets: 193.3242 ms
req.acceptsLanguages: 206.9786 ms

Benchmark with caching:
req.accepts: 808.1724 ms
req.acceptsEncodings: 73.8829 ms
req.acceptsCharsets: 77.4844 ms
req.acceptsLanguages: 134.1124 ms

Benchmark without caching:
req.accepts: 856.8224 ms
req.acceptsEncodings: 136.1057 ms
req.acceptsCharsets: 129.9873 ms
req.acceptsLanguages: 219.2094 ms

Benchmark with caching:
req.accepts: 829.7202 ms
req.acceptsEncodings: 99.3029 ms
req.acceptsCharsets: 91.8122 ms
req.acceptsLanguages: 113.6522 ms

Summary of Results

  • req.accepts: Improvement of approximately 10-15%.
  • req.acceptsEncodings: Improvement of approximately 50-60%.
  • req.acceptsCharsets: Improvement of approximately 50-60%.
  • req.acceptsLanguages: Improvement of approximately 40-50%.

Trade-offs

  1. Performance Impact: The improvements are non-trivial, especially for methods other than req.accepts. This suggests that caching can significantly reduce the overhead of creating new accepts instances.
  2. Code Complexity: The change introduces a small amount of additional complexity by adding a caching mechanism. However, this complexity is minimal and can be well-contained.

Given these performance improvements, especially for methods other than req.accepts it is reasonable to consider moving forward with this change.

Should we move forward with adding this caching optimization?

cc: @wesleytodd @UlisesGascon @ctcpip

@ifty64bit
Copy link

From my perspective what I think is that, In a typical backend application, the accepts methods are generally called during the initial stages of request handling, especially for content negotiation. Once the content type, encoding, charset, or language has been determined, these methods are rarely invoked again within the same request lifecycle.

While the proposed caching mechanism undoubtedly improves response time, could this lead to potential memory overhead? Since the cached accepts instance would persist for the entire duration of the request, even if it is not needed after the initial invocation, this may result in inefficient memory usage.

Could we provide developers with the option to decide whether or not they need caching as it has some performance benefit?

@IamLizu
Copy link
Member Author

IamLizu commented Sep 6, 2024

Hey @ifty64bit 👋

Thank you for sharing your perspective; I appreciate the thoughtful insight. Full disclosure, I myself have very surface level knowledge, most of my points are my understanding and they can be wrong. I hope to learn more from you and others.

So, you're absolutely right that in most backend applications:

"The accepts methods are generally called during the initial stages of request handling, especially for content negotiation. Once the content type, encoding, charset, or language has been determined, these methods are rarely invoked again within the same request lifecycle."

The caching mechanism indeed provides a performance boost by preventing redundant computation when these methods are invoked multiple times in a request. While you correctly point out that they are typically called early in the request lifecycle, there are real-world scenarios where these methods might be invoked multiple times:

  1. Middleware Chains: In large Express-based applications, there can be a chain of middleware that process requests before they hit the final route handler. For instance:

    • Compression Middleware: Tools like compression check req.acceptsEncodings('gzip') to see if the client supports gzip compression.
    • Static File Serving Middleware: Middleware like serve-static might call req.accepts to decide whether to serve HTML, JSON, or other file types.

    Example:

    const compression = require('compression');
    const serveStatic = require('serve-static');
    
    // Compression middleware checks for supported encodings
    app.use(compression());
    
    // Static file server checks accepted content types
    app.use(
        serveStatic('public', {
            setHeaders: (res, path) => {
                if (res.req.accepts('html')) {
                    res.setHeader('Content-Type', 'text/html');
                }
            },
        })
    );

    Here, req.acceptsEncodings and req.accepts are called separately in different middleware. Without caching, each invocation would reparse the headers, but with caching, the parsing happens only once.

  2. Route Handlers and Conditionals: Consider a REST API that handles both API clients and browser-based clients, where the response content type needs to be adjusted based on the Accept header. This pattern is common in applications that serve both human-readable formats like HTML and machine-readable formats like JSON.

    Example:

    app.get('/profile', (req, res) => {
        if (req.accepts('json')) {
            res.json({ user: 'John Doe' });
        } else if (req.accepts('html')) {
            res.render('profile', { user: 'John Doe' });
        } else {
            res.status(406).send('Not Acceptable');
        }
    });
    
    app.get('/notifications', (req, res) => {
        if (req.accepts('json')) {
            res.json({ notifications: [] });
        } else if (req.accepts('xml')) {
            res.type('xml').send('<notifications></notifications>');
        }
    });

    In this case, multiple route handlers may need to check the Accept header, and caching the result would eliminate redundant parsing across handlers.

  3. Third-Party Libraries: Many popular Express middleware packages internally use the accepts library. For example:

    • i18next-express-middleware: This middleware handles internationalization and may use req.acceptsLanguages to detect the user's preferred language based on the Accept-Language header.
    • express-validator: This library can validate incoming request data based on content types, possibly making calls to req.accepts to check the format of the incoming data (e.g., JSON or form data).

    Example:

    const i18next = require('i18next');
    const i18nextMiddleware = require('i18next-express-middleware');
    
    app.use(i18nextMiddleware.handle(i18next));
    
    app.post('/submit', (req, res) => {
        if (req.accepts('json')) {
            res.json({ message: 'Data received' });
        } else if (req.accepts('html')) {
            res.send('<p>Data received</p>');
        }
    });

    Here, i18nextMiddleware internally checks the Accept-Language header to determine the user's language preference. Additionally, in the /submit route, req.accepts is checked again to determine the appropriate response format. Without caching, this would lead to multiple header parsing operations within the same request.

In these real-world scenarios, without caching, the accepts object would be recreated multiple times, leading to redundant computation. By caching the result of the accepts function, we can improve performance by reducing unnecessary work.

However, as you mentioned:

"Since the cached accepts instance would persist for the entire duration of the request, even if it is not needed after the initial invocation, this may result in inefficient memory usage."

This is a valid concern, especially in environments with limited resources or highly optimized systems. While the cached object is generally lightweight, there are cases where developers might prefer to avoid caching if they don't anticipate repeated use within the same request.

One potential solution, as you suggested, would be:

"Could we provide developers with the option to decide whether or not they need caching as it has some performance benefit?"

I think this is a great idea. We could implement a toggle-able option for caching, ensuring that developers have control over whether to enable it. This way, caching can be used when necessary but avoided when developers want to minimize memory usage.

Thanks again for raising this point 👍

@wesleytodd
Copy link
Member

Just linking for reference, will circle back later: expressjs/discussions#306

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants