-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Infer and cache date field format instead of re-parsing it for every document #4558
Comments
I wonder whether we can leverage the fact that most of the time all documents have the same date format. Maybe the date parser code can cache the date format and attempt to reuse it, only falling back to re-computing the date format when that fails? |
Yes, we should cache it. |
Started working on this issue, will share baseline benchmarking numbers soon to highlight the differences in cpu for different datetime field formats |
Microbenchmarks
CPU % Diff Matrix:
Grab workload benchmark (Grab is synthetic data generated, osb-benchmark didn't have the workload for testing different data time mappings except We found no significant differences in search performance of both the formats |
With respect to the implementation for this issue, We've couple of approaches for caching datetime field
Once we implement caching, a quick win will be to add a stricter format like @Prabs @tharejas @mgodwan please provide your thoughts on this |
The date field for the default format uses high CPU during parsing. A huge portion of date formatting time(close to 7.12% of CPU time in profiles) goes into parsing, which generally happens when the date format is optional for certain segments. Our customers don’t often set the date parser, but rely on the unoptimized default one. When I changed the date parsing format to a strict one for the same data set, the indexing throughput increased by 8%.
For logs, the date format does not change across different log lines. Hence, it is pretty inefficient to compute the date format for every single document. For such users, we could infer and set a stricter date format after parsing a few documents.
Additionally, 7% CPU seems too high just for date parsing. Maybe Java formatter has improved since the time I ran these tests. CPU profile shows that the most time goes into parsing the optional segments for the date.
Solutions?
The text was updated successfully, but these errors were encountered: