-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lift expensive Regex construction from DateFormat method body. #43647
Conversation
Actually, does |
Good catch! It doesn't seem to be mutated anywhere in Dates, but this comment (Dates/src/io.jl:291) indicates that packages may mutate it:
Hm 🤔 |
It looks like the failing test captures precisely that possibility:
|
I'm sketching an implementation now, but along similar lines: It seems that updating |
428e676
to
bc84cdb
Compare
Reworked to check for changes to |
Looks good. I wonder if it would be enough just to check the length? Hopefully nobody deletes format characters...? 😄 |
I sure hope not 😄
Perhaps so 🤔. The cost of checking the hash of the keys seems negligible relative to the overall cost of the operation, though, so perhaps let's defer that (potentially risky) optimization till the cost of the check becomes relevant? :) CI seems happy, Kristoffer seems happy (modulo the now-addressed concern), and Jeff seems happy. So absent objections or requests for time, I plan to merge this patch later today :). |
Marked for backport to 1.6 as purely a performance optimization, and for backport to 1.7 as a partial mitigation of #43610. |
stdlib/Dates/src/io.jl
Outdated
# To understand this block, please see the comments attached to the | ||
# definitions of DATEFORMAT_REGEX_CACHE and CONVERSION_SPECIFIERS_KEYS_HASH above. | ||
conversion_specifiers_keys_hash = hash(keys(CONVERSION_SPECIFIERS)) | ||
if conversion_specifiers_keys_hash != CONVERSION_SPECIFIERS_KEYS_HASH[] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any concurrency concerns here? Maybe a lock is needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hypothetically all of these objects need locks I think, i.e. CONVERSION_SPECIFIERS
, CONVERSION_DEFAULTS
, CONVERSION_TRANSLATIONS
, DATEFORMAT_REGEX_CACHE
, and CONVERSION_SPECIFIERS_KEY_HASH
. But addressing that strikes me as beyond the scope of this PR: To do so properly, we would need to introduce an interface that maintains consistency between all of these objects, and introduce appropriate reader/writer locks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thinking was that it used to be thread safe to create date formats but now it no longer is. I would have thought slapping a lock around the part that reads and updates the cache would make this version as safe as the original one. The cost of the lock should be insignificant compared to the work of the function. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point. It also occurs to me that this would do the wrong thing in case of a hash collision, though of course that is very unlikely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To check my understanding, your concern is specifically about consistency of and between DATEFORMAT_REGEX_CACHE
and CONVERSION_SPECIFIERS_KEYS_HASH
across threads constructing DateFormat
s, when other threads are simultaneously modifying CONVERSION_SPECIFIERS
? And not about the existing potential for races when some threads modify CONVERSION_SPECIFIERS
(and friends) while others construct DateFormat
s (or perform other operations that consume CONVERSION_SPECIFIERS
and friends)?
Sure, the former we could mitigate pretty simply via an exclusive lock on CONVERSION_SPECIFIERS_HASH
/ DATEFORMAT_REGEX_CACHE
(at risk of introducing unnecessary contention) or less simply via a combination of shared and exclusive locks (to avoid introducing unnecessary contention). The latter seems like a broader, preexisting issue.
It also occurs to me that this would do the wrong thing in case of a hash collision, though of course that is very unlikely.
Both of these concerns seem pretty unlikely to bite in practice :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've pushed a new commit that introduces an exclusive lock (DATEFORMAT_REGEX_LOCK
) guarding access to DATEFORMAT_REGEX_CACHE
and DATEFORMAT_REGEX_HASH
(née CONVERSION_SPECIFIERS_KEYS_HASH
, which seemed a bit wordy if more descriptive); hopefully the potential contention isn't so bad. Does that address your concern, or are you also concerned about the preexisting issues?
Constructing the Regex touched in this commit can represent a significant fraction (e.g. half or better) of the runtime of the DateFormat method touched in this commit. To make this DateFormat method more efficient, let's lift that Regex construction out of that method body.
bc84cdb
to
7fb5bb8
Compare
CI seems happy. Any remaining concerns? :) Absent other concerns or requests for time, I plan to merge this later today or tomorrow morning. |
Thanks gentlemen! :) |
Data:
On master
and on this branch
Best!