-
-
Notifications
You must be signed in to change notification settings - Fork 727
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vsscanf() based date string parser #334
Conversation
Thanks a lot, @sobri909. I'll answer as shortly as I can! |
Thanks @groue 😄 Another couple of implementation notes:
|
I've cleaned up all the obvious test fails. But travis is failing out on what looks like unrelated stuff now (testCopyTransfersChanges, etc). I'll have a poke around in those tests tomorrow morning. |
Hello @sobri909,
💯 I fully agree. We can avoid the static methods, shared allocated pointers, and the mutex with struct SQLDateComponentsParser {
struct SQLDateComponents {
var year: Int32 = 0
var month: Int32 = 0
var day: Int32 = 0
}
func parse(string: String) -> SQLDateComponents {
var components = SQLDateComponents()
let parseCount = withUnsafeMutablePointer(to: &components.year) { yearP in
withUnsafeMutablePointer(to: &components.month) { monthP in
withUnsafeMutablePointer(to: &components.day) { dayP in
withVaList([yearP, monthP, dayP]) { vaList in
vsscanf(string, "%d-%d-%d", vaList)
}
}
}
}
return components
}
}
let sqlComponents = SQLDateComponentsParser().parse(string: "2018-04-16")
sqlComponents.year // 2018
sqlComponents.month // 4
sqlComponents.day // 16
This is interesting, especially considering that GRDB3 is on the way: we can break things, fix design issues, and why not question DatabaseDateComponents. We'll see that it's not that simple (but I'm still quite open to suggestions): DatabaseDateComponents, today, groups a plain DateComponents, and a format (YMD, etc), so that one knows the format of the parsed date string. It is important to distinguish those formats. A missing component is not equivalent to a component that is explicitly set to 0: "2018-04-16" != "2018-04-16 00:00:00". This format is exposed to the end user, for whatever useful purpose. It is also used internally, when GRDB decodes Date from String, via DatabaseDateComponent. For example, "12:00:00" can be converted to DatabaseDateComponent, with format Now, there is information redundancy here, since components in DateComponents are optional. For example, to know if an SQLite string contains time information, one could just check the // Why not?
let dateComponents: DateComponents = row["date"]
if dateComponents.hour == nil {
// parsed "YYYY-MM-DD"
} else {
// parsed... something... else (???)
} But I think that this is easier to do with a plain and boring switch on a format enum, as today: // Current API
let dbComponents: DatabaseDateComponents = row["date"]
switch dbComponents.format {
case .YMD:
// parsed "YYYY-MM-DD"
case .YMD_HM:
// parsed "YYYY-MM-DD HH:MM"
...
} Finally, removing DatabaseDateComponents would expose us to a writing problem: not all DateComponents can be converted to valid SQLite date strings. When you store a DateComponents will nil components except year, what are you supposed to store in the database? var components = DateComponents()
components.year = 2018
// What here?
// - "2018" is not a valid SQL date
// - "2018-01-01" is valid, but...
// - ...why not "2018-01-01 00:00:00.000" then?
// - fatal error, in order to complain about ambiguous input?
let string = String.fetchOne(db, "SELECT ?", arguments: [components]) DatabaseDateComponents does not suffer from this ambiguity. The user never doubts which format is used. All nil components in the requested format are given default values when needed: var components = DateComponents()
components.year = 2018
// "2018-01-01 00:00"
let dbComponents = DatabaseDateComponents(components, format: .YMD_HM)
let string = String.fetchOne(db, "SELECT ?", arguments: [dbComponents])
Those tests check that some record information is preserved by some transformations (copy, roundtrip through the database, etc). And it happens that the involved records have a date property :-) Those tests are not flaky, and have not been modified in a while: it's very likely that they spot an actual regression. |
I'd look at testRecordIsNotEditedAfterFullFetch first: it should be the easier to track in the debugger. The |
@@ -179,7 +179,7 @@ class FoundationDateTests : GRDBTestCase { | |||
XCTAssertEqual(calendar.component(.hour, from: date), 1) | |||
XCTAssertEqual(calendar.component(.minute, from: date), 2) | |||
XCTAssertEqual(calendar.component(.second, from: date), 3) | |||
XCTAssertTrue(abs(calendar.component(.nanosecond, from: date) - 4_000_000) < 10) // We actually get 4_000_008. Some precision is lost during the DateComponents -> Date conversion. Not a big deal. | |||
XCTAssertTrue(abs(calendar.component(.nanosecond, from: date) - 4_560_000) < 10) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This commit needs to be reverted.
Those tests make sure that GRDB handles fractional seconds in a compatible way with SQLite. Extra precision has to be ignored:
let seconds = try Double.fetchOne(db, "SELECT strftime('%f','2015-07-22 01:02:03.00456')")!
print(seconds) // 3.005, not 3.00456
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@groue I'm already on it 😉 Another commit coming in a sec.
Although SQLite is maintaining the precision for me locally. But we can't depend on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although SQLite is maintaining the precision for me locally. But we can't depend on that.
Interesting. What do you mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SQLiteDateParser is preserving precision to 9 sig figs. So that test is passing for me locally. (And on travis too, so far. Yay!)
Although sim tests != real device. And having the tests demand that level of precision would be brittle for any changes in SQLite engine. SQLite I don't think is making any promises on how much precision it maintains.
So I'm about to commit a change to the tests that permits precision loss down to 3 sig figs. (The old tests were demanding that precision loss, instead of just permitting it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And having the tests demand that level of precision would be brittle for any changes in SQLite engine. SQLite I don't think is making any promises on how much precision it maintains.
Beware this reasoning: SQLite values backward compatibility just like Linux does (although D. Richard Hipp does not yell like Linus Torvalds). When in doubt, assume SQLite stability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the only way to pass the test would've been to discard the precision?
Yes, yes: we'll have to discard precision as long as SQLite does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Except that SQLite is only discarding precision when manipulating dates, not when storing them. (Or at least, it's not discarding the precision when they're stored as strings).
So there's no reason to discard the precision when doing storage and retrieval? There's only a need to be aware of potential precision loss, when manipulating the dates SQL-side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(our messages have overlapped, and it may look like I dismiss your answers - I don't)
Their docs say they accept as many sig figs as given, and just don't promise to make use of any beyond three, when using strftime() etc.
OK, I'll look at that.
(But this does not mean that I'll let GRDB drift from actual, observed SQLite behavior)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hah. No worries 😄
this does not mean that I'll let GRDB drift from actual, observed SQLite behavior
Likewise! The docs do say that it's fine to store more precision, and that the date/time functions will simply ignore it. But it will be worth testing to make sure that's definitely true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docs do say that it's fine to store more precision, and that the date/time functions will simply ignore it. But it will be worth testing to make sure that's definitely true.
Yes, I'm with you on that. As I said, "an ideal test would assert identical parsing between GRDB and SQLite" I'll think about how to write the tests in this way.
Now I have to go fill my purse 😉 See you later!
@sobri909, I owe you an apology, because I didn't guide you well. Our above conversation has me think, and I'm not sure yet how I will conclude. I have been fuzzy, and am actually unable right now to precisely define what I mean by "compatibility with SQLite". Besides, I feel uncomfortable that GRDB currently does not allow one to get the extra digits via date components. This does not fit well with the "experts welcome" design goal of this lib. So maybe I'll rally your view, eventually. So:
What do you think? |
@groue Sounds fine to me! I'll change the string length truncations in the last commit to truncate down to max of 3 sig figs. Then later if you want to allow preservation of more sig figs, it's an easy two line change. |
Hm. Travis is always timing out in the Seems unrelated, and possibly just travis being moody, as it often is. But seems suspicious that it's always in the same place, even though that place doesn't appear to be related to the changes at all. Not sure what to make of that... |
Ok, now that the tests are passing, I can finally read and reply to your refactor thoughts! Aside: Except for those For It seems like it's for when you want to use Swift vars for C / Objective-C functions that expect pointers? So you can use But it has an awkward API that would require 7 levels of nesting, because you can only hand in one var per call 😞 And it seems like its automatic memory management wouldn't be useful in this case, because we currently don't need to ever dealloc the vars. So there's nothing to be gained?
Haha. I consider those the main feature of the parser 😄 Because it only ever allocates its working memory once, it doesn't need to alloc/dealloc anything on each run, and isn't at risk of leaving any allocations behind like For My only thought was to change But that's just rearranging the furniture without any real gain. Until there's something useful to be gained from a reshuffle, it doesn't seem worth the time? |
The job ends with "No output has been received in the last 10m0s"... I agree: don't bother about it.
Yes, it frees us from any heap allocation: var i = 1 // stack allocated
withUnsafeMutablePointer(to: &i) { p in
// use pointer to i
}
It's ugly and awkward, true. But never mind: the compiler optimizes out all calls to
The static version has seven heap allocations (that's not much), but we have zero. And we have no longer any global constants, which means that we can remove the mutex: we can parse concurrently from various threads. We'll get some regular-looking parsing code, which starts by initializing a formatter, and then having it parse a string. let formatter = MyFormatter()
let value = formatter.value(from: string)
I understand. But we need to know the parsed format, and the parser is quite competent at that. |
Really? I expected them to be stack, due to being static lets that request explicit allocation size at compile time... Although having just done some googling and reading, some people are suggesting that Swift's I can't see a good reason why the compiler wouldn't optimise these specific declarations to the stack. But also can't find anyone saying that the compiler will do that. Sigh. Although even if they are heap, they're still only allocated once, so it's still only 7 allocations for the entire app session. There's no real performance cost there. It's just academic. And given the uglier code required for I think the more promising gain of using If I get a chance, I'll try that later this week. I've had to jump over to some other tasks now, so don't have time to play with it again just yet 😞 |
Here's the mailing list thread that discusses stack/heap for And now I remember that Swift's static lets are instantiated lazily. So yeah, pretty much guaranteed to be heap. Sigh. It's still only academic in terms of performance cost, but I'm disappointed anyway 😂 |
UnsafeMutablePointer is a pointer. Whether this pointer points to heap-allocated memory or stack memory is not its business. And it will even sometimes be optimized out by the compiler (making our stack/heap questions irrelevant). Consider how it compiles the two functions below (both return 2): // test.swift
func foo() -> Int {
var i = 1
withUnsafeMutablePointer(to: &i) {
$0.pointee = 2
}
return i
}
func bar() -> Int {
return 2
}
They are identical :-) However, explicit allocations are not optimized out: func baz() -> Int {
let p = UnsafeMutablePointer<Int>.allocate(capacity: 1)
defer { p.deallocate(capacity: 1) }
p.pointee = 2
return p.pointee
}
|
I added DateParsingTests, which parses 50000 date strings:
That's an great speed improvement :-) I don't know how to test for memory consumption, though. |
Since 316d287:
|
OK, @sobri909, I'm done. |
Hah. I see you're going back and forth with the different ways to parse the nanoseconds, like I did 😉 Ironically I think my first attempt at it was the cleanest and most dependable (by just using Also I'm guessing you're not a Vim user? Adding trailing whitespace to empty lines breaks Vim's ability to navigate between code blocks with |
For this, I've got some super super old code sitting around somewhere, from when I had to be aware of memory consumption when doing some large image processing stuff during the iOS 4 days. I'll see if I can find it, and if it still works. |
Ok, I've found this method, from some code back in 2012: - (void)calcMemory {
mach_port_t host_port;
mach_msg_type_number_t host_size;
vm_size_t pagesize;
host_port = mach_host_self();
host_size = sizeof(vm_statistics_data_t) / sizeof(integer_t);
host_page_size(host_port, &pagesize);
vm_statistics_data_t vm_stat;
if (host_statistics(host_port, HOST_VM_INFO, (host_info_t)&vm_stat, &host_size)
!= KERN_SUCCESS) {
NSLog(@"failed to fetch vm statistics");
}
// stats in bytes
natural_t mem_used = (vm_stat.active_count + vm_stat.inactive_count
+ vm_stat.wire_count) * pagesize;
natural_t mem_free = vm_stat.free_count * pagesize;
self.totalMemory = mem_used + mem_free;
//NSLog(@"used: %umb free: %umb total: %dmb", mem_used / 1024 / 1024,
// mem_free / 1024 / 1024, self.totalMemory / 1024 / 1024);
} Just trying to make sense of it now... |
Yeah! I understand you! I had to add tests because I was sweating! I just wanted to dig a little more deeper in the raw parsing to gain a few cycles (more on that below).
No, I use Xcode, and GRDB project uses Xcode conventions throughout. I guess a configurable professional editor like vim can handle that with a few mods 😅
You know, we were supposed to fix the original issue, #333. If it looks fixed to you, we'll be able to merge this PR! I'm very happy with it, because we've learned and improved many things:
|
Me too, but with XVim2, otherwise I'd go insane. Xcode has a setting for "Automatically trim trailing whitespace" and "Including whitespace-only lines".
I looked for a workaround some years back, before I found that Xcode setting, then I stopped looking.
Yep! I just ran my unscientific tests again on the latest commits, and memory use still looks great! Also looks like it's now faster than my original implementation too. Though hard to tell without more careful comparison. But either way, it's easily achieved the original memory use goal, and is almost three times as fast 😄
Neither. That's why that blog post got me excited. Fun new (old) toy! |
We're ready for a merge, then :-) |
This PR will land in GRDB 3.0 #313 |
Great! 😄 Semi related: Apologies for not having tried 3.0 yet. I've been doing some db level refactors in LocoKit that I wanted to get stable first, before throwing any new variables into the equation. But it looks like those parts of my refactor are settled now, so I'll probably switch to 3.0 today / tomorrow. (More likely tomorrow, as I've got a flight to catch today). |
I've temporarily left in the
useScanfStrategy
debug bool, to make it easy for you to do your own comparison testing, if you want.