-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handling floating point precision in sqlite #150
Comments
I think this would make sense! A quick search online seems to confirm that this is a reasonable approach (albeit annoying)
Agreed! |
I'm thinking maybe 7 decimal places for lat/lon is good. This gives precision of at least 0.011112m (1.1cm) - with greater precision for longitude away from the equator. This is way more accurate than most GPS, which have a max precision of about 5m, however survey GPS and even RTK can get down to millimeter precision (although the most common standard for surveys is Trimble devices, which have about 2-3cm precision I think). The advantage of 7 decimal places is that we can fit them in a 32-bit signed integers (decoding 64-bit integers to Javascript can be complicated). Min, max of sint32 is -2,147,483,647 to 2,147,483,647. 7 decimal precision lat/lon is going to be -1,800,000,000 to 1,800,000,000 We could go 8 decimal places for millimeter precision, but I think this is overkill for Mapeo, given the overhead of then needing 64-bit integers. |
In what ways will this impact queries? |
Using strings in some ways is easiest - ensures round-trip accuracy. It would mean we can't query lat lon though (except via json - see below), but we don't right now anyway. Using integers I think would just work due to the way drizzle transforms values. One option with strings is that SQLite json queries could probably parse it into a number and run queries on it. It wouldn't be indexed but it would work. |
ts-proto also introduces precison errors on the roundtrip. If we're using ints, I agree using int32 would be the best and recommended by ts-proto (see https://github.com/stephenh/ts-proto section |
Can you try |
yeap, already tried here and it worked! Don't know if we should keep this conversation open related SQL handling of floats |
Will publish the mapeo/schema changes tomorrow and see how that affects the tests here and if it's still an issue. I guess it will become an issue at a certain level of precision, but not sure what that is? |
Switching protobuf storage to
Going to leave this open to potentially address in the future, although migration might be tricky. |
If we use the ["integer" JSON Schema type][0], we should save that as an INTEGER column, not a REAL column. Both will work for many numbers, but larger integers (like `2**63`) can only be safely stored in INTEGERs. This should be a bit more reliable. As of this patch, we don't have any integer types in our schemas, so this has no effect. However, that may change as we address [upcoming issues][1]. [0]: https://json-schema.org/understanding-json-schema/reference/numeric#integer [1]: #150
tl;dr: I think we're all good here. I anticipate two types of bug here:
I've don't think either of these are problems for us. Bug 1: data serializationWe should be okay as long as we use standard floats of the same type, which we do:
We could have problems going between, say, More detailsI don't always trust docs, so here's a script that verifies that we don't lose data going between JS and SQLite: import Database from 'better-sqlite3'
import assert from 'node:assert/strict'
const db = Database(':memory:')
db.exec('CREATE TABLE numbers (value REAL NOT NULL) STRICT')
const insert = db.prepare('INSERT INTO numbers (value) VALUES (?)')
const select = db.prepare('SELECT value FROM numbers LIMIT 1').pluck()
const clear = db.prepare('DELETE FROM numbers')
;[
123,
0.1,
0.1 + 0.2,
Number.MIN_SAFE_INTEGER,
Number.MAX_SAFE_INTEGER,
Number.EPSILON,
Number.MIN_VALUE,
].forEach((number) => {
clear.run()
insert.run(number)
assert.equal(select.get(), number)
}) Bug 2: precision64-bit floats are accurate to within a single digit number of nanometers.1 This is probably fine for our purposes. How I came to thisIEEE floats have limited precision. This imprecision isn't constant. Numbers lose precision as their magnitude gets larger.2 Therefore, if we want to test the worst case precision, we should test the largest number we represent. In our case, that number is 180. For us, the largest possible imprecision would be between (90°, 180°) and (89.9999999999999857891°, 179.999999999999971578°).3 I wrote this short Python program to compute the Haversine distance between these points, in meters: from sympy import asin, sqrt, sin, cos, pi, sympify
r = 6378.137 * 1000
def degrees_to_radians(degrees):
return degrees * pi / 180
def haversine(p1, p2):
lat_1 = degrees_to_radians(p1[0])
lng_1 = degrees_to_radians(p1[1])
lat_2 = degrees_to_radians(p2[0])
lng_2 = degrees_to_radians(p2[1])
dlat = lat_1 - lat_2
dlng = lng_1 - lng_2
return sympify(2 * r) * asin(
sqrt(sin(dlat / 2) ** 2 + cos(lat_1) * cos(lat_2) * sin(dlng / 2) ** 2)
)
def main():
p1 = sympify((90, 180))
p2 = sympify(("89.9999999999999857891", "179.999999999999971578"))
distance_in_meters = haversine(p1, p2)
print("meters:", distance_in_meters.evalf())
print("centimeters:", (distance_in_meters * 100).evalf())
print("nanometers:", (distance_in_meters * 1e9).evalf())
if __name__ == "__main__":
main() This returned about 1.58 nanometers. The Haversine distance isn't perfectly accurate, but I think it's good enough for us. (I also tried the same thing with float32 values and it's about 85 centimeters. Might also be okay for us, but I wouldn't want to risk that.) Footnotes
|
If we use the ["integer" JSON Schema type][0], we should save that as an INTEGER column, not a REAL column. Both will work for many numbers, but larger integers (like `2**63`) can only be safely stored in INTEGERs. This should be a bit more reliable. As of this patch, we don't have any integer types in our schemas, so this has no effect. However, that may change as we address [upcoming issues][1]. [0]: https://json-schema.org/understanding-json-schema/reference/numeric#integer [1]: #150
I think we're all good here. Closing. |
Description
Writing a floating point number (e.g. non-decimal) to sqlite and reading it again results in a different value due to floating point conversion errors. This is noticeable when storing lat/lon coordinates. We are currently using the SQLite REAL data type for storing lat and lon. More about how SQLite handles floating point here: https://www.sqlite.org/floatingpoint.html
This might also be related to encoding of floating point numbers in Protobuf, which is also involved in the round-trip of writing-reading data in Mapeo Core.
I think the issue is compounded by Javascript and Sqlite using different ways to store floating points.
It's not ideal that when storing data with lat/lon coordinates, the values that come back are different to those that went in. I wonder that maybe we switch to using an integer field for lat and lon, multiplying as needed for a fixed precision. e.g. we could support 6 decimal places of precision by storing lat / lon as
value * Math.pow(10,6)
.I think this is probably preferable to getting different values back than what you write, which is not what people would expect.
Suggestions about the best way forwards?
The text was updated successfully, but these errors were encountered: