-
-
Notifications
You must be signed in to change notification settings - Fork 365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[performance] convert enum strings to ints #3558
[performance] convert enum strings to ints #3558
Conversation
…ns from type string -> int for performance / space savings
internal/db/bundb/migrations/20240620074530_interaction_policy.go
Outdated
Show resolved
Hide resolved
internal/db/bundb/migrations/20241121121623_enum_strings_to_ints.go
Outdated
Show resolved
Hide resolved
internal/gtsmodel/notification.go
Outdated
type NotificationType string | ||
// NotificationType describes the | ||
// reason/type of this notification. | ||
type NotificationType int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it make any difference in the DB if we store these as uint8 or uint16 vs int? Just wondering if we could save even more space by using smallint
instead of integer
as the database storage type (assuming smallint
is supported by both PG and SQLite) -- https://www.postgresql.org/docs/current/datatype-numeric.html#DATATYPE-NUMERIC-TABLE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For SQLite this doesn't matter. Any column type with INT
in it becomes an column with integer affinity. For storage, SQLite picks the smallest representation for the row value (so the column isn't "fixed width" based on the largest value).
The value is a signed integer, stored in 0, 1, 2, 3, 4, 6, or 8 bytes depending on the magnitude of the value.
It'll get converted to 8-byte / int64 when reading it into memory.
So you can use SMALLINT
for Postgres and SQLite. The only caveat is that SQLite would accept a number bigger than Pg. But if we run over int16 notification types then we got bigger problems 😛.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed: 336db3d
Looking good! |
@tsmethurst re: conversation we had the other day here: #3547 (comment) we definitely need to move models in at the time we create them, so we have a snapshot of the model when it was created. i just spent ages tracking down an issue with why postgres was angry at my visibility column type, but it's because the sinbinstatus type was creating the table from the current model type instead of a snapshot. it's much easier to just copy in a model at creation-time for a snapshot than it is hunting down the exact commit it was added at to snapshot it later. if you could update your previous migration to do that with the new domain permission types either in a PR or in your next domain permissions PR that would be great please :) |
// Add new column to database. | ||
if _, err := tx.NewAddColumn(). | ||
Table(table). | ||
ColumnExpr("? SMALLINT NOT NULL DEFAULT ?", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why SMALLINT
and not an actual Enum type? (or does SQLite not support them?)
That being said, enum types consume double the space per row :/
From a referential integrity PoV and for making analytics queries easier, it would be interesting to have a table with just id SMALLINT
and label TEXT
(e.g. visibility_types
and then have statuses.visibility
have an FK reference to visibility_types.id
). I understand that GtS does not consider working directly on the DB supported, though, so it’s okay if nothing comes from this.
(Just, from experience, in 10+-year-running projects, things, including later modelling that starts from the DB, work better with these things clearly structured in the SQL.)
for old, new := range mapping { | ||
|
||
// Update old to new values. | ||
res, err := tx.NewUpdate(). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I wrote on Fedi, on the phone so possible typos and not tested, this could probably have better been something like UPDATE statuses SET visibility_new = CASE visibility WHEN 'public' THEN 1 WHeN 'next one' THEN 2 […] CASE ELSE THEN NULL (or some other default value) END CASE;
Not sure if SQLite could do that, though.
Description
This helps improve performance by reducing the amount that these enums take-up in disk-space in their respective tables, and improves performance (however minorly) by reducing memory impact of models requiring only int enums instead of strings and also simplifying the comparisons between values of these types.
However, on larger databases this will likely be a monstrously large migration given the nature of it and it acting on the largest table we have, statuses :')
Checklist
go fmt ./...
andgolangci-lint run
.