Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Controlled casting for dates, times, and timestamps #128

Closed
mathemancer opened this issue May 18, 2021 · 11 comments
Closed

Controlled casting for dates, times, and timestamps #128

mathemancer opened this issue May 18, 2021 · 11 comments
Labels
duplicate This issue or pull request already exists

Comments

@mathemancer
Copy link
Contributor

Problem

Related to #126 . We want to support dates, times, and timestamps (i.e., datetimes) with our controlled casting (and eventually inference) logic.

Proposed solution

We should add functions to db.types.alteration that will handle casting appropriate types to each of DATE, TIME, or TIMESTAMP.

Additional context

See this discussion for some thoughts about Timezones: #119 .

@mathemancer
Copy link
Contributor Author

I don't think this is quite ready for work, since we'll need to first implement a custom timestamp-based type to track TZ information as per the linked discussion.

@kgodey
Copy link
Contributor

kgodey commented May 18, 2021

@mathemancer I put a comment in the linked discussion, to keep things simple/portable I think we should start with the native Postgres datetime type and worry about storing TZ information later.

@mathemancer
Copy link
Contributor Author

mathemancer commented May 18, 2021

The problem with this (to me) is that it's lossy. That is, if we use the default Postgres behavior on timestamp data that does have time zone info, we'll lose that info, and it won't be recoverable from the remaining data. For example, consider the situation where we ask PostgreSQL to interpret a string with a UTC-08 offset as a timestamp. In the example, we start with the server configured to use UTC:

mathesar=# SELECT '1997-12-17 07:37:16-08'::TIMESTAMP;
      timestamp
---------------------
 1997-12-17 07:37:16
(1 row)

mathesar=# SELECT '1997-12-17 07:37:16-08'::TIMESTAMP WITH TIME ZONE;
      timestamptz
------------------------
 1997-12-17 15:37:16+00
(1 row)

mathesar=# SET TIME ZONE 'Asia/Hong_Kong';
SET
mathesar=# SELECT '1997-12-17 07:37:16-08'::TIMESTAMP;
      timestamp
---------------------
 1997-12-17 07:37:16
(1 row)

mathesar=# SELECT '1997-12-17 07:37:16-08'::TIMESTAMP WITH TIME ZONE;
      timestamptz
------------------------
 1997-12-17 23:37:16+08
(1 row)

In the first case, PostgreSQL simply strips off the TZ info as a string, and interprets the remaining string as a timestamp in UTC. In the second case, PostgreSQL correctly interprets the string and coerces the underlying timestamp to UTC. The problem is that then the display of the timestamp is dependent on the TZ setting of the client's session (not the DB). This is shown in the fourth case above.

In none of those cases is the result displayed as it was input (i.e., as 07:37:16-08), and it's actually not possible to recover enough information from what's stored to display that result since the TZ info is lost in the coercion to UTC.

If I were a user, I'd rather have the tool leave my timestamps as strings than lose information in that way.

@mathemancer
Copy link
Contributor Author

It's less bad if the input string doesn't have TZ info with it. In that case, we can at least assume UTC (or timestamp without time zone) and not worry that we've lost any data. If we want to proceed with the default Postgres timestamps, I suggest:

  1. We refuse to change a column to a TIMESTAMP (or TIMESTAMP WITH TIME ZONE) type if it has any entries with TZ information.
  2. We store any column which we do want to store as either of the types in question as a TIMESTAMP (without TZ info). This way, we avoid making a transformation on the data that can't be reversed (so we actually avoid letting PostgreSQL coerce the data to UTC from whatever TZ it's configured in at that moment).

@kgodey
Copy link
Contributor

kgodey commented May 18, 2021

@mathemancer I understand that it's lossy, but I think it's worth the risk because we need to work with dates and times without having to install custom types. We talked previously about disabling working with types if the user is unable to install our custom types schema in their database. I'm willing to disable email types but not datetime types, they're too fundamental to too many use cases.

I'm fine with being conservative while coercing to the type, or asking the user what timezone the type is in before storing it, but we can't have a custom type be a prerequisite for working with datetimes. Your two suggestions above seem reasonable.

@mathemancer
Copy link
Contributor Author

I thought the disabling types discussion was related to things that would need to be installed as extensions. A composite TIMESTAMP + TZ type doesn't fall into that category. To install that, we'd basically just need any user with the privileges necessary to create a type via SQL commands, i.e., we should be able to get everything we need from the CREATE TYPE command. While that does require some privileges, we need a user with similar privileges to create (for example) all the functions that we use for casting types between each other. Without such a user, I don't think installing Mathesar is feasible in the first place. Certainly, installing the bits necessary for type inference is infeasible without a privileged user (doesn't need to be root, just some CREATE roles granted).

Maybe we need to spin off a separate discussion about precisely what we do/don't assume the installing user is privileged to do on the DB.

@mathemancer
Copy link
Contributor Author

Okay, here's the spun-off discussion: #130

@kgodey
Copy link
Contributor

kgodey commented May 18, 2021

@mathemancer I don't think I understand fully which types require extensions / what our current custom type implementation is. It does require making a new schema, right?

I do think it makes sense to start a discussion or at least document what privileges a user will need to have on the DB (I see you've already done so, thanks).

My other concern is the portability of the database. We've talked about Mathesar creating a database that people could use easily to power other applications. If we're making a custom type, will that be easy to work with? Let me know what you think.

I'm also concerned about us working well with existing Postgres databases, we should be able to offer filter, sort, group by, calendar view, etc. functionality for existing datetime fields without requiring a composite type. It seems easiest to ensure that by starting with using Postgres' datetime type first and then extending to a composite type later.

@mathemancer
Copy link
Contributor Author

So far, nothing we have requires installing an extension. If we want to use PostGIS, that'll require installing of an extension (of course), but it's available on pretty much any PostgreSQL installation.

For the rest, do you mind if I copy your comment into the spun-off discussion to respond? (or you can copy it there)

@kgodey
Copy link
Contributor

kgodey commented May 18, 2021

@mathemancer I do not mind.

@kgodey
Copy link
Contributor

kgodey commented Jun 14, 2021

I'm closing this in favor of #250.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue or pull request already exists
Projects
No open projects
Development

No branches or pull requests

2 participants