Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow filtering with date/time properties #6619

Closed
clarkus opened this issue Oct 22, 2021 · 20 comments · Fixed by #9252
Closed

Allow filtering with date/time properties #6619

clarkus opened this issue Oct 22, 2021 · 20 comments · Fixed by #9252
Assignees
Labels
design Issues that need a designer's attention enhancement New feature or request P1 Urgent, non-breaking (no crash but low usability)

Comments

@clarkus
Copy link
Contributor

clarkus commented Oct 22, 2021

Is your feature request related to a problem?

While unix timestamps work, they're not very approachable for most users. This requires users to manually convert timestamps outside of PostHog or to use a plugin to convert this for them.

Screen Shot 2021-10-22 at 10 53 07 AM

Describe the solution you'd like

The ability to filter in a format that's most approachable to the user composing the filter. Unix timestamps might be best for some users, but other users might prefer UTC or their local time.

Describe alternatives you've considered

The alternative is leaving this as it is.

Additional context

Reported by a user via our community slack today.

Thank you for your feature request – we love each and every one!

@clarkus clarkus added enhancement New feature or request design Issues that need a designer's attention team-core-experience labels Oct 22, 2021
@paolodamico paolodamico added the P1 Urgent, non-breaking (no crash but low usability) label Oct 22, 2021
@paolodamico
Copy link
Contributor

@mariusandra fwiw I think we should also prioritize this as part of the effort in improving the UX around filtering we talked about today

@timgl
Copy link
Collaborator

timgl commented Oct 24, 2021

There's a plugin as a stopgap: https://posthog.com/plugins/timestamp-parser

@paolodamico
Copy link
Contributor

Some context from a user on Slack, this is becoming increasingly important to support natively. It's quite unintuitive that you can't handle dates properly in the app.

@clarkus
Copy link
Contributor Author

clarkus commented Nov 10, 2021

Agree that we should prioritize this. It's kind of a dead end for less-technical users. I already have a partial design done, but I can put that back into my rotation if we think we have capacity for this in the near future.

@clarkus
Copy link
Contributor Author

clarkus commented Nov 12, 2021

Bumping again as this was brought up in the users slack today.

@pauldambra
Copy link
Member

pauldambra commented Nov 15, 2021

There seem to be two relatively quick concrete things in here

  1. make the timestamp-parser plugin native
  2. show a date and time picker when a field is a timestamp (@clarkus have we got any designs on what a date and time picker should look/behave like?)

(there are other things but these seem to have good bang per buck)

@pauldambra
Copy link
Member

pauldambra commented Nov 15, 2021

Marius's suggestions

  1. allow a property schema so we can say that event X property Y is a DateTime (and allows other solutions)
  2. all graphs are in UTC - can this allow us to fix that so graphs are in the users' timezones

(store date formats against properties so they can be passed to clickhouse?)

@mariusandra
Copy link
Collaborator

There are few points that we discussed:

  1. Searching for times with a datepicker when the property is a string in the format of a timestamp (event.properties.start_time >= "one day ago"). This is what the current issue is about. My suggestion: we can store in propertydefinition the type of the date (YYYY-MM-DDTHH:MM vs YYYY-MM-DD HH:MM ZZZ vs YYYY-MM-DD, etc), and perhaps there's a clickhouse function that can do the parsing and comparisons on the fly. We can detect what format the timestamps in the table are quite easily (and update when we're wrong), making the experience rather seamless for the end user.
  2. On the subject of timestamps, if we'd all live on the US west coast, we would have probably fixed by now the issue that all times in the app are UTC. There has to be a way to group by date in your local timezone.
  3. Breakdown by day of week / day of month / etc, based on either the event's timestamp or some other custom property.
  4. Automatically exploding every timestamp into properties "just in case" seems excessive. It seems like much already for the current timestamp, but doing that for all custom start_time timestamps leads to modifying users' data in ways I wouldn't expect. So I'd skip doing this.

@clarkus
Copy link
Contributor Author

clarkus commented Nov 15, 2021

Searching for times with a datepicker when the property is a string in the format of a timestamp (event.properties.start_time >= "one day ago"). This is what the current issue is about. My suggestion: we can store in propertydefinition the type of the date (YYYY-MM-DDTHH:MM vs YYYY-MM-DD HH:MM ZZZ vs YYYY-MM-DD, etc), and perhaps there's a clickhouse function that can do the parsing and comparisons on the fly. We can detect what format the timestamps in the table are quite easily (and update when we're wrong), making the experience rather seamless for the end user.

Agree with this direction - the core change is presenting dates in a format that is meaningful to the user. Maybe this could be a project-level setting for setting a data format preference, then we can default all dates to that format.

@paolodamico
Copy link
Contributor

The UTC thing is definitely important but not as important IMO as the filter support. If I can filter users who signed up before/after X time, if I have some timezone wonkiness it'll be a minor setback vs. a complete blocker. From a UX standpoint, I would suggest if we could automatically recognize either UNIX timestamps or ISO-8601 date strings (whether date or date/time) and automatically query them on the fly.

@paolodamico paolodamico changed the title More approachable time units in filters Allow filtering with date/time properties Nov 25, 2021
@pauldambra
Copy link
Member

Braindumping some thoughts

  1. I'm making an assumption that we don't want to detect dates and times and store them differently on write so that we don't have to reprocess old data. ClickHouse has settings that appear to apply, but only apply if we are storing in DateTime fields so we can ignore them

  2. we write timestamp, created at, and _timestamp when we store an event. They are ClickHouse DateTime types in UTC. If we want to treat them as a DateTime in a query we already can

  3. if we want to include event properties (and wouldn't we?!)....

Searching for times with a datepicker when the property is a string in the format of a timestamp (event.properties.start_time >= "one day ago"). This is what the current issue is about. My suggestion: we can store in propertydefinition the type of the date (YYYY-MM-DDTHH:MM vs YYYY-MM-DD HH:MM ZZZ vs YYYY-MM-DD, etc), and perhaps there's a clickhouse function that can do the parsing and comparisons on the fly. We can detect what format the timestamps in the table are quite easily (and update when we're wrong), making the experience rather seamless for the end user.

ClickHouse can do some parsing when selecting strings that we know are dates

CREATE TABLE dates_and_times
(
    `timestamp` String,
    `format` String
)
ENGINE = TinyLog;

-- a selection of valid but differently formatted versions of 2020-02-01T10:18:56Z
insert into dates_and_times VALUES
                      ('2020-02-01', 'YYYY-MM-DD'),
                      ('20200201', 'YYYYMMDD'),
                      ('2020-02-01 10:18:56', 'YYYY-MM-DD hh:mm:ss'),
                      ('2020-02-01T10:18:56Z', 'ISO 8601 UTC'),
                      ('2020-02-01T11:18:56+01:00', 'ISO 8601 with timezone'),
                      ('200201', 'YYMMDD'),
                      ('20-02-01', 'YY-MM-DD'),
                      ('0201', '--MMDD'),
                      ('02-01', '--MM-DD'),
                      ('1580552336', 'unix timestamp');

-- results when loading those values as a Date or DateTime
select toDateTime(timestamp) from dates_and_times where format = 'YYYY-MM-DD';
/* ✅ 2020-02-01 00:00:00 */

select toDate(timestamp) from dates_and_times where format = 'YYYY-MM-DD';
/* ✅ 2020-02-01 */

select toDateTime(timestamp) from dates_and_times where format = 'YYYYMMDD';
/* ❌ 1970-08-22 19:10:01 */

select toDate(timestamp) from dates_and_times where format = 'YYYYMMDD';
/* ❌ error */

select toDateTime(timestamp) from dates_and_times where format = 'ISO 8601 UTC';
/* ❌ error */

select toDate(timestamp) from dates_and_times where format = 'ISO 8601 UTC';
/* ❌ error */

select toDateTime(timestamp) from dates_and_times where format = 'ISO 8601 with timezone';
/* ❌ error */

select toDateTime(timestamp) from dates_and_times where format = 'unix timestamp';
/* ✅ 2020-02-01 10:18:56 */

select toDateTime(timestamp) from dates_and_times where format = 'YYMMDD';
/* ❌ 1970-01-03 07:36:41 */

select toDate(timestamp) from dates_and_times where format = 'YYMMDD';
/* ❌ error */

select toDateTime(timestamp) from dates_and_times where format = '--MMDD';
/* ❌ error */

select toDateTime(timestamp) from dates_and_times where format = '--MM-DD';
/* ❌ error */

So if we constrain the input string formats we can query the strings as dates without any processing and take advantage of the built-in date and time functions

Where we can't constrain the input (and we won't always be able to) we can convert some values with simple processing

e.g. ISO8601 ending in Z we know is in UTC so

select toDateTime(substr(timestamp, 1, 19)) from dates_and_times where format = 'ISO 8601 UTC';
/* ✅  2020-02-01 10:18:56 */

Since the user has provided a format we can exclude items that don't match that format at query time

e.g.

insert into dates_and_times VALUES
                      ('202002-01 10:18:56', 'mismatched YYYY-MM-DD hh:mm:ss');

-- now there are two rows that are almost the same

-- with a very naive REGEX
select * from dates_and_times where match(timestamp, '^\d\d\d\d-\d\d-\d\d \d\d:\d\d:\d\d$')
/* ✅  one row returned `2020-02-01 10:18:56` */

Since we can detect incorrect formatting on event properties in order to exclude them, we could also warn the user about them... although that might be frustrating if there's no onward journey for the user to edit or remove bad data

So we could allow someone to say that property X is a DateTime with one of a list of allowed formats. We can look that up from Postgres before querying in ClickHouse, and then convert the item to DateTime as we query

@pauldambra
Copy link
Member

pauldambra commented Dec 2, 2021

On the subject of timestamps, if we'd all live on the US west coast, we would have probably fixed by now the issue that all times in the app are UTC. There has to be a way to group by date in your local timezone.

"All functions for working with the date and time that have a logical use for the time zone can accept a second optional time zone argument."

So...

select toDateTime(timestamp), toDateTime(timestamp, 'US/Samoa') from dates_and_times where format = 'YYYY-MM-DD hh:mm:ss';

returns 2020-02-01 10:18:56 and 2020-02-01 21:18:56

So, if we can detect, or allow the user to provide, a timezone we can use that when querying and generating results

@pauldambra
Copy link
Member

Breakdown by day of week / day of month / etc, based on either the event's timestamp or some other custom property.

There are built-in ClickHouse functions e.g.

select
       toDateTime(timestamp),
       toDayOfMonth(toDateTime(timestamp)),
       toDayOfWeek(toDateTime(timestamp)),
       toDayOfYear(toDateTime(timestamp))
from dates_and_times where format = 'YYYY-MM-DD hh:mm:ss';

returns

2020-02-01 10:18:56, 1, 6, 32

Once we know a particular property is a date we can add queries that take advantage of that

@pauldambra
Copy link
Member

pauldambra commented Dec 2, 2021

It seems like a reasonable first step is letting (paid?) customers specify that particular properties are DateTime with a choice from a constrained format list of YYYY-MM-DD, YYYY-MM-DD hh:mm:ss, ISO 8601 UTC, and UNIX timestamp.

And picking a single use-case to provide on top of that... Maybe querying from the Taxonomic Breakdown Filter for same time, same date, before, and after

That feels like a good step to get feedback on and a starting point for more complex capabilities

@Twixes @mariusandra @clarkus ??

@kristiehoward
Copy link

One more request / +1 for this feature, with our use case below if that helps - Slack thread here. We are Pro users :)

The AWS outage caused issues for our users, and we were trying to filter events by time (ex. “find all the create workflow events that happened after 12/7/2021 3:30 PM GMT”) to see the number of users impacted. @maxime Heckel and I were surprised that we couldn’t filter events by a datetime or date. We discovered that there was a unix epoc timestamp (seconds), but it was really quite challenging for us to answer that specific query. Are there any plans in the works to improve the way searching for time works in the UI? The UI displays formatted times but doesn’t allow us to filter by them. It’s really non-intuitive for us to go to an online converter to find a unix timestamp for a given datetime and then go back to the UI and hope that it was right.

@pauldambra
Copy link
Member

ClickHouse parseDateTimeBestEffort is much more flexible. It succeeds with more of the example test cases from above

CREATE TABLE dates_and_times
(
    `timestamp` String,
    `format` String
)
ENGINE = TinyLog;

-- a selection of valid but differently formatted versions of 2020-02-01T10:18:56Z
insert into dates_and_times VALUES
                      ('2020-02-01', 'YYYY-MM-DD'),
                      ('20200201', 'YYYYMMDD'),
                      ('2020-02-01 10:18:56', 'YYYY-MM-DD hh:mm:ss'),
                      ('2020-02-01T10:18:56Z', 'ISO 8601 UTC'),
                      ('2020-02-01T11:18:56+01:00', 'ISO 8601 with timezone'),
                      ('200201', 'YYMMDD'),
                      ('20-02-01', 'YY-MM-DD'),
                      ('0201', '--MMDD'),
                      ('02-01', '--MM-DD'),
                      ('1580552336', 'unix timestamp');

-- results when loading those values as a Date or DateTime
select parseDateTimeBestEffort(timestamp) from dates_and_times where format = 'YYYY-MM-DD';
/* ✅ 2020-02-01 00:00:00 */

select toDate(parseDateTimeBestEffort(timestamp)) from dates_and_times where format = 'YYYY-MM-DD';
/* ✅ 2020-02-01 */

select parseDateTimeBestEffort(timestamp) from dates_and_times where format = 'YYYYMMDD';
/* ✅ 2020-02-01 00:00:00 */

select toDate(parseDateTimeBestEffort(timestamp)) from dates_and_times where format = 'YYYYMMDD';
/* ✅ 2020-02-01 */

select parseDateTimeBestEffort(timestamp) from dates_and_times where format = 'ISO 8601 UTC';
/* ✅ 2020-02-01 10:18:56 */

select parseDateTimeBestEffort(timestamp) from dates_and_times where format = 'ISO 8601 UTC';
/* ✅ 2020-02-01 10:18:56 */

select parseDateTimeBestEffort(timestamp) from dates_and_times where format = 'ISO 8601 with timezone';
/* ✅ 2020-02-01 10:18:56 */

select parseDateTimeBestEffort(timestamp) from dates_and_times where format = 'unix timestamp';
/* ✅ 2020-02-01 10:18:56 */

select toDate(parseDateTimeBestEffort(timestamp)) from dates_and_times where format = 'YYMMDD';
/* ✅ 2002-01-01 */

select toDate(parseDateTimeBestEffort(timestamp)) from dates_and_times where format = 'YYMMDD';
/* ✅ 2002-01-01 */

select parseDateTimeBestEffort(timestamp) from dates_and_times where format = '--MMDD';
/* ❌ 2061-02-06 06:28:16 */

select toMonth(parseDateTimeBestEffort(timestamp)), toDayOfMonth(parseDateTimeBestEffort(timestamp))  from dates_and_times where format = '--MM-DD';
/* ❌ 01, 02 (should be 2, 1 */

@posthog-contributions-bot
Copy link
Contributor

This issue has 2002 words at 17 comments. Issues this long are hard to read or contribute to, and tend to take very long to reach a conclusion. Instead, why not:

  1. Write some code and submit a pull request! Code wins arguments
  2. Have a sync meeting to reach a conclusion
  3. Create a Request for Comments and submit a PR with it to the meta repo or product internal repo

Is this issue intended to be sprawling? Consider adding label epic or sprint to indicate this.

@pauldambra
Copy link
Member

TODO

@pauldambra
Copy link
Member

switched to 100% of cloud users

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design Issues that need a designer's attention enhancement New feature or request P1 Urgent, non-breaking (no crash but low usability)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants