feat(persons-on-events): add required person and group columns to events table #9251

yakkomajuri · 2022-03-25T10:01:31Z

Problem

Changes

I was thinking about this for a bit and think this might be cleaner than updating the sharding migration, while also unlocking this schema change earlier in the process.

The idea here is to alter events to add the columns and if the setup is sharded alter those tables too.

Consider the scenarios:

User is on old schema, CLICKHOUSE_REPLICATION is off (for whatever reason)

We'll alter events, they'll get the new columns, and whenever they run the sharding migration the new tables will have the columns too.

User is on old schema, CLICKHOUSE_REPLICATION is on (but 0004_replicated_schema hasn't run)

We'll alter events, see that writable_events doesn't exist, and call it a day. When they run the migration the tables will have all the columns.

User is on new schema

We alter the sharded tables to get the columns.

Fresh install

They have the new columns from 0001, so 0025 will complete successfully because of IF NOT EXISTS when adding the columns.

How did you test this code?

Manually ran the migrations

…into clickhouse [nuke protobuf pt. 1]

…nts table

ee/clickhouse/migrations/0026_persons_and_groups_on_events.py

posthog/async_migrations/test/__snapshots__/test_0004_replicated_schema.ambr

ee/clickhouse/migrations/0026_persons_and_groups_on_events.py

macobo · 2022-03-25T10:18:19Z

Approach looks great - had some code-level issues!

This reverts commit 63d7126.

ee/clickhouse/migrations/0026_persons_and_groups_on_events.py

macobo · 2022-03-25T11:42:34Z

Holding off on approve due to test failures (probably due to conflicting column names) but this lgtm :)

yakkomajuri · 2022-03-25T12:27:16Z

oof yeah the new column name conflicts on joins

yakkomajuri · 2022-03-25T12:32:36Z

I haven't worked on the queries side so help me out here:

The lazy thing to do is name this something else like personid and move on. Else I can go in and change all queries joining the pdi table to SELECT pdi.person_id as person_id for now.

I'm inclined to just be lazy here, but wondering if that'll lead to some level of confusion for people writing queries given the mismatch in column names + if I'd be breaking best practices for highlighting relations in a non-relational db

@macobo

yakkomajuri · 2022-03-25T12:35:51Z

There's so much formatting, dynamic generation, etc of SQL that it would be hard for me to track down all instances of the join as someone who hasn't touched these. This PR would also suddenly get a bit more dangerous as it would touch multiple areas.

Definitely won't make decisions because "laziness" but just wondering if it would really harm anyone to have that column be called e.g. personid :D

macobo · 2022-03-25T12:37:14Z

The lazy thing to do is name this something else like personid and move on. Else I can go in and change all queries joining the pdi table to SELECT pdi.person_id as person_id for now.

I think the other stakeholder here is @EDsCODE rather than me right now. Bring this up in the sync with him!

EDsCODE · 2022-04-04T21:15:26Z

I believe everything on the query end is ok! @tiina303 @yakkomajuri

yakkomajuri · 2022-04-05T08:32:21Z

Thanks a lot - changes lgtm @EDsCODE. Care to stamp this if the rest looks ok?

yakkomajuri · 2022-04-05T10:23:00Z

@tiina303 maybe give this a quick look as well?

tiina303

Looks good to me

tiina303 · 2022-04-05T15:21:30Z

I'm not that worried about the querying side - if we mess up there some query won't work and we can hot fix it (on that note we should keep an eye on support requests, failures in sentry), if we mess up something on the table/migration side that could be trickier to resolve.

yakkomajuri · 2022-04-11T09:43:42Z

I'm holding off on merging this until #9207 is fully sorted. #9368 should be the missing piece

yakkomajuri · 2022-04-12T15:07:11Z

Actually this migration also needs to update the kafka and MV tables

ee/clickhouse/migrations/0026_persons_and_groups_on_events.py

…s to events table (#9251)" This reverts commit 3d71ad0.

…s to events table (#9251)" (#9406) This reverts commit 3d71ad0.

yakkomajuri and others added 7 commits March 23, 2022 14:32

refactor(ingestion): establish setup for json consumption from kafka …

9e722df

…into clickhouse [nuke protobuf pt. 1]

address review

4726e74

fix kafka table name across the board

90bc6a1

Update posthog/async_migrations/test/test_0004_replicated_schema.py

f6791a5

run checks

42a8fa2

feat(persons-on-events): add required person and group columns to eve…

d3a8698

…nts table

rename

4b525df

yakkomajuri changed the title ~~Events table new schema~~ feat(persons-on-events): add required person and group columns to events table Mar 25, 2022

yakkomajuri requested a review from macobo March 25, 2022 10:02

update snapshots

63d7126

macobo reviewed Mar 25, 2022

View reviewed changes

ee/clickhouse/migrations/0026_persons_and_groups_on_events.py Outdated Show resolved Hide resolved

macobo reviewed Mar 25, 2022

View reviewed changes

ee/clickhouse/migrations/0026_persons_and_groups_on_events.py Outdated Show resolved Hide resolved

macobo reviewed Mar 25, 2022

View reviewed changes

posthog/async_migrations/test/__snapshots__/test_0004_replicated_schema.ambr Outdated Show resolved Hide resolved

macobo reviewed Mar 25, 2022

View reviewed changes

ee/clickhouse/migrations/0026_persons_and_groups_on_events.py Outdated Show resolved Hide resolved

yakkomajuri added 5 commits March 25, 2022 10:46

address review

6a43874

Revert "update snapshots"

e15c9b0

This reverts commit 63d7126.

address review

6f44a59

update snapshots

3da77ef

Merge branch 'nuke-protobuf-pt1' into events-table-new-schema

6e931dc

macobo reviewed Mar 25, 2022

View reviewed changes

ee/clickhouse/migrations/0026_persons_and_groups_on_events.py Outdated Show resolved Hide resolved

macobo reviewed Mar 25, 2022

View reviewed changes

ee/clickhouse/migrations/0026_persons_and_groups_on_events.py Outdated Show resolved Hide resolved

yakkomajuri added 2 commits March 25, 2022 12:22

update more snapshots

8954406

Merge branch 'nuke-protobuf-pt1' into events-table-new-schema

2bf3fb4

use runpython

3fb7a66

EDsCODE added 2 commits April 4, 2022 12:51

fix ambiguous test

673b7f6

fix queries'

1c35f77

Base automatically changed from nuke-protobuf-pt1 to master April 4, 2022 17:29

EDsCODE added 2 commits April 4, 2022 16:15

last bits

8dccb82

merge master

b298462

yakkomajuri requested a review from tiina303 April 5, 2022 10:23

tiina303 approved these changes Apr 5, 2022

View reviewed changes

EDsCODE approved these changes Apr 5, 2022

View reviewed changes

fix typo to retrigger tests

0fe0fd0

yakkomajuri marked this pull request as draft April 12, 2022 15:07

also handle kafka and mv tables in migration

d970cf8

yakkomajuri marked this pull request as ready for review April 12, 2022 18:12

update snapshots

1a37d3b

tiina303 reviewed Apr 12, 2022

View reviewed changes

ee/clickhouse/migrations/0026_persons_and_groups_on_events.py Outdated Show resolved Hide resolved

yakkomajuri commented Apr 13, 2022

View reviewed changes

ee/clickhouse/migrations/0026_persons_and_groups_on_events.py Outdated Show resolved Hide resolved

drop tables if exists

dd8e435

yakkomajuri merged commit 3d71ad0 into master Apr 13, 2022

yakkomajuri deleted the events-table-new-schema branch April 13, 2022 10:48

yakkomajuri added a commit that referenced this pull request Apr 13, 2022

Revert "feat(persons-on-events): add required person and group column…

43bd70b

…s to events table (#9251)" This reverts commit 3d71ad0.

yakkomajuri mentioned this pull request Apr 13, 2022

fix(persons-on-events): revert add required person and group columns to events table #9406

Merged

yakkomajuri added a commit that referenced this pull request Apr 13, 2022

Revert "feat(persons-on-events): add required person and group column…

c7d3733

…s to events table (#9251)" (#9406) This reverts commit 3d71ad0.

tiina303 restored the events-table-new-schema branch April 13, 2022 23:26

tiina303 mentioned this pull request Apr 13, 2022

feat: Add person info to events #9404

Merged

yakkomajuri mentioned this pull request Apr 25, 2022

feat(persons-on-events): add groups and persons columns to events schema #9510

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(persons-on-events): add required person and group columns to events table #9251

feat(persons-on-events): add required person and group columns to events table #9251

yakkomajuri commented Mar 25, 2022

macobo commented Mar 25, 2022

macobo commented Mar 25, 2022

yakkomajuri commented Mar 25, 2022

yakkomajuri commented Mar 25, 2022

yakkomajuri commented Mar 25, 2022

macobo commented Mar 25, 2022

EDsCODE commented Apr 4, 2022

yakkomajuri commented Apr 5, 2022

yakkomajuri commented Apr 5, 2022

tiina303 left a comment

tiina303 commented Apr 5, 2022

yakkomajuri commented Apr 11, 2022

yakkomajuri commented Apr 12, 2022

feat(persons-on-events): add required person and group columns to events table #9251

feat(persons-on-events): add required person and group columns to events table #9251

Conversation

yakkomajuri commented Mar 25, 2022

Problem

Changes

How did you test this code?

macobo commented Mar 25, 2022

macobo commented Mar 25, 2022

yakkomajuri commented Mar 25, 2022

yakkomajuri commented Mar 25, 2022

yakkomajuri commented Mar 25, 2022

macobo commented Mar 25, 2022

EDsCODE commented Apr 4, 2022

yakkomajuri commented Apr 5, 2022

yakkomajuri commented Apr 5, 2022

tiina303 left a comment

Choose a reason for hiding this comment

tiina303 commented Apr 5, 2022

yakkomajuri commented Apr 11, 2022

yakkomajuri commented Apr 12, 2022