Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Separated Statistics (Target) #5879

Closed
wants to merge 91 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
91 commits
Select commit Hold shift + click to select a range
8374bcb
Newsfile
reivilibre Aug 19, 2019
8de9ebe
Tear out current room & user statistics (#5880)
reivilibre Aug 20, 2019
d7675e7
Add schema for Separated Statistics
reivilibre Aug 20, 2019
80a1c6e
Add storage function for storing stats deltas
reivilibre Aug 20, 2019
e4cbea6
Handle state deltas and turn them into stats deltas
reivilibre Aug 20, 2019
1819563
Ack, isort!
reivilibre Aug 20, 2019
b5573c0
Update synapse/storage/stats.py
reivilibre Aug 20, 2019
4a97eef
Update synapse/storage/stats.py
reivilibre Aug 20, 2019
6a19f7e
Add room and user statistics documentation.
reivilibre Aug 20, 2019
981c6cf
Sanitise accepted fields in `_update_stats_delta_txn`
reivilibre Aug 20, 2019
977310e
Clarify `_update_stats_delta_txn`
reivilibre Aug 20, 2019
eafa8d3
Unify name of 'stats regenerator' in schema comments.
reivilibre Aug 20, 2019
18a4c03
Remove needless defaults.
reivilibre Aug 20, 2019
7b657f1
Simplify table structure
reivilibre Aug 22, 2019
e8fc180
Fix up SQL schema delta
reivilibre Aug 22, 2019
79252d1
Fix up historical stats support.
reivilibre Aug 22, 2019
c3d2bf2
Allow schema deltas to be engine-specific
reivilibre Aug 27, 2019
1ecd1a6
Use engine-specific delta SQL files rather than delta written in Python.
reivilibre Aug 27, 2019
baeaf00
Merge branch 'develop' into rei/rss_target
reivilibre Aug 27, 2019
5043ef8
Merge branch 'rei/rss_target' into rei/rss_inc2
reivilibre Aug 27, 2019
4b7bf2e
Apply suggestions from code review
reivilibre Aug 27, 2019
81c5289
Clarify `_update_stats_delta_txn` by adding code comments and kwargs.
reivilibre Aug 27, 2019
544ba2c
Apply minor suggestions from review
reivilibre Aug 27, 2019
a6c1020
Lock tables in upsert fall-backs.
reivilibre Aug 27, 2019
736ac58
Code formatting (Black)
reivilibre Aug 27, 2019
09cbc3a
Switch to milliseconds in room/user stats for consistency.
reivilibre Aug 27, 2019
c775f31
Don't include the room & user stats docs in this PR.
reivilibre Aug 27, 2019
bc754cd
Merge branch 'rei/rss_inc2' into rei/rss_inc3
reivilibre Aug 27, 2019
3b09a37
Adapt to stats now working in milliseconds
reivilibre Aug 27, 2019
99c88ac
No-op if no membership change and thus simplify verbose dict updates.
reivilibre Aug 27, 2019
dd8e602
For user stats, handle other membership transitions properly.
reivilibre Aug 27, 2019
491eaf0
Remove obsolete `OldCollectionRequired` as old collection is obsolete.
reivilibre Aug 27, 2019
11c4e50
Rename `room_state` table to `room_stats_state`
reivilibre Aug 27, 2019
62b1250
Update `_purge_room_txn` to take account of separated stats tables
reivilibre Aug 27, 2019
07c267c
For user stats, handle other membership transitions properly.
reivilibre Aug 27, 2019
44d3c2e
Invalidate `get_earliest_token_for_stats` cache as required.
reivilibre Aug 27, 2019
10c1a23
Fix logic error.
reivilibre Aug 27, 2019
324f21b
Fix logic error.
reivilibre Aug 27, 2019
064143c
Use `DeferredLock` instead of `threading.Lock`
reivilibre Aug 27, 2019
1af7866
Clean up code with improved naming and hoist around functions.
reivilibre Aug 27, 2019
b9f1adc
Update synapse/storage/stats.py
reivilibre Aug 28, 2019
a344ad3
Code formatting (Black)
reivilibre Aug 28, 2019
cc66cf1
Merge pull request #5889 from matrix-org/rei/rss_inc2
reivilibre Aug 28, 2019
dfb22fe
Merge branch 'rei/rss_target' into rei/rss_inc3
reivilibre Aug 28, 2019
81aa6d5
Address code review comments
reivilibre Aug 28, 2019
3cdce28
Merge pull request #5890 from matrix-org/rei/rss_inc3
reivilibre Aug 28, 2019
bc2c284
Add `total_event_bytes` to room statistics schema.
reivilibre Aug 28, 2019
a13ad21
Add incremental counting for rooms' total events and total event bytes.
reivilibre Aug 28, 2019
d7a692f
Update total_events and total_event_bytes on new events.
reivilibre Aug 28, 2019
b06f294
Track new users in user statistics.
reivilibre Aug 28, 2019
73d552a
Hoist up None check to prevent trying to iterate over NoneType.keys()
reivilibre Aug 28, 2019
3b69bf3
Upsert fixes
reivilibre Aug 28, 2019
4444b9a
Code formatting (Black)
reivilibre Aug 29, 2019
39dbee2
Count total_events and total_event_bytes within the loop.
reivilibre Aug 29, 2019
f7ececb
Merge branch 'develop' into rei/rss_target
reivilibre Aug 29, 2019
7c0224d
Merge branch 'rei/rss_target' into rei/rss_inc6
reivilibre Aug 29, 2019
6048103
Merge branch 'rei/rss_target' into rei/rss_inc5
reivilibre Aug 29, 2019
9dbf42a
Merge pull request #5923 from matrix-org/rei/rss_inc5
reivilibre Aug 30, 2019
4c13f2b
Merge branch 'develop' into rei/rss_target
reivilibre Aug 30, 2019
6c582d7
Merge branch 'rei/rss_target' into rei/rss_inc6
reivilibre Aug 30, 2019
757205d
Convert `chain` to `list` as `chain` is only once iterable.
reivilibre Aug 30, 2019
44b0367
Add stats regenerator
reivilibre Aug 30, 2019
893729a
Code formatting
reivilibre Aug 30, 2019
8c02602
Merge pull request #5924 from matrix-org/rei/rss_inc6
reivilibre Aug 30, 2019
1d6cf15
Merge branch 'rei/rss_inc7' into rei/rss_inc8
reivilibre Aug 30, 2019
440c60e
Some fixes that have become necessary due to changes in other PRs
reivilibre Aug 30, 2019
065042c
Code formatting and typo pointed out by Erik.
reivilibre Aug 30, 2019
7dc387e
Merge branch 'rei/rss_inc7' into rei/rss_inc8
reivilibre Aug 30, 2019
0f2e59f
Fix that became apparent after unit testing
reivilibre Aug 30, 2019
bf6d45f
Merge branch 'rei/rss_inc7' into rei/rss_inc8
reivilibre Aug 30, 2019
b379a11
`users` table's ID field is actually called `name`.
reivilibre Aug 30, 2019
425d445
Merge branch 'rei/rss_inc7' into rei/rss_inc8
reivilibre Aug 30, 2019
4ecc62b
Whoops, took out a line there...
reivilibre Aug 30, 2019
97b2035
Merge branch 'rei/rss_inc7' into rei/rss_inc8
reivilibre Aug 30, 2019
d39c09c
Ambiguous `room_id`
reivilibre Aug 30, 2019
eba432e
Merge branch 'rei/rss_inc7' into rei/rss_inc8
reivilibre Aug 30, 2019
50c321d
Adapt to use renamed `room_state`
reivilibre Aug 30, 2019
98a8928
Merge branch 'rei/rss_inc7' into rei/rss_inc8
reivilibre Aug 30, 2019
b928909
Fix incremental processor when there are no deltas.
reivilibre Aug 30, 2019
ab11c0a
Whoopsies; these things come in order…
reivilibre Aug 30, 2019
7b977bd
Fixes to counting and stats deltas
reivilibre Aug 30, 2019
6f5e543
Various fixes
reivilibre Aug 30, 2019
d49457b
Add stats tests
reivilibre Aug 30, 2019
fc5d118
Add stats docs
reivilibre Aug 30, 2019
21593fe
Linting
reivilibre Aug 30, 2019
fca3a9c
Fix to use milliseconds
reivilibre Aug 30, 2019
ffc30b8
Merge branch 'develop' into rei/rss_target
reivilibre Aug 30, 2019
e893214
Merge pull request #5941 from matrix-org/rei/rss_inc7
reivilibre Aug 30, 2019
84532a4
Merge branch 'rei/rss_target' of github.com:matrix-org/synapse into r…
erikjohnston Sep 2, 2019
02f759e
Renamve get_room_state
erikjohnston Sep 2, 2019
745f2da
Merge pull request #5946 from matrix-org/rei/rss_inc8
erikjohnston Sep 2, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changelog.d/5879.misc
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Rework room and user statistics to separate current & historical rows, as well as track stats correctly.
136 changes: 136 additions & 0 deletions docs/room_and_user_statistics.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
Room and User Statistics
========================

Synapse maintains room and user statistics (as well as a cache of room state),
in various tables.

These can be used for administrative purposes but are also used when generating
the public room directory. If these tables get stale or out of sync (possibly
after database corruption), you may wish to regenerate them.


# Synapse Administrator Documentation

## Various SQL scripts that you may find useful

### Delete stats, including historical stats

```sql
DELETE FROM room_stats_current;
DELETE FROM room_stats_historical;
DELETE FROM user_stats_current;
DELETE FROM user_stats_historical;
```

### Regenerate stats (all subjects)

```sql
BEGIN;
DELETE FROM stats_incremental_position;
INSERT INTO stats_incremental_position (
state_delta_stream_id,
total_events_min_stream_ordering,
total_events_max_stream_ordering,
is_background_contract
) VALUES (NULL, NULL, NULL, FALSE), (NULL, NULL, NULL, TRUE);
COMMIT;

DELETE FROM room_stats_current;
DELETE FROM user_stats_current;
```

then follow the steps below for **'Regenerate stats (missing subjects only)'**

### Regenerate stats (missing subjects only)

```sql
-- Set up staging tables
-- we depend on current_state_events_membership because this is used
-- in our counting.
INSERT INTO background_updates (update_name, progress_json) VALUES
('populate_stats_prepare', '{}', 'current_state_events_membership');

-- Run through each room and update stats
INSERT INTO background_updates (update_name, progress_json, depends_on) VALUES
('populate_stats_process_rooms', '{}', 'populate_stats_prepare');

-- Run through each user and update stats.
INSERT INTO background_updates (update_name, progress_json, depends_on) VALUES
('populate_stats_process_users', '{}', 'populate_stats_process_rooms');

-- Clean up staging tables
INSERT INTO background_updates (update_name, progress_json, depends_on) VALUES
('populate_stats_cleanup', '{}', 'populate_stats_process_users');
```

then **restart Synapse**.


# Synapse Developer Documentation

## High-Level Concepts

### Definitions

* **subject**: Something we are tracking stats about – currently a room or user.
* **current row**: An entry for a subject in the appropriate current statistics
table. Each subject can have only one.
* **historical row**: An entry for a subject in the appropriate historical
statistics table. Each subject can have any number of these.

### Overview

Stats are maintained as time series. There are two kinds of column:

* absolute columns – where the value is correct for the time given by `end_ts`
in the stats row. (Imagine a line graph for these values)
* They can also be thought of as 'gauges' in Prometheus, if you are familiar.
* per-slice columns – where the value corresponds to how many of the occurrences
occurred within the time slice given by `(end_ts − bucket_size)…end_ts`
or `start_ts…end_ts`. (Imagine a histogram for these values)

Currently, only absolute columns are in use.

Stats are maintained in two tables (for each type): current and historical.

Current stats correspond to the present values. Each subject can only have one
entry.

Historical stats correspond to values in the past. Subjects may have multiple
entries.

## Concepts around the management of stats

### current rows

Current rows contain the most up-to-date statistics for a room.
They only contain absolute columns

#### incomplete current rows

There are also **incomplete** current rows, which are current rows that do not
contain a full count yet – this is because they are waiting for the regeneration
process to give them an initial count. Incomplete current rows DO NOT contain
correct and up-to-date values. As such, *incomplete rows are not old-collected*.
Instead, old incomplete rows will be extended so they are no longer old.

### historical rows

Historical rows can always be considered to be valid for the time slice and
end time specified. (This, of course, assumes a lack of defects in the code
to track the statistics, and assumes integrity of the database).

Even still, there are two considerations that we may need to bear in mind:

* historical rows will not exist for every time slice – they will be omitted
if there were no changes. In this case, the following assumptions can be
made to interpolate/recreate missing rows:
- absolute fields have the same values as in the preceding row
- per-slice fields are zero (`0`)
* historical rows will not be retained forever – rows older than a configurable
time will be purged.

#### purge

The purging of historical rows is not yet implemented.

13 changes: 5 additions & 8 deletions synapse/config/stats.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,19 +27,16 @@ class StatsConfig(Config):

def read_config(self, config, **kwargs):
self.stats_enabled = True
self.stats_bucket_size = 86400
self.stats_bucket_size = 86400 * 1000
self.stats_retention = sys.maxsize
stats_config = config.get("stats", None)
if stats_config:
self.stats_enabled = stats_config.get("enabled", self.stats_enabled)
self.stats_bucket_size = (
self.parse_duration(stats_config.get("bucket_size", "1d")) / 1000
self.stats_bucket_size = self.parse_duration(
stats_config.get("bucket_size", "1d")
)
self.stats_retention = (
self.parse_duration(
stats_config.get("retention", "%ds" % (sys.maxsize,))
)
/ 1000
self.stats_retention = self.parse_duration(
stats_config.get("retention", "%ds" % (sys.maxsize,))
)

def generate_config_section(self, config_dir_path, server_name, **kwargs):
Expand Down
Loading