Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using nanosecond precision instead of sequence_tag #87

Open
candlerb opened this issue Jun 2, 2018 · 1 comment
Open

Using nanosecond precision instead of sequence_tag #87

candlerb opened this issue Jun 2, 2018 · 1 comment

Comments

@candlerb
Copy link

candlerb commented Jun 2, 2018

This is just a suggestion for discussion/consideration.

Currently the plugin increments sequence_tag when multiple entries with identical timestamp occur, and resets it to zero when an event with different timestamp is seen. There are two problems with this.

  1. You could end up generating a lot of time series. Suppose, for example, that in one particular second you have an unusual burst of 1000 messages. You'll end up creating 1000 time series, and in future all queries will have to be across all those 1000 series, even though mostly they will be empty.

  2. If the records are being written with timestamps that are not increasingly linearly - e.g. there is some "backfill" going on - then there is a high risk of overwriting previous data, because the sequence tag is reset to zero every time the source timestamp changes. This is mainly a problem when the source timestamp has only 1-second resolution.

I would like to suggest another approach, which is to make use of the nanosecond timestamp precision of influxdb.

There are several approaches to this, but I propose the following one:

  • Send all timestamps to influxdb in nanoseconds - which is influxdb default anyway 1
  • Keep a counter which starts at a random value between 0 and 999999, and increments for every message. Never reset to zero, except to wrap around from 999999 to 0.
  • Add this value to the nanosecond timestamp sent to influxdb

This means that the stored timestamps are at worst in error by one millisecond. The chances of conflict are extremely low. If your input records have 1-second resolution then either you would have to have one million events in a single second, or you would have to be backfilling to a previous second and be unlucky enough to hit the same range of timestamps.

Even if your input records have millisecond resolution, as long as they arrive with monotonically increasing timestamps there should not be a problem, although some reordering is possible for records in adjacent milliseconds when the counter wraps. Maybe when the time precision is millisecond or better, the counter should only run between 0 and 999, so the error is no worse than one microsecond. (This upper limit can be made configurable anyway)

I did consider some other options - e.g. generating a random offset between 0 and 999999 for each record, or using a hash of the record. The former has a higher probability of collision (around 50% probability of at least one collision when 1000 records are stored in a single second). The latter will not record repeated records at all, which is undesirable for logs where you may actually want to count multiple instances of the same event.


Footnotes

  1. If time_precision rounding is required then do it in the plugin and convert back to nanoseconds; but I don't really see why anyone would want to reduce the resolution of their existing timestamps.

@candlerb
Copy link
Author

candlerb commented Jun 4, 2018

There were some limitations to that algorithm. I think this one is much better, inspired by the current sequence_tag logic.

  • Initialise: set era to a random value between 0 and 999, and seq to 0
  • Compare the timestamp of the current event (t1) with the previous event (t0), where t0 and t1 are in nanoseconds
    • If t1 < t0: reinitialise (i.e. set era to a new random value between 0 and 999, and seq to 0)
    • If t1 = t0: increment seq by 1 (unbounded, i.e. not modulo arithmetic)
    • If t1 > t0: decrement seq by min(seq, t1-t0-1). That is, we'd like to reset it to zero, but not so far as we'd reuse previous offsets
  • Emit event with timestamp t1 + era + seq

Let's say that era starts at 123, and we get a series of events with timestamps 00:00:00, 00:00:03, 00:00:08 (three events), 00:00:10, 00:00:13. These would be recorded as:

  • 00:00:00.000000123
  • 00:00:03.000000123
  • 00:00:08.000000123
  • 00:00:08.000000124
  • 00:00:08.000000125
  • 00:00:10.000000123
  • 00:00:13.000000123

Should time ever go backwards (i.e. backfilling takes place) then a new random era will be chosen; the probability of collision is low. The larger era range you chose, the lower the collision probability but the higher the timestamp error. Even with era up to 106 the error is only up to one millisecond.

This algorithm works when the source events have nanosecond resolution; it also allows unlimited numbers of events with the same timestamp. Example:

  • 00:00:00.100005042 +123 => 00:00:00.100005165
  • 00:00:00.100005744 +123 => 00:00:00.100005867
  • 00:00:00.100005931 +123 => 00:00:00.100006054
  • 00:00:00.100005931 +124 => 00:00:00.100006055
  • 00:00:00.100005931 +125 => 00:00:00.100006056
  • 00:00:00.100005931 +126 => 00:00:00.100006057
  • 00:00:00.100005933 +125 => 00:00:00.100006058
  • 00:00:00.100006007 +123 => 00:00:00.100006130

Finally: the offset value (123 to 126 above) could be emitted as a field, not a tag. This would allow the true original timestamp to be calculated, without creating a new time series for each distinct value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant