Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support either nanosecond timestamp precision or composite primary keys #1597

Open
grenade opened this issue Nov 20, 2024 · 1 comment
Open
Labels
question Further information is requested

Comments

@grenade
Copy link

grenade commented Nov 20, 2024

Describe This Problem

my use case is trade data analytics for signal or strategy indicators. a trade has:

  • timestamp: the time (in nanosecond precision) of the execution of the trade
  • base: the tag/symbol of the commodity or security that was bought or sold
  • quote: the tag/symbol of the currency, commodity or security that the price of base is given in
  • exchange: the tag/symbol of the exchange where the trade was executed
  • id: the (uint64) id used by the exchange to uniquely identify the trade. due to the volume of trades, most exchanges use a composite key of (id, base, quote), so there are always collisions if base, quote, exchange and id are not considered together
  • price: the price (in quote) for 1 unit of base
  • size: the quantity of base in the trade. a positive size indicates a buy. a negative size indicates a sell

i am struggling to document a table definition in horaedb that does not lose trades. since there are many trades which occur within the same millisecond. nanosecond precision appears to be unavailable.

i have attempted to work around the millisecond precision limitation by using a composite primary key (time, exchange, quote, id). each base gets it's own table which allows for multi-exchange series and combining quotes where their underlying values are equivalent (ie: usdt == usdc). however it appears that horaedb accepts the definition but ignores any value in the composite key that is not the timestamp. this results in all but the first trade in a given millisecond being discarded as a duplicate.

Proposal

i would appreciate ideas about how to utilise nanosecond timestamp precision or how to correctly define a composite key that will actually work.

Additional Context

here's the table definition i have tried:

CREATE TABLE IF NOT EXISTS {base} (
    time timestamp NOT NULL,
    id uint64,
    exchange string,
    quote string,
    price double,
    size double,
    TIMESTAMP KEY(time),
    PRIMARY KEY(time, id, exchange, quote)
) ENGINE=Analytic with (enable_ttl='false')

here's an example of source data where base is btc, quote is usd and exchange is coinbase:
https://api.exchange.coinbase.com/products/BTC-USD/trades?limit=10&after=1000

@jiacai2050
Copy link
Contributor

Hi, nanoseconds is not supported in horaedb now.

For your case, I think you could add another column to bypass this limit,

CREATE TABLE IF NOT EXISTS {base} (
    time timestamp NOT NULL,
    time2 uint64 NOT NULL,
    id uint64,
    exchange string,
    quote string,
    price double,
    size double,
    TIMESTAMP KEY(time),
    PRIMARY KEY(time, time2, id, exchange, quote)
) ENGINE=Analytic with (enable_ttl='false')

Here we add another column time2 to primary keys, and it's defined as time(nanoseconds) % 1e6, so rows with same time value won't be overwritten.

@jiacai2050 jiacai2050 added the question Further information is requested label Nov 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants