Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inheritance support #8

Closed
kmatt opened this issue Apr 8, 2014 · 8 comments
Closed

Inheritance support #8

kmatt opened this issue Apr 8, 2014 · 8 comments

Comments

@kmatt
Copy link

kmatt commented Apr 8, 2014

For time series data, one approach in Postgres to manage data warehouse space is to create a set of inherited tables constrained by a date/time interval, and drop or move old partitions when no longer needed in the active working set.

This would also be a way to manage data size in cstore_fdw given its append-only approach.

Is inheritance on the road map?

@kmatt kmatt changed the title Inheritance support (?) Inheritance support Apr 8, 2014
@jasonmp85
Copy link
Contributor

Inheritance support for foreign tables has been under serious discussion for some time but is not yet supported by PostgreSQL. The most recent CommitFest even had a patch to add support but it was not committed.

When supported by PostgreSQL, it's possible cstore_fdw could be modified to support the new functionality, but we're unlikely to put together our own inheritance solution before that point.

The main purpose of inheritance in partitioning is to enable PostgreSQL to use constraint exclusion when planning queries. Since foreign tables also do not support CHECK constraints that's not of any use here, so there's really no reason you can't do your own partitioning manually using a view defined with UNION ALL.

@kmatt
Copy link
Author

kmatt commented Apr 8, 2014

Is there any other federation approach that can be used with FDWs, cstore_fdw specifically, to manage disk usage? The lack of DELETE support is at odds with the typical approach of avoiding partitioning with column stores.

I assume a UNION ALL view could be used, but would that introduce unwanted scans and impact SELECT performance?

@pykello
Copy link
Contributor

pykello commented Apr 9, 2014

Another option is using CitusDB. In CitusDB you can create a distributed cstore_fdw table, and once in a while use \stage command to add more shards to the table.

CitusDB will store the min/max values for each shard in its metadata catalogs, so for a query the cstore_fdw shards that aren't related are never scanned.

Once you don't need a shard anymore, you can drop it and clean up it's resources.

@kmatt
Copy link
Author

kmatt commented Apr 9, 2014

Noting from jasonmp85 that inherited tables are not currently by the Postgres FDW mechanism, closing this issue, and looking at CitusDB as a potential solution.

@kmatt kmatt closed this as completed Apr 9, 2014
@ozgune
Copy link

ozgune commented Apr 11, 2014

We'd love for you to give CitusDB a try and see if it fits your needs.

For completeness, cstore_fdw has built-in skip indexes. If your data inherently has a time dimension to it, and if your queries also filter on time, cstore_fdw will automatically skip over unrelated "blocks."

The number of rows skipped (in each comparison) is defined by block_row_count. This setting defaults to 10,000 and increasing it may help with your use-case.

@kmatt
Copy link
Author

kmatt commented Apr 11, 2014

CitusDB is on my short list, and initial performance tests with cstore_fdw are promising (especially on ZFS/lz4).

In this issue however, diskspace management is the interest. cstore_fdw without DELETE support means my time-series data grows rapidly (500GB/day) without a "convenient" way to trim older data. COPY only support (no INSERT/UPDATE) is not a significant concern, as append-only works for time-series metrics (provided all buckets are already aggregated), but DELETE would make things much easier.

I will be looking into the suggestion @pykello made about about CitusDB shards.

@kmatt
Copy link
Author

kmatt commented Apr 11, 2014

@pykello "In CitusDB you can create a distributed cstore_fdw table, and once in a while use \stage command to add more shards to the table."

Perhaps a FAQ or post on how to do this for CitusDB newbies (like me)? Specifically, how are shards managed and dropped?

Creation of shards is touched on in the examples with \stage, and http://www.citusdata.com/docs/before-production states "If disk space becomes an issue, users need to manually clean up the old shards in the database", but I cannot find documentation as to the process.

@ozgune
Copy link

ozgune commented May 5, 2014

For completeness, it looks like inheritance support for foreign tables has been part of 9.4's commit fest.

https://commitfest.postgresql.org/action/patch_view?id=1386

If this change goes in, cstore_fdw will automatically pick up inheritance support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants