Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Temperature-aware data partitioning #348

Merged
merged 178 commits into from
Oct 18, 2021
Merged

Temperature-aware data partitioning #348

merged 178 commits into from
Oct 18, 2021

Conversation

hennlo
Copy link
Member

@hennlo hennlo commented Jul 22, 2021

This PR introduces an extension to horizontal partitioning in form of a new partition function: Temperature-aware as well as a change to the underlying representation of partitions in general.
All changes are accompanied by several new test cases to verify correctness.

Partition Representation

Summary

This internal change of structural representation was used as a prerequisite to enable Temperature-aware partitioning
as well as removing the currently existing constraints with horizontal partitioning: Worst-case routing and enforced full placements as fallback. Each Partition of a table is now represented as an individual physical table linked to the original table.
For simplicity reasons, each "unpartitioned table" contains now exactly one partition.

Changes

  • Partition concept

    • renamed Partitions to PartitionGroups as a logical grouping for the user to place the groups as desired
    • created new Partitions as part of exactly one PartitionGroup
    • PartitionGroups are on the other hand connected to exactly one table
  • Partitions are now physically created as independent physical tables on the underlying store and logically mapped to the table

  • Statements can now logically involve several physical partitions which eases the combination between vertical and horizontal partitioning

  • With the introductions of PhysicalTables for each partition each Table is now considered as being partitioned
    and now has at least on PartitionGroup, Partition and PartitionPlacement, this unifies code handling and simplifies current limitations.

  • The routing has been adjusted to work with PartitionPlacements

    • Batch Inserts are now possible
    • SELECT statements identify all accessed partitions and combine vertical and horizontal partitioning by gathering every necessary ColumnPlacement for each requested partition
    • DataMigrator is now able to copy from and to an arbitrary number of sub partitions of a table.

New Features

  • New Catalog Item: CatalogPartitionGroup
  • New Catalog Item: CatalogPartition
  • New Catalog Item: CatalogPartitionPlacement to uniquely identify each partition per adapter
    • this entity now holds the actual table name

Tests

  • Checks if correct number of CatalogPartitionPlacements are created

    • For one table
    • For an additional store
    • if placements are correctly altered when partitions are modified
    • If merge partitions is handled correctly
  • Check if physical tables are correctly created per partition

  • Checks if Batch Inserts are correctly executed and distributed to the right partitions

Temperature-Aware Partitioning

Summary

Temperature aware partitioning is a new functionality (provided as a unique partition manager) which is actually built on top of the existing partition managers HASH, LIST, RANGE. And utilizes these PartitionManagers to identify and retrieve partitions that have been accessed. Instead of a 1:1 binding between PartitionGroup & Partition, TEMPERATURE has exactly two predefined groups: HOT & COLD.
Additionally, TEMPERATURE creates a number of internal partitions which serve as data fragments to be able to analyze accesses frequencies based on READ, WRITE or TOTAL ACCESSES on these internal partitions.
After a centrally predefined time (configurable via ConfigurationManager) these access frequencies are analyzed and partitions are moved between PartitionGroups and consequently between stores where these groups have been placed. Triggering a create Table of new partitions on a store including a partitionwise dataCopy and the removal of swapped out partitions.

New Features

  • New Class PartitionProperty to centrally attach essential partition information to each table.
  • New Class PartitionInformation to attach parsed information to a single object without altering the constructor of utilizing classes for every new extension
  • A new partition manager TemperatureAwarePartitionManager
    • which uses the capabilities of the built-in PartitionManagers: HASH, RANGE, LIST
    • Currently only HASH is supported (restriction in parsing, functionality is already implemented)
  • A Frequency Map based on Workload Monitoring which tracks WRITE| READ | TOTAL accesses to each underlying partition
    • consequently swaps out new hot partitions form cold to hot
    • and from hot to cold including copy of data if partitions have been swapped.
    • internally then reassigns partitions to new PartitionGroups

Tests

  • If temperature-partitioned table is correctly created
  • If the custom PartitionProperty is correctly created and filled with all necessary information
  • If the correct internal partition function is used
  • If all internal partitions are correctly created and distributed between HOT and COLD storage
  • If Batch Insert is correctly executed
  • If the most frequently accessed tables are indeed moved from cold to hot after some statements

Additional Improvements

hennlo and others added 30 commits March 5, 2021 06:19
- Initialize Monitoring in Polypheny startup
- Add event monitors in query processing
- Initialize Monitoring in Polypheny startup
- Add event monitors in query processing
Copy link
Member

@vogti vogti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx, @hennlo, for this PR!

@vogti vogti merged commit 3ee4dcf into master Oct 18, 2021
@vogti vogti deleted the temp-aware_partitioning branch October 18, 2021 14:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-in-progress Still working on this pull request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PostgreSQL-Adapter (remote): Problems when there is no connection to DB
2 participants