Skip to content

Commit

Permalink
[DNM] rowexec, rowcontainer: add an invertedJoiner, for
Browse files Browse the repository at this point in the history
joining two tables where one has an inverted index

There are currently no tests for this code. There are also some
questions listed as bare todos ("TODO:") in the code, relating to
descriptors and using the encoded inverted column to construct
spans to retrieve from the index. I'd like to get a sanity check
on the approach, and get answers to those questions before
proceeding to add tests.

- InvertedJoinerSpec is the spec for the invertedJoiner and
  consists of two expressions:
  - involving the inverted column and the corresponding column
    in the input that will be used for lookup. For geospatial
    these would be either both geometry columns or both geography
    columns.
  - on expression that can involve the other columns on the
    two sides.
  The join is a conjunction of both expressions.
- RowToInvertedIndexExpr is an interface that uses an input row
  to produce a reverse polish set expression involving spans of
  the inverted column. For geospatial, this will be implemented
  by the functionality in GeographyIndex and GeometryIndex.
- invertedJoiner is given an implementation of RowToInvertedIndexExpr
  for the join it is executing, so it can be abstracted from the
  details on how each input row is converted into an expression.
  invertedJoiner operates analogous to a "lookup join" -- it
  consumes a batch of input rows, computes the expression for
  each row, unions the spans needed by this batch of expressions,
  and fetches from the inverted index to evaluate the expressions.
  invertedJoiner will be used for geospatial joins and could be used
  for JSON and array joins (it is not clear to me if we currently
  use inverted indexes for JSON and array).
- batchedInvertedExprEvaluator is used by invertedJoiner to
  evaluate the join on a batch of input rows. It is also to be
  used for the non-join case where one is selecting from a table
  using an expression that involves literals and the inverted column.
- InvertedIndexRowContainer is used by invertedJoiner to dedup
  the rows retrieved from the inverted index (minus the inverted
  column). This allows the expr evaluators to work with integers
  as set members. There is only a memory-backed implementation for
  now but adding a disk-backed implementation will be straightforward.

Release note: None
  • Loading branch information
sumeerbhola committed Apr 29, 2020
1 parent 56a836c commit 6eea279
Show file tree
Hide file tree
Showing 5 changed files with 1,967 additions and 226 deletions.
Loading

0 comments on commit 6eea279

Please sign in to comment.