Materialized view implementation #5556

zhangxinyu1 · 2018-03-30T03:52:03Z

Target

To optimize query.

Implementation

There are two extensions namely materialized-view-maintenance and materialized-view-selection.

In materialized-view-maintenance, MaterializedViewSupervisor is used to generate or drop derived datasource segments and keep the timeline's consistency of base datasource and derived datasource.

In materialized-view-selection, MaterializedViewQuery is implemented to do materialized-view-selection for topn/groupby/timeseries query.

The detailed design and discussion is in issue #5304

Usage

Loading materialized-view-maintenance and materialized-view-selection. Notices: materialized-view-selection can only be loaded when start broker.
Submit a MaterializedViewSupervisor. e.g.:

{
  "type" : "derivativeDataSource",
  "baseDataSource": "wikiticker",
  "dimensionsSpec":{
            "dimensions" : [
              "isUnpatrolled",
              "metroCode",
              "namespace",
              "page",
              "regionIsoCode",
              "regionName",
              "user"
            ]
          },
    "metricsSpec" : [
        {
          "name" : "count",
          "type" : "count"
        },
        {
          "name" : "added",
          "type" : "longSum",
          "fieldName" : "added"
        }
      ],
  "tuningConfig": {
      "type" : "hadoop"
  }
}

Send a MaterializedViewQuery. e.g.:

{
    "queryType": "view",
    "query": {
        "queryType": "groupBy",
        "dataSource": "wikiticker",
        "granularity": "all",
        "dimensions": [
            "user"
        ],
        "limitSpec": {
            "type": "default",
            "limit": 1,
            "columns": [
                {
                    "dimension": "added",
                    "direction": "descending",
                    "dimensionOrder": "numeric"
                }
            ]
        },
        "aggregations": [
            {
                "type": "longSum",
                "name": "added",
                "fieldName": "added"
            }
        ],
        "intervals": [
            "2015-09-12/2015-09-13"
        ]
    }
}

jihoonson · 2018-03-30T04:25:01Z

@zhangxinyu1 thanks for raising this PR! Would you add a link to the proposal here?

jihoonson · 2018-03-30T04:25:46Z

Oh, never mind. It's already here.

jihoonson · 2018-03-30T18:11:10Z

I restarted Travis. @zhangxinyu1 would you check the TeamCity inspection failure?

jihoonson · 2018-04-03T00:29:47Z

@zhangxinyu1 thanks for the fix. I'll start my review. BTW, did you have a chance to test this feature in some real clusters?

zhangxinyu1 · 2018-04-03T02:17:09Z

@jihoonson Thanks!
Yes, we have real clusters running with this feature, but the version of these clusters are 0.10.0 and this feature in our clusters is implement based on 0.10.0. However, I have tested some functions of this implementation based on 0.13.0-SNAPSHOT in our test cluster. Do you have any suggestions about testing this feature?

jihoonson

Reviewed up to MaterializedViewMetadataCoordinator.

jihoonson · 2018-04-04T17:38:00Z

...intenance/src/main/java/io/druid/indexing/materializedview/DerivativeDataSourceMetadata.java

+import java.util.Objects;
+import java.util.Set;
+
+@JsonTypeName("view")


I believe we will have more types of views in the future. Please use more specific name like derivativeDataSource.

BTW, this annotation is not needed since you added a NamedType here.

jihoonson · 2018-04-04T17:39:09Z

...intenance/src/main/java/io/druid/indexing/materializedview/DerivativeDataSourceMetadata.java

+    Preconditions.checkNotNull(baseDataSource, "baseDataSource cannot be null. This is not a valid DerivativeDataSourceMetadata.");
+    Preconditions.checkNotNull(dimensions, "dimensions cannot be null. This is not a valid DerivativeDataSourceMetadata.");
+    Preconditions.checkNotNull(metrics, "metrics cannot be null. This is not a valid DerivativeDataSourceMetadata.");
+    this.baseDataSource = baseDataSource;


nit: Can be simplified to this.baseDataSource = Preconditions.checkNotNull(baseDataSource, "baseDataSource cannot be null. This is not a valid DerivativeDataSourceMetadata.");

jihoonson · 2018-04-04T17:50:54Z

...intenance/src/main/java/io/druid/indexing/materializedview/DerivativeDataSourceMetadata.java

+  }
+
+  @Override
+  public boolean matches(DataSourceMetadata other) 


Looks like the logic is almost same with equals(). Then it would be better to call equals() here.

jihoonson · 2018-04-04T17:52:54Z

...intenance/src/main/java/io/druid/indexing/materializedview/DerivativeDataSourceMetadata.java

+  @Override
+  public DataSourceMetadata plus(DataSourceMetadata other)
+  {
+    // DerivedDataSourceMetadata is not allowed to change


Then, this should throw UnsupportedOperationException. If this causes a problem, you might need to add some methods like isMergeable() and isSubtractable() to the DataSourceMetadata interface.

jihoonson · 2018-04-04T17:52:59Z

...intenance/src/main/java/io/druid/indexing/materializedview/DerivativeDataSourceMetadata.java

+  @Override
+  public DataSourceMetadata minus(DataSourceMetadata other) 
+  {
+    // DerivedDataSourceMetadata is not allowed to change


Same here. This should throw UnsupportedOperationException.

jihoonson · 2018-04-04T21:46:43Z

...ce/src/main/java/io/druid/indexing/materializedview/MaterializedViewMetadataCoordinator.java

+        new HandleCallback<List<Pair<DataSegment, String>>>() 
+        {
+          @Override
+          public List<Pair<DataSegment, String>> withHandle(Handle handle) throws Exception 


This method doesn't throw Exception.

jihoonson · 2018-04-04T21:47:10Z

...ce/src/main/java/io/druid/indexing/materializedview/MaterializedViewMetadataCoordinator.java

+                  public Pair<DataSegment, String> map(int index, ResultSet r, StatementContext ctx) throws SQLException 
+                  {
+                    try {
+                      return new Pair<DataSegment, String>(


nit: unnecessary type arguments.

jihoonson · 2018-04-04T21:52:55Z

...ce/src/main/java/io/druid/indexing/materializedview/MaterializedViewMetadataCoordinator.java

+    this.connector = connector;
+  }
+
+  public void insertDataSourceMetadata(String dataSource, DataSourceMetadata metadata) 


Probably this method should be merged into IndexerSQLMetadataStorageCoordinator.resetDataSourceMetadata() and that method should check an entry already exists in metastore and insert a new entry if it doesn't. Otherwise, it can update the existing entry.

Yes, this method can be merged into IndexerSQLMetadataStorageCoordinator.resetDataSourceMetadata(). However, maybe we can do it in another pr, because we should consider the logic of code where used this method.

jihoonson · 2018-04-04T21:56:17Z

...ce/src/main/java/io/druid/indexing/materializedview/MaterializedViewMetadataCoordinator.java

+    );
+  }
+
+  public Map<DataSegment, String> getSegmentAndCreatedDate(String dataSource, Interval interval)


This method should return only used segments. Please add a method like getUsedSegmentsForInterval() which returns List<Pair<DataSegment, String>> to IndexerMetadataStorageCoordinator.

jihoonson · 2018-04-04T21:57:07Z

...ce/src/main/java/io/druid/indexing/materializedview/MaterializedViewMetadataCoordinator.java

+
+  public Map<DataSegment, String> getSegmentAndCreatedDate(String dataSource, Interval interval)
+  {
+    List<Pair<DataSegment, String>> maxCreatedDate = connector.retryWithHandle(


maxCreatedDate is a less-intuitive name.

jihoonson · 2018-04-04T23:12:11Z

Yes, we have real clusters running with this feature, but the version of these clusters are 0.10.0 and this feature in our clusters is implement based on 0.10.0. However, I have tested some functions of this implementation based on 0.13.0-SNAPSHOT in our test cluster. Do you have any suggestions about testing this feature?

@zhangxinyu1 that is great! I think it would be enough. I'll test this PR in our cluster as well.

zhangxinyu1 · 2018-04-10T02:31:47Z

@jihoonson I have modified code according to your comments. Could you please go on to review it?

jihoonson · 2018-04-10T06:19:58Z

@zhangxinyu1 sure. I'll review tomorrow.

jihoonson

@zhangxinyu1 still reviewing. Reviewed up to DataSourceOptimizer.

jihoonson · 2018-04-10T22:26:33Z

distribution/pom.xml

@@ -243,6 +243,10 @@
                                        <argument>io.druid.extensions.contrib:druid-time-min-max</argument>
                                        <argument>-c</argument>
                                        <argument>io.druid.extensions.contrib:druid-virtual-columns</argument>
+                                        <argument>-c</argument>
+                                        <argument>io.druid.extensions.contrib:materialized-view-maintenance</argument>


Would you elaborate more on why this feature is split into two extensions? If we need to always load both extensions to use this feature, it would be better to make a single extension.

I can't agree with you more. However, DataSourceOptimizer need BrokerServerView to get the timeline of different dataSources to do optimizing, and only broker has this information. Then, materialized-view-selection module has to be only loaded in broker, so I have to split it into two extensions. I thought about this for a long time, but cannot figure out how to solve this problem. Do you have any suggestions?

Do you mean that materialized-view-maintenance should be loaded only in overlords while materialized-view-selection should be loaded only in brokers?

materialized-view-selection should be loaded only in brokers, but materialized-view-maintenance can be loaded anywhere.

Ah, ok. We don't have a nice way to do this currently.. I think it's fine with going as it is. Would you please add some comments about this, especially materialized-view-selection should be loaded only in brokers?

Sure, I'm working on your comments these days. Thanks very much!

jihoonson · 2018-04-10T22:41:15Z

...maintenance/src/main/java/io/druid/indexing/materializedview/MaterializedViewSupervisor.java

+public class MaterializedViewSupervisor implements Supervisor
+{
+  private static final EmittingLogger log = new EmittingLogger(MaterializedViewSupervisor.class);
+  private static final Interval ALL_INTERVAL = Intervals.of("0000-01-01/3000-01-01");


Please use Intervals.ETERNITY instead.

Intervals.ETERNITY doesn't work well when comparing to a varchar in metastore.

Would let me know which error you saw?

Intervals.ETERNITY="-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z".

When we use it to compare the start and end of segments to get all segments from metastore, such as :
select * from druid_segments where start > '-146136543-09-08T08:23:32.096Z' and end < '146140482-04-24T15:36:27.903Z';,
An empty set will be returned, that is because no end is less than '146140482-04-24T15:36:27.903Z'.

jihoonson · 2018-04-10T22:42:40Z

...maintenance/src/main/java/io/druid/indexing/materializedview/MaterializedViewSupervisor.java

+{
+  private static final EmittingLogger log = new EmittingLogger(MaterializedViewSupervisor.class);
+  private static final Interval ALL_INTERVAL = Intervals.of("0000-01-01/3000-01-01");
+  private static final int MAX_TASK_COUNT = 1;


Looks like DEFAULT_MAX_TASK_COUNT.

jihoonson · 2018-04-10T22:47:40Z

...tenance/src/main/java/io/druid/indexing/materializedview/MaterializedViewSupervisorSpec.java

+    this.tuningConfig = Preconditions.checkNotNull(tuningConfig, "tuningConfig cannot be null. Please provide tuningConfig");
+
+    this.dataSourceName = dataSourceName == null ? 
+        StringUtils.format("%s-%s", baseDataSource, DigestUtils.sha1Hex(dimensionsSpec.toString()).substring(0, 8)) :


The line indentation is not correct. Please adjust it.

jihoonson · 2018-04-10T22:48:29Z

...tenance/src/main/java/io/druid/indexing/materializedview/MaterializedViewSupervisorSpec.java

+      @JacksonInject MaterializedViewTaskConfig config
+  )
+  {
+    this.baseDataSource = Preconditions.checkNotNull(baseDataSource, "baseDataSource cannot be null. Please provide a baseDataSource.");


Please break the line like

Preconditions.checkNotNull( baseDataSource, "baseDataSource cannot be null. Please provide a baseDataSource." );

Same for the following 3 lines.

jihoonson · 2018-04-10T23:09:59Z

...maintenance/src/main/java/io/druid/indexing/materializedview/MaterializedViewSupervisor.java

+  }
+
+  @VisibleForTesting
+  Pair<SortedMap<Interval, String>, Map<Interval, List<DataSegment>>> checkSegments()


Please add some javadoc.

jihoonson · 2018-04-10T23:14:29Z

...view-selection/src/main/java/io/druid/query/materializedview/DatasourceOptimizerMonitor.java

+import java.util.Map;
+import java.util.Set;
+
+public class DatasourceOptimizerMonitor extends AbstractMonitor 


Thanks for adding this!

jihoonson · 2018-04-10T23:16:36Z

...alized-view-selection/src/main/java/io/druid/query/materializedview/DatasourceOptimizer.java

+  private static ConcurrentHashMap<String, AtomicLong> hitCount = new ConcurrentHashMap<>();
+  private static ConcurrentHashMap<String, AtomicLong> costTime = new ConcurrentHashMap<>();
+  private static ConcurrentHashMap<String, ConcurrentHashMap<Set<String>, AtomicLong>> missFields = new ConcurrentHashMap<>();
+  private static TimelineServerView serverView = null;


This should be a final non-static variable.

serverView is used in optimize method, and this method is static.

I mean, this is should be a final non-static variable because it's quite dangerous. As you said, serverView is used in a static method (optimize()), but is initialized in the constructor. As you know, static methods can be used without creating an instance which means serverView might not be initialized when optimize() is called. This currently works because Guice initializes DataSourceOptimizer when DataSourceOptimizerMonitor is initialized and this happens to be before optimize() is called. However, it might be broken in the future if somethings change like someone decides to make DataSourceOptimizerMonitor configurable and disables it.

Thanks, you'r right. I'll modify it

jihoonson · 2018-04-10T23:18:40Z

...alized-view-selection/src/main/java/io/druid/query/materializedview/DatasourceOptimizer.java

+import java.util.concurrent.locks.ReentrantReadWriteLock;
+import java.util.function.Consumer;
+
+public class DatasourceOptimizer 


Please rename to DataSourceOptimizer.

jihoonson · 2018-04-10T23:19:57Z

...alized-view-selection/src/main/java/io/druid/query/materializedview/DatasourceOptimizer.java

+public class DatasourceOptimizer 
+{
+  private static final ReadWriteLock lock = new ReentrantReadWriteLock();
+  private static ConcurrentHashMap<Derivative, AtomicLong> derivativesHitCount = new ConcurrentHashMap<>();


These variables represent the metrics of dataSourceOptimizer, which means dataSourceOptimizer needs to keep some states. Why don't we simply making a singleton instance of this?

DataSourceOptimizer is a singleton instance, and I use static because optimize method is a static method.

Why don't we simply making a singleton instance of this?

Do you mean I should write another class (e.g. DataSourceOptimizerMetrics) to do record these states.

Oh, you're right. It's singleton. Then, I wonder why you made the optimize() method static. Usually static methods are useful when a class doesn't have to keep any states (like util classes). But, DataSourceOptimizer does keep states (that is, metrics).

jihoonson

@zhangxinyu1 left more comments. It looks a nice start for supporting this kind of cool feature!

Also please add some documentation. I would love to test this in my cluster!

jihoonson · 2018-04-12T22:43:46Z

...alized-view-selection/src/main/java/io/druid/query/materializedview/DatasourceOptimizer.java

+    }
+
+    try {
+      lock.readLock().lock();


This should be outside of the try clause (https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/locks/Lock.html).

Also, probably this should be writeLock().

jihoonson · 2018-04-12T22:45:25Z

...alized-view-selection/src/main/java/io/druid/query/materializedview/DatasourceOptimizer.java

+{
+  private static final ReadWriteLock lock = new ReentrantReadWriteLock();
+  private static ConcurrentHashMap<Derivative, AtomicLong> derivativesHitCount = new ConcurrentHashMap<>();
+  private static ConcurrentHashMap<String, AtomicLong> totalCount = new ConcurrentHashMap<>();


It looks that the these maps are synchronized with lock. If so, they don't have to be the concurrentHashMap.

Also please leave some comments about what these maps mean.

lock is mainly used to synchronized all stats in getAndResetStats() method. In getAndResetStats() , we get snapshots of stats one by one, and then clear all stats. I use lock to ensure there is no changing of stats between these steps.
I use concurrentHashMap beacause, in optimize(), each stat increases concurrently.

I'm not sure I understood correctly, but if my new comments are correct, readLock() and writeLock() should be used in getAndResetStats() and optimize(), respectively. If so, concurrentMap is not needed because only one thread can write at a time in optimize(), and all threads can read without contention in getAndResetStats().

In my design, many threads are allowed to call optimize() simultaneously, because, MaterializedViewQuery need to be optimized concurrently, so I use readLock in optimize(). It means that these stats can be get and changed respectively by these threads.
However, when a thread call getAndResetStats() to get the whole snapshot of stats, these stats are not allowed to change respectively. Therefore, I use the writeLock() to limit the call to optimize().

Ok. Please add some comments about this.

jihoonson · 2018-04-12T22:48:39Z

...alized-view-selection/src/main/java/io/druid/query/materializedview/DatasourceOptimizer.java

+    // only TableDataSource can be optimiezed
+    if (!(query instanceof TopNQuery || query instanceof TimeseriesQuery || query instanceof GroupByQuery)
+        || !(query.getDataSource() instanceof TableDataSource)) {
+      return Lists.newArrayList(query);


nit: can be Collections.singletonList(query).

jihoonson · 2018-04-12T22:52:26Z

...alized-view-selection/src/main/java/io/druid/query/materializedview/DatasourceOptimizer.java

+        return Lists.newArrayList(query);
+      }
+
+      // 


Unnecessary.

jihoonson · 2018-04-12T22:53:45Z

...ib/materialized-view-selection/src/main/java/io/druid/query/materializedview/Derivative.java

+import java.util.Objects;
+import java.util.Set;
+
+public class Derivative implements Comparable<Derivative>


Please rename to more intuitive name. Looks like DerivativeDataSource?

jihoonson · 2018-04-12T23:53:40Z

...ized-view-selection/src/main/java/io/druid/query/materializedview/MaterializedViewQuery.java

+  @Override
+  public int hashCode()
+  {
+    return Objects.hash(VIEW) + query.hashCode();


Should be Objects.hash(VIEW, query).

jihoonson · 2018-04-12T23:54:51Z

...ection/src/main/java/io/druid/query/materializedview/MaterializedViewQueryRunnerFactory.java

+import java.util.concurrent.ExecutorService;
+
+/**
+ * Created by zhangxinyu on 2018/2/5.


Please remove this.

jihoonson · 2018-04-12T23:56:47Z

...ection/src/main/java/io/druid/query/materializedview/MaterializedViewQueryRunnerFactory.java

+/**
+ * Created by zhangxinyu on 2018/2/5.
+ */
+public class MaterializedViewQueryRunnerFactory implements QueryRunnerFactory 


Is this class needed in the current implementation?

No, it's useless. Should I remove it?

Yes please.

jihoonson · 2018-04-12T23:57:33Z

...ized-view-selection/src/main/java/io/druid/query/materializedview/MaterializedViewUtils.java

+        String dim = spec.getDimension();
+        dimensions.add(dim);
+      }
+    }


Please throw an exception if the query type is unknown.

jihoonson · 2018-04-12T23:59:33Z

...ized-view-selection/src/main/java/io/druid/query/materializedview/MaterializedViewUtils.java

+    return ret;
+  }
+
+  private static Set<String> getDimensionsInFilter(DimFilter dimFilter)


I think it's better to add a method to DimFilter which returns all required column names.

Yes, it's better if we add this method. Because it will miss the case when any new implementation of DimFilter . But, do you think I should add this method in this pr?

I think it's up to you. If you don't want to make this PR bigger, please raise an issue for this.

jihoonson · 2018-04-16T23:40:45Z

...alized-view-selection/src/main/java/io/druid/query/materializedview/DatasourceOptimizer.java

+    ImmutableMap<String, AtomicLong> costTimeSnapshot;
+    ImmutableMap<String, ConcurrentHashMap<Set<String>, AtomicLong>> missFieldsSnapshot;
+    try {
+      lock.writeLock().lock();


Probably this should be readLock().

jihoonson · 2018-04-16T23:41:07Z

...alized-view-selection/src/main/java/io/druid/query/materializedview/DatasourceOptimizer.java

+    }
+
+    try {
+      lock.readLock().lock();


Also, probably this should be writeLock().

jihoonson · 2018-04-16T23:46:19Z

...alized-view-selection/src/main/java/io/druid/query/materializedview/DatasourceOptimizer.java

+{
+  private static final ReadWriteLock lock = new ReentrantReadWriteLock();
+  private static ConcurrentHashMap<Derivative, AtomicLong> derivativesHitCount = new ConcurrentHashMap<>();
+  private static ConcurrentHashMap<String, AtomicLong> totalCount = new ConcurrentHashMap<>();


I'm not sure I understood correctly, but if my new comments are correct, readLock() and writeLock() should be used in getAndResetStats() and optimize(), respectively. If so, concurrentMap is not needed because only one thread can write at a time in optimize(), and all threads can read without contention in getAndResetStats().

zhangxinyu1 · 2018-04-28T08:06:29Z

@jihoonson

Also please add some documentation. I would love to test this in my cluster!

The rough documentation about how to use this feature is at the front of this pr. Should I add some documentation to docs?

jihoonson · 2018-04-30T18:39:51Z

@zhangxinyu1 yes, you can add docs to the directory under $DRUID/docs/content/development/extensions-contrib like other extensions.

jihoonson · 2018-05-08T00:09:12Z

@zhangxinyu1 thanks for the update! I didn't realize that. I'll take another look and do some tests in our cluster.

BTW, a recent change (#5583) merged into master includes a change of the signature of HadoopTuningConfig which makes merging this PR failed. Would you update this PR?

zhangxinyu1 · 2018-05-09T02:05:44Z

@jihoonson Thanks for reminding. I have updated it.

jihoonson

@zhangxinyu1 thanks for the update. I left my last comments. I also tested this PR in my local machine. It works nicely!

jihoonson · 2018-05-11T18:41:47Z

docs/content/development/extensions-contrib/materialized-view.md

+
+# Materialized View
+
+To use this feature, make sure to only load materialized-view-selection on broker and load materialized-view-maintenance on overlord.


Please add that this feature currently requires a hadoop cluster.

jihoonson · 2018-05-11T21:04:29Z

...maintenance/src/main/java/io/druid/indexing/materializedview/MaterializedViewSupervisor.java

+            {
+              try {
+                DataSourceMetadata metadata = metadataStorageCoordinator.getDataSourceMetadata(dataSource);
+                if (metadata != null 


Would you check this comment?

jihoonson · 2018-05-11T21:07:52Z

...maintenance/src/main/java/io/druid/indexing/materializedview/MaterializedViewSupervisor.java

+    if (dataSourceMetadata == null) {
+      // if oldMetadata is different from spec, tasks and segments will be removed when reset.
+      DataSourceMetadata oldMetadata = metadataStorageCoordinator.getDataSourceMetadata(dataSource);
+      if (oldMetadata != null && oldMetadata instanceof DerivativeDataSourceMetadata) {


Same here. Null check is unnecessary.

jihoonson · 2018-05-11T22:51:54Z

server/src/main/java/io/druid/indexing/overlord/IndexerMetadataStorageCoordinator.java

+   * @param interval   The interval for which all applicable and used datasources are requested. Start is inclusive, end is exclusive
+   * @return The DataSegments and the related created_date of segments which include data in the requested interval
+   */
+  List<Pair<DataSegment, String>> getUsedSegmentAndCreatedDateForInterval(String dataSource, Interval interval);


I suggest to modify List<DataSegment> getUsedSegmentsForInterval(String dataSource, Interval interval); to return List<Pair<DataSegment, String>> rather than adding a new method.

I don't know. I just think when someone calls method getUsedSegmentsForInterval , maybe he doesn't want to get the information about created date.

Maybe created_date should be a part of DataSegment. In this way, we only need the method List<DataSegment> getUsedSegmentsForInterval(String dataSource, Interval interval);. What do you think?

The only usage of getUsedSegmentsForInterval() is SegmentAllocateAction. It checks any segments are already allocated for the given interval to allocate a new segment id. I think it can just ignore the createdDate part.

Maybe created_date should be a part of DataSegment. In this way, we only need the method List getUsedSegmentsForInterval(String dataSource, Interval interval);. What do you think?

Hmm, that's a good point. It sounds good, but I'm not sure about why created_date is not a part of DataSchema itself. @gianm any idea?

Alright, let me raise an issue for this and merge these two methods in another pr, because it affect about 16 Classes.

Sounds good. Please go for it.

dylwylie

This looks like a greatly useful change, thanks!

dylwylie · 2018-05-23T10:28:02Z

...maintenance/src/main/java/io/druid/indexing/materializedview/MaterializedViewSupervisor.java

+   */
+  private boolean hasEnoughLag(Interval target, Interval maxInterval)
+  {
+    if ((target.getStartMillis() + minDataLagMs) > maxInterval.getStartMillis()) {


Could just return the boolean expression

Yes, thanks!

dylwylie · 2018-05-23T10:34:50Z

...tenance/src/main/java/io/druid/indexing/materializedview/MaterializedViewSupervisorSpec.java

+    parser.put("parseSpec", parseSpec);
+
+    //generate HadoopTuningConfig
+    HadoopTuningConfig tuningConfigForTask = new HadoopTuningConfig(


Can we do tuningConfigForTask.withVersion instead?

I'm afraid not, because though withVersion function can set new version, it cannot set useExplicitVersion = true.

Cool thanks!

dylwylie · 2018-05-23T11:01:34Z

...iew-selection/src/main/java/io/druid/query/materializedview/MaterializedViewQueryRunner.java

+  }
+
+  @Override
+  public Sequence<T> run(QueryPlus<T> queryPlus, Map<String, Object> responseContext)


It'd be kinda nice to make the UnionDataSource support QueryDataSources and reuse it to run a list of queries.

I don't understand. Could you please describe it more detail? Thanks!

Sure - not an important suggestion so please ignore if it seems irrelevant or too much work :)

In order to execute a materialised view query we have to issue multiple queries on different intervals and merge their results. That might be a more generally useful component where user's can union multiple queries rather than just multiple datasources.

dylwylie · 2018-05-23T11:06:13Z

...alized-view-selection/src/main/java/io/druid/query/materializedview/DataSourceOptimizer.java

+        || !(query.getDataSource() instanceof TableDataSource)) {
+      return Collections.singletonList(query);
+    }
+    String datasourceName = ((TableDataSource) query.getDataSource()).getName();


If it's easy to do i think it'd be worth supporting UnionDatasources as well. Would it just be a matter of iterating over a list of datasource names and running the rest of this method and flattening the resulting list of queries?

Thanks for your suggestion. The current implementation support UnionDataSource in this way: In UnionQueryRunner, UnionDataSource are transformed to some TableDataSources, and then, these TableDataSources are optimized in DataSourceOptimizer.java. Is is ok?

Ah got you, thanks for the explanation!

zhangxinyu1 · 2018-05-30T09:18:18Z

@Dylan1312 Could you please trigger the travis CI building?

dylwylie · 2018-05-30T10:15:51Z

Afraid I don't have the appropriate permission, a committer should be able to help you out

b-slim · 2018-05-30T10:18:09Z

you can always close and reopen the PR to restart the build ...

zhangxinyu1 · 2018-05-30T11:34:48Z

@Dylan1312 Thanks!

zhangxinyu1 · 2018-05-30T11:35:07Z

@b-slim It works, thanks!

sascha-coenen · 2018-05-30T17:14:53Z

...alized-view-selection/src/main/java/io/druid/query/materializedview/DatasourceOptimizer.java

+import java.util.concurrent.locks.ReentrantReadWriteLock;
+import java.util.function.Consumer;
+
+public class DatasourceOptimizer 


Please forgive me for posting this here - I'm not a committer/reviewer, so my feedback does not count, but there is one thing that looks incorrect to me:
Class DatasourceOptimizer states that
"Derived dataSource with smallest average size of segments have highest priority to replace the datasource in user query"
and accordingly the following lines produce this prioritized collection of derivatives:

// get all derivatives for datasource in query. The derivatives set is sorted by average size of per segment granularity. ImmutableSortedSet<Derivative> derivatives = DerivativesManager.getDerivatives(datasourceName);

However, a few lines below items from the above collection named "derivatives" which is sorted by priority get selected and put into the following collection, which is simply a hashset, which is not sorted and which according to javadoc does also not guarantee that the items are in insertion order:
Set<Derivative> derivativesWithRequiredFields = Sets.newHashSet();

To my understanding, the "derivativesWithRequiredFields" should be a list or a LinkedHashSet such that it is guaranteed that later on the best derivative gets consulted first.

thanks

@sascha-coenen thanks for your attention and suggestion.
Please see the latest version of DataSourceOptimizer here: https://github.com/druid-io/druid/pull/5556/files#diff-250d80eb8afc10c49ee91e41d8f9d91c .
The derivativesWithRequiredFields will be sorted when it is used as follows :
for (DerivativeDataSource derivativeDataSource : ImmutableSortedSet.copyOf(derivativesWithRequiredFields))

jihoonson · 2018-06-08T17:53:14Z

I'm going to remove Design Review tag and merge this PR unless any other committers start reviewing until tonight because

There is a proposal in [Proposal] The 2nd version of implementation for materialised view #5304, and the design of this PR has already reviewed.
No committers have reviewed this PR for more than 2 months except me, and this makes merging this PR really slow.

jihoonson · 2018-06-09T19:17:35Z

All right. I'm going to merge this PR shortly.

jihoonson · 2018-06-09T19:25:10Z

Merged. @zhangxinyu1 thank you for the contribution!

zhangxinyu1 · 2018-06-11T03:02:58Z

@jihoonson Thanks! I will work on the related issue #5710 and #5775 these days.

leventov · 2019-03-25T20:27:33Z

...maintenance/src/main/java/io/druid/indexing/materializedview/MaterializedViewSupervisor.java

+      List<Pair<DataSegment, String>> snapshot
+  )
+  {
+    Interval maxAllowedToBuildInterval = snapshot.parallelStream()


@zhangxinyu1 why did you use parallel Stream?

leventov · 2019-03-25T20:27:46Z

...iew-selection/src/main/java/io/druid/query/materializedview/DerivativeDataSourceManager.java

+              .list()
+    );
+
+    List<DerivativeDataSource> derivativeDataSources = derivativesInDatabase.parallelStream()


jihoonson added Feature Design Review labels Mar 30, 2018

jihoonson mentioned this pull request Mar 30, 2018

[Proposal] The 2nd version of implementation for materialised view #5304

Closed

zhangxinyu1 force-pushed the feature-materialized-view-1.0 branch from 8d328ac to fdf16a9 Compare March 31, 2018 10:49

jihoonson reviewed Apr 4, 2018

View reviewed changes

zhangxinyu1 force-pushed the feature-materialized-view-1.0 branch 2 times, most recently from 6627334 to ee6fec7 Compare April 9, 2018 07:53

jihoonson reviewed Apr 10, 2018

View reviewed changes

jihoonson reviewed Apr 13, 2018

View reviewed changes

jihoonson reviewed Apr 16, 2018

View reviewed changes

zhangxinyu1 force-pushed the feature-materialized-view-1.0 branch from ee6fec7 to 6887cb0 Compare April 27, 2018 10:56

zhangxinyu1 mentioned this pull request Apr 27, 2018

Add a method getRequiredColumns to DimFilter #5710

Closed

zhangxinyu1 force-pushed the feature-materialized-view-1.0 branch from 6887cb0 to 4a6a372 Compare May 3, 2018 09:42

zhangxinyu1 force-pushed the feature-materialized-view-1.0 branch from 4a6a372 to 0ab2bcd Compare May 8, 2018 14:58

zhangxinyu1 force-pushed the feature-materialized-view-1.0 branch 2 times, most recently from b879328 to 1b452d6 Compare May 10, 2018 03:00

jihoonson reviewed May 11, 2018

View reviewed changes

zhangxinyu1 force-pushed the feature-materialized-view-1.0 branch from 1b452d6 to 1938ce5 Compare May 14, 2018 02:55

dylwylie reviewed May 23, 2018

View reviewed changes

dylwylie approved these changes May 27, 2018

View reviewed changes

unknown and others added 9 commits May 28, 2018 11:10

implement materialized view

9708da8

modify code according to jihoonson's comments

3e16a28

modify code according to jihoonson's comments - 2

60d13b0

add documentation about materialized view

9a88e5e

use new HadoopTuningConfig in pr 5583

f58f136

add minDataLag and fix optimizer bug

dbd4ad3

correct value of DEFAULT_MIN_DATA_LAG_MS

7aab45f

modify code according to jihoonson's comments - 3

50d6f4f

use the boolean expression instead of if-else

3623f9b

zhangxinyu1 force-pushed the feature-materialized-view-1.0 branch from 1938ce5 to 3623f9b Compare May 28, 2018 03:11

zhangxinyu1 closed this May 30, 2018

zhangxinyu1 reopened this May 30, 2018

sascha-coenen reviewed May 30, 2018

View reviewed changes

jihoonson removed the Design Review label Jun 9, 2018

jihoonson merged commit e43e5eb into apache:master Jun 9, 2018

dclim added this to the 0.13.0 milestone Oct 8, 2018

dclim mentioned this pull request Oct 10, 2018

Druid 0.13.0-incubating release notes #6442

Closed

leventov mentioned this pull request Feb 1, 2019

Design of MaterializedViewQuery #6977

Open

leventov reviewed Mar 25, 2019

View reviewed changes


		# Materialized View

		To use this feature, make sure to only load materialized-view-selection on broker and load materialized-view-maintenance on overlord.

Materialized view implementation #5556

Materialized view implementation #5556

Conversation

zhangxinyu1 commented Mar 30, 2018 • edited Loading

Target

Implementation

Usage

jihoonson commented Mar 30, 2018

jihoonson commented Mar 30, 2018

jihoonson commented Mar 30, 2018

jihoonson commented Apr 3, 2018

zhangxinyu1 commented Apr 3, 2018 • edited Loading

jihoonson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jihoonson commented Apr 4, 2018

zhangxinyu1 commented Apr 10, 2018

jihoonson commented Apr 10, 2018

jihoonson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhangxinyu1 Apr 11, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jihoonson left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhangxinyu1 Apr 16, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhangxinyu1 commented Mar 30, 2018 •

edited

Loading

zhangxinyu1 commented Apr 3, 2018 •

edited

Loading

zhangxinyu1 Apr 11, 2018 •

edited

Loading

jihoonson left a comment •

edited

Loading

zhangxinyu1 Apr 16, 2018 •

edited

Loading

zhangxinyu1 commented Apr 28, 2018 •

edited

Loading