Replace IOPeon with SegmentWriteOutMedium; Improve buffer compression #4762

leventov · 2017-09-08T00:01:32Z

Important, for release notes: downgrade from the Druid version, which includes this PR, is possible only to version, that includes #4824, i. e. Druid 0.11.0, but not earlier.

This PR consists of three sets of fairly independent changes:

Replace IOPeon with SegmentWriteOutMedium
Refactoring of buffer compression, remove unnecessary data copy
Replace some boxing collections with fastutil, and similar things in the serialization part of the codebase

Unfortunately the way how those changes were developed doesn't allow to split them into independent PRs at this point and make them pass tests (I tried). I apologise for the size of this PR and will try to avoid this situation in the future.

Replace `IOPeon` with `SegmentWriteOutMedium`

IOPeon interface was used in many serializers and writers to temporarily store the "main volume" of the data:

interface IOPeon {
  OutputStream makeOutputStream(String filename);
  InputStream makeInputStream(String filename);
}

There was a single implementation TmpFileIOPeon, that created a temporary file for each stream of data in the task working directory.

This interface is replaced with SegmentWriteOutMedium:

interface SegmentWriteOutMedium {
  WriteOutBytes makeWriteOutBytes();
}

abstract class WriteOutBytes extends OutputStream implements WritableByteChannel {
 long size();
 void writeTo(WritableByteChannel channel);
 InputStream asInputStream();
 void readFully(long pos, ByteBuffer buffer);
}

WriteOutBytes is an abstraction of appendable byte stream, that is readable and writable at any time.

This interface change allowed to simplify many serializers and writers, and they don't need to be Closeable anymore.

Also an off-heap memory based implementation of SegmentWriteOutMedium is added, along with tmpFile-based. Everyone who have enough memory on the servers should use this type of SegmentWriteOutMedium. Note that it may require to change -XX:MaxDirectMemorySize JVM parameter.

Type of SegmentWriteOutMedium to use could be configured per-task (in this PR this is implemented for most relevant types of tasks, except Hadoop tasks), via a parameter called segmentWriteOutMediumFactory, with options tmpFile and offHeapMemory, somewhere in "tuningConfig".

Also a generic druid.defaultSegmentWriteOutMediumFactory configuration added, to select the SegmentWriteOutMedium to use for tasks in which segmentWriteOutMediumFactory is not specified. It should work in Hadoop as well. The default value is tmpFile, for backward compatibility.

Refactoring of buffer compression, remove unnecessary data copy

Buffer compression is simplified. Unnecessary data copy is removed at least at two points:

CompressedObjectStrategy.toBytes() (This method is removed and not used anymore, because the code that used it was optimized)
LZ4 compression now uses only direct buffers on the input and the output side, that allows the native implementation to not allocate another buffer internally and use the provided buffers.

…-compression-improvements

leventov · 2017-09-19T04:18:08Z

Also, this PR now not always compresses blocks of the full block size (which is 64K), if the leftover or the whole column (e. g. if the segment is small) is lesser than that. This doesn't contradict the spec, but is not supported by the current implementation, that is fixed in #4824. I think if #4824 will appear in Druid 0.11 it will be good to apply this PR without bumping segment version.

Removed WIP tag, this PR is good to review.

…-compression-improvements

leventov · 2017-09-22T03:44:21Z

@jihoonson could you please review this PR?

…-compression-improvements

jihoonson · 2017-09-22T04:34:13Z

Sure, I'll review soon.

jihoonson

Reviewed until OutputMedium.

jihoonson · 2017-09-26T12:06:07Z

docs/content/configuration/indexing-service.md

+
+|Property|Description|Default|
+|--------|-----------|-------|
+|`druid.defaultOutputMediumFactory`|`tmpFile` or `offHeapMemory`, see explanation above|`tmpFile`|


Maybe druid.peon.defaultOutputMediumFactory is better because this configuration is used by peon.

This config could be used not only on peons, e. g. in Hadoop/Spark tasks.

From the perspective of Druid, Hadoop/Spark tasks are one of task types which use external systems for indexing, but are executed by peons. I think it doesn't matter how this configuration is used outside of Druid.

I forgot this comment. @leventov any thoughts?

Changed to druid.peon.defaultOutputMediumFactory

jihoonson · 2017-09-26T12:07:19Z

docs/content/ingestion/tasks.md

@@ -120,9 +120,10 @@ The tuningConfig is optional and default parameters will be used if no tuningCon
 |indexSpec|defines segment storage format options to be used at indexing time, see [IndexSpec](#indexspec)|null|no|
 |maxPendingPersists|Maximum number of persists that can be pending but not started. If this limit would be exceeded by a new intermediate persist, ingestion will block until the currently-running persist finishes. Maximum heap memory usage for indexing scales with maxRowsInMemory * (2 + maxPendingPersists).|0 (meaning one persist can be running concurrently with ingestion, and none can be queued up)|no|
 |forceExtendableShardSpecs|Forces use of extendable shardSpecs. Experimental feature intended for use with the [Kafka indexing service extension](../development/extensions-core/kafka-ingestion.html).|false|no|
-|forceGuaranteedRollup|Forces guaranteeing the [perfect rollup](./design/index.html). The perfect rollup optimizes the total size of generated segments and querying time while indexing time will be increased. This flag cannot be used with either `appendToExisting` of IOConfig or `forceExtendableShardSpecs`. For more details, see the below __Segment publishing modes__ section.|false|no|
+|forceGuaranteedRollup|Forces guaranteeing the [perfect rollup](../design/index.html). The perfect rollup optimizes the total size of generated segments and querying time while indexing time will be increased. This flag cannot be used with either `appendToExisting` of IOConfig or `forceExtendableShardSpecs`. For more details, see the below __Segment publishing modes__ section.|false|no|


Thanks for fixing.

jihoonson · 2017-09-26T12:23:08Z

processing/src/main/java/io/druid/output/ByteBufferOutputBytes.java

+import java.util.function.Function;
+import java.util.stream.Collectors;
+
+public abstract class ByteBufferOutputBytes extends OutputBytes


Would you add unit tests for the HeapByteBufferOutputBytes and DirectByteBufferOutputBytes?

Added tests

jihoonson · 2017-09-26T12:25:26Z

processing/src/main/java/io/druid/output/ByteBufferOutputBytes.java

+
+public abstract class ByteBufferOutputBytes extends OutputBytes
+{
+  static final int BUFFER_SIZE = 64 * 1024;


Out of curiosity, is there any reason for 64K buffer size?

If you have, maybe it's better to add some comments.

Added comment (there is no reason)

jihoonson · 2017-09-26T12:46:10Z

processing/src/main/java/io/druid/output/FileOutputBytes.java

+import java.nio.channels.FileChannel;
+import java.nio.channels.WritableByteChannel;
+
+final class FileOutputBytes extends OutputBytes


Please add some unit tests.

jihoonson

Still reviewing. Reviewed up to StringDimensionMergerV9.

jihoonson · 2017-09-27T07:26:32Z

processing/src/main/java/io/druid/segment/serde/Serializer.java

+import java.io.IOException;
+import java.nio.channels.WritableByteChannel;
+
+public interface Serializer


Would you add some java docs?

jihoonson · 2017-09-27T07:45:59Z

processing/src/main/java/io/druid/segment/DimensionHandler.java

   * @param capabilities  The ColumnCapabilities of the dimension represented by this DimensionHandler
   * @param progress      ProgressIndicator used by the merging process

   * @return A new DimensionMergerV9 object.
   */
  DimensionMergerV9<EncodedKeyComponentType> makeMerger(
      IndexSpec indexSpec,
-      File outDir,
-      IOPeon ioPeon,
+      OutputMedium outputMedium,


Please update javadoc.

jihoonson · 2017-09-27T07:47:55Z

processing/src/main/java/io/druid/segment/DoubleDimensionMergerV9.java

    }
    catch (IOException ioe) {
      throw new RuntimeException(ioe);
    }
  }

-  protected void setupEncodedValueWriter() throws IOException
+  protected void setupEncodedValueWriter(OutputMedium outputMedium) throws IOException


Can be private.

jihoonson · 2017-09-27T08:18:11Z

processing/src/main/java/io/druid/segment/GenericColumnSerializer.java


 @ExtensionPoint
-public interface GenericColumnSerializer extends Closeable
+public interface GenericColumnSerializer extends Serializer
 {
  public void open() throws IOException;


Please remove the public modifier.

@jihoonson could you please add an IntelliJ/Checkstyle/PMD rule that prohibits unnecessary qualifiers in interfaces?

Ok. I'll make a PR soon.

…-compression-improvements

jihoonson · 2017-10-02T06:57:34Z

processing/src/main/java/io/druid/segment/StringDimensionMergerV9.java

@@ -80,17 +73,19 @@

  protected String dimensionName;
  protected GenericIndexedWriter<String> dictionaryWriter;
+  protected List<String> dictionary;
+  protected String firstDictionaryValue;
+  protected int dictionarySize;


These variables can be private.

Also, all variables in this class can be private as well. Would you please fix it too?

Changed all

jihoonson · 2017-10-02T07:20:18Z

processing/src/main/java/io/druid/segment/StringDimensionMergerV9.java

@@ -80,17 +73,19 @@

  protected String dimensionName;
  protected GenericIndexedWriter<String> dictionaryWriter;
+  protected List<String> dictionary;


This variable looks to be used for only spatial indexes. Please leave some comments.

Added comment

jihoonson · 2017-10-02T07:22:58Z

processing/src/main/java/io/druid/segment/StringDimensionMergerV9.java

+        if (hasSpatial) {
+          dictionary.add(value);
+        }
+        dictionarySize++;


From Line 191 to Line 198 duplicates with From Line 170 to Line 179. Please extract as a method.

jihoonson · 2017-10-02T07:27:09Z

processing/src/main/java/io/druid/segment/StringDimensionMergerV9.java

+    dictionaryWriter = new GenericIndexedWriter<>(outputMedium, dictFilename, GenericIndexed.STRING_STRATEGY);
+    boolean hasSpatial = capabilities.hasSpatialIndexes();
+    if (hasSpatial) {
+      dictionary = new ArrayList<>();


This was an off-heap mmap buffer previsouly, but now it's an on-heap buffer. There will be some issues about memory and I'm not sure this is better even though disk write/read is removed.

Implemented an optimization, now spatial index reuses dictionaryWriter

jihoonson · 2017-10-02T08:13:06Z

processing/src/main/java/io/druid/segment/data/CompressedFloatsIndexedSupplier.java

-  )
+  private int metaSize()
+  {
+    return 1 + 4 + 4 + 1;


Please use Integer.BYTES.

Refactored, as well as in other places

jihoonson · 2017-10-03T06:24:05Z

processing/src/main/java/io/druid/segment/data/CompressedVSizeIndexedV3Writer.java

  }

-  @Override
-  public void writeToChannel(WritableByteChannel channel, FileSmoosher smoosher) throws IOException
+  private void writeLastOffset() throws IOException


Looks that once this method is called, subsequent allValues() are not valid anymore. Please add comments about this somewhere.

Is this same for all other interfaces or classes implementing Serializer? If so, all methods modifying data should check that they are called after this or getSerializedSize() is called.

Added checks

jihoonson · 2017-10-03T06:55:45Z

processing/src/main/java/io/druid/segment/data/CompressionStrategy.java

+        out.flip();
+      }
+      catch (IOException e) {
+        log.error(e, "Error decompressing data");


Is it fine with logging instead of rethrowing?

Changed to rethrowing

jihoonson · 2017-10-03T07:21:09Z

processing/src/main/java/io/druid/segment/data/GenericIndexedWriter.java

    ++numWritten;
-    SerializerUtils.writeBigEndianIntToOutputStream(valuesOut, bytesToWrite.length, sizeHelperBuffer);
-    valuesOut.write(bytesToWrite);
+    valuesOut.writeInt(0);


What does writing a 0 mean?

Added comment

jihoonson · 2017-10-03T07:29:01Z

processing/src/main/java/io/druid/segment/data/GenericIndexedWriter.java

  }

-  private void closeMultiFiles() throws IOException
+  private void closeMultiFiles(WritableByteChannel channel, FileSmoosher smoosher) throws IOException


Please rename properly.

jihoonson · 2017-10-03T07:30:21Z

processing/src/main/java/io/druid/segment/data/GenericIndexedWriter.java

+    long previousValuePosition = 0;
+    int bagSize = 1 << bagSizePower;
+
+    int numberOfFilesRequired = GenericIndexed.getNumberOfFilesRequired(bagSize, numWritten);


Please add some comments what's going on here.

Not sure what I should comment here

jihoonson · 2017-10-03T07:49:01Z

I finished my first pass review. I'll do another pass in couple of days.

leventov · 2017-10-10T19:53:49Z

@jihoonson did you have a chance to look at this?

jihoonson · 2017-10-11T11:35:47Z

processing/src/main/java/io/druid/output/OutputBytes.java

+   * Reads bytes from the byte sequences, represented by this OutputBytes, at the random position, into the given
+   * buffer.
+   *
+   * @throws RuntimeException if the byte sequences from the given pos ends before all bytes are read


BufferUnderflowException?

Maybe "before the given buffer is filled"?

Would you please add a test for the case when BufferUnderflowException occurs as well?

Fixed, added tests

…-compression-improvements

…OutBytes

…-compression-improvements

leventov · 2017-10-30T20:38:09Z

@b-slim do you have more comments here?

…-compression-improvements

b-slim · 2017-10-31T14:14:35Z

@leventov did not finish the review and am out of office for personal vacation please do not block this PR on my review

drcrallen · 2017-11-02T23:17:26Z

common/src/main/java/io/druid/io/ByteBufferInputStream.java

+
+  public ByteBufferInputStream(ByteBuffer buffer)
+  {
+    this.buffer = buffer;


should this be a read only copy?

It's more flexible, because allows to make or not to make copy before constructor call, depending on the needs. Added javadoc comment.

…-compression-improvements

leventov · 2017-11-08T20:53:41Z

@b-slim could you please take another look?

…-compression-improvements

leventov · 2017-11-21T17:01:48Z

@b-slim if you are ok about the design of this PR, maybe we could merge it?

b-slim · 2017-11-26T23:35:25Z

Reviewed 116 of 234 files at r1, 41 of 81 files at r2.
Review status: 134 of 236 files reviewed at latest revision, all discussions resolved.

Comments from Reviewable

…-compression-improvements

gianm · 2017-11-28T18:47:45Z

@leventov @b-slim does this patch need any further review or is it ready to commit?

b-slim · 2017-11-30T18:44:20Z

Reviewed 56 of 234 files at r1, 32 of 81 files at r2.
Review status: 219 of 236 files reviewed at latest revision, 3 unresolved discussions.

processing/src/main/java/io/druid/segment/CompressedVSizeIndexedV3Supplier.java, line 92 at r2 (raw file):

  {
    Iterator<IndexedInts> objects = objectsIterable.iterator();
    IntArrayList offsetList = new IntArrayList();

wondering what is the benefits of IntArrayList VS ArrayList?

processing/src/main/java/io/druid/segment/serde/ComplexColumnSerializer.java, line 50 at r2 (raw file):

  }

  @PublicApi

this need to be called out since it is public.

processing/src/main/java/io/druid/segment/writeout/FileWriteOutBytes.java, line 35 at r2 (raw file):

import java.nio.channels.WritableByteChannel;

final class FileWriteOutBytes extends WriteOutBytes

Wondering if we can reuse other projects code, for bytebuffers IO, this pattern is fairly common and usually this kind of code can have some common bugs that are hard to see.

Comments from Reviewable

b-slim · 2017-12-05T02:03:19Z

Reviewed 8 of 234 files at r1, 7 of 81 files at r2, 2 of 2 files at r3.
Review status: all files reviewed at latest revision, 3 unresolved discussions.

Comments from Reviewable

b-slim · 2017-12-05T02:05:18Z

@leventov sorry that this took so long, it was a very good PR thus very hard to review!

leventov · 2017-12-05T13:45:33Z

@b-slim thanks for taking time to review this PR!

wondering what is the benefits of IntArrayList VS ArrayList?

Memory savings, IntArrayList uses int[] instead of Object[]

this need to be called out since it is public.

To called out where? I think I made this factory method @PublicApi to suppress warning about this method being unnecessarily public instead of package-private. As far as I remember it was mentioned somewhere that this serde API should be considered public.

Wondering if we can reuse other projects code, for bytebuffers IO, this pattern is fairly common and usually this kind of code can have some common bugs that are hard to see.

As I mentioned here: #4762 (comment) I don't think this functionality could be used somewhere else in Druid. Somewhat similar is needed only possibly when making spilling in groupBy

leventov · 2017-12-05T14:13:55Z

@gianm @b-slim @drcrallen FYI included important note about the downgrade baseline, on top of the first comment in this PR.

Replace IOPeon with OutputMedium; Improve compression

506ef5a

leventov added Area - Batch Ingestion Design Review Performance Release Notes WIP labels Sep 8, 2017

Merge remote-tracking branch 'upstream/master' into output-medium-and…

dab2feb

…-compression-improvements

leventov mentioned this pull request Sep 19, 2017

LZ4 decompression forward compatibility #4824

Merged

Merge remote-tracking branch 'upstream/master' into output-medium-and…

03ca973

…-compression-improvements

leventov removed the WIP label Sep 19, 2017

leventov added 4 commits September 18, 2017 23:29

Fix test

e4fac07

Merge remote-tracking branch 'upstream/master' into output-medium-and…

174ffb7

…-compression-improvements

Cleanup CompressionStrategy

51b09a3

Merge remote-tracking branch 'upstream/master' into output-medium-and…

41f3248

…-compression-improvements

Merge remote-tracking branch 'upstream/master' into output-medium-and…

bb68d21

…-compression-improvements

jihoonson requested changes Sep 26, 2017

View reviewed changes

jihoonson requested changes Sep 27, 2017

View reviewed changes

leventov added 3 commits October 2, 2017 16:13

Merge remote-tracking branch 'upstream/master' into output-medium-and…

906bf84

…-compression-improvements

Javadocs

6281695

Add OutputBytesTest

43a3b9f

jihoonson requested changes Oct 3, 2017

View reviewed changes

leventov added 3 commits October 3, 2017 20:41

Address comments

7e10873

Random access in OutputBytes and GenericIndexedWriter

98d602a

Fix bugs

3e9b6a2

jihoonson requested changes Oct 11, 2017

View reviewed changes

leventov added 4 commits October 17, 2017 14:06

Merge remote-tracking branch 'upstream/master' into output-medium-and…

bbcaa6f

…-compression-improvements

Merge remote-tracking branch 'upstream/master' into output-medium-and…

3aec7cc

…-compression-improvements

Rename OutputMedium to SegmentWriteOutMedium and OutputBytes to Write…

0300e78

…OutBytes

Merge remote-tracking branch 'upstream/master' into output-medium-and…

18a889a

…-compression-improvements

leventov changed the title ~~Replace IOPeon with OutputMedium; Improve buffer compression~~ Replace IOPeon with SegmentWriteOutMedium; Improve buffer compression Oct 19, 2017

Merge remote-tracking branch 'upstream/master' into output-medium-and…

75a5725

…-compression-improvements

drcrallen reviewed Nov 2, 2017

View reviewed changes

leventov added 2 commits November 7, 2017 01:40

Merge remote-tracking branch 'upstream/master' into output-medium-and…

a058e16

…-compression-improvements

Add comments to ByteBufferInputStream

832ca66

leventov added 3 commits November 20, 2017 19:43

Merge remote-tracking branch 'upstream/master' into output-medium-and…

aa1eb2e

…-compression-improvements

Remove unused declarations

66378a1

Merge remote-tracking branch 'upstream/master' into output-medium-and…

2479fa1

…-compression-improvements

Merge remote-tracking branch 'upstream/master' into output-medium-and…

994685f

…-compression-improvements

b-slim merged commit a7a6a04 into apache:master Dec 5, 2017

leventov deleted the output-medium-and-compression-improvements branch December 5, 2017 12:02

leventov mentioned this pull request Dec 5, 2017

Bump Druid version to 0.12.0 #5138

Merged

jon-wei mentioned this pull request Dec 6, 2017

numeric quantiles sketch aggregator #5002

Merged

jon-wei added this to the 0.12.0 milestone Jan 5, 2018

jon-wei mentioned this pull request Jan 5, 2018

[WIP] Druid 0.12.0 release notes #5211

Closed

Replace IOPeon with SegmentWriteOutMedium; Improve buffer compression #4762

Replace IOPeon with SegmentWriteOutMedium; Improve buffer compression #4762

Conversation

leventov commented Sep 8, 2017 • edited Loading

Replace IOPeon with SegmentWriteOutMedium

Refactoring of buffer compression, remove unnecessary data copy

leventov commented Sep 19, 2017

leventov commented Sep 22, 2017

jihoonson commented Sep 22, 2017

jihoonson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jihoonson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leventov Oct 4, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jihoonson commented Oct 3, 2017

leventov commented Oct 10, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leventov commented Oct 30, 2017

b-slim commented Oct 31, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leventov commented Nov 8, 2017

leventov commented Nov 21, 2017

b-slim commented Nov 26, 2017

gianm commented Nov 28, 2017 • edited Loading

b-slim commented Nov 30, 2017

b-slim commented Dec 5, 2017

b-slim commented Dec 5, 2017

leventov commented Dec 5, 2017

leventov commented Dec 5, 2017

leventov commented Sep 8, 2017 •

edited

Loading

Replace `IOPeon` with `SegmentWriteOutMedium`

leventov Oct 4, 2017 •

edited

Loading

gianm commented Nov 28, 2017 •

edited

Loading