Removing Integer.MAX column size limit. #3743

akashdw · 2016-12-06T03:00:40Z

This PR addresses #3513.

Note: Introduced genericIndexed version 3 instead of 2 because I see an open PR(#3069) by @pjain1 to introduce version 2.

pjain1 · 2016-12-06T18:06:53Z

I am on vacation starting tomorrow and since #3069 has merge conflicts and has not been reviewed fully so it would make sense to make this version 2 and #3069 can be version 3 as and when it gets in so that this PR is not blocked.

leventov · 2016-12-07T00:37:52Z

I suggest #3570 to be merged first, because @jon-wei already did a lot of rebases

himanshug · 2016-12-28T20:16:56Z

processing/src/main/java/io/druid/segment/data/GenericIndexedWriter.java

@@ -82,6 +132,11 @@ public void write(T objectToWrite) throws IOException
    valuesOut.write(bytesToWrite);

    headerOut.write(Ints.toByteArray((int) valuesOut.getCount()));
+    headerOutLong.write(Longs.toByteArray(valuesOut.getCount()));


can we avoid writing headerOutLong file in cases where its not needed? may be, we can write headerOutLong on demand at the point when fileSizeLimit is crossed (and stop writing headerOut afterwards)? in most use cases, headerOutLong would not need to be written at all.

For that I need to build headerOutLong from headerOut values during switch, I decided to keep the same info in headerOutLong to avoid building from headerOut and delete it if not required.

But I can totally avoid writing into headerOut after the switch. Will update the pr not to write into headerOut after switch.

himanshug · 2016-12-28T20:20:49Z

processing/src/main/java/io/druid/segment/data/GenericIndexedWriter.java

+  {
+    headerOut.close();
+    headerOutLong.close();
+    valuesOut.close();


these 3 lines are common in closeXXX() and can be moved to close()

himanshug · 2016-12-28T20:21:34Z

processing/src/main/java/io/druid/segment/data/GenericIndexedWriter.java

+    headerOutLong.close();
+    valuesOut.close();
+
+    // revisit this check.


why this comment?

removed it.

himanshug · 2016-12-28T20:30:21Z

processing/src/main/java/io/druid/segment/data/GenericIndexedWriter.java

    }
-    finally {
-      metaOut.close();
+    throw new ISE(


say a druid user got this error during batch/realtime ingestion , it will continue to happen on retries ... what would be the resolution?

This can happen if column values are greater than 2^31 bytes and each value split has only one element. but yes for this kind of extreme cases its the design limitation and retries will continue to throw this exception.

himanshug · 2016-12-28T21:35:38Z

processing/src/main/java/io/druid/segment/data/GenericIndexed.java

+      );
+    }
+    throw new IAE("Unknown version[%s]", versionFromBuffer);
+  }


can't this just be read(buffer,strategy,null) with possible error msg updated in other read if necessary.?

Other read(with smoosher) will again read version from the buffer, in that case it will error out.

akashdw · 2017-01-18T02:20:36Z

@himanshug Updated the PR to create headerLong only after hitting the limit, also changed the version number to use 2 instead of 3.

himanshug · 2017-01-19T22:39:41Z

👍

leventov

Before going on with the review I want to ask you (also @himanshug and @cheddar) - so the reason to introduce this complexity is 2 GB limitation of ByteBuffers, what about overcoming this limitation by replacing ByteBuffers with Memory, that is beneficial for other reasons also? #3716 (comment)

leventov · 2017-01-27T03:28:56Z

processing/src/main/java/io/druid/segment/CompressedVSizeIndexedV3Supplier.java

@@ -58,18 +59,23 @@
    this.valueSupplier = valueSupplier;
  }

-  public static CompressedVSizeIndexedV3Supplier fromByteBuffer(ByteBuffer buffer, ByteOrder order)
+  public static CompressedVSizeIndexedV3Supplier fromByteBuffer(
+      ByteBuffer buffer, ByteOrder order,


Each param should be on a separate line

leventov · 2017-01-27T03:32:25Z

processing/src/main/java/io/druid/segment/IntIteratorUtils.java

@@ -52,7 +51,7 @@ public static int skip(IntIterator it, int n)
   * It isn't checked if the given source iterators are actually ascending, if they are not, the order of values in the
   * returned iterator is undefined.
   * <p>
-   * This is similar to what {@link MergeIterator} does with simple {@link java.util.Iterator}s.
+   * This is similar to what {@link io.druid.java.util.common.guava.MergeIterator} does with simple {@link java.util.Iterator}s.


Line longer than 120 columns

leventov · 2017-01-27T03:32:31Z

processing/src/main/java/io/druid/segment/IntIteratorUtils.java

@@ -68,7 +67,7 @@ public static IntIterator mergeAscending(List<IntIterator> iterators)
  }

  /**
-   * This class is designed mostly after {@link MergeIterator}. {@code MergeIterator} uses a priority queue of wrapper
+   * This class is designed mostly after {@link io.druid.java.util.common.guava.MergeIterator}. {@code MergeIterator} uses a priority queue of wrapper


Line longer than 120 columns

leventov · 2017-01-27T03:37:02Z

processing/src/main/java/io/druid/segment/data/BlockLayoutIndexedFloatSupplier.java

@@ -36,12 +37,13 @@

  public BlockLayoutIndexedFloatSupplier(
      int totalSize, int sizePer, ByteBuffer fromBuffer, ByteOrder order,


Each param should be on a separate line

leventov · 2017-01-27T03:38:44Z

processing/src/main/java/io/druid/segment/data/CompressedIntsIndexedSupplier.java

  {
    byte versionFromBuffer = buffer.get();

    if (versionFromBuffer == VERSION) {
      final int totalSize = buffer.getInt();
      final int sizePer = buffer.getInt();
-      final CompressedObjectStrategy.CompressionStrategy compression = CompressedObjectStrategy.CompressionStrategy.forId(buffer.get());
+      final CompressedObjectStrategy.CompressionStrategy compression = CompressedObjectStrategy.CompressionStrategy.forId(
+          buffer.get());


); should be on a separate line

leventov · 2017-01-27T04:16:08Z

processing/src/main/java/io/druid/segment/data/GenericIndexed.java

+    public byte[] toBytes(String val)
+    {
+      if (val == null) {
+        return new byte[]{};


ByteArrays.EMPTY_ARRAY

leventov · 2017-01-27T04:16:52Z

processing/src/main/java/io/druid/segment/data/GenericIndexed.java

+      if (val == null) {
+        return new byte[]{};
+      }
+      return io.druid.java.util.common.StringUtils.toUtf8(val);


Why fully qualified?

leventov · 2017-01-27T04:17:24Z

processing/src/main/java/io/druid/segment/data/GenericIndexed.java

+      return ordering.compare(o1, o2);
+    }
+  };
+  private static final byte version_one = 0x1;


I suggest to keep versions on the very top to of the file

Either CAP_CASE or camelCase

leventov · 2017-01-27T04:17:57Z

processing/src/main/java/io/druid/segment/data/GenericIndexed.java

+    }
+  };
+  private static final byte version_one = 0x1;
+  private static final byte version_two = 0X2;


leventov · 2017-01-27T04:38:44Z

processing/src/main/java/io/druid/segment/data/GenericIndexed.java

+    this.theBuffer = buffer;
+    this.strategy = strategy;
+    this.allowReverseLookup = allowReverseLookup;
+    size = theBuffer.getInt();


Suggested numElements, to match format description in the header. The same is for other fields

Indexed interface exposes int size();. Using variable name size in parity with this function makes more sense to me.
Also some of the exceptions thrown from this class are also using size in the description.
throw new IAE(String.format("Index[%s] >= size[%s]", index, size));

akashdw · 2017-01-27T21:41:46Z

@leventov Thanks for reviewing this PR. I updated the PR for code formatting related changes.
Memory changes are not ready yet and can be introduced into GenericIndexed with a new version.

himanshug · 2017-01-30T18:16:38Z

@leventov yes, it is to introduce larger columns with existing ByteBuffer based GenericIndexed. Replacing ByteBuffer with Memory would achieve same but that would take some time before getting materialized.
Also, this work internally predates the discussions on replacing ByteBuffer with Memory.

@cheddar any other thoughts?

leventov · 2017-01-30T20:42:33Z

@akashdw @himanshug If we add a more complex format now, we have to support it for very long or forever, despite this complexity could be avoided if we replace ByteBuffer with Memory. I suggest to not merge this PR and push "ByteBuffer -> Memory" project more actively. See #3892

niketh · 2017-01-30T21:28:36Z

@leventov I am already working on migrating from ByteBuffer to Memory with @leerho and @cheddar . It is going to take sometime before this task is completed. Also we need to arrive at a consensus on a few things.

akashdw · 2017-01-30T21:34:20Z

@leventov I understand replacing ByteBuffer with memory would achieve the same but memory changes are not ready yet. @niketh and @leerho are working on the memory changes and might take some time to replace ByteBuffer with memory. I think these changes are completely independent of memory changes and allow us to create large columns using ByteBuffer. As I mentioned earlier memory changes can be introduced later with a new GenericIndexed version 3.

leventov · 2017-01-30T21:46:58Z

@niketh thanks for letting know.

@akashdw IMO being able to go beyond 2 GB limit one or two public Druid releases earlier doesn't worth introducing complexity, that should then be supported for years. At Yahoo you need it and you are the author of the patch, so you can apply it and use private builds (if you don't do this already)

cheddar · 2017-01-30T22:44:12Z

@leventov I'm not sure I understand the complexity here. There is an explicit need to support large columns, we have users who are actually running into this problem and you are asking us to not merge it because of a PR that isn't even ready yet?

We chose to do these even though we knew that the Memory interface would eventually come because we do not believe the Memory interface to be a simple and quick change. It will introduce API instability, will have performance implications and will require a good amount of back and forth.

I don't think it's a good idea to set a precedent of not accepting patches simply because you, personally, don't need the fix. And the argument about support can be used to say that we should not accept any PR whatsoever. This change is incremental, opt-in only and can clearly and cleanly be migrated forward. I don't think there is any reason to block the PR.

cheddar · 2017-01-30T22:45:34Z

@leventov if you are volunteering to complete the move to the Memory interface as your highest priority project and believe you can get it implemented within the next week or two, then I would reconsider and would be happy to let you

push "ByteBuffer -> Memory" project more actively

leventov · 2017-01-31T00:04:05Z

@cheddar the fact that other people need this, besides Yahoo, changes the picture. I'm not blocking it, if any other committer not from Yahoo is OK with it.

cheddar · 2017-01-31T00:41:06Z

@leventov even if it is only Yahoo at this point in time, I don't see why that's a reason to block it. Btw, as I've calmed down a little bit. Let me add that this PR introduces an interface change to ComplexMetricSerde interface that is actually very important and good. It provides the ability for a ComplexMetricSerde to control its own serialization, opening the door to many optimizations. Without doing the work to make this PR happen, we wouldn't have realized how to actually adjust that interface to make things work. Throwing away this PR essentially throws that away and pulls us backwards in that regard.

In general, Druid has and continues to maintain versioned binary storage to enable us to iterate and make changes as needed. This has served us well and is the mechanism by which we handle complexity and support for features that might initially seem orthogonal to what others also need out of the project. But, in the end, we are running into this problem because we are pushing the limits of Druid and need to resolve those issues. The changes are all within the normal mechanisms that Druid uses to make incremental changes and they are purely opt-in.

If there is some technical reason that you think we are making changes to the code that will have long-lasting, negative implications, please bring them up and we can try to address them. But if the push back is merely "it's more code and more code increases the maintenance burden," I do not believe that argument is valid.

leventov

Stopped on GenericIndexedWriter.java

leventov · 2017-01-31T00:14:46Z

processing/src/main/java/io/druid/segment/data/GenericIndexed.java

+ * Meta File:
+ * byte 1: version (0x2)
+ * byte 2 == 0x1 =>; allowReverseLookup
+ * bytes 3-6: numberOfElementsPerValueFile expressed as power of 2. That means all the value files contains same number of items


Line longer than 120 cols

leventov · 2017-01-31T00:18:10Z

processing/src/main/java/io/druid/segment/data/GenericIndexed.java

+      @Override
+      public T get(int index)
+      {
+        if (index < 0) {


Lines 143-148 replaceable with Preconditions.checkElementIndex()

I won't be able throw exception as IAE, My intent here is throw IAE exception with custom error message.

Error message of IndexOutOfBoundsException produced by Preconditions is just as informative.

One of primary thing I have considered while making these changes is not to change existing behavior of V1 GenericIndexed. Current V1 in master is throwing IAE with custom error message and I'm keeping the same behavior. V2 just follows the same convention as V1. May be there can be another PR to replace IAE with preconditions checks.

Ok, then could you please extract this block (repeated several times in GenericIndexed) as a helper method in GenericIndexed and leave a comment on this method, explaining why this method is used instead of Preconditions?

leventov · 2017-01-31T00:22:43Z

processing/src/main/java/io/druid/segment/data/GenericIndexed.java

+    numElementsPerValueFile = size;
+
+    int indexOffset = theBuffer.position();
+    int valuesOffset = theBuffer.position() + (size << 2);


Suggested * Ints.BYTES

isn't <<2 more optimal?

No, Hotspot JIT (if not javac) replaces multiplication or division by a static compile-time constant, which is a power of 2, with shift.

Even if it wasn't so, this constructor is not a hot place where every CPU cycle counts

Ints.BYTES is more readable, will change the code to use it.

leventov · 2017-01-31T00:24:26Z

processing/src/main/java/io/druid/segment/data/GenericIndexed.java

+          throw new IAE(String.format("Index[%s] >= size[%s]", index, size));
+        }
+
+        ByteBuffer copyHeaderBuffer = headerBuffer.asReadOnlyBuffer();


Suggested not to create this copy and use a few positioned getInt() calls below on the original headerBuffer

From the java doc I don't see any copying of data while creating ReadOnlyBuffer, it just ensures write protection on original buffer.

So I suggest to remove line 150 and then

if (index == 0) { startOffset = 4; endOffset = headerBuffer.getInt(0); } else { int headerPosition = (index - 1) * Ints.BYTES; startOffset = headerBuffer.getInt(headerPosition) + 4; endOffset = headerBuffer.getInt(headerPosition + 4); } return _get(valueBuffer.asReadOnlyBuffer(), startOffset, endOffset);

2 lines less in total, less garbage created.

how will you ensure BufferedIndexed.reads are write protected? BufferedIndexed contract is to fetch element and guarantees no modification in original ByteBuffer. Any bad implementation of an ObjectStrategy can corrupt the original ByteBuffer. I think protecting ByteBuffer write has more benefits than less garbage collection

Ahh my mistake. sure, suggested code ensures write protection. I'll update the PR

leventov · 2017-01-31T00:25:24Z

processing/src/main/java/io/druid/segment/data/GenericIndexed.java

+        final ByteBuffer copyBuffer = valueBuffers.get(fileNum).asReadOnlyBuffer();
+        final ByteBuffer copyHeaderBuffer = headerBuffer.asReadOnlyBuffer();
+
+        if (index < 0) {


Replaceable with Preconditions.checkElementIndex()

I won't be able throw exception as IAE, My intent here is throw IAE exception with custom error message.

leventov · 2017-01-31T00:41:40Z

processing/src/main/java/io/druid/segment/data/GenericIndexed.java

+          int fileNum = index / numElementsPerValueFile;
+          final ByteBuffer copyBuffer = copyValueBuffers.get(fileNum);
+
+          if (index < 0) {


Same as above

I won't be able throw exception as IAE, My intent here is throw IAE exception with custom error message.

leventov · 2017-01-31T00:42:45Z

processing/src/main/java/io/druid/segment/data/GenericIndexed.java

+          final int startOffset;
+          final int endOffset;
+
+          if (index % numElementsPerValueFile == 0) {


This block is identical to some block in BufferedIndexed above, consider extracting a method

I kept both the blocks separately because version 1 does a check of if (index == 0) vs V2 check is if (index % numElementsPerValueFile == 0), we wanted to avoid % operation for version 1 indexed.

leventov · 2017-01-31T00:44:33Z

processing/src/main/java/io/druid/segment/data/GenericIndexed.java

+        }
+      };
+    } else {
+      final List<ByteBuffer> copyValueBuffers = Lists.newArrayList(


Creating an empty array list and filling it in a for-each loop over valueBuffers takes 4 lines, this block takes 13 lines

leventov · 2017-01-31T00:51:26Z

processing/src/main/java/io/druid/segment/data/GenericIndexedWriter.java

+      throws IOException
+  {
+    while (numBytesToPutInFile > 0) {
+      int bytesRead = is.read(buffer, 0, Math.min(buffer.length, (int) numBytesToPutInFile));


Unsafe cast to int, may lead to exception, suggested Ints.saturatedCast(numBytesToPutInFile)

leventov · 2017-01-31T00:52:44Z

processing/src/main/java/io/druid/segment/data/GenericIndexedWriter.java

+  {
+    while (numBytesToPutInFile > 0) {
+      int bytesRead = is.read(buffer, 0, Math.min(buffer.length, (int) numBytesToPutInFile));
+      smooshChannel.write((ByteBuffer) ByteBuffer.wrap(buffer).limit(bytesRead));


Should be checked bytesRead is not -1

cheddar · 2017-02-14T18:49:01Z

@leventov I don't think we are going to make the change to introduce LargeGenericIndexed separate from GenericIndexed. As I mentioned, we originally considered two methods of doing this already and chose the current because GenericIndexed already has the abstraction that we needed in place and introducing another class would basically just be a bunch of C&P. I think there is likely room for further optimization in a subsequent PR, but don't think we should delay this PR anymore on this.

Is that acceptable or are you actively veto'ing this PR?

leventov · 2017-02-14T19:44:40Z

@cheddar @akashdw could you please go through my arguments pro separation:

Instead of having two big-endian impls now, and adding two native-endian later, you can do LargeGenericIndexed native-endian now and don't introduce any new version of LargeGenericIndexed in the future
Remove the unused 4-byte field for each stored object in LargeGenericIndexed
If a subsequent PR won't be merged before 0.10.0, the format introduced in this PR is sealed.
Separation of GenericIndexed and LargeGenericIndexed doesn't seem to require much development, it's a regrouping of existing code (and making fixes in LargeGenericIndexed suggested above). I don't see a risk of not doing it before 0.10.0
I don't see much code is really shared in GenericIndexed now. fromIterable() supports version 1 only. Two separate constructors without shared code. Most other methods just check the format version one way or another and then start executing separate, unshared code for different formats. The only method that is really shared is indexOf(), it could be easily converted to a static accepting any Indexed and called from both GenericIndexed and LargeGenericIndexed.

cheddar · 2017-02-14T20:11:38Z

Instead of having two big-endian impls now, and adding two native-endian later, you can do LargeGenericIndexed native-endian now and don't introduce any new version of LargeGenericIndexed in the future

We will likely need to reimplement for the changes for Memory anyway, so spending time right now isn't really that useful. It seems more prudent to me to just introduce things as we have them right now and then in the Memory future, we introduce a new binary version that is all native endian and handles all and any sizes and migrate everything to just use that.

Remove the unused 4-byte field for each stored object in LargeGenericIndexed

This is not a significant storage cost (Parag had a PR out to resolve that and introduce a new version of Generic Indexed that sat without anyone looking at it for months). Introducing that change would also actually have performance implications on indexing, because we currently do not re-process the payloads when we notice that the column needs to become large. In order to switch to that implementation, we would need to start reprocessing then. Once we are in the Memory future and these things collapse together, it will be trivial to remove that.

If a subsequent PR won't be merged before 0.10.0, the format introduced in this PR is sealed.

That's exactly why we version our binary formats. In the future we will introduce another version and move things towards that.

Separation of GenericIndexed and LargeGenericIndexed doesn't seem to require much development, it's a regrouping of existing code (and making fixes in LargeGenericIndexed suggested above). I don't see a risk of not doing it before 0.10.0
I don't see much code is really shared in GenericIndexed now. fromIterable() supports version 1 only. Two separate constructors without shared code. Most other methods just check the format version one way or another and then start executing separate, unshared code for different formats. The only method that is really shared is indexOf(), it could be easily converted to a static accepting any Indexed and called from both GenericIndexed and LargeGenericIndexed.

The code path that I'm talking about is the main contract of the GenericIndexed object, where the get() code is being reused and we are taking advantage of the abstraction that is already in place to switch between the implementations.

If your issues are just generally with how GenericIndexed works and you want to reimplement that, I'm totally down with that and see that as something that can have potential benefits, but I do not think it should be coupled with this PR.

The point of this PR is to introduce the ability to handle columns larger than 2GB in as minimally invasive of a way as possible. In order to do that we also ended up needing to expand the abstraction for ComplexColumnSerde, which we did and has significant benefits for the extension point. We are not trying to innovate on the actual algorithm of GenericIndexed nor fix anything else other than allow for larger columns.

leventov · 2017-02-15T00:19:17Z

Ok, I agree not to change LargeGenericIndexed format (not removing the size field and changing ordering)

The code path that I'm talking about is the main contract of the GenericIndexed object, where the get() code is being reused and we are taking advantage of the abstraction that is already in place to switch between the implementations.

This abstraction, BufferedIndexed, could be made a top-level package-private class, copying three fields from GenericIndexed/LargeGenericIndexed: size, strategy, and allowReverseLookup. Both GenericIndexed and LargeGenericIndexed could use it. None other code in the currently proposed version of GenericIndexed is shared between versions.

On the normal GenericIndexed.get(), size(), getClazz() and iterator() path BufferedIndexed is not needed at all, as I noted here: #3743 (comment) (I don't require this to be a part of this PR, can fix it myself later).

If your issues are just generally with how GenericIndexed works and you want to reimplement that, I'm totally down with that and see that as something that can have potential benefits, but I do not think it should be coupled with this PR.

What I currently displeased with is that GenericIndexed looks like a perfect mix of two separate sets of logic, that makes GenericIndexed unnecessarily complex. There should be two different classes.

Your argument against separation is that it's going to be "large C&P of the code", but I actually don't see any code that should be copied.

leventov · 2017-02-15T00:31:39Z

Or e. g. AbstractGenericIndexed with three fields: size, strategy and allowsReverseLookup, inner package-private BufferedIndexed class, and static constants. GenericIndexed and LargeGenericIndexed extend it.

cheddar · 2017-02-15T01:01:19Z

@leventov if I can repeat what I think you are saying. You are annoyed that GenericIndexed has fields for "both" sets of use cases. I.e. it is maintaining both

private final List<ByteBuffer> valueBuffers;
private final ByteBuffer headerBuffer;
private int logBaseTwoOfElementsPerValueFile;

and

private ByteBuffer theBuffer;

If we can consolidate the fields into just one of those two sets for both uses, is that a reasonable compromise?

Separating it out into separate classes seems like all it is doing is shuffling code around and I'm not sure it's really enabling or solving something other than aligning with a specific aesthetic. While it might be nicer to look at each of those pieces individually, it will also mean that the currently relatively self-contained GenericIndexed will become an amalgam of a number of disparate classes. At the end of the Memory changes, I would hope to have just one primary implementation in use again, which would hopefully just be a new implementation of Indexed without any of this complexity at all. I don't really see a significant benefit to making changes now given that it's not an abstraction that I expect to continue being important in the future nor do I think that the abstraction introduced would enable us to do something new and wonderful.

leventov · 2017-02-15T01:29:41Z

@cheddar I'm annoyed not so much with the fields, but with the fact that the logic of the cases is interspersed. I'm ok with a compromise that the logic just inside GenericIndexed is regrouped so that logic for v1 and v2 is grouped together. After all I think better I do it myself in a subsequent PR, along with some other fixes like proposed here: #3743 (comment).

I'll take another (hopefully final) look at the code in this PR later this week

fjy · 2017-02-16T21:58:38Z

@leventov can we finish this off? This is one of the largest PRs blocking for the release

leventov · 2017-02-16T22:32:28Z

@fjy will do before the end of this week

leventov · 2017-02-16T23:59:18Z

processing/src/main/java/io/druid/segment/data/GenericIndexedWriter.java

@@ -81,7 +154,17 @@ public void write(T objectToWrite) throws IOException
    valuesOut.write(Ints.toByteArray(bytesToWrite.length));
    valuesOut.write(bytesToWrite);

-    headerOut.write(Ints.toByteArray((int) valuesOut.getCount()));
+    if (!requireMultipleFiles) {
+      writeIntValueToOutputStream(buf, (int) valuesOut.getCount(), headerOut);


checkedCast()

Not sure why github is not showing this line as outdated.
19b69f3#diff-37ba90c2e6d6b833ce38fc1ce76eecc3R158

leventov · 2017-02-17T00:02:16Z

processing/src/main/java/io/druid/segment/data/GenericIndexedWriter.java

+        // to current offset.
+        if ((pos & (numberOfElementsPerValueFile - 1)) == 0) {
+          relativeRefBytes = currentNumBytes;
+          numberOfElementsPerValueFile = 1 << bagSizePower;


This line should be removed

leventov · 2017-02-17T00:05:28Z

processing/src/main/java/io/druid/segment/data/GenericIndexedWriter.java

+        }
+        currentNumBytes = Long.reverseBytes(headerFile.readLong());
+        relativeNumBytes = currentNumBytes - relativeRefBytes;
+        helperBuffer.putInt(0, Ints.checkedCast(relativeNumBytes));


This and the next line are replaceable with

writeIntValueToOutputStream(helperBuffer, Ints.checkedCast(relativeNumBytes), finalHeaderOut);

weijietong mentioned this pull request Dec 7, 2016

define new high-level abstract storage interfaces with a default mmap implementation #3716

Closed

fjy assigned cheddar and himanshug Dec 19, 2016

himanshug reviewed Dec 28, 2016

View reviewed changes

akashdw force-pushed the large_column_support branch from 5c2b3b8 to bd90ff3 Compare January 18, 2017 02:16

akashdw closed this Jan 18, 2017

akashdw reopened this Jan 18, 2017

leventov self-assigned this Jan 25, 2017

leventov reviewed Jan 27, 2017

View reviewed changes

akashdw force-pushed the large_column_support branch from 5ab8861 to d43c39e Compare January 27, 2017 21:34

leventov mentioned this pull request Jan 30, 2017

Move from ByteBuffer to Memory #3892

Closed

leventov requested changes Jan 31, 2017

View reviewed changes

akashdw added 7 commits February 13, 2017 16:12

GenericIndexed/writer code review requested changes.

fb58955

checkIndex() to static

9a0dbdf

native endianess for genericIndexedV2, code review requested changes.

53ea856

Formatting

50ca25a

Hll fix.

b185e34

use native endianess during bag size calculation.

1a31541

Code review requested changes.

b13ea5d

akashdw force-pushed the large_column_support branch from f1b4689 to b13ea5d Compare February 14, 2017 00:34

akashdw added 2 commits February 13, 2017 16:46

IOPeon close() changes.

4037265

use different tmp directory path for testing.

6140c22

leventov requested changes Feb 17, 2017

View reviewed changes

Code review requested changes.

19b69f3

leventov approved these changes Feb 17, 2017

View reviewed changes

leventov merged commit 797488a into apache:master Feb 17, 2017

leventov mentioned this pull request Feb 17, 2017

GenericIndexed minor bug fixes, optimizations and refactoring #3951

Merged

gianm added the Release Notes label Feb 23, 2017

gianm mentioned this pull request Feb 28, 2017

Druid 0.10.0 release notes #3944

Closed

gianm mentioned this pull request Apr 19, 2017

0.10.0 : Kafka indexing : can not combine streams for version v2 #4183

Closed

clambertus unassigned cheddar, himanshug and leventov Jul 6, 2018

		@@ -36,12 +37,13 @@

		public BlockLayoutIndexedFloatSupplier(
		int totalSize, int sizePer, ByteBuffer fromBuffer, ByteOrder order,

Removing Integer.MAX column size limit. #3743

Removing Integer.MAX column size limit. #3743

Conversation

akashdw commented Dec 6, 2016

pjain1 commented Dec 6, 2016

leventov commented Dec 7, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

akashdw Jan 17, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

akashdw commented Jan 18, 2017

himanshug commented Jan 19, 2017

leventov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

akashdw commented Jan 27, 2017

himanshug commented Jan 30, 2017

leventov commented Jan 30, 2017

niketh commented Jan 30, 2017 • edited Loading

akashdw commented Jan 30, 2017 • edited Loading

leventov commented Jan 30, 2017

cheddar commented Jan 30, 2017 • edited Loading

cheddar commented Jan 30, 2017 • edited Loading

leventov commented Jan 31, 2017

cheddar commented Jan 31, 2017

leventov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

akashdw Jan 31, 2017 • edited Loading

Choose a reason for hiding this comment

leventov Jan 31, 2017 • edited Loading

Choose a reason for hiding this comment

akashdw Jan 31, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leventov Jan 31, 2017 • edited Loading

Choose a reason for hiding this comment

akashdw Jan 31, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

akashdw Jan 31, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

akashdw Jan 31, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cheddar commented Feb 14, 2017

leventov commented Feb 14, 2017

cheddar commented Feb 14, 2017

leventov commented Dec 7, 2016 •

edited

Loading

akashdw Jan 17, 2017 •

edited

Loading

niketh commented Jan 30, 2017 •

edited

Loading

akashdw commented Jan 30, 2017 •

edited

Loading

cheddar commented Jan 30, 2017 •

edited

Loading

cheddar commented Jan 30, 2017 •

edited

Loading

akashdw Jan 31, 2017 •

edited

Loading

leventov Jan 31, 2017 •

edited

Loading

akashdw Jan 31, 2017 •

edited

Loading

leventov Jan 31, 2017 •

edited

Loading

akashdw Jan 31, 2017 •

edited

Loading

akashdw Jan 31, 2017 •

edited

Loading

akashdw Jan 31, 2017 •

edited

Loading

leventov commented Feb 15, 2017 •

edited

Loading

leventov commented Feb 15, 2017 •

edited

Loading