Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

General factory method for "fromBytes" #52

Closed
hossman opened this issue Apr 24, 2015 · 7 comments
Closed

General factory method for "fromBytes" #52

hossman opened this issue Apr 24, 2015 · 7 comments

Comments

@hossman
Copy link

hossman commented Apr 24, 2015

The new TDigest.createDigest() factory method in version 3.1 looks very handy, but seems to really only useful for people operating in simple single node environments. In order to serialize/deserialize/merge TDigests in a distributed application, it's currently still neccessary to use hardcoded implementations to leverage the fromBytes and as(Small)Bytes methods.

It would be helpful if there was a general (static) factory contract in the TDigest class for deserializing a ByteBuffer into a TDigest of unknonw concrete type.

@hossman
Copy link
Author

hossman commented Apr 24, 2015

some rough stream of conciousness thoughts on this...

the simplest way to go about this would be to just add something like...

public static TDigest createDigestFromBytes(ByteByffer buf) {
  // TODO: update this class name whenever createDigest is modified
  return AVLTreeDigest.fromBytes(buf);
}

...but this would break in cases of "rolling updates" to distributed systems, where one node might still be using tdigest-3.1 where AVLTreeDigest is the default, but another node has just upgraded to tdigest-3.42 where MergingDigest might be the new the default.

A better approach might be to include in the binary data a prefix indicating which implementation is used, so that any of them could be supported genericly, even if they are not hte current "default" impl. This could be done either by breaking backcompat in all of the current asByte/fromByte methods, or by introducing a new static factory method for the serialization, in addition to the deserialization, and making them responsible for writing/reading an impl prefix before delegating to the eixsting static methods. This wouldn't be very polymorphic, but would ensure that as long as client code used the API consistently, they would continue to work even as future versions of TDigest add future impls and/or changes the default impl.

half brained, ugly, example...

private static final Class<TDigest>[] SERIALIZATION_SUPPORTED_IMPLS = new Class<TDigest>[] {
  // NEVER CHANGE THE ORDER OF THIS ARRAY, ONLY ADD TO THE END
  // IF IMPLS ARE DELETED FROM CODE BASE, REPLACE WITH null TO PRESERVER INDEX OFFSETS
  TreeDigest.class,
  ArrayDigest.class,
  AVLTreeDigest.class,
  MergingDigest.class
};

/** only usable in conjunction with deserialize(ByteBuffer) */
public static void serialize(TDigest d, ByteBuffer buf) {
  int prefix = 0;
  for (Class<TDigest> c : SERIALIZATION_SUPPORTED_IMPLS) {
    if (null != c && c.isInstance(d)) {
      buf.putInt(0 - prefix);
      d.asSmallBytes(buf);
      return;
    }
    prefix++;
  }
  throw new Exception("TODO: unsupported impl error");
}

/** only usable in conjunction with serialize(TDigest, ByteBuffer) */
public static TDigest deserialize(TDigest d, ByteBuffer buf) {
  final int prefix = buf.getInt();
  assert prefix <= 0 : "TODO: better error handling";
  Class<TDigest> c = SERIALIZATION_SUPPORTED_IMPLS[ 0 - prefix ];

  // TODO: this gets more hairy then serialize ...
  // either need to use reflection to call static fromBytes method in each class
  // or abandon the SERIALIZATION_SUPPORTED_IMPLS array and just go with a simple switch
}

@tdunning
Copy link
Owner

Yeah.... good ideas.

I think also that we have enough ideas across the different kinds of
t-digest that we could just as easily define a relatively universal
serialization format that can deserialize into any t-digest. This allows
rolling updates, specific deserialization or non-specific,
give-me-the-best-kind deserialization.

On Fri, Apr 24, 2015 at 2:06 PM, Hoss Man [email protected] wrote:

some rough stream of conciousness thoughts on this...

the simplest way to go about this would be to just add something like...

public static TDigest createDigestFromBytes(ByteByffer buf) {
// TODO: update this class name whenever createDigest is modified
return AVLTreeDigest.fromBytes(buf);
}

...but this would break in cases of "rolling updates" to distributed
systems, where one node might still be using tdigest-3.1 where
AVLTreeDigest is the default, but another node has just upgraded to
tdigest-3.42 where MergingDigest might be the new the default.

A better approach might be to include in the binary data a prefix
indicating which implementation is used, so that any of them could be
supported genericly, even if they are not hte current "default" impl. This
could be done either by breaking backcompat in all of the current
asByte/fromByte methods, or by introducing a new static factory method for
the serialization, in addition to the deserialization, and making them
responsible for writing/reading an impl prefix before delegating to the
eixsting static methods. This wouldn't be very polymorphic, but would
ensure that as long as client code used the API consistently, they would
continue to work even as future versions of TDigest add future impls and/or
changes the default impl.

half brained, ugly, example...

private static final Class[] SERIALIZATION_SUPPORTED_IMPLS = new Class[] {
// NEVER CHANGE THE ORDER OF THIS ARRAY, ONLY ADD TO THE END
// IF IMPLS ARE DELETED FROM CODE BASE, REPLACE WITH null TO PRESERVER INDEX OFFSETS
TreeDigest.class,
ArrayDigest.class,
AVLTreeDigest.class,
MergingDigest.class
};

/** only usable in conjunction with deserialize(ByteBuffer) */
public static void serialize(TDigest d, ByteBuffer buf) {
int prefix = 0;
for (Class c : SERIALIZATION_SUPPORTED_IMPLS) {
if (null != c && c.isInstance(d)) {
buf.putInt(0 - prefix);
d.asSmallBytes(buf);
return;
}
prefix++;
}
throw new Exception("TODO: unsupported impl error");
}

/** only usable in conjunction with serialize(TDigest, ByteBuffer) */
public static TDigest deserialize(TDigest d, ByteBuffer buf) {
final int prefix = buf.getInt();
assert prefix <= 0 : "TODO: better error handling";
Class c = SERIALIZATION_SUPPORTED_IMPLS[ 0 - prefix ];

// TODO: this gets more hairy then serialize ...
// either need to use reflection to call static fromBytes method in each class
// or abandon the SERIALIZATION_SUPPORTED_IMPLS array and just go with a simple switch
}


Reply to this email directly or view it on GitHub
#52 (comment).

@hossman
Copy link
Author

hossman commented Apr 27, 2015

I ill take your word for it that a universal serialization format is possible - i haven't looked at enough diff impls to make sense of how they differ.

one other aspect of this that i just realized would be problematic is generalizing how the client knows how big to make the ByteBuffer - untill/unless issue #53 has a solution that makes the cost of smallByteSize() equally "cheap" for all concrete impls, expecting clients to call that on an arbitrary TDigest isn't really a good idea.

@tdunning
Copy link
Owner

I am addressing this in the current release. My strategy is:

  1. implement serializable for all current digests (AvlTree and MergingDigest)

  2. you will be able serialize any type of digest and then deserialize it into any other.

  3. there will be an easy and default way to deserialize to the "current best" concrete type.

@tdunning
Copy link
Owner

Well, 2 / 3 ain't bad.

Both AVLTreeDigest and MergingDigest are serializable, but there is currently no cross serialization. This meets goal 1. Goal 2 and 3 aren't there yet and I am going to punt it for now.

@tdunning
Copy link
Owner

See #87 for future work.

@tdunning
Copy link
Owner

tdunning commented Aug 6, 2017

Closing this in favor of the work on #87

@tdunning tdunning closed this as completed Aug 6, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants