Memory problem on ConcurrentHClientPool using HThriftClient with TFramedTransport #665

flefilla · 2014-08-28T11:44:40Z

In version 1.1-4, when ConcurrentHClientPool release HClient, if it is opened, it is pooled in availableClientQueue.

If we are in case of HThrifClient with TTransport wrapped with TFramedTransport, TMemoryInputTransport readBuffer_ keeps datas of operations done by HClient.

These given datas multiplied by connection's number can increase quickly the memory.

Why doesn't clear readBuffer_ on HClient release ?

I have 1 additional question,
Why max active connection is divided by 3 to obtain HClient number by host ?

Thanks

zznate · 2014-08-28T12:10:34Z

Why doesn't clear readBuffer_ on HClient release ?

It keeps the data, but it is overwritten during the next usage.

This is a "feature" of that version of Thrift. It keeps the underlying byte[] to avoid having to re-allocate/re-grow. The problem, as you have discovered, is that they will grow out to except a larger payload, but they will not shrink, doing so all the way out to the max message length (15mb by default).

Why max active connection is divided by 3 to obtain HClient number by host ?

Did not need to have all MAX_CONNECTIONS threads allocated and that seemed a good number from empirical observation of adding a service into a running architecture. That was a good guess it seemed as no one has yet had a big enough issue with it to want to add a MIN_CONNECTIONS or similar :)

flefilla · 2014-08-28T14:29:01Z

Thank you for the answer.

So, the maximum (approximately ) retained heap by the ConcurrentHClientPool using HThriftClient with TFramedTransport follows this rule :
HOSTS_NUMBER * (MAX_ACTIVE_CONNECTION / 3) * MAX_MESSAGE_LENGTH ?

zznate · 2014-08-28T14:42:18Z

Yes - exactly that.

This by itself may be a reason to incorporate the DataStax Java Driver for simple operations in your code as well, maintaining a much smaller pool of hector connections for large batch mutates or getting at dynamic columns easier.

Further, the binary protocol for CQL uses evented IO via Netty on the client and server so is significantly more efficient resource wise.

That said, despite what you may read elsewhere, using raw thrift is more performant and flexible if (a really big "if" there) you understand the underlying storage model and its limits.

There's really no reason you can't use both.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory problem on ConcurrentHClientPool using HThriftClient with TFramedTransport #665

Memory problem on ConcurrentHClientPool using HThriftClient with TFramedTransport #665

flefilla commented Aug 28, 2014

zznate commented Aug 28, 2014

flefilla commented Aug 28, 2014

zznate commented Aug 28, 2014

Memory problem on ConcurrentHClientPool using HThriftClient with TFramedTransport #665

Memory problem on ConcurrentHClientPool using HThriftClient with TFramedTransport #665

Comments

flefilla commented Aug 28, 2014

zznate commented Aug 28, 2014

flefilla commented Aug 28, 2014

zznate commented Aug 28, 2014