batch operations #19

rashtao · 2015-07-28T09:46:59Z

Are batch operations supported? doc_here

In a fluent-interface way, they could be implemented with this approach:

qb
    .insert(valuesA)
    .into("columnFamily")
    .and()
    .insert(valuesB)
    .into("columnFamily")
    .and()
    .insert(valuesC)
    .into("columnFamily")
    .batch(function(err, result) {
      // do something w/ your err/result
    });

UnbounDev · 2015-07-28T21:12:55Z

They're not supported as of yet, adding it to the list!

UnbounDev · 2015-07-28T21:22:37Z

What about a nested query approach? e.g.

qb.batch([
  qb()
    .insert(valuesA)
    .into("columnFamily"),
  qb()
    .insert(valuesB)
    .into("columnFamily"),
  qb()
    .insert(valuesC)
    .into("columnFamily"),
  qb()
    .update("columnFamily")
    .set(valuesD)
    .where(...)
], function(err, result) {
   // do something w/ your err/result
})

That would avoid the semi-vague .and() methods, and would ease readability for large or complex nested queries that should be created in helper functions.

rashtao · 2015-07-29T08:36:18Z

If possible I would prefer to keep the method chaining, so implementing it through a pure fluent interface approach. It offers a better separability of the code since the helper functions do not need to keep a reference to qb().

Of course the syntax in my suggestion may not be perfect and I would agree to find better alternatives and trying to avoid the .and()

UnbounDev · 2015-08-04T16:38:45Z

I've given this a bit of thought - the following contrasts the two proposed approaches - completely open to suggestions/comments/alternatives.

From the datastax nodejs driver batch example (expanded to include another bandmate/increase batch complexity):

var queries = [
  {
    query: 'UPDATE user_profiles SET email=? WHERE key=?',
    params: [emailAddress, 'hendrix']
  },
  {
    query: 'INSERT INTO user_track (key, text, date) VALUES (?, ?, ?)',
    params: ['hendrix', 'Changed email', new Date()]
  },
  {
    query: 'UPDATE user_profiles SET email=? WHERE key=?',
    params: [emailAddress, 'Buddy Miles']
  },
  {
    query: 'INSERT INTO user_track (key, text, date) VALUES (?, ?, ?)',
    params: ['Buddy Miles', 'Changed email', new Date()]
  }
];
client.batch(queries, { prepare: true }, function(err) {
  assert.ifError(err);
  console.log('Data updated on cluster');
});

The emphasis in the example seems to point towards being able to take any set of query statements, regardless of order, and execute an atomic execution of each.

(1) How this would be implemented using nested queries:

var cassanknex = require("cassanknex")();

function _getUpdateProfiles(email, key) {
  var qb = cassanknex();
  return qb()
    .update("user_profiles")
    .set({email: email})
    .where("key", "=", key)
}

function _getInsertUserTrack(key, text) {
  var qb = cassanknex();
  return qb()
    .insert({key: key, text: text, date: new Date()})
    .into("user_track")
}

var qb = cassanknex();
qb.batch([
    _getInsertUserTrack("emailAddress", "hendrix"),
    _getInsertUserTrack("hendrix", "Changed email"),
    _getInsertUserTrack("emailAddress", "Buddy Miles"),
    _getInsertUserTrack("Buddy Miles", "Changed email")],
  function (err, res) {
    // continue on your merry way
  });

This provides a decoupling of individual query statements and the usage of batch executions, I believe this is consistent w/ the intended usage of the datastax nodejs driver method (also note that there is no consistent reference to any particular nested qb, batch simply takes an array of distinct cassanknex objects and executes them).

(2) vs a completely fluent approach:

var qb = cassanknex();
qb
  .update("user_profiles")
  .set({email: email})
  .where("key", "=", "hendrix")
  .and()
  .insert({key: "hendrix", text: "Changed email", date: new Date()})
  .into("user_track")
  .and()
  .update("user_profiles")
  .set({email: email})
  .where("key", "=", "Buddy Miles")
  .and()
  .insert({key: "Buddy Miles", text: "Changed email", date: new Date()})
  .into("user_track")
  .batch(function (err, res) {
    // continue on your merry way
  });

This maintains a strict fluent interface, but will quickly get more and more complex as the number of queries in a batch increases.

Clearly I'm more in favor of a nested approach given that it handles complex cases (you can easily imagine a case where you want to insert/update for every member in a band atomically, which is easy given the decoupling), and permits code reuse (a wrapper around cassanknex can easily use a single 'batch executor' function and cherry pick whichever queries it needs prior to performing an operation - this is possible w/ the fluent approach as well, but easier if allowed decoupling).

Have I convinced you or have I missed a prime benefit of the fluent approach or a major detriment of the nested query approach? :)

rashtao · 2015-08-05T09:37:38Z

OK I agree with your approach! It seems to me clear and powerful enough.

UnbounDev · 2015-08-11T23:07:42Z

Feature is in! Docs here.

UnbounDev added the enhancement label Jul 28, 2015

UnbounDev self-assigned this Jul 28, 2015

UnbounDev added a commit that referenced this issue Aug 11, 2015

Add batch functionality - closes #19

178b66d

UnbounDev mentioned this issue Aug 11, 2015

Feature/batch executions #20

Merged

UnbounDev closed this as completed in #20 Aug 11, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

batch operations #19

batch operations #19

rashtao commented Jul 28, 2015

UnbounDev commented Jul 28, 2015

UnbounDev commented Jul 28, 2015

rashtao commented Jul 29, 2015

UnbounDev commented Aug 4, 2015

rashtao commented Aug 5, 2015

UnbounDev commented Aug 11, 2015

batch operations #19

batch operations #19

Comments

rashtao commented Jul 28, 2015

UnbounDev commented Jul 28, 2015

UnbounDev commented Jul 28, 2015

rashtao commented Jul 29, 2015

UnbounDev commented Aug 4, 2015

rashtao commented Aug 5, 2015

UnbounDev commented Aug 11, 2015