Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

batch operations #19

Closed
rashtao opened this issue Jul 28, 2015 · 6 comments · Fixed by #20
Closed

batch operations #19

rashtao opened this issue Jul 28, 2015 · 6 comments · Fixed by #20
Assignees

Comments

@rashtao
Copy link

rashtao commented Jul 28, 2015

Are batch operations supported? doc_here

In a fluent-interface way, they could be implemented with this approach:

qb
    .insert(valuesA)
    .into("columnFamily")
    .and()
    .insert(valuesB)
    .into("columnFamily")
    .and()
    .insert(valuesC)
    .into("columnFamily")
    .batch(function(err, result) {
      // do something w/ your err/result
    });
@UnbounDev UnbounDev self-assigned this Jul 28, 2015
@UnbounDev
Copy link
Contributor

They're not supported as of yet, adding it to the list!

@UnbounDev
Copy link
Contributor

What about a nested query approach? e.g.

qb.batch([
  qb()
    .insert(valuesA)
    .into("columnFamily"),
  qb()
    .insert(valuesB)
    .into("columnFamily"),
  qb()
    .insert(valuesC)
    .into("columnFamily"),
  qb()
    .update("columnFamily")
    .set(valuesD)
    .where(...)
], function(err, result) {
   // do something w/ your err/result
})

That would avoid the semi-vague .and() methods, and would ease readability for large or complex nested queries that should be created in helper functions.

@rashtao
Copy link
Author

rashtao commented Jul 29, 2015

If possible I would prefer to keep the method chaining, so implementing it through a pure fluent interface approach. It offers a better separability of the code since the helper functions do not need to keep a reference to qb().

Of course the syntax in my suggestion may not be perfect and I would agree to find better alternatives and trying to avoid the .and()

@UnbounDev
Copy link
Contributor

I've given this a bit of thought - the following contrasts the two proposed approaches - completely open to suggestions/comments/alternatives.

From the datastax nodejs driver batch example (expanded to include another bandmate/increase batch complexity):

var queries = [
  {
    query: 'UPDATE user_profiles SET email=? WHERE key=?',
    params: [emailAddress, 'hendrix']
  },
  {
    query: 'INSERT INTO user_track (key, text, date) VALUES (?, ?, ?)',
    params: ['hendrix', 'Changed email', new Date()]
  },
  {
    query: 'UPDATE user_profiles SET email=? WHERE key=?',
    params: [emailAddress, 'Buddy Miles']
  },
  {
    query: 'INSERT INTO user_track (key, text, date) VALUES (?, ?, ?)',
    params: ['Buddy Miles', 'Changed email', new Date()]
  }
];
client.batch(queries, { prepare: true }, function(err) {
  assert.ifError(err);
  console.log('Data updated on cluster');
});

The emphasis in the example seems to point towards being able to take any set of query statements, regardless of order, and execute an atomic execution of each.

(1) How this would be implemented using nested queries:

var cassanknex = require("cassanknex")();

function _getUpdateProfiles(email, key) {
  var qb = cassanknex();
  return qb()
    .update("user_profiles")
    .set({email: email})
    .where("key", "=", key)
}

function _getInsertUserTrack(key, text) {
  var qb = cassanknex();
  return qb()
    .insert({key: key, text: text, date: new Date()})
    .into("user_track")
}

var qb = cassanknex();
qb.batch([
    _getInsertUserTrack("emailAddress", "hendrix"),
    _getInsertUserTrack("hendrix", "Changed email"),
    _getInsertUserTrack("emailAddress", "Buddy Miles"),
    _getInsertUserTrack("Buddy Miles", "Changed email")],
  function (err, res) {
    // continue on your merry way
  });

This provides a decoupling of individual query statements and the usage of batch executions, I believe this is consistent w/ the intended usage of the datastax nodejs driver method (also note that there is no consistent reference to any particular nested qb, batch simply takes an array of distinct cassanknex objects and executes them).

(2) vs a completely fluent approach:

var qb = cassanknex();
qb
  .update("user_profiles")
  .set({email: email})
  .where("key", "=", "hendrix")
  .and()
  .insert({key: "hendrix", text: "Changed email", date: new Date()})
  .into("user_track")
  .and()
  .update("user_profiles")
  .set({email: email})
  .where("key", "=", "Buddy Miles")
  .and()
  .insert({key: "Buddy Miles", text: "Changed email", date: new Date()})
  .into("user_track")
  .batch(function (err, res) {
    // continue on your merry way
  });

This maintains a strict fluent interface, but will quickly get more and more complex as the number of queries in a batch increases.

Clearly I'm more in favor of a nested approach given that it handles complex cases (you can easily imagine a case where you want to insert/update for every member in a band atomically, which is easy given the decoupling), and permits code reuse (a wrapper around cassanknex can easily use a single 'batch executor' function and cherry pick whichever queries it needs prior to performing an operation - this is possible w/ the fluent approach as well, but easier if allowed decoupling).

Have I convinced you or have I missed a prime benefit of the fluent approach or a major detriment of the nested query approach? :)

@rashtao
Copy link
Author

rashtao commented Aug 5, 2015

OK I agree with your approach! It seems to me clear and powerful enough.

@UnbounDev
Copy link
Contributor

Feature is in! Docs here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants