Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cloud Spanner DML & PartitionedDML support #3703

Merged
merged 6 commits into from
Sep 27, 2018
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -135,7 +135,7 @@ public interface DatabaseClient {
ReadOnlyTransaction singleUseReadOnlyTransaction();

/**
  * Returns a read-only transaction context in which a single read or query can be performed at the
* Returns a read-only transaction context in which a single read or query can be performed at
* given timestamp bound. This method differs from {@link #singleUse(TimestampBound)} in that the
* read timestamp used may be inspected after the read has returned data or finished successfully.
*
Expand Down Expand Up @@ -269,4 +269,51 @@ public interface DatabaseClient {
*
*/
TransactionManager transactionManager();

/**
* Returns the lower bound of rows modified by this DML statement.
*
* <p>The method will block until the update is complete. Running a DML statement with this method
* does not offer exactly once semantics, and therfore the DML statement should be idempotent. The
* DML statement must be fully-partitionable. Specifically, the statement must be expressible as
* the union of many statements which each access only a single row of the table. This is a
snehashah16 marked this conversation as resolved.
Show resolved Hide resolved
* Partitioned DML transaction in which a single Partitioned DML statement is executed.
* Partitioned DML partitions the key space and runs the DML statement over each partition in
* parallel using separate, internal transactions that commit independently. Partitioned DML
* transactions do not need to be committed.
*
* <p>Partitioned DML updates are used to execute a single DML statement with a different
* execution strategy that provides different, and often better, scalability properties for large,
* table-wide operations than DML in a {@link #readWriteTransaction()} transaction. Smaller scoped
* statements, such as an OLTP workload, should prefer using {@link
* TransactionContext#executeUpdate(Statement)} with {@link #readWriteTransaction()}.
*
* <ul>
* That said, Partitioned DML is not a drop-in replacement for standard DML used in {@link
* #readWriteTransaction()}.
* <li>The DML statement must be fully-partitionable. Specifically, the statement must be
* expressible as the union of many statements which each access only a single row of the
* table.
* <li>The statement is not applied atomically to all rows of the table. Rather, the statement
* is applied atomically to partitions of the table, in independent internal transactions.
* Secondary index rows are updated atomically with the base table rows.
* <li>Partitioned DML does not guarantee exactly-once execution semantics against a partition.
* The statement will be applied at least once to each partition. It is strongly recommended
* that the DML statement should be idempotent to avoid unexpected results. For instance, it
* is potentially dangerous to run a statement such as `UPDATE table SET column = column +
* 1` as it could be run multiple times against some rows.
* <li>The partitions are committed automatically - there is no support for Commit or Rollback.
* If the call returns an error, or if the client issuing the DML statement dies, it is
* possible that some rows had the statement executed on them successfully. It is also
* possible that statement was never executed against other rows.
* <li>If any error is encountered during the execution of the partitioned DML operation (for
* instance, a UNIQUE INDEX violation, division by zero, or a value that cannot be stored
* due to schema constraints), then the operation is stopped at that point and an error is
* returned. It is possible that at this point, some partitions have been committed (or even
* committed multiple times), and other partitions have not been run at all.
*
* <p>Given the above, Partitioned DML is good fit for large, database-wide, operations that are
* idempotent, such as deleting old rows from a very large table.
*/
long executePartitionedUpdate(Statement stmt);
}
Original file line number Diff line number Diff line change
Expand Up @@ -28,10 +28,11 @@
class DatabaseClientImpl implements DatabaseClient {
private static final String READ_WRITE_TRANSACTION = "CloudSpanner.ReadWriteTransaction";
private static final String READ_ONLY_TRANSACTION = "CloudSpanner.ReadOnlyTransaction";
private static final String PARTITION_DML_TRANSACTION = "CloudSpanner.PartitionDMLTransaction";
private static final Tracer tracer = Tracing.getTracer();

static {
TraceUtil.exportSpans(READ_WRITE_TRANSACTION, READ_ONLY_TRANSACTION);
TraceUtil.exportSpans(READ_WRITE_TRANSACTION, READ_ONLY_TRANSACTION, PARTITION_DML_TRANSACTION);
}

private final SessionPool pool;
Expand Down Expand Up @@ -155,6 +156,17 @@ public TransactionManager transactionManager() {
}
}

@Override
public long executePartitionedUpdate(Statement stmt) {
Span span = tracer.spanBuilder(PARTITION_DML_TRANSACTION).startSpan();
try (Scope s = tracer.withSpan(span)) {
return pool.getReadWriteSession().executePartitionedUpdate(stmt);
} catch (RuntimeException e) {
TraceUtil.endSpanWithFailure(span, e);
throw e;
}
}

ListenableFuture<Void> closeAsync() {
return pool.closeAsync();
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
package com.google.cloud.spanner;

import com.google.spanner.v1.ResultSetStats;
import javax.annotation.Nullable;

/**
* Provides access to the data returned by a Cloud Spanner read or query. {@code ResultSet} allows a
Expand Down Expand Up @@ -59,13 +60,17 @@ public interface ResultSet extends AutoCloseable, StructReader {
@Override
void close();


/**
* Returns the {@link ResultSetStats} for the query only if the query was executed in either the
* {@code PLAN} or the {@code PROFILE} mode via the {@link ReadContext#analyzeQuery(Statement,
* com.google.cloud.spanner.ReadContext.QueryAnalyzeMode)} method. Attempts to call this method on
* a {@code ResultSet} not obtained from {@code analyzeQuery} result in an {@code
* UnsupportedOperationException}. This method must be called after {@link #next()} has
* returned @{code false}. Calling it before that will result in an {@code IllegalStateException}.
* com.google.cloud.spanner.ReadContext.QueryAnalyzeMode)} method or for DML statements in
* {@link ReadContext#executeQuery(Statement, QueryOption)}. Attempts to call this method on

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

* a {@code ResultSet} not obtained from {@code analyzeQuery} or {@code executeQuery} will return
* a {@code null} {@code ResultSetStats}. This method must be called after {@link #next()} has
* returned @{code false}. Calling it before that will result in {@code null}
* {@code ResultSetStats} too.
*/
@Nullable
ResultSetStats getStats();
}
Original file line number Diff line number Diff line change
Expand Up @@ -312,6 +312,18 @@ public Timestamp write(Iterable<Mutation> mutations) throws SpannerException {
}
}

@Override
public long executePartitionedUpdate(Statement stmt) throws SpannerException {
try {
markUsed();
return delegate.executePartitionedUpdate(stmt);
} catch (SpannerException e) {
throw lastException = e;
} finally {
close();
}
}

@Override
public Timestamp writeAtLeastOnce(Iterable<Mutation> mutations) throws SpannerException {
try {
Expand Down
Loading