MemoryArtifactStore for unit testing and ArtifactStore SPI Validation #3517

chetanmeh · 2018-04-03T11:48:31Z

Provides an in-memory ArtifactStore implementation which can be used for unit testing and ArtifactStore SPI contract validation as proposed in #3387

Description

This PR introduces a MemoryArtifactStore which stores the artifacts in an in-memory map and supports all operations as expected from an ArtifactStore.

Test Suite

The PR also includes a test suite ArtifactStoreBehavior to validate the ArtifactStore SPI contract. This is broken down in following parts

ArtifactStoreCRUDBehaviors - Checks normal CRUD operations
ArtifactStoreQueryBehaviors - Check the query contract around skipping, limiting, sorting etc
ArtifactStoreSubjectQueryBehaviors - Check views specific to WhiskAuth specially around the join support for limits
- subjects/identities
- namespaceThrottlings/blockedNamespaces

For now these tests are run for 2 stores and passes for them

MemoryArtifactStoreTests
CouchDBArtifactStoreTests

Design for Non CouchDB Stores

CouchDB performs queries based on computed views. Most other databases provide a query interface with backing indexes. Currently OpenWhisk logic provides the view name as part of query invocation and thus has an implicit dependency on CouchDB view support. This has following impact on any non CouchDB store implementation

Mapping views to queries

For database which support SQL like constructs we need to map the query as expressed in CouchDB view to SQL. For e.g. if we need to map whisks.v2.1.0/actions view to some query

function (doc) {
  var PATHSEP = "/";
  var isAction = function (doc) { return (doc.exec !== undefined) };
  if (isAction(doc)) try {
    var ns = doc.namespace.split(PATHSEP);
    var root = ns[0];
    var value = {
      namespace: doc.namespace,
      name: doc.name,
      version: doc.version,
      publish: doc.publish,
      annotations: doc.annotations,
      limits: doc.limits,
      exec: { binary: doc.exec.binary || false},
      updated: doc.updated
    };
    emit([doc.namespace, doc.updated], value);
    if (root !== doc.namespace) {
      emit([root, doc.updated], value);
    }
  } catch (e) {}
}

Then it would be expressed as following pseudo query

SELECT namespace, 
       name, 
       version, 
       publish, 
       annotations, 
       limits, 
       updated, 
       EXEC 
FROM   whisks 
WHERE  entitytype = 'action' 
       AND ( namespace = $namespace 
              OR rootns = $namespace ) 
       AND ( updated >= since 
             AND updated <= $upto ) 
ORDER  BY updated

Here

rootns - Is a computed field created at time of creation (see below)
exec - Would be a computed field created at runtime during query result post processing
select criteria - Fields specified in select part are function of view used (see projected fields below)

Sometimes its not possible to completely map the view condition to query. For e.g. in subject queries. In such cases query partially applies the condition and then in DocumentHandler full conditions are applied thus result returned from query may be a super set of actual result set and actual result is thus obtained via "filtering"

Interpreting the keys

Query parameters startKey and endKey are closely tied to view implementation. Thus any other store would need to interpret the values in keys as per the view name. To support that this PR introduces a DocumentHandler trait which defines the db neutral operations

trait DocumentHandler {

  /**
   * Returns a JsObject having computed fields. This is a substitution for fields
   * computed in CouchDB views
   */
  def computedFields(js: JsObject): JsObject = JsObject.empty

  def fieldsRequiredForView(ddoc: String, view: String): Set[String] = Set()

  def transformViewResult(
    ddoc: String,
    view: String,
    startKey: List[Any],
    endKey: List[Any],
    includeDocs: Boolean,
    js: JsObject,
    provider: DocumentProvider)(implicit transid: TransactionId, ec: ExecutionContext): Future[JsObject]

  def shouldAlwaysIncludeDocs(ddoc: String, view: String): Boolean = false
}

There is a separate handle implementation provided for each entity i.e. ActivationHandler, WhisksHandler and SubjectHandler

Sections below would provide more details on the various methods

Computed Keys - `computedFields`

Some of the views in use make use of computed fields in the view logic. For e.g. view for rules computes a root namespace when namespace is a multi segment path

function (doc) {
  var PATHSEP = "/";
  var isRule = function (doc) {  return (doc.trigger !== undefined) };
  if (isRule(doc)) try {
    var ns = doc.namespace.split(PATHSEP);
    var root = ns[0];
    emit([doc.namespace, doc.updated], 1);
    if (root !== doc.namespace) {
      emit([root, doc.updated], 1);
    }
  } catch (e) {}
}

For supporting such cases in non CouchDB store DocumentHandler#computedFields can be used. This would compute the required field at time of put which the ArtifactStore implementation can then persist along with the main document. For e.g. for Mongo case such a computed json object can be stored under a separate _computed field internally.

These computed fields can then be used for defining declarative indexes. For now following computed fields are generated

WhiskEntity
- rootns - Root name space.
Activations
- nspath - Namespace with Entity path. The nspath is computed based on logic in whisks-filters.v2.1.0/activations
- deleteLogs - This field is computed based on logic in logCleanup/byDateWithLogs i.e. flag would enabled for all such activations which are not sequence

See Mongo ArtifactStore for more details and examples

Projected Fields - `fieldsRequiredForView`

CouchDB views determine which all fields are included in returned document as part of query result. For non CouchDB cases this field list is defined by fieldsRequiredForView. ArtifactStore implementation can use it for projecting which all fields should be included

Join Support - `transformViewResult`

CouchDB supports joins which is used for subject queries to fetch the limits. Most other no sql db does not support such joins. So for such cases transformViewResult can be used which would be responsible for performing the join. For it to work the ArtifactStore need to provide an implementation of DocumentProvider which returns the raw json for provided id

trait DocumentProvider {
  protected[database] def get(id: DocId)(implicit transid: TransactionId): Future[Option[JsObject]]
}

MemoryViewMapper

This is an ArtifactStore implementation specific abstraction which converts the query keys passed to query to underlying storage query syntax. Each store implementation needs to have similar logic implementated which cover all possible scenarios from all the active views

Here the test suite plays an important role by validating that all query cases are covered.

TestSuite and future view changes

This test suite would need to be kept in sync with any change in view logic or addition of new views. Then only it can be ensured that other ArtifactStore implementation cover all the usecases as supported by default CouchDB. So going forward MemoryArtifactStore would become a cononical implementation of ArtifactStore contract!

Pending Work

Support for namespaceThrottlings/blockedNamespaces view
More coverage of entity tests

Related issue and scope

I opened an issue to propose and discuss this change (Mock database service for unit testing #3387)

My changes affect the following components

Types of changes

Bug fix (generally a non-breaking change which closes an issue).
Enhancement or new feature (adds new functionality).
Breaking change (a bug fix or enhancement which changes existing behavior).

Checklist:

I signed an Apache CLA.
I reviewed the style guides and followed the recommendations (Travis CI will check :).
I added tests to cover my changes.
My changes require further changes to the documentation.
I updated the documentation where necessary.

Also add tests for sorting, skipping and limiting result

This ensure that we can get test coverage stats easily as those are by default only calculated for main sources. Also fix some missing type annotation warnings

Support for "all" view was removed with apache#3167

Validates that implicit join from subject record to limits is working

CouchDBRestStore would throw an AssertionError if there is a revision mismatch. So adapt the logic accordingly. That means that get should do a simple lookup without revision and let deserialization logic enforce revision check

- Add waits for views to be updated for query tests - Perform cleanup post each test

- With join support the `_id` of the `value` is the limit one - `doc` is null when no result for join

The subject/identity view has a check for sub document field. This may not be properly enforced as part of query and thus its possible that query result set is super set of actual result. Thus transformViewResult may recheck the view conditions and opt to omit the result instance if it does not match all the conditions

Stop gab measure till AttachmentStore PR is merged

This instance need not be created as put is possible with existing activationStore instance. Current mode poses problem with MemoryArtifactStore as 2 separate instance of stores do not share the same state.

- NamespaceBlacklistTests - This test performs a direct interaction with CouchDB along with a generic interaction via `authStore` instance. Hence it cannot work with any non CouchDB storage - SequenceMigrationTests - This test requires a running OW setup and hits it using wsk rest client and also does direct interaction. Such test cannot work with MemoryArtifactStore as 2 different store instance would not share same state. This should however work with other ArtifactStore implementations as it is not aware of CouchDB

chetanmeh · 2018-04-10T09:12:51Z

This PR is now ready for review.

In a separate branch travis run was done with default ArtifactStoreProvider switched MemoryArtifactStoreProvider and all test (except few¹) have passed.

ArtifactStoreBehavior test suite consist of ~30 tests which try to cover all possible access patterns for ArtifactStore and provides a coverage of ~92%

¹ 2 test needed to be ignored for run involving MemoryArtifactStore. See commit notes for details

rabbah · 2018-04-06T01:09:15Z

common/scala/src/main/scala/whisk/core/database/ArtifactStoreExceptions.scala

@@ -28,3 +28,9 @@ case class DocumentTypeMismatchException(message: String) extends ArtifactStoreE
 case class DocumentUnreadable(message: String) extends ArtifactStoreException(message)

 case class PutException(message: String) extends ArtifactStoreException(message)
+
+sealed abstract class ArtifactStoreRuntimeException(message: String) extends RuntimeException(message)
+


Is this sealed class intended to restrict exceptions for views only vs documents?

Made it sealed to follow the pattern used with ArtifactStoreException i.e. a base RuntimeException which can be used to indicate logical errors in code flow. So can be used for both views and document

rabbah · 2018-04-10T16:20:29Z

tests/src/test/scala/whisk/core/database/test/behavior/ArtifactStoreWhisksQueryBehaviors.scala

+import whisk.core.entity._
+
+trait ArtifactStoreWhisksQueryBehaviors extends ArtifactStoreBehaviorBase {
+  this: FlatSpec =>


why not just extend?

Makes sense. Modified the code to extend FlatSpec

rabbah · 2018-04-13T03:01:20Z

common/scala/src/main/scala/whisk/core/database/memory/MemoryViewMapper.scala

+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+


can you add a java/scala doc (comment) explaining the purpose of the view mapper.
do you anticipate this will be used outside of testing (in which case should this move to the tests package)?

rabbah · 2018-04-13T03:03:32Z

common/scala/src/main/scala/whisk/core/database/memory/MemoryArtifactStore.scala

+import scala.reflect.ClassTag
+import scala.util.Try
+
+object MemoryArtifactStoreProvider extends ArtifactStoreProvider {


can you add a java/scala doc (comment) explaining the purpose of memory store provider; furthermore, do you anticipate this will be used outside of testing (in which case should this move to the tests package)?

Would add docs. So far no concrete usecase to use them outside testing. One major reason to put them in main codebase was getting coverage data easily.

Typically Code Coverage data is only calculated for production code but here I would like to track coverage of Memory store implementation logic such that it can be ensured that tests cover all aspects of the implementation

rabbah · 2018-04-13T03:04:39Z

common/scala/src/main/scala/whisk/core/database/DocumentHandler.scala

+  private val triggerFields = commonFields
+
+  protected val supportedTables = Set(
+    "whisks.v2.1.0/actions",


should these be parameterized on the view versions?

You mean have a separate variable for version and refer to that in name? That can be done

Here I wanted to explicitly list the actual version which current implementation is coded for. If any change in version is done in CouchDB views and config in application.conf updated this hardcoding would ensure that we take a conscious update here and ensure that implementations logic is adapted for newer version

rabbah · 2018-04-13T03:06:15Z

@dubee @csantanapr @markusthoemmes 🙏 a pg to be safe although I expect Travis to be sufficient.

chetanmeh · 2018-04-13T05:54:43Z

@rabbah Currently there is a ignored test count with skip in ArtifactStoreQueryBehaviors as it fails for CouchDB. Looks like a count call with skip does not work as intended so that flow does not work

$ curl -H 'Host: localhost:5984' -H 'Authorization: Basic d2hpc2tfYWRtaW46c29tZV9wYXNzdzByZA==' -H 'Accept: application/json' -H 'User-Agent: akka-http/10.0.10' 'http://localhost:5984/whisk_local_activations/_design/whisks-filters.v2.1.0/_view/activations?startkey=%5B%22artifactTCK_aAAS_ns_zKXPz/testact%22,0%5D&endkey=%5B%22artifactTCK_aAAS_ns_zKXPz/testact%22,%22%EF%BF%B0%22,%22%EF%BF%B0%22%5D&skip=4&reduce=true'
{"rows":[

]}

In above curl for same setup if I remove skip=4 I get following response

{"rows":[
{"key":null,"value":10}
]}

Should I create separate issue to track this?

markusthoemmes · 2018-04-13T08:07:23Z

PG2 3021 🔵

chetanmeh · 2018-04-17T04:41:52Z

@markusthoemmes If the PG test has passed can this PR be approved and merged?

chetanmeh · 2018-04-19T04:40:00Z

Currently there is a ignored test count with skip in ArtifactStoreQueryBehaviors as it fails for CouchDB

Opened #3560 to track this

…apache#3517) Provides MemoryArtifactStore as an in-memory ArtifactStore implementation which can be used for unit testing and ArtifactStore SPI contract validation. CouchDB views determine which fields are included in returned document as part of query results. For non CouchDB cases this field list is defined by fieldsRequiredForView. ArtifactStore implementations can use it for projecting which fields should be included. CouchDB supports joins which is used for subject queries to fetch the limits. Most other NoSQL dbs do not support such joins. So for such cases transformViewResult can be used which would be responsible for performing the join. For it to work the ArtifactStore needs to provide an implementation of DocumentProvider which returns the raw json for a provided doc id. The MemoryViewMapper is an ArtifactStore implementation specific abstraction which converts the query keys passed to query to underlying storage query syntax. Each store implementation needs to have similar logic implemented to cover all possible scenarios from all the active views. Here the test suite plays an important role by validating that all query cases are covered. The added test suites need to be kept in sync with any change in view logic or addition of new views. Then only it can be ensured that other ArtifactStore implementation cover all the use cases as supported by default CouchDB. So going forward MemoryArtifactStore would become a canonical implementation of ArtifactStore contract.

Initial implementation of MemoryArtifactStore

9b2b0ce

chetanmeh added wip artifactstore labels Apr 3, 2018

chetanmeh self-assigned this Apr 3, 2018

chetanmeh added 26 commits April 4, 2018 11:12

Create Read and Delete support completed

fe7692c

Add DocumentHandler tests

0def54c

Initial query support

e4c7056

Implement query support for activations

9f47647

Also add tests for sorting, skipping and limiting result

Implement count support

ef97eff

Make tests modular

9e5b5fa

Move MemoryArtifactStore to common/scala from tests

afe61d8

This ensure that we can get test coverage stats easily as those are by default only calculated for main sources. Also fix some missing type annotation warnings

Remove support for "all" view

16d709d

Support for "all" view was removed with apache#3167

Switch to DocId instead of simple String id

a587a5d

Refactor deserialization logic to utility method

cdd96b3

Initial support for subjects/identities view queries

3cb4d25

Test to validate subject limits support

033bec5

Validates that implicit join from subject record to limits is working

Remove test for "all" view

d67145c

Change behavior to match CouchDBRestStore

78a2d57

CouchDBRestStore would throw an AssertionError if there is a revision mismatch. So adapt the logic accordingly. That means that get should do a simple lookup without revision and let deserialization logic enforce revision check

Make EntityName creation random

87d28cc

Make TCK run against CouchDB

48f73ac

- Add waits for views to be updated for query tests - Perform cleanup post each test

Add check for supported view names

b4dbf69

Test to check blocked subject handling

e828966

Test to check blocked subject handling

b03b323

Test to confirm that query response for subject matches CouchDB format

5a3566a

- With join support the `_id` of the `value` is the limit one - `doc` is null when no result for join

Add support for blacklisted namespace view

419ee02

Remove unused method

afaf4ad

Implement basic attachment handling support

4fbaa57

Stop gab measure till AttachmentStore PR is merged

Fix the view name

e5d5969

Add test and support for querying public packages

57b6d0b

chetanmeh added 6 commits April 10, 2018 10:36

Remove new activationStore instance

8199238

This instance need not be created as put is possible with existing activationStore instance. Current mode poses problem with MemoryArtifactStore as 2 separate instance of stores do not share the same state.

Refer to view names from entity objects

0ba3b29

Minor code touchup

3babc0a

Assume should be called from within withFixture and not in before

c2b48ca

Add test to assert the query involving since and untill

97c83a6

chetanmeh removed the wip label Apr 10, 2018

rabbah reviewed Apr 13, 2018

View reviewed changes

chetanmeh added 3 commits April 13, 2018 10:53

Make test extend FlatSpec directly

0f20838

Add scala docs

c58c6c5

Refactor to use the common deserialization utility method

095eebd

Add test to check del with revision check

710aca5

rabbah approved these changes Apr 19, 2018

View reviewed changes

rabbah merged commit 220a3f4 into apache:master Apr 19, 2018

This was referenced Apr 19, 2018

Mock database service for unit testing #3387

Closed

Count with skip is not working for CouchDB #3560

Closed

chetanmeh mentioned this pull request Apr 24, 2018

ArtifactStore implementation for CosmosDB #3562

Merged

14 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MemoryArtifactStore for unit testing and ArtifactStore SPI Validation #3517

MemoryArtifactStore for unit testing and ArtifactStore SPI Validation #3517

chetanmeh commented Apr 3, 2018 •

edited

Loading

chetanmeh commented Apr 10, 2018 •

edited

Loading

rabbah Apr 6, 2018

chetanmeh Apr 13, 2018

rabbah Apr 10, 2018

chetanmeh Apr 13, 2018

rabbah Apr 13, 2018

rabbah Apr 13, 2018

chetanmeh Apr 13, 2018

rabbah Apr 13, 2018

chetanmeh Apr 13, 2018

rabbah commented Apr 13, 2018

chetanmeh commented Apr 13, 2018

markusthoemmes commented Apr 13, 2018

chetanmeh commented Apr 17, 2018

chetanmeh commented Apr 19, 2018

MemoryArtifactStore for unit testing and ArtifactStore SPI Validation #3517

MemoryArtifactStore for unit testing and ArtifactStore SPI Validation #3517

Conversation

chetanmeh commented Apr 3, 2018 • edited Loading

Description

Test Suite

Design for Non CouchDB Stores

Mapping views to queries

Interpreting the keys

Computed Keys - computedFields

Projected Fields - fieldsRequiredForView

Join Support - transformViewResult

MemoryViewMapper

TestSuite and future view changes

Pending Work

Related issue and scope

My changes affect the following components

Types of changes

Checklist:

chetanmeh commented Apr 10, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rabbah commented Apr 13, 2018

chetanmeh commented Apr 13, 2018

markusthoemmes commented Apr 13, 2018

chetanmeh commented Apr 17, 2018

chetanmeh commented Apr 19, 2018

chetanmeh commented Apr 3, 2018 •

edited

Loading

Computed Keys - `computedFields`

Projected Fields - `fieldsRequiredForView`

Join Support - `transformViewResult`

chetanmeh commented Apr 10, 2018 •

edited

Loading