Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MemoryArtifactStore for unit testing and ArtifactStore SPI Validation #3517

Merged
merged 37 commits into from
Apr 19, 2018

Conversation

chetanmeh
Copy link
Member

@chetanmeh chetanmeh commented Apr 3, 2018

Provides an in-memory ArtifactStore implementation which can be used for unit testing and ArtifactStore SPI contract validation as proposed in #3387

Description

This PR introduces a MemoryArtifactStore which stores the artifacts in an in-memory map and supports all operations as expected from an ArtifactStore.

Test Suite

The PR also includes a test suite ArtifactStoreBehavior to validate the ArtifactStore SPI contract. This is broken down in following parts

  • ArtifactStoreCRUDBehaviors - Checks normal CRUD operations
  • ArtifactStoreQueryBehaviors - Check the query contract around skipping, limiting, sorting etc
  • ArtifactStoreSubjectQueryBehaviors - Check views specific to WhiskAuth specially around the join support for limits
    • subjects/identities
    • namespaceThrottlings/blockedNamespaces

For now these tests are run for 2 stores and passes for them

  • MemoryArtifactStoreTests
  • CouchDBArtifactStoreTests

Design for Non CouchDB Stores

CouchDB performs queries based on computed views. Most other databases provide a query interface with backing indexes. Currently OpenWhisk logic provides the view name as part of query invocation and thus has an implicit dependency on CouchDB view support. This has following impact on any non CouchDB store implementation

Mapping views to queries

For database which support SQL like constructs we need to map the query as expressed in CouchDB view to SQL. For e.g. if we need to map whisks.v2.1.0/actions view to some query

function (doc) {
  var PATHSEP = "/";
  var isAction = function (doc) { return (doc.exec !== undefined) };
  if (isAction(doc)) try {
    var ns = doc.namespace.split(PATHSEP);
    var root = ns[0];
    var value = {
      namespace: doc.namespace,
      name: doc.name,
      version: doc.version,
      publish: doc.publish,
      annotations: doc.annotations,
      limits: doc.limits,
      exec: { binary: doc.exec.binary || false},
      updated: doc.updated
    };
    emit([doc.namespace, doc.updated], value);
    if (root !== doc.namespace) {
      emit([root, doc.updated], value);
    }
  } catch (e) {}
}

Then it would be expressed as following pseudo query

SELECT namespace, 
       name, 
       version, 
       publish, 
       annotations, 
       limits, 
       updated, 
       EXEC 
FROM   whisks 
WHERE  entitytype = 'action' 
       AND ( namespace = $namespace 
              OR rootns = $namespace ) 
       AND ( updated >= since 
             AND updated <= $upto ) 
ORDER  BY updated 

Here

  • rootns - Is a computed field created at time of creation (see below)
  • exec - Would be a computed field created at runtime during query result post processing
  • select criteria - Fields specified in select part are function of view used (see projected fields below)

Sometimes its not possible to completely map the view condition to query. For e.g. in subject queries. In such cases query partially applies the condition and then in DocumentHandler full conditions are applied thus result returned from query may be a super set of actual result set and actual result is thus obtained via "filtering"

Interpreting the keys

Query parameters startKey and endKey are closely tied to view implementation. Thus any other store would need to interpret the values in keys as per the view name. To support that this PR introduces a DocumentHandler trait which defines the db neutral operations

trait DocumentHandler {

  /**
   * Returns a JsObject having computed fields. This is a substitution for fields
   * computed in CouchDB views
   */
  def computedFields(js: JsObject): JsObject = JsObject.empty

  def fieldsRequiredForView(ddoc: String, view: String): Set[String] = Set()

  def transformViewResult(
    ddoc: String,
    view: String,
    startKey: List[Any],
    endKey: List[Any],
    includeDocs: Boolean,
    js: JsObject,
    provider: DocumentProvider)(implicit transid: TransactionId, ec: ExecutionContext): Future[JsObject]

  def shouldAlwaysIncludeDocs(ddoc: String, view: String): Boolean = false
}

There is a separate handle implementation provided for each entity i.e. ActivationHandler, WhisksHandler and SubjectHandler

Sections below would provide more details on the various methods

Computed Keys - computedFields

Some of the views in use make use of computed fields in the view logic. For e.g. view for rules computes a root namespace when namespace is a multi segment path

function (doc) {
  var PATHSEP = "/";
  var isRule = function (doc) {  return (doc.trigger !== undefined) };
  if (isRule(doc)) try {
    var ns = doc.namespace.split(PATHSEP);
    var root = ns[0];
    emit([doc.namespace, doc.updated], 1);
    if (root !== doc.namespace) {
      emit([root, doc.updated], 1);
    }
  } catch (e) {}
}

For supporting such cases in non CouchDB store DocumentHandler#computedFields can be used. This would compute the required field at time of put which the ArtifactStore implementation can then persist along with the main document. For e.g. for Mongo case such a computed json object can be stored under a separate _computed field internally.

These computed fields can then be used for defining declarative indexes. For now following computed fields are generated

  • WhiskEntity
    • rootns - Root name space.
  • Activations
    • nspath - Namespace with Entity path. The nspath is computed based on logic in whisks-filters.v2.1.0/activations
    • deleteLogs - This field is computed based on logic in logCleanup/byDateWithLogs i.e. flag would enabled for all such activations which are not sequence

See Mongo ArtifactStore for more details and examples

Projected Fields - fieldsRequiredForView

CouchDB views determine which all fields are included in returned document as part of query result. For non CouchDB cases this field list is defined by fieldsRequiredForView. ArtifactStore implementation can use it for projecting which all fields should be included

Join Support - transformViewResult

CouchDB supports joins which is used for subject queries to fetch the limits. Most other no sql db does not support such joins. So for such cases transformViewResult can be used which would be responsible for performing the join. For it to work the ArtifactStore need to provide an implementation of DocumentProvider which returns the raw json for provided id

trait DocumentProvider {
  protected[database] def get(id: DocId)(implicit transid: TransactionId): Future[Option[JsObject]]
}

MemoryViewMapper

This is an ArtifactStore implementation specific abstraction which converts the query keys passed to query to underlying storage query syntax. Each store implementation needs to have similar logic implementated which cover all possible scenarios from all the active views

Here the test suite plays an important role by validating that all query cases are covered.

TestSuite and future view changes

This test suite would need to be kept in sync with any change in view logic or addition of new views. Then only it can be ensured that other ArtifactStore implementation cover all the usecases as supported by default CouchDB. So going forward MemoryArtifactStore would become a cononical implementation of ArtifactStore contract!

Pending Work

  • Support for namespaceThrottlings/blockedNamespaces view
  • More coverage of entity tests

Related issue and scope

My changes affect the following components

  • API
  • Controller
  • Message Bus (e.g., Kafka)
  • Loadbalancer
  • Invoker
  • Intrinsic actions (e.g., sequences, conductors)
  • Data stores (e.g., CouchDB)
  • Tests
  • Deployment
  • CLI
  • General tooling
  • Documentation

Types of changes

  • Bug fix (generally a non-breaking change which closes an issue).
  • Enhancement or new feature (adds new functionality).
  • Breaking change (a bug fix or enhancement which changes existing behavior).

Checklist:

  • I signed an Apache CLA.
  • I reviewed the style guides and followed the recommendations (Travis CI will check :).
  • I added tests to cover my changes.
  • My changes require further changes to the documentation.
  • I updated the documentation where necessary.

chetanmeh added 26 commits April 4, 2018 11:12
Also add tests for sorting, skipping and limiting result
This ensure that we can get test coverage stats easily as those are by default only calculated for main sources.

Also fix some missing type annotation warnings
Support for "all" view was removed with apache#3167
Validates that implicit join from subject record to limits is working
CouchDBRestStore would throw an AssertionError if there is a revision mismatch. So adapt the logic accordingly. That means that get should do a simple lookup without revision and let deserialization logic enforce revision check
- Add waits for views to be updated for query tests
- Perform cleanup post each test
- With join support the `_id` of the `value` is the limit one
- `doc` is null when no result for join
The subject/identity view has a check for sub document field. This may not be properly enforced as part of query and thus its possible  that query result set is super set of actual result. Thus transformViewResult may recheck the view conditions and opt to omit the result instance if it does not match all the conditions
Stop gab measure till AttachmentStore PR is merged
This instance need not be created as put is possible with existing activationStore instance. Current mode poses problem with MemoryArtifactStore as 2 separate instance of stores do not share the same state.
- NamespaceBlacklistTests - This test performs a direct interaction with CouchDB along with a generic interaction via `authStore` instance. Hence it cannot work with any non CouchDB storage

- SequenceMigrationTests -  This test requires a running OW setup and hits it using wsk rest client and also does direct interaction. Such test cannot work with MemoryArtifactStore as 2 different store instance would not share same state. This should however work with other ArtifactStore implementations as it is not aware of CouchDB
@chetanmeh chetanmeh removed the wip label Apr 10, 2018
@chetanmeh
Copy link
Member Author

chetanmeh commented Apr 10, 2018

This PR is now ready for review.

In a separate branch travis run was done with default ArtifactStoreProvider switched MemoryArtifactStoreProvider and all test (except few1) have passed.

ArtifactStoreBehavior test suite consist of ~30 tests which try to cover all possible access patterns for ArtifactStore and provides a coverage of ~92%

1 2 test needed to be ignored for run involving MemoryArtifactStore. See commit notes for details

@@ -28,3 +28,9 @@ case class DocumentTypeMismatchException(message: String) extends ArtifactStoreE
case class DocumentUnreadable(message: String) extends ArtifactStoreException(message)

case class PutException(message: String) extends ArtifactStoreException(message)

sealed abstract class ArtifactStoreRuntimeException(message: String) extends RuntimeException(message)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this sealed class intended to restrict exceptions for views only vs documents?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made it sealed to follow the pattern used with ArtifactStoreException i.e. a base RuntimeException which can be used to indicate logical errors in code flow. So can be used for both views and document

import whisk.core.entity._

trait ArtifactStoreWhisksQueryBehaviors extends ArtifactStoreBehaviorBase {
this: FlatSpec =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just extend?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Modified the code to extend FlatSpec

* See the License for the specific language governing permissions and
* limitations under the License.
*/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a java/scala doc (comment) explaining the purpose of the view mapper.
do you anticipate this will be used outside of testing (in which case should this move to the tests package)?

import scala.reflect.ClassTag
import scala.util.Try

object MemoryArtifactStoreProvider extends ArtifactStoreProvider {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a java/scala doc (comment) explaining the purpose of memory store provider; furthermore, do you anticipate this will be used outside of testing (in which case should this move to the tests package)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would add docs. So far no concrete usecase to use them outside testing. One major reason to put them in main codebase was getting coverage data easily.

Typically Code Coverage data is only calculated for production code but here I would like to track coverage of Memory store implementation logic such that it can be ensured that tests cover all aspects of the implementation

private val triggerFields = commonFields

protected val supportedTables = Set(
"whisks.v2.1.0/actions",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should these be parameterized on the view versions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean have a separate variable for version and refer to that in name? That can be done

Here I wanted to explicitly list the actual version which current implementation is coded for. If any change in version is done in CouchDB views and config in application.conf updated this hardcoding would ensure that we take a conscious update here and ensure that implementations logic is adapted for newer version

@rabbah
Copy link
Member

rabbah commented Apr 13, 2018

@dubee @csantanapr @markusthoemmes 🙏 a pg to be safe although I expect Travis to be sufficient.

@chetanmeh
Copy link
Member Author

@rabbah Currently there is a ignored test count with skip in ArtifactStoreQueryBehaviors as it fails for CouchDB. Looks like a count call with skip does not work as intended so that flow does not work

$ curl -H 'Host: localhost:5984' -H 'Authorization: Basic d2hpc2tfYWRtaW46c29tZV9wYXNzdzByZA==' -H 'Accept: application/json' -H 'User-Agent: akka-http/10.0.10' 'http://localhost:5984/whisk_local_activations/_design/whisks-filters.v2.1.0/_view/activations?startkey=%5B%22artifactTCK_aAAS_ns_zKXPz/testact%22,0%5D&endkey=%5B%22artifactTCK_aAAS_ns_zKXPz/testact%22,%22%EF%BF%B0%22,%22%EF%BF%B0%22%5D&skip=4&reduce=true'
{"rows":[

]}

In above curl for same setup if I remove skip=4 I get following response

{"rows":[
{"key":null,"value":10}
]}

Should I create separate issue to track this?

@markusthoemmes
Copy link
Contributor

PG2 3021 🔵

@chetanmeh
Copy link
Member Author

@markusthoemmes If the PG test has passed can this PR be approved and merged?

@rabbah rabbah merged commit 220a3f4 into apache:master Apr 19, 2018
@chetanmeh
Copy link
Member Author

Currently there is a ignored test count with skip in ArtifactStoreQueryBehaviors as it fails for CouchDB

Opened #3560 to track this

BillZong pushed a commit to BillZong/openwhisk that referenced this pull request Nov 18, 2019
…apache#3517)

Provides MemoryArtifactStore as an in-memory ArtifactStore implementation which can be used for unit testing and ArtifactStore SPI contract validation.

CouchDB views determine which fields are included in returned document as part of query results. For non CouchDB cases this field list is defined by fieldsRequiredForView. ArtifactStore implementations can use it for projecting which fields should be included.

CouchDB supports joins which is used for subject queries to fetch the limits. Most other NoSQL dbs do not support such joins. So for such cases transformViewResult can be used which would be responsible for performing the join. For it to work the ArtifactStore needs to provide an implementation of DocumentProvider which returns the raw json for a provided doc id.

The MemoryViewMapper is an ArtifactStore implementation specific abstraction which converts the query keys passed to query to underlying storage query syntax. Each store implementation needs to have similar logic implemented to cover all possible scenarios from all the active views. Here the test suite plays an important role by validating that all query cases are covered.

The added test suites need to be kept in sync with any change in view logic or addition of new views. Then only it can be ensured that other ArtifactStore implementation cover all the use cases as supported by default CouchDB. So going forward MemoryArtifactStore would become a canonical implementation of ArtifactStore contract.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants