Name		Name	Last commit message	Last commit date
parent directory ..
src		src
README.md		README.md
build.gradle.kts		build.gradle.kts

README.md

Spark Plugin

This plugin enables Assertainty integration with Apache Spark, using the Kotlin Spark API. It is parameterized in org.apache.spark.sql.Dataset and org.apache.spark.sql.Column.

Gradle

testImplementation("io.github.peterattardo.assertainty:spark-plugin:0.2.0")

Usage

val ds = // create dataset
ds.assert {
    +Column("someColumn") // grouping column
    +"someOtherColumn" // the Spark DSL adds this convenience function to the core DSL to specify grouping columns by String.
    
    always(functions.length(Column("someIdColumn")) eq 15) // Because the plugin is parameterized in org.apache.spark.sql.Column, it can take full advantage of the methods available to that class. 
}

ds.assertSeparateQueries { // Logically identical to assert, but under the hood it runs each assertion as its own call to RelationalGroupedDataset#agg()
    +"someColumn"
    +"someOtherColumn"
    
    minSum(Column("revenue"), 100_000) // we're making good money, eh?
    minCount(100) // averaging $1000/sale is impressive
}

Note

Spark, like the other plugins, defaults to generating a single combined query. However, because of the likelihood of duplicate columns between assertions (count() in particular), all columns are aliased during the query building process. To avoid this behavior, assertSeparateQueries exists, in which each assertion gets its own aggregation call, at the cost of more iterations over the data and slower execution time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spark

spark

README.md

Spark Plugin

Gradle

Usage

Files

spark

Directory actions

More options

Directory actions

More options

Latest commit

History

spark

Folders and files

parent directory

README.md

Spark Plugin

Gradle

Usage