Ecclesiastical Latin IPA: /ˈʃi.o/, [ˈʃiː.o], [ˈʃi.i̯o]
Verb: I can, know, understand, have knowledge.
Scio is a Scala API for Google Cloud Dataflow and Apache Beam inspired by Spark and Scalding. See the current API documentation for more information.
Scio is being donated to Apache Beam as a Scala DSL (BEAM-302).
- Scala API close to that of Spark and Scalding core APIs
- Unified batch and streaming programming model1, 2
- Fully managed service2
- Integration with Google Cloud products: Cloud Storage, BigQuery, Pub/Sub, Datastore, Bigtable2
- HDFS source/sink
- Interactive mode with Scio REPL
- Type safe BigQuery
- Integration with Algebird and Breeze
- Pipeline orchestration with Scala Futures
- Distributed cache
1 provided by Apache Beam
2 provided by Google Cloud Dataflow
The ubiquitous word count example can be run directly with SBT in local mode, using README.md
as input.
sbt "project scio-examples" "run-main com.spotify.scio.examples.WordCount --input=README.md --output=wc"
cat wc/part-00000-of-00001.txt
- Scio Wiki - wiki page
- ScalaDocs - current API documentation
- Big Data Rosetta Code - comparison of code snippets in Scio, Scalding and Saprk
Scio includes the following artifacts:
scio-core
: core libraryscio-test
: test utilities, add to your project as a "test" dependencyscio-bigquery
: Add-on for BigQuery, included inscio-core
but can also be used standalonescio-bigtable
: Add-on for Bigtablescio-extra
: Extra utilities for working with collections, Breeze, etc.scio-hdfs
: Add-on for HDFS
Copyright 2016 Spotify AB.
Licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0