clj-spark

Idiomatic Clojure bindings for the Apache Spark framework.

Overview

Currently there exists no Clojure binding library atop the Apache Spark framework that operates well from a user standpoint. This project aims to resolve those issues by providing a series of files and functions that interoperate between traditional Clojure datasets as well as the Spark RDD objects.

For example, Spark has a concept of map, but wait, doesn't Clojure have one of those as well? I think so. So why should you, as a developer, worry about another library and namespace when both functions have the same expected outputs, merely operating on different objects? Well we don't have to anymore...

With clj-spark we looked at how the current Spark implementations were handling function serialization and their various methods, deciding there must be a better way. Instead of making custom versions of map, reduce, etc. clj-spark overrides the default clojure.core functionality and merely adds to it the ability to operate on Spark RDD objects as well!

A Simple Example

TODO: Once the API has settled down.

Caveats

Now, given we do things completely different from things like the older clj-spark library and the newer flambo there are a few things people should know about...

All projects must be AOT-compiled
All methods that accept a function cannot be passed an individual macro

For example: (map count (parallelize [1 2 3 4])) will cause an error because count is now a Clojure macro. Instead this line would have to be written as (map (fn [x] (count x)) (parallelize [1 2 3 4])).

Acknowledgements

@TheClimateCorporation/clj-spark for their initial implementation of the library and giving me inspiration to move the concept forward.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
src/clj_spark		src/clj_spark
test/clj_spark		test/clj_spark
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
project.clj		project.clj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

clj-spark

Overview

A Simple Example

Caveats

Acknowledgements

About

Releases

Packages

Languages

License

brennonyork/clj-spark

Folders and files

Latest commit

History

Repository files navigation

clj-spark

Overview

A Simple Example

Caveats

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages