GitHub - redcat9/mongo-pig-examples: Collection of Pig scripts using data in MongoDB

Welcome to Mortar!

Mortar is a platform-as-a-service for Hadoop. With Mortar, you can run jobs on Hadoop using Apache Pig and Python without any special training.

Getting Started

Here we've included some example scripts that explore using MongoDB with Hadoop. These scripts use a sample of tweets from a single day loaded into a readonly publicly available MongoDB instance. To start using them:

Signup for a Mortar account
Install the Mortar Development Framework

Clone this repository to your computer and register it as a project with Mortar:

git clone [email protected]:mortardata/mongo-pig-examples.git
cd mongo-pig-examples
mortar register mongo-pig-examples

Once you've setup the project, use the mortar illustrate command to show data flowing through a given script. Use mortar run to run the script on a Hadoop cluster.

For lots more help and tutorials on running Mortar, check out the Mortar Help site.

Examples

characterize_collection:

This pig script will return some basic information about a MongoDB collection. Output is:

Field Name. Embedded fields have their parent's field name prepended to their name. Every field that appears in any document in the collection is listed.
Unique value count. The number of unique values associated with the field.
Example value. An example value for the field.
Example value type. The data type of the example value.
Value count. The number of times the example value appeared for this field in the collection

Each field is listed up to five times with their five most common example values.

mongo_schema_generator:

This pig script will return a single text field which is the Pig schema of the collection loaded. This schema can be copied directly into the MongoLoader constructor to load the collection. See Using MongoDB with Mortar for an explanation of why you might like to load your collection using a schema.

hourly_coffee_tweets:

This pig script will go through a small sampling of a single day's worth of Tweets and count the number of times coffee was tweeted bucketed into two hour time blocks of the tweeter's local time.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
example_output		example_output
fixtures		fixtures
macros		macros
pigscripts		pigscripts
udfs/python		udfs/python
LICENSE.txt		LICENSE.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Welcome to Mortar!

Getting Started

Examples

characterize_collection:

mongo_schema_generator:

hourly_coffee_tweets:

About

Releases

Packages

License

redcat9/mongo-pig-examples

Folders and files

Latest commit

History

Repository files navigation

Welcome to Mortar!

Getting Started

Examples

characterize_collection:

mongo_schema_generator:

hourly_coffee_tweets:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages