Skip to content

Commit

Permalink
Revamped README and CONTRIBUTING.
Browse files Browse the repository at this point in the history
  • Loading branch information
dblock committed Apr 3, 2017
1 parent d9938ed commit 3a2663c
Show file tree
Hide file tree
Showing 4 changed files with 321 additions and 205 deletions.
165 changes: 165 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,165 @@
# Contributing

This project is work of [many contributors](https://github.com/rcongiu/Hive-JSON-Serde/graphs/contributors).

You're encouraged to submit [pull requests](https://github.com/rcongiu/Hive-JSON-Serde/pulls), [propose features and discuss issues](https://github.com/rcongiu/Hive-JSON-Serde/issues).

In the examples below, substitute your Github username for `contributor` in URLs.

## Fork the Project

Fork the [project on Github](https://github.com/rcongiu/Hive-JSON-Serde) and check out your copy.

```
git clone https://github.com/contributor/Hive-JSON-Serde.git
cd Hive-JSON-Serde
git remote add upstream https://github.com/rcongiu/Hive-JSON-Serde.git
```

## Build

Ensure that you can build the project and run tests.

```
git checkout develop
mvn test
```

### Architecture

JSON encoding and decoding is using a somewhat modified version of [Douglas Crockfords JSON library](https://github.com/douglascrockford/JSON-java), which is included in the distribution.

The SerDe builds a series of wrappers around `JSONObject`. Since serialization and deserialization are executed for every (and possibly billions) record we want to minimize object creation, so instead of serializing/deserializing to an `ArrayList`, the `JSONObject` is kept and a cached
`ObjectInspector` is built around it. When deserializing, Hive gets a `JSONObject`, and a `JSONStructObjectInspector` to read from. Hive has `Structs`, `Maps`, `Arrays` and primitives while `JSON` has `Objects`, `Arrays` and primitives. Hive `Maps` and `Structs` are both implemented as `Object`, which are less restrictive than hive maps. A JSON `Object` could be a mix of keys and values of different types, while Hive expects you to declare the
type of map (eg. `map<string,string>`). The user is responsible for having the JSON data structure match hive table declaration.

See [www.congiu.com](http://www.congiu.com/?s=serde) for details.

### Compiling for Specific Targets

Use maven to compile the SerDe. This project uses maven profiles to support multiple version of Hive/CDH.

#### CDH4

```
mvn -Pcdh4 clean package
```

#### CDH5

```
mvn -Pcdh5 clean package
```

#### HDP 2.3

```
mvn -Phdp23 clean package
```

### Generate a JAR

All output is generated into `json-serde/target/json-serde-VERSION-jar-with-dependencies.jar`.

```
$ mvn package
```

#### Specific Versions of Hive

If you want to compile the SerDe against a different version of the cloudera libs, use `-D`.

```
$ mvn -Dcdh.version=0.9.0-cdh3u4c-SNAPSHOT package
```

For Hive 0.14.0 and Cloudera 1.0.0.

```
mvn -Pcdh5 -Dcdh5.hive.version=1.0.0 clean package
```

## Write Tests

Try to write a test that reproduces the problem you're trying to fix or describes a feature that you want to build.

We definitely appreciate pull requests that highlight or reproduce a problem, even without a fix.

## Write Code

Implement your feature or bug fix.

## Write Documentation

Document any external behavior in the [README](README.md).

## Update Changelog

Add a line to [CHANGELOG](CHANGELOG.md) under *Next* release.
Make it look like every other line, including your name and link to your Github account.

## Commit Changes

Make sure git knows your name and email address:

```
git config --global user.name "Your Name"
git config --global user.email "[email protected]"
```

Writing good commit logs is important. A commit log should describe what changed and why.

```
git add ...
git commit
```

## Push

```
git push origin my-feature-branch
```

## Make a Pull Request

Go to https://github.com/contributor/Hive-JSON-Serde and select your feature branch.
Click the 'Pull Request' button and fill out the form. Pull requests are usually reviewed within a few days.

## Rebase

If you've been working on a change for a while, rebase with upstream/master.

```
git fetch upstream
git rebase upstream/master
git push origin my-feature-branch -f
```

## Update CHANGELOG Again

Update the [CHANGELOG](CHANGELOG.md) with the pull request number. A typical entry looks as follows.

```
* [#123](https://github.com/rcongiu/Hive-JSON-Serde/pull/123): Reticulated splines - [@contributor](https://github.com/contributor).
```

Amend your previous commit and force push the changes.

```
git commit --amend
git push origin my-feature-branch -f
```

## Check on Your Pull Request

Go back to your pull request after a few minutes and see whether it passed muster with Travis-CI. Everything should look green, otherwise fix issues and amend your commit as described above.

## Be Patient

It's likely that your change will not be merged and that the nitpicky maintainers will ask you to do more, or fix seemingly benign problems. Hang on there!

## Thank You

Please do know that we really appreciate and value your time and work. We love you, really.


Loading

0 comments on commit 3a2663c

Please sign in to comment.