Skip to content

Map Reduce

nestorpersist edited this page Sep 23, 2012 · 60 revisions

OStore map-reduce is specified by

  • The ScalaDoc for the MapReduce object. Map and reduce operations are defined by classes that implement the traits defined in that object.
  • The Configuration page describes how to specify derived map and reduce tables within a database configuration.

Some simple examples of map and reduce are given here. The Examples page includes additional map-reduce examples.

Sample Code

The package com.persist.map contains sample Map classes. The package com.persist.reduce contains sample Reduce classes. These classes are used in the various OStore examples and may be changed in future releases. If you want to use one of these classes, make your own copy into your own package.

Maps

This example shows how to compute inverted indexes. Suppose table1 has the following items

Key     Value
1       {"a":7,"b":3}
2       {"a":2,"b":99}
3       {"a":0,"b":6}

We can then define tablea and tableb within the database configuration as follows.

{"name":"tablea", "map":{"from":"table1", "act":"com.test.Index", "field":"a"}}
{"name":"tableb", "map":{"from":"table1", "act":"com.test.Index", "field":"b"}}

Next we define the mapping action Index as follows.

package com.test
import com.persist.JsonOps._
class Index() extends com.persist.MapReduce.Map {
  private lazy val field = jgetString(options, "field")
  def to(key: JsonKey, value: Json) = {
    val newValue = jget(value, field)
    List((JsonArray(newValue, key), null))
  }
  def from(key: JsonKey): JsonKey = {
    jgetArray(key,1)
  }
}

Then tablea will have the following items.

Key     Value
[0,3]   null
[2,2]   null
[7,1]   null

An tableb will have the following items.

Key     Value
[6,3]    null
[3,1]    null
[99,2]   null

Reduce

This example shows how to count the number of items with a common prefix. Suppose table2 has the following items.

Key       Value
["a",3]   null
["a",5]   null
["a",6]   null
["b",1]   null
["b",2]   null

We can then define the count table within the database configuration as follows.

{"name":"count", "reduce":{"from":"table2", "act":"com.test.Count", "size":1}}

Next we define the reduce action Count as follows.

package com.test
import com.persist.JsonOps._
class Count() extends com.persist.MapReduce.Reduce {
  val zero = 0L
  def item(key: JsonKey, value: Json): Json = 1L
  def add(value1: Json, value2: Json): Json = jgetLong(value1) + jgetLong(value2)
  def subtract(value1: Json, value2: Json): Json = jgetLong(value1) - jgetLong(value2)
}

Then table count will have the following form

Key     Value
["a"]   3
["b"]   2

Prefix Maps

This example shows a join. Suppose table Person contains

Key    Value
"Bob"  {"city":3}
"Jim"  {"city":2}
"Sam"  {"city":1}
"Tom"  {"city":2}

and table City contains

Key    Value
1      "Seattle"
2      "Boston"
3      "Chicago"

We can then define PersonCity, Person1, Person2, and Person3 within the database configuration as follows.

{"name":"PersonCity", "prefix":{"name":"City", "size":1},
                      "map":[{"from":"Person", "act":"com.test.Invert"},
                             {"from":"City", "act":"com.test.Wrap", "toprefix": "City"}]}
{"name":"Person1", "map":{"from":"PersonCity", "act2":"com.test.Combine", "fromprefix":"City"}}
{"name":"Person2", "reduce":{"from":"Person1", "act":"com.test.Simplify", "size":1}}
{"name":"Person3", "map":{"from":"Person2", "act":"com.test.Flat"}}

The implementation for the Invert, Wrap, and Flat simple map functions and the Simplify reduce function is left as an exercise. The implementation of Combine is as follows.

package com.test
import com.persist.JsonOps._
class Combine() extends com.persist.MapReduce.Map2 {
  def to(prefixKey: JsonKey, prefixValue: Json, key: JsonKey, value: Json): 
        Traversable[(JsonKey, Json)] = {
    val key = JsonArray(jget(key,1),jget(key,0))
    val value = JsonObject("city"->prefixValue)
    List((key,value))
  }
  def from(key: JsonKey): JsonKey = JsonArray(jget(key,1),jget(key,0))
}

Table PersonCity will have the following items.

Key        Value
[1,"Sam"]  null
[2,"Jim"]  null
[2,"Tom"]  null
[3,"Bob"]  null

The subtable City of PersonCity will have the following items.

Key    Value
[1]    "Seattle"
[2]    "Boston"
[3]    "Chicago"

The table Person1 will have the following items produced by joining the previous two tables using a prefix map.

Key        Value
["Bob",3]  {"city":"Chicago"}
["Jim",2]  {"city":"Boston"}
["Sam",1]  {"city":"Seattle"}
["Tom",2]  {"city":"Boston"}

The table Person2 will have the following items produced by using reduce to shorten the keys.

Key      Value
["Bob"]  {"city":"Chicago"}
["Jim"]  {"city":"Boston"}
["Sam"]  {"city":"Seattle"}
["Tom"]  {"city":"Boston"}

Finally Person3 will have the following items produced by using map to simplify the keys.

Key    Value
"Bob"  {"city":"Chicago"}
"Jim"  {"city":"Boston"}
"Sam"  {"city":"Seattle"}
"Tom"  {"city":"Boston"}
Clone this wiki locally