-
Notifications
You must be signed in to change notification settings - Fork 3
Map Reduce
OStore map-reduce is specified by
- The ScalaDoc for the MapReduce object. Map and reduce operations are defined by classes that implement the traits defined in that object.
- The Configuration page describes how to specify derived map and reduce tables within a database configuration.
Some simple examples of map and reduce are given here. The Examples page includes additional map-reduce examples.
The package com.persist.map
contains sample Map classes.
The package com.persist.reduce
contains sample Reduce classes.
These classes are used in the various OStore examples and may be changed
in future releases. If you want to use one of these classes, make your own
copy into your own package.
This example shows how to compute inverted indexes. Suppose table1 has the following items
Key Value
1 {"a":7,"b":3}
2 {"a":2,"b":99}
3 {"a":0,"b":6}
We can then define tablea and tableb within the database configuration as follows.
{"name":"tablea", "map":{"from":"table1", "act":"com.test.Index", "field":"a"}}
{"name":"tableb", "map":{"from":"table1", "act":"com.test.Index", "field":"b"}}
Next we define the mapping action Index as follows.
package com.test
import com.persist.JsonOps._
class Index() extends com.persist.MapReduce.Map {
private lazy val field = jgetString(options, "field")
def to(key: JsonKey, value: Json) = {
val newValue = jget(value, field)
List((JsonArray(newValue, key), null))
}
def from(key: JsonKey): JsonKey = {
jgetArray(key,1)
}
}
Then tablea will have the following items.
Key Value
[0,3] null
[2,2] null
[7,1] null
An tableb will have the following items.
Key Value
[6,3] null
[3,1] null
[99,2] null
This example shows how to count the number of items with a common prefix. Suppose table2 has the following items.
Key Value
["a",3] null
["a",5] null
["a",6] null
["b",1] null
["b",2] null
We can then define the count table within the database configuration as follows.
{"name":"count", "reduce":{"from":"table2", "act":"com.test.Count", "size":1}}
Next we define the reduce action Count as follows.
package com.test
import com.persist.JsonOps._
class Count() extends com.persist.MapReduce.Reduce {
val zero = 0L
def item(key: JsonKey, value: Json): Json = 1L
def add(value1: Json, value2: Json): Json = jgetLong(value1) + jgetLong(value2)
def subtract(value1: Json, value2: Json): Json = jgetLong(value1) - jgetLong(value2)
}
Then table count will have the following form
Key Value
["a"] 3
["b"] 2
This example shows a join. Suppose table Person contains
Key Value
"Bob" {"city":3}
"Jim" {"city":2}
"Sam" {"city":1}
"Tom" {"city":2}
and table City contains
Key Value
1 "Seattle"
2 "Boston"
3 "Chicago"
We can then define PersonCity, Person1, Person2, and Person3 within the database configuration as follows.
{"name":"PersonCity", "prefix":{"name":"City", "size":1},
"map":[{"from":"Person", "act":"com.test.Invert"},
{"from":"City", "act":"com.test.Wrap", "toprefix": "City"}]}
{"name":"Person1", "map":{"from":"PersonCity", "act2":"com.test.Combine", "fromprefix":"City"}}
{"name":"Person2", "reduce":{"from":"Person1", "act":"com.test.Simplify", "size":1}}
{"name":"Person3", "map":{"from":"Person2", "act":"com.test.Flat"}}
The implementation for the Invert, Wrap, and Flat simple map functions and the Simplify reduce function is left as an exercise. The implementation of Combine is as follows.
package com.test
import com.persist.JsonOps._
class Combine() extends com.persist.MapReduce.Map2 {
def to(prefixKey: JsonKey, prefixValue: Json, key: JsonKey, value: Json):
Traversable[(JsonKey, Json)] = {
val key = JsonArray(jget(key,1),jget(key,0))
val value = JsonObject("city"->prefixValue)
List((key,value))
}
def from(key: JsonKey): JsonKey = JsonArray(jget(key,1),jget(key,0))
}
Table PersonCity will have the following items.
Key Value
[1,"Sam"] null
[2,"Jim"] null
[2,"Tom"] null
[3,"Bob"] null
The subtable City of PersonCity will have the following items.
Key Value
[1] "Seattle"
[2] "Boston"
[3] "Chicago"
The table Person1 will have the following items produced by joining the previous two tables using a prefix map.
Key Value
["Bob",3] {"city":"Chicago"}
["Jim",2] {"city":"Boston"}
["Sam",1] {"city":"Seattle"}
["Tom",2] {"city":"Boston"}
The table Person2 will have the following items produced by using reduce to shorten the keys.
Key Value
["Bob"] {"city":"Chicago"}
["Jim"] {"city":"Boston"}
["Sam"] {"city":"Seattle"}
["Tom"] {"city":"Boston"}
Finally Person3 will have the following items produced by using map to simplify the keys.
Key Value
"Bob" {"city":"Chicago"}
"Jim" {"city":"Boston"}
"Sam" {"city":"Seattle"}
"Tom" {"city":"Boston"}