Name		Name	Last commit message	Last commit date
parent directory ..
DistributedLinearRegression.java		DistributedLinearRegression.java
MapReduceEngine.java		MapReduceEngine.java
README.md		README.md
SparseMatrixVectorMultiplication.java		SparseMatrixVectorMultiplication.java
SparseVector.java		SparseVector.java

README.md

Assignment 2 [20 points]

Please implement the code for the following tasks and submit a zip file via Canvas. Make sure to write your code in a way that it would work in a real distributed execution in Hadoop. You can use the unit tests as a means to debug and validate your implementation. Note that a successful unit test execution does not necessary mean that your solution is 100% correct.

Task 0: Code Submission [5 points]

Make sure that your submitted code compiles.

Task 1: Distributed Matrix Vector Multiplication in MapReduce [3 + 2 points]

Implement a sparse vector backed by a hashmap in the class SparseVector. Next, please implement a distributed matrix vector multiplication via a broadcast-join in the class SparseMatrixVectorMultiplication, analogous to the implementation from our exercises. Please use a dense representation for all vectors with a sparsity of less than 50% and a sparse representation otherwise.

You can test your implementation with the following unit tests:

./run_docker.sh mvn -Dtest=nl.uva.bigdata.hadoop.assignment2.SparseMatrixVectorMultiplicationLocalTest test
./run_docker.sh mvn -Dtest=nl.uva.bigdata.hadoop.assignment2.SparseMatrixVectorMultiplicationClusterTest test

Task 2: Implement your own MapReduce engine [5 points]

In this task, we stop using Hadoop and implement our own (local) MapReduce engine in MapReduceEngine.. This reverses the previous tasks, now we are given the map and reduce implementations for word counting and we have to implement the underlying engine according to the three phases of execution in MapReduce.

You can test your implementation with the following unit test:

./run_docker.sh mvn -Dtest=nl.uva.bigdata.hadoop.assignment2.MapReduceEngineTest test

Task 3: Implement Distributed Linear Regression [5 points]

Your final task is to implement distributed linear regression (as discussed in the class) on top of your own MapReduce engine in DistributedLinearRegression. Compute outer products in the mapper, sum up the intermediate results in the reducer and solve the corresponding linear system.

You can test your implementation with the following unit test:

./run_docker.sh mvn -Dtest=nl.uva.bigdata.hadoop.assignment2.DistributedLinearRegressionTest test

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

assignment2

assignment2

README.md

Assignment 2 [20 points]

Task 0: Code Submission [5 points]

Task 1: Distributed Matrix Vector Multiplication in MapReduce [3 + 2 points]

Task 2: Implement your own MapReduce engine [5 points]

Task 3: Implement Distributed Linear Regression [5 points]

Files

assignment2

Directory actions

More options

Directory actions

More options

Latest commit

History

assignment2

Folders and files

parent directory

README.md

Assignment 2 [20 points]

Task 0: Code Submission [5 points]

Task 1: Distributed Matrix Vector Multiplication in MapReduce [3 + 2 points]

Task 2: Implement your own MapReduce engine [5 points]

Task 3: Implement Distributed Linear Regression [5 points]