-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy pathPROTOCOL
79 lines (63 loc) · 2.66 KB
/
PROTOCOL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
Protocol between Spark Driver and Alchemist Driver (AlDriver):
- Spark Driver sends sequence of commands to AlDriver, and AlDriver sends
replies synchronously
- all values are sent and received in network byte order
- each command starts with a 32-bit "message type" field.
- MESSAGE TYPES
- handshake
cmd:
uint32_t typeCode = 0xABCD
uint32_t protocolVersion = 0x1
reply:
uint32_t statusCode = 0xDCBA
uint32_t protocolVersion = 0x1
uint32_t numWorkers = number of AlWorkers
WorkerInfo workers[numWorkers]
- shutdown
uint32_t typeCode = 0xFFFFFFFF
reply:
uint32_t statusCode = 0x1
- newMatrix
uint32_t typeCode = 0x1
uint64_t numRows
uint64_t numCols
uint64_t numWorkers
// each entry of layout must be number of rows to be recvd by that worker
uint32_t layout[numWorkers]
reply 1: "ready for receiving data"
uint32_t statusCode = 0x1
uint32_t matrixHandle
[then spark sends data to workers]
reply 2: "matrix successfully loaded"
uint32_t statusCode = 0x1
- matrixMul
uint32_t matrixHandle
uint32_t matrixHandle
reply:
uint32_t statusCode = 0x1
uint32_t matrixHandle
- STRUCTS
- string
uint32_t length
uint8_t data[length]
- WorkerInfo:
string hostname
uint32_t port
Block transfer protocol of AlWorker:
uint32_t typeCode = 0x1
uint32_t matrixHandle
uint64_t rowIdx
uint64_t numCols
double rowData[numCols]
###
alchemist:
use locality
figure out how to force persistence of cached RDDs returned from Alchemist (to avoid replay -- maybe make it persist to disk in the worst case)
allow freeing of matrices on the RDD and alchemist sides
need logging system for better debugging and profiling (other than println to console)
allow AlMatrices to be reshuffled internally (e.g. to row-wise distributions for kmeans)
need to have more extensible way of adding to library functionality (e.g. a separate header for each lib we link against, and use strings for operations instead of hexs, since easier to make unique, e.g. "elemental/thinSVD" instead of 0x5)
support returning vectors (of ints as well, e.g. for kmeans cluster assignments)
support returning vector/matrices direct to driver (again e.g. for kmeans), gathering to Alchemist driver first if not already there (to avoid collects on Spark side)
have better way to track the distributed (and new driver only) matrices/vectors/objects more generally, in the Alchemist driver
more formal testing framework w/ unit tests, extensible for new functions, etc.