-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathbnn_layer.qmd
201 lines (158 loc) · 6.55 KB
/
bnn_layer.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
## BNN Layer
## Sum of Product Networks
Consider a three variable multi-valued Truth table.
| $x_1$ | $x_2$ | $x_3$ | $y_1$ | $y_2$ | $y_3$ | Prod Terms |
|-----|-----|-----|-----|-----|-----|-----|
F | F | F | T | F | F | $h_1 = \neg x_1 \neg x_2 \neg x_3$ |
F | F | T | T | T | F | $h_2 = \neg x_1 \neg x_2 x_3$ |
F | T | F | F | T | F | $h_3 = \neg x_1 x_2 \neg x_3$ |
F | T | T | F | F | T | $h_4 = \neg x_1 x_2 x_3$ |
T | F | F | F | T | F | $h_5 = x_1 \neg x_2 \neg x_3$ |
T | F | T | F | F | F | $h_6 = x_1 \neg x_2 x_3$ |
T | T | F | F | F | F | $h_7 = x_1 x_2 \neg x_3$ |
T | T | T | F | T | T | $h_8 = x_1 x_2 x_3$|
For an n-variable Boolean function, the Truth Table will have $2^n$ unique rows, and the size of all Boolean functions is $2^{2^n}$. Above, we considered $y_1, y_2, y_3$ for illustration.
We can model
$$
\begin{array}{left}
y_1 &=& \neg x_1 \neg x_2 \neg x_3 \lor \neg x_1 \neg x_2 x_3\\
y_2 &=& \neg x_1 \neg x_2 x_3 \lor \neg x_1 x_2 \neg x_3 \lor x_1 \neg x_2 \neg x_3 \lor x_1 x_2 x_3\\
y_3 &=& x_1 x_2 x_3
\end{array}
$$
We notice any Truth Table ($y_1$, for example) can be expressed in Sum-of-Products form. The innards are the product terms which are to be OR-ed.
Generally speaking, we model the Truth Tables via learnable gates of n-variables as follows:
$$
\begin{array}{left}
h_j &=& \land_{i=1}^{n} x_i \oplus w_i^j \\
y_k &=& \uplus_{j=1}^{m} \alpha_{j}^{k} \odot h_j \text{ where } \alpha_{j}^{k} \in \{0,1\} \text{ and } \exists j \text{ s.t } \alpha_{j}^{k} = 1 \forall k
\end{array}
$$
Above, $\alpha_j^k$ is a selection gate. When all of them are cold (off), model will be in an $Ignore$ state. The Truth Table for the selection gate $h_j \otimes \alpha_j$ is:
| $\alpha_j$ | $h_j$ | $\alpha_j \odot h_j$ |
|-----|-----|-----|
T| T | T |
T| F | F |
F| T | 0 |
F| F | 0 |
And, $\uplus(.)$ is an modified OR gate works like regular OR gate when at least one of the inputs is not in $Ignore$ state, which is different from the OR gate we have defined [earlier](./bold.qmd). The Truth Table for an N-ary SelectOR gate is defined as follows:
| $a$ | $x$ | $\uplus(x,a)$ |
|-----|-----|-----|
T | T | T |
T | F | T |
F | T | T |
F | F | F |
0 | T | T |
0 | F | F |
0 | 0 | 0 |
T | 0 | T |
F | 0 | F |
Therefore,
$$
\begin{array}{left}
\uplus(x,a) &=& x \text{ if } a=0 \\
&=& a \text{ if } x=0 \\
&=& \lor (x,a) \text{ o.w }
\end{array}
$$
Further,
$$
\begin{array}{left}
x \oplus w &=& \neg x \text{ if } w = T \\
&=& x \text{ o.w }
\end{array}
$$
In effect, we are using XOR gate to complement the inputs, which is essential to create the product terms and using the SelectOR gate to select the prod terms and OR them.
## ProdTron
We want to realize a module which takes n-inputs and creates a product term, i.e., we want to realize
$$
\begin{array}{left}
h(x,w) &=& \land_{i=1}^{n} x_i \oplus w_i
\end{array}
$$
For convenience, we write them recursively, to ease up the derivation, $h= \land(x_i \oplus w_i, h_{-i})$, where
$h_{-i} = \land_{k \neq i} x_k \oplus w_k$
We obtain the partial derivatives as:
$$
\begin{array}{left}
\frac{\partial{h(x,w)}}{\partial{x_i}} &=& w_i \text{ if } h_{-i} = T\\
&=& 0 \text{ o.w } \\
\frac{\partial{h(x,w)}}{\partial{w_i}} &=& x_i \text{ if } h_{-i} = T\\
&=& 0 \text{ o.w } \\
\end{array}
$$
It can be implemented with Heavyside function as follows:
$$
\begin{array}{left}
\frac{\partial{h(x,w)}}{\partial{x_i}} &=& w_i H(h_{-i}) \\
\frac{\partial{h(x,w)}}{\partial{w_i}} &=& x_i H(h_{-i})
\end{array}
$$
where $\epsilon$ is a small positive constant.
## SumTron
We want to realize a module which takes m-inputs, selects some of the m-inputs, and OR's them.
$$
\begin{array}{left}
y(\alpha, h) &=& \uplus_{j=1}^{m} \alpha_j \odot h_j
\end{array}
$$
For convenience, we write them recursively, to ease up the derivation, $y = \uplus(\alpha_j \odot h_j, y_{-j})$, where $y_{-j} = \uplus_{k=1, \neq j }^{m} \alpha_k \odot h_k$. We can see that $y = y_{-j} \text { if } \alpha_j = F$
We obtain the partial derivatives by observing the Truth Table to get $\frac{\partial{y(\alpha,h)}}{\partial{h_i}}$ as follows:
| $\alpha_j$ | $h_j$ | $y_{-j}$ | $y$ | $y(\neg h_j)$ | $\delta y$ | $\delta h_j$ | $y'$ |
|-----|-----|-----|-----| -----| -----|-----|-----|
T| T | T/F/0 | T | T/F/F | 0/F/F | F | 0/T/T |
T| F | T/F/0 | T/F/F | T | 0/T/T | T | 0/T/T |
F| T | T/F/0 | T/F/0 | T/F/0 | 0 | F | 0 |
F| F | T/F/0 | T/F/0 | T/F/0 | 0 | F | 0 |
Therefore,
$$
\begin{array}{left}
\frac{\partial{y(\alpha,h)}}{\partial{h_i}} &=& T \text{ if } \alpha_j = T\, \& \, y_{-j} = F/0 \\
&=& 0 \text{ o.w } \\
\end{array}
$$
It can be implemented with Heavyside function as follows:
$$
\begin{array}{left}
\frac{\partial{y(\alpha,h)}}{\partial{h_j}} &=& H( \alpha_j)H(-y_{-j} + \epsilon)
\end{array}
$$
where $\epsilon$ is a small positive constant.
We obtain the partial derivatives by observing the Truth Table to get $\frac{\partial{y(\alpha,h)}}{\partial{\alpha_i}}$ as follows:
| $\alpha_j$ | $h_j$ | $y_{-j}$ | $y$ | $y(\neg \alpha_j)$ | $\delta y$ | $\delta \alpha_j$ | $y'$ |
|-----|-----|-----|-----| -----| -----|-----|-----|
T| T | T/F/0 | T | $y_{-j}$ | 0 | F | 0 |
T| F | T/F/0 | T/F/F | $y_{-j}$ | 0 | F | 0 |
F| T | T/F/0 | $y_{-j}$ | T | 0/T/0 | T | 0/T/0 |
F| F | T/F/0 | $y_{-j}$ | T/F/F| 0/0/0 | T | 0 |
Therefore,
$$
\begin{array}{left}
\frac{\partial{y(\alpha,h)}}{\partial{\alpha_j}} &=& T \text{ if } \alpha_j = F\, \& \, y_{-j} = F \, \& \, h_j = T \\
&=& 0 \text{ o.w } \\
\end{array}
$$
It can be implemented with Heavyside function as follows:
$$
\begin{array}{left}
\frac{\partial{y(\alpha,h)}}{\partial{h_i}} &=& H(\alpha_j)H(-y_{-j})H(h_{j})
\end{array}
$$
where $\epsilon$ is a small positive constant.
Finally, we can put it all together
## BNN Layer
A BNN Layer is a Boolean Neural Network that maps an N-dimensional input to an K-dimension output, with a hyper parameter $H$ that controls the layer complexity, resulting in a total of $H(N+K)$ learnable Boolean weights.
$$
\begin{array}{left}
y_{k} &=& \uplus_{j=1}^{m} \alpha_{j}^{k} \odot \left( h_j \equiv \land_{i=1}^{n} x_i \oplus w_i^j \right)
\end{array}
$$
where $n=1,\dots,N$, $j=1,\dots,H$ $k=1,\dots,K$.
We can revisit the example again and see that the following weights will realize the respective Truth Values. To model $y_1 = h_1 \lor h_2$, we need
$$
\begin{array}{left}
\alpha_{j}^1 &=& T \text{ for } j = 1,2 \text{ and } F \text{ o.w}\\
w_{i}^1 &=& T \text{ for } j \in {1,2,3} \text{ corresponding to } h_1 \\
w_{i}^2 &=& T \text{ for } j \in {1,2} \text{ and } F \text{ o.w corresponding to } h_2
\end{array}
$$