Added block-oriented network, ReLU activation function and hinge loss

The BlockNet class is based on the earlier SimpleNet class, but has been designed to better integrate groupwise dropout during training. The rectified linear activation function was added to those already available, and the standard svm-like hinge loss was added. Test files for the new blocky net were also added.
Philip-Bachman · Jan 15, 2013 · eabd463 · eabd463
1 parent 133c593
commit eabd463
Show file tree

Hide file tree

Showing 8 changed files with 659 additions and 11 deletions.
diff --git a/ActFunc.m b/ActFunc.m
@@ -30,6 +30,8 @@
                 case 4
                     acts = ActFunc.logexp_ff(pre_values, pre_weights);
                 case 5
+                    acts = ActFunc.relu_ff(pre_values, pre_weights);
+                case 6
                     acts = ActFunc.softmax_ff(pre_values, pre_weights);
                 otherwise
                     error('No valid activation function type selected.');
@@ -54,6 +56,9 @@
                     node_grads = ActFunc.logexp_bp(...
                         post_grads, post_weights, pre_values, pre_weights);
                 case 5
+                    node_grads = ActFunc.relu_bp(...
+                        post_grads, post_weights, pre_values, pre_weights);
+                case 6
                     node_grads = ActFunc.softmax_bp(...
                         post_grads, post_weights, pre_values, pre_weights);
                 otherwise
@@ -192,6 +197,37 @@
             return
         end
 
+        function [ cur_acts ] = relu_ff(pre_acts, pre_weights)
+            % Compute simple rectified linear activation function.
+            %
+            % Parameters:
+            %   pre_acts: previous layer activations (obs_count x pre_dim)
+            %   pre_weights: weights from pre -> cur (pre_dim x cur_dim)
+            % Outputs:
+            %   cur_acts: activations at current layer (obs_count x cur_dim)
+            %
+            cur_acts = max(0, pre_acts * pre_weights);
+            return
+        end
+
+        function [ cur_grads ] = ...
+                relu_bp(post_grads, post_weights, pre_acts, pre_weights)
+            % Compute the gradient for each node in the current layer given
+            % the gradients in post_grads for nodes at the next layer.
+            % 
+            % Parameters:
+            %   post_grads: grads at next layer (obs_dim x post_dim)
+            %   post_weights: weights from current to post (cur_dim x post_dim)
+            %   pre_acts: activations at previous layer (obs_dim x pre_dim)
+            %   pre_weights: weights from prev to current (pre_dim x cur_dim)
+            % Outputs:
+            %   cur_grads: gradients at current layer (obs_dim x cur_dim)
+            %
+            nz_acts = (pre_acts * pre_weights) > 0;
+            cur_grads = (post_grads * post_weights') .* nz_acts;
+            return
+        end
+
         function [ cur_acts ] = softmax_ff(pre_acts, pre_weights)
             % Compute simple softmax activation function where each row in the
             % matrix (pre_acts * pre_weights) is "softmaxed".