fma relu combination for convolution-output #31

Jokeren · 2016-10-17T17:41:11Z

Thanks for your review!

Maratyszcza · 2016-10-17T18:05:42Z

Thanks Keren. Looks good, but needs few changes:

Please use activation parameter of enum type rather than a bool. In the future, we may want to support other activation types, not just ReLU. I.e.

enum nnp_activation {
    nnp_activation_identity = 0,
    nnp_activation_relu = 1,
};

enum nnp_status nnp_convolution_output(
    enum nnp_convolution_algorithm algorithm,
    size_t batch_size,
    size_t input_channels,
    size_t output_channels,
    struct nnp_size input_size,
    struct nnp_padding input_padding,
    struct nnp_size kernel_size,
    const float input[],
    const float kernel[],
    const float bias[],
    float output[],
    enum nnp_activation activation,
    pthreadpool_t threadpool,
    struct nnp_profile* profile);

Please generate separate FFT functions with ReLU, e.g. nnp_ifft8x8_with_bias_with_relu__avx2 instead of argument for inverse FFT functions. Also, not that if you have arg_relu parameter, if arg_relu: statement does not generate assembly instructions to check if arg_relu is non-zero, rather it checks if the Python variable arg_relu evaluates to True.
Please use tabs for indentation in C/C++ sources.

Maratyszcza · 2016-10-17T22:26:51Z

src/x86_64-fma/fft16x16.py

@@ -509,6 +514,8 @@ def inverse_vfft(reg_t0, reg_t8, reg_t_stride, data_in, reg_row_start=None, reg_
                elif reg_row_end:
                    CMP(reg_row_end, row_lo)
                    JBE(store_data.end)
+                if relu:
+                    VBLENDVPS(ymm_data_lo, ymm_data_lo, ymm_zero, ymm_data_lo)


It is preferable to use VMAXPS(ymm_data_lo, ymm_data_lo, ymm_zero) for performance reasons (VBLENDPS may generate multiple microoperations)

Maratyszcza · 2016-10-17T22:28:16Z

src/x86_64-fma/fft16x16.py

@@ -499,6 +500,10 @@ def inverse_vfft(reg_t0, reg_t8, reg_t_stride, data_in, reg_row_start=None, reg_
                    negate_b=fft8_negate_b.get(id(data_hi), False),
                    writeback=False)

+            if relu:
+                ymm_zero = YMMRegister()
+                VMOVAPS(ymm_zero, Constant.uint32x8(0))


Please use negative signed zero, i.e. Constant.float32x8(-0.0)

Could you please tell me the reason for using negative signed zero? Or propose a simple example?

This is for the backward pass. The backward pass needs to know which values were positive/negative before we applied ReLU. Using negative zero ensures that the sign of the convolution results doesn't change after we apply ReLU. See discussion in #24 for why its important.

Jokeren · 2016-10-18T02:30:59Z

Sorry that I set tab=8 space previously so that the style is not consistent with yours.

I have changed the format, and it looks just fine.

I will fix other functions later.

Regarding the psimd implementations, I do not have a mac machine. Then, how to test them after adding the relu function?

Maratyszcza · 2016-10-18T14:58:21Z

You don't need a Mac to test psimd functions, any Linux machine with Clang would work. Just do python configure.py --enable-psimd

Jokeren · 2016-10-22T13:38:17Z

Hi, @Maratyszcza !

Please view my latest commits to see whether the structures and format meet your requirement.

A problem about testing is proposed in the issues.

Maratyszcza · 2016-10-22T23:07:06Z

bench/vgg.c

@@ -94,7 +94,9 @@ double benchmark_vgg(
 				for (size_t layer_index = 0; layer_index < layers_count; layer_index++) {
 					switch (layers[layer_index].type) {
 						case layer_type_convolutional:
-							status = nnp_convolution_output(nnp_convolution_algorithm_auto,
+							status = nnp_convolution_output(
+								nnp_activation_identity,


Is the order correct? In include/nnpack.h activation is the second argument

Maratyszcza · 2016-10-22T23:11:01Z

src/convolution-output.c

+					output_transform_function = nnp_hwinfo.transforms.ifft8x8_with_bias;
+					break;
+				default:
+					goto cleanup;


nnp_convolution_output should return an error code if activation has unknown value. I suggest to check activation inside validate_convolution_arguments. Then in these switch statements you write NNP_UNREACHABLE; to indicate that this case never happens. Compiler will use it for optimization.

Maratyszcza · 2016-10-22T23:24:03Z

Looks good. Once the tests are working I will merge.

… and output

Jokeren · 2016-10-24T18:00:14Z

Hi, @Maratyszcza , all tests are working!

I am sorry that my former commit logs are not in correct format.

I used [-0.1, 1] uniform distribution for convolution tests.

bhack · 2017-02-11T09:14:40Z

Any news?

Maratyszcza · 2017-02-11T09:58:25Z

@bhack This PR is ready to merge, but first NNPACK is moving to a new configuration system.

Jokeren · 2017-02-11T11:22:34Z

Anything I can help? @Maratyszcza

Maratyszcza · 2017-02-11T20:03:45Z

@Jokeren There are reports of a bug in nnp_fully_connected_output #43 #44. Could you take a look?

Maratyszcza · 2017-02-22T19:36:26Z

Manually rebased, merged, and committed as 4dbf75d

fma relu combination for convolution-output

bdc4619

Maratyszcza reviewed Oct 17, 2016

View reviewed changes

code sytle format

267ae02

Jokeren added 2 commits October 22, 2016 15:15

merge relu to activation struct

cdeaac7

relu inference

214a66f

Maratyszcza reviewed Oct 22, 2016

View reviewed changes

Jokeren added 4 commits October 25, 2016 00:34

bug fix and test pass

51922c5

Test: use (-0.1, 1.0) uniform distribution from convolution inference…

01f5e59

… and output

Configure: fix vgg-a test name

f68d944

Test: modify errorLimit of vgg-a to pass the test

2731511

Maratyszcza closed this Feb 22, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fma relu combination for convolution-output #31

fma relu combination for convolution-output #31

Jokeren commented Oct 17, 2016

Maratyszcza commented Oct 17, 2016

Maratyszcza Oct 17, 2016

Maratyszcza Oct 17, 2016

Jokeren Oct 22, 2016 •

edited

Loading

Maratyszcza Oct 22, 2016

Jokeren commented Oct 18, 2016

Maratyszcza commented Oct 18, 2016

Jokeren commented Oct 22, 2016

Maratyszcza Oct 22, 2016

Maratyszcza Oct 22, 2016

Maratyszcza commented Oct 22, 2016

Jokeren commented Oct 24, 2016 •

edited

Loading

bhack commented Feb 11, 2017

Maratyszcza commented Feb 11, 2017

Jokeren commented Feb 11, 2017

Maratyszcza commented Feb 11, 2017

Maratyszcza commented Feb 22, 2017

fma relu combination for convolution-output #31

fma relu combination for convolution-output #31

Conversation

Jokeren commented Oct 17, 2016

Maratyszcza commented Oct 17, 2016

Maratyszcza Oct 17, 2016

Choose a reason for hiding this comment

Maratyszcza Oct 17, 2016

Choose a reason for hiding this comment

Jokeren Oct 22, 2016 • edited Loading

Choose a reason for hiding this comment

Maratyszcza Oct 22, 2016

Choose a reason for hiding this comment

Jokeren commented Oct 18, 2016

Maratyszcza commented Oct 18, 2016

Jokeren commented Oct 22, 2016

Maratyszcza Oct 22, 2016

Choose a reason for hiding this comment

Maratyszcza Oct 22, 2016

Choose a reason for hiding this comment

Maratyszcza commented Oct 22, 2016

Jokeren commented Oct 24, 2016 • edited Loading

bhack commented Feb 11, 2017

Maratyszcza commented Feb 11, 2017

Jokeren commented Feb 11, 2017

Maratyszcza commented Feb 11, 2017

Maratyszcza commented Feb 22, 2017

Jokeren Oct 22, 2016 •

edited

Loading

Jokeren commented Oct 24, 2016 •

edited

Loading