Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fma relu combination for convolution-output #31

Closed
wants to merge 8 commits into from

Conversation

Jokeren
Copy link
Contributor

@Jokeren Jokeren commented Oct 17, 2016

Thanks for your review!

@Maratyszcza
Copy link
Owner

Thanks Keren. Looks good, but needs few changes:

  1. Please use activation parameter of enum type rather than a bool. In the future, we may want to support other activation types, not just ReLU. I.e.
enum nnp_activation {
    nnp_activation_identity = 0,
    nnp_activation_relu = 1,
};

enum nnp_status nnp_convolution_output(
    enum nnp_convolution_algorithm algorithm,
    size_t batch_size,
    size_t input_channels,
    size_t output_channels,
    struct nnp_size input_size,
    struct nnp_padding input_padding,
    struct nnp_size kernel_size,
    const float input[],
    const float kernel[],
    const float bias[],
    float output[],
    enum nnp_activation activation,
    pthreadpool_t threadpool,
    struct nnp_profile* profile);
  1. Please generate separate FFT functions with ReLU, e.g. nnp_ifft8x8_with_bias_with_relu__avx2 instead of argument for inverse FFT functions. Also, not that if you have arg_relu parameter, if arg_relu: statement does not generate assembly instructions to check if arg_relu is non-zero, rather it checks if the Python variable arg_relu evaluates to True.
  2. Please use tabs for indentation in C/C++ sources.

@@ -509,6 +514,8 @@ def inverse_vfft(reg_t0, reg_t8, reg_t_stride, data_in, reg_row_start=None, reg_
elif reg_row_end:
CMP(reg_row_end, row_lo)
JBE(store_data.end)
if relu:
VBLENDVPS(ymm_data_lo, ymm_data_lo, ymm_zero, ymm_data_lo)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is preferable to use VMAXPS(ymm_data_lo, ymm_data_lo, ymm_zero) for performance reasons (VBLENDPS may generate multiple microoperations)

@@ -499,6 +500,10 @@ def inverse_vfft(reg_t0, reg_t8, reg_t_stride, data_in, reg_row_start=None, reg_
negate_b=fft8_negate_b.get(id(data_hi), False),
writeback=False)

if relu:
ymm_zero = YMMRegister()
VMOVAPS(ymm_zero, Constant.uint32x8(0))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use negative signed zero, i.e. Constant.float32x8(-0.0)

Copy link
Contributor Author

@Jokeren Jokeren Oct 22, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please tell me the reason for using negative signed zero? Or propose a simple example?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for the backward pass. The backward pass needs to know which values were positive/negative before we applied ReLU. Using negative zero ensures that the sign of the convolution results doesn't change after we apply ReLU. See discussion in #24 for why its important.

@Jokeren
Copy link
Contributor Author

Jokeren commented Oct 18, 2016

Sorry that I set tab=8 space previously so that the style is not consistent with yours.

I have changed the format, and it looks just fine.

I will fix other functions later.

Regarding the psimd implementations, I do not have a mac machine. Then, how to test them after adding the relu function?

@Maratyszcza
Copy link
Owner

You don't need a Mac to test psimd functions, any Linux machine with Clang would work. Just do python configure.py --enable-psimd

@Jokeren
Copy link
Contributor Author

Jokeren commented Oct 22, 2016

Hi, @Maratyszcza !

Please view my latest commits to see whether the structures and format meet your requirement.

A problem about testing is proposed in the issues.

@@ -94,7 +94,9 @@ double benchmark_vgg(
for (size_t layer_index = 0; layer_index < layers_count; layer_index++) {
switch (layers[layer_index].type) {
case layer_type_convolutional:
status = nnp_convolution_output(nnp_convolution_algorithm_auto,
status = nnp_convolution_output(
nnp_activation_identity,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the order correct? In include/nnpack.h activation is the second argument

output_transform_function = nnp_hwinfo.transforms.ifft8x8_with_bias;
break;
default:
goto cleanup;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nnp_convolution_output should return an error code if activation has unknown value. I suggest to check activation inside validate_convolution_arguments. Then in these switch statements you write NNP_UNREACHABLE; to indicate that this case never happens. Compiler will use it for optimization.

@Maratyszcza
Copy link
Owner

Looks good. Once the tests are working I will merge.

@Jokeren
Copy link
Contributor Author

Jokeren commented Oct 24, 2016

Hi, @Maratyszcza , all tests are working!

I am sorry that my former commit logs are not in correct format.

I used [-0.1, 1] uniform distribution for convolution tests.

@bhack
Copy link

bhack commented Feb 11, 2017

Any news?

@Maratyszcza
Copy link
Owner

@bhack This PR is ready to merge, but first NNPACK is moving to a new configuration system.

@Jokeren
Copy link
Contributor Author

Jokeren commented Feb 11, 2017

Anything I can help? @Maratyszcza

@Maratyszcza
Copy link
Owner

@Jokeren There are reports of a bug in nnp_fully_connected_output #43 #44. Could you take a look?

@Maratyszcza
Copy link
Owner

Manually rebased, merged, and committed as 4dbf75d

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants