Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bi-Directional Block Self-Attention for fast and memory-efficient sequence modeling #2

Open
flrngel opened this issue Feb 6, 2018 · 0 comments

Comments

@flrngel
Copy link
Owner

flrngel commented Feb 6, 2018

Paper at ICLR2018 https://openreview.net/forum?id=H1cWzoxA-
aka BiBloSAN (Bi-BloSAN)

Abstract

  • CNN focuses local dependency
  • Bi-BloSAN achieves better efficiency-memory trade-off than RNN/CNN/SAN models

1. Introduction

  • RNN has problem for parallel processing
  • CNN number growth by position distance
    • Convolution Seq2Seq: Linearly
    • ByteNet: Logarithmically
  • Bi-BloSAN succeeds,
    • Transformer
    • DiSAN (Directional Self-Attention Network)
      • forward/backward mask
      • using feature level
  • SAN(Self-Attention Network) has memory problem and Bi-BloSAN solves it
  • Core idea of Bi-BloSAN is processing tokens after splitting sequence to same length
  • Bi-BloSAN benefits: memory-efficient, fast (intra -> inter block makes this possible)

2. Background

2.2 Vanilla Attention and Multi-dimensional Attention

  • Additive attention is better than Multiplicative(dot-product) attention for
    • memory, time, computing performance

2.3 Two types of Self-attention

  • token2token
    • x_i, x_j uses same x
  • source2token
    • token to entire sentence

2.4 Masked Self-attention

  • (Shen et al., 2017) used mask to make one way direction
  • M is for masking, uses (forward)
    • 0; if i<j
    • -inf; else

3. Proposed Model

m-BloSA

image

Bi-BloSAN

image

4. Experiments

Terminology

  • x: input sequence
  • q: query
  • i: position from sequence
  • P: feature importance with evidence
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant