attention_layer application #1241

alexmosc · 2021-07-07T13:10:58Z

I was hardly trying to find an example of layer_attention for Keras in the Rstudio's library, didn't find one.

I got the net working in two fashions: LSTM(return_sequences = T) -> Attention -> LSTM(return_sequences = F); LSTM(return_sequences = T) -> Attention -> Flatten before the dense layers. Note in my code that layer_flatten is commented; you could comment the second layer_lstm instead. Both approaches output 1D tensor, which at least seems to be fitting the expected dimensionality of the NN output.

What is a correct way, or more sensible? I am not very experienced in this field...

library(keras)

rm(nn_model)

lstm_units <- 16L
lstm_seq_len <- 4L 
nfeatures <- 2L; final_diffs <- 1:3

inputs <- 
  layer_input(shape = list(lstm_seq_len, nfeatures))

lstm_output <- 
  inputs %>% 
  layer_lstm(
    input_shape = list(lstm_seq_len, nfeatures)
    , units = lstm_units
    , activation = 'relu'
    , return_sequences = T
    , stateful = F
    , name = 'lstm1'
  )

predictions <-
  layer_attention(
    inputs = list(lstm_output, lstm_output),
    use_scale = FALSE,
    causal = FALSE,
    batch_size = NULL,
    dtype = NULL,
    name = 'attention',
    trainable = T,
    weights = NULL
  ) %>%
  layer_lstm(
    input_shape = list(lstm_seq_len, nfeatures)
    , units = lstm_units
    , activation = 'relu'
    , return_sequences = F
    , stateful = F
    , name = 'lstm2'
  ) %>%
  #layer_flatten %>%
  layer_dense(units = 64L, activation = NULL, name = 'dense1') %>% 
  layer_batch_normalization(name = 'bn1') %>%  
  layer_activation(activation = "relu", name = 'act1') %>%  
  layer_dense(units = 32L, activation = NULL, name = 'dense2') %>% 
  layer_batch_normalization(name = 'bn2') %>%  
  layer_activation(activation = "relu", name = 'act2') %>%  
  layer_dense(units = length(final_diffs), activation = 'softmax', name = 'dense3')

optimizer <- 
  optimizer_adam(lr = 1e-5)

nn_model <- 
  keras_model(inputs = inputs, outputs = predictions)

nn_model %>% 
  keras::compile(
    optimizer = optimizer,
    loss = 'categorical_crossentropy',
    metrics = 'categorical_accuracy'
  )

summary(nn_model)

predict(nn_model, array(runif(lstm_seq_len * nfeatures, 0, 1), dim = c(1, lstm_seq_len, nfeatures)))

The text was updated successfully, but these errors were encountered:

PM-42 · 2025-01-27T10:38:34Z

I have a similar problem, which I cannot get to work. I wanted to use an attention layer without the complication of lstm. The code below is designed to work for an input matrix of tokens (derived from text), categoried as 'positive', 'negative' or 'neutral' (coded as 0, 1 and 2). The dummy random data has 1000 instances (rows), each with 10 tokens (columns). Each row has a label 0 or 1 or 2. The training/test split is 800:200. The problem is that although the model builds, it reports an output shape (None, 64) (64 is the number of units in the first dense layer). Therefore it fails at the fitting stage. I cannot get the final dense layer to output the required (batchsize, 3) tensor. Can anybody help, please. Many thanks.

set.seed(1234)
n_samples <- 1000
features <- matrix(rnorm(n_samples * 10), nrow = n_samples, ncol = 10)
labels <- sample(0:2, size = n_samples, replace = TRUE)

#Convert labels to categorical variables
labels_categorical <- as.integer(labels)
labels_onehot <- to_categorical(labels)
labelsTrain <- labels_categorical[1:800]
labelsTest <- labels_categorical[801:1000]

Split data into training and testing sets

train_set <- list(features[1:800, ], labels_onehot[1:800, ])
test_set <- list(features[801:1000, ], labels_onehot[801:1000, ])

Convert data to tensor

X_train <- tf$convert_to_tensor(train_set[[1]])
Y_train <- tf$convert_to_tensor(train_set[[2]])
X_test <- tf$convert_to_tensor(test_set[[1]])
Y_test <- tf$convert_to_tensor(test_set[[2]])

reshape

X_train <- tf$reshape(X_train, as_tensor(shape( 800L, 10L, 1L)))
X_test <- tf$reshape(X_test, as_tensor(shape( 200L, 10L, 1L)))

Build the model

model = keras_model_sequential()
model %>%
layer_dense(units=64, activation='relu', input_shape = 10) %>%
layer_attention(inputs = list(X_train, X_train)) %>%
layer_flatten() %>%
#tf$reshape(tf$shape(list()) + c(NULL, 3))
layer_dense(units=3)

model %>% compile(
optimizer = optimizer_rmsprop(learning_rate = 0.01),
metrics = "accuracy",
loss = "categorical_crossentropy"
)
summary(model)

fit the model

history <- model %>% fit(
X_train, Y_train,
batch_size = 32,
epochs = 25,
verbose = 1
)

################################################
These are the outputs
Model: "sequential_3"

Layer (type) Output Shape Param #

dense_6 (Dense) (None, 64) 704

Total params: 704
Trainable params: 704
Non-trainable params: 0

ValueError: Shapes (32, 3) and (32, 64) are incompatible

── R Traceback ─────────────────────────────────────────────────────────────────────────────────────────────
▆

├─model %>% ...
├─generics::fit(...)
└─keras:::fit.keras.engine.training.Model(...)
├─base::do.call(object$fit, args)
└─reticulate (local) <python.builtin.method>(...)

└─reticulate:::py_call_impl(callable, call_args$unnamed, call_args$named)

alexmosc · 2025-01-27T11:20:22Z

Thank you!

I was kinda on a long way switching to python and pytorch. They do the work well with this kind of tasks. I am using a multihead Attention's Transformer multidimensional timeseries classification, where the embeddings are the timeseries values (no tokens, no encoder).

Sorry for an off-topic, but in case you would be interested in such a model, I have placed it here for you: kaggle

Best regards,
Alexey

t-kalinowski added the documentation label Sep 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

attention_layer application #1241

attention_layer application #1241

alexmosc commented Jul 7, 2021

PM-42 commented Jan 27, 2025

alexmosc commented Jan 27, 2025 •

edited

Loading

attention_layer application #1241

attention_layer application #1241

Comments

alexmosc commented Jul 7, 2021

PM-42 commented Jan 27, 2025

Split data into training and testing sets

Convert data to tensor

reshape

Build the model

fit the model

Layer (type) Output Shape Param #

dense_6 (Dense) (None, 64) 704

alexmosc commented Jan 27, 2025 • edited Loading

alexmosc commented Jan 27, 2025 •

edited

Loading