Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

attention_layer application #1241

Open
alexmosc opened this issue Jul 7, 2021 · 2 comments
Open

attention_layer application #1241

alexmosc opened this issue Jul 7, 2021 · 2 comments

Comments

@alexmosc
Copy link

alexmosc commented Jul 7, 2021

I was hardly trying to find an example of layer_attention for Keras in the Rstudio's library, didn't find one.

I got the net working in two fashions: LSTM(return_sequences = T) -> Attention -> LSTM(return_sequences = F); LSTM(return_sequences = T) -> Attention -> Flatten before the dense layers. Note in my code that layer_flatten is commented; you could comment the second layer_lstm instead. Both approaches output 1D tensor, which at least seems to be fitting the expected dimensionality of the NN output.

What is a correct way, or more sensible? I am not very experienced in this field...

library(keras)

rm(nn_model)

lstm_units <- 16L
lstm_seq_len <- 4L 
nfeatures <- 2L; final_diffs <- 1:3

inputs <- 
  layer_input(shape = list(lstm_seq_len, nfeatures))

lstm_output <- 
  inputs %>% 
  layer_lstm(
    input_shape = list(lstm_seq_len, nfeatures)
    , units = lstm_units
    , activation = 'relu'
    , return_sequences = T
    , stateful = F
    , name = 'lstm1'
  )

predictions <-
  layer_attention(
    inputs = list(lstm_output, lstm_output),
    use_scale = FALSE,
    causal = FALSE,
    batch_size = NULL,
    dtype = NULL,
    name = 'attention',
    trainable = T,
    weights = NULL
  ) %>%
  layer_lstm(
    input_shape = list(lstm_seq_len, nfeatures)
    , units = lstm_units
    , activation = 'relu'
    , return_sequences = F
    , stateful = F
    , name = 'lstm2'
  ) %>%
  #layer_flatten %>%
  layer_dense(units = 64L, activation = NULL, name = 'dense1') %>% 
  layer_batch_normalization(name = 'bn1') %>%  
  layer_activation(activation = "relu", name = 'act1') %>%  
  layer_dense(units = 32L, activation = NULL, name = 'dense2') %>% 
  layer_batch_normalization(name = 'bn2') %>%  
  layer_activation(activation = "relu", name = 'act2') %>%  
  layer_dense(units = length(final_diffs), activation = 'softmax', name = 'dense3')

optimizer <- 
  optimizer_adam(lr = 1e-5)

nn_model <- 
  keras_model(inputs = inputs, outputs = predictions)

nn_model %>% 
  keras::compile(
    optimizer = optimizer,
    loss = 'categorical_crossentropy',
    metrics = 'categorical_accuracy'
  )

summary(nn_model)

predict(nn_model, array(runif(lstm_seq_len * nfeatures, 0, 1), dim = c(1, lstm_seq_len, nfeatures)))

@PM-42
Copy link

PM-42 commented Jan 27, 2025

I have a similar problem, which I cannot get to work. I wanted to use an attention layer without the complication of lstm. The code below is designed to work for an input matrix of tokens (derived from text), categoried as 'positive', 'negative' or 'neutral' (coded as 0, 1 and 2). The dummy random data has 1000 instances (rows), each with 10 tokens (columns). Each row has a label 0 or 1 or 2. The training/test split is 800:200. The problem is that although the model builds, it reports an output shape (None, 64) (64 is the number of units in the first dense layer). Therefore it fails at the fitting stage. I cannot get the final dense layer to output the required (batchsize, 3) tensor. Can anybody help, please. Many thanks.

set.seed(1234)
n_samples <- 1000
features <- matrix(rnorm(n_samples * 10), nrow = n_samples, ncol = 10)
labels <- sample(0:2, size = n_samples, replace = TRUE)

#Convert labels to categorical variables
labels_categorical <- as.integer(labels)
labels_onehot <- to_categorical(labels)
labelsTrain <- labels_categorical[1:800]
labelsTest <- labels_categorical[801:1000]

Split data into training and testing sets

train_set <- list(features[1:800, ], labels_onehot[1:800, ])
test_set <- list(features[801:1000, ], labels_onehot[801:1000, ])

Convert data to tensor

X_train <- tf$convert_to_tensor(train_set[[1]])
Y_train <- tf$convert_to_tensor(train_set[[2]])
X_test <- tf$convert_to_tensor(test_set[[1]])
Y_test <- tf$convert_to_tensor(test_set[[2]])

reshape

X_train <- tf$reshape(X_train, as_tensor(shape( 800L, 10L, 1L)))
X_test <- tf$reshape(X_test, as_tensor(shape( 200L, 10L, 1L)))

Build the model

model = keras_model_sequential()
model %>%
layer_dense(units=64, activation='relu', input_shape = 10) %>%
layer_attention(inputs = list(X_train, X_train)) %>%
layer_flatten() %>%
#tf$reshape(tf$shape(list()) + c(NULL, 3))
layer_dense(units=3)

model %>% compile(
optimizer = optimizer_rmsprop(learning_rate = 0.01),
metrics = "accuracy",
loss = "categorical_crossentropy"
)
summary(model)

fit the model

history <- model %>% fit(
X_train, Y_train,
batch_size = 32,
epochs = 25,
verbose = 1
)

################################################
These are the outputs
Model: "sequential_3"


Layer (type) Output Shape Param #

dense_6 (Dense) (None, 64) 704

Total params: 704
Trainable params: 704
Non-trainable params: 0


ValueError: Shapes (32, 3) and (32, 64) are incompatible

── R Traceback ─────────────────────────────────────────────────────────────────────────────────────────────

  1. ├─model %>% ...
  2. ├─generics::fit(...)
  3. └─keras:::fit.keras.engine.training.Model(...)
  4. ├─base::do.call(object$fit, args)
  5. └─reticulate (local) <python.builtin.method>(...)
  6. └─reticulate:::py_call_impl(callable, call_args$unnamed, call_args$named)
    

@alexmosc
Copy link
Author

alexmosc commented Jan 27, 2025

Thank you!

I was kinda on a long way switching to python and pytorch. They do the work well with this kind of tasks. I am using a multihead Attention's Transformer multidimensional timeseries classification, where the embeddings are the timeseries values (no tokens, no encoder).

Sorry for an off-topic, but in case you would be interested in such a model, I have placed it here for you: kaggle

Best regards,
Alexey

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants