-
Notifications
You must be signed in to change notification settings - Fork 282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
attention_layer application #1241
Comments
I have a similar problem, which I cannot get to work. I wanted to use an attention layer without the complication of lstm. The code below is designed to work for an input matrix of tokens (derived from text), categoried as 'positive', 'negative' or 'neutral' (coded as 0, 1 and 2). The dummy random data has 1000 instances (rows), each with 10 tokens (columns). Each row has a label 0 or 1 or 2. The training/test split is 800:200. The problem is that although the model builds, it reports an output shape (None, 64) (64 is the number of units in the first dense layer). Therefore it fails at the fitting stage. I cannot get the final dense layer to output the required (batchsize, 3) tensor. Can anybody help, please. Many thanks. set.seed(1234) #Convert labels to categorical variables Split data into training and testing setstrain_set <- list(features[1:800, ], labels_onehot[1:800, ]) Convert data to tensorX_train <- tf$convert_to_tensor(train_set[[1]]) reshapeX_train <- tf$reshape(X_train, as_tensor(shape( 800L, 10L, 1L))) Build the modelmodel = keras_model_sequential() model %>% compile( fit the modelhistory <- model %>% fit( ################################################ Layer (type) Output Shape Param #dense_6 (Dense) (None, 64) 704Total params: 704 ValueError: Shapes (32, 3) and (32, 64) are incompatible ── R Traceback ─────────────────────────────────────────────────────────────────────────────────────────────
|
Thank you! I was kinda on a long way switching to python and pytorch. They do the work well with this kind of tasks. I am using a multihead Attention's Transformer multidimensional timeseries classification, where the embeddings are the timeseries values (no tokens, no encoder). Sorry for an off-topic, but in case you would be interested in such a model, I have placed it here for you: kaggle Best regards, |
I was hardly trying to find an example of layer_attention for Keras in the Rstudio's library, didn't find one.
I got the net working in two fashions:
LSTM(return_sequences = T) -> Attention -> LSTM(return_sequences = F)
;LSTM(return_sequences = T) -> Attention -> Flatten
before the dense layers. Note in my code thatlayer_flatten
is commented; you could comment the second layer_lstm instead. Both approaches output 1D tensor, which at least seems to be fitting the expected dimensionality of the NN output.What is a correct way, or more sensible? I am not very experienced in this field...
The text was updated successfully, but these errors were encountered: