You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I read your paper titled "OpenHands: Making Sign Language Recognition Accessible with Pose-based Pretrained Models across Languages" on https://arxiv.org/abs/2110.05877.
On page 4, you state:
"For the RNN model, we use a 4-layered bidirectional LSTM with a hidden layer dimension of 128, which takes as input the frame-wise pose representation of 27 keypoints with 2 coordinates each, resulting in a vector of 54 points per frame. We also use a temporal attention layer to weight the most effective frames for classification."
However, I couldn't find a definition of "temporal attention" as used in your method. Could you please explain it?
The text was updated successfully, but these errors were encountered:
I read your paper titled "OpenHands: Making Sign Language Recognition Accessible with Pose-based Pretrained Models across Languages" on https://arxiv.org/abs/2110.05877.
On page 4, you state:
"For the RNN model, we use a 4-layered bidirectional LSTM with a hidden layer dimension of 128, which takes as input the frame-wise pose representation of 27 keypoints with 2 coordinates each, resulting in a vector of 54 points per frame. We also use a temporal attention layer to weight the most effective frames for classification."
However, I couldn't find a definition of "temporal attention" as used in your method. Could you please explain it?
The text was updated successfully, but these errors were encountered: