Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is "Temporal Attention" as used alongside the RNN in your paper? #52

Open
argadewanata opened this issue Jun 20, 2024 · 0 comments

Comments

@argadewanata
Copy link

I read your paper titled "OpenHands: Making Sign Language Recognition Accessible with Pose-based Pretrained Models across Languages" on https://arxiv.org/abs/2110.05877.

On page 4, you state:
"For the RNN model, we use a 4-layered bidirectional LSTM with a hidden layer dimension of 128, which takes as input the frame-wise pose representation of 27 keypoints with 2 coordinates each, resulting in a vector of 54 points per frame. We also use a temporal attention layer to weight the most effective frames for classification."

However, I couldn't find a definition of "temporal attention" as used in your method. Could you please explain it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant