The repository contains a Sequence-to-Sequence model, which can automatically generate captions from images.
The solution architecture consists of:
- CNN encoder, which encodes the images into the embedded feature vectors:
- Decoder, which is a sequential neural network consisting of LSTM units, which translates the feature vector into a sequence of tokens:
The Project has been reviewed by Udacity and graded. Meets Specifications against the following rubric.