Skip to content
This repository has been archived by the owner on Nov 3, 2023. It is now read-only.

Friends Dataset Teacher Add speakers field and flag to exclude speaker labels in text #4693

Merged
merged 4 commits into from
Aug 1, 2022

Conversation

chiehminwei
Copy link
Contributor

Patch Description
This patch adds a speakers field in the message generated from the teacher for the Friends dataset.

It also adds an option to exclude speaker ids from the text. This option is named --include-speaker-in-context. The original --include-speaker-in-context flag, which determines whether speaker labels are added at the end of text, is renamed to --add-speaker-to-context-end.

Together, these changes make it convenient to feed the data into downstream models such as BlenderBot2 that require cleaned text input without speaker labels, but need to restore the speaker labels later.

Sample Output
parlai dd -t friends -n 2 --verbose --include-speaker-in-context True --add-speaker-to-context-end True
This is the default behavior.
image

parlai dd -t friends -n 2 --verbose --include-speaker-in-context True --add-speaker-to-context-end False
image

parlai dd -t friends -n 2 --verbose --include-speaker-in-context False --add-speaker-to-context-end True
Notice the empty line which marks an empty sentence, and notice how the current speaker label is also added to the speakers field.
image

parlai dd -t friends -n 2 --verbose --include-speaker-in-context False --add-speaker-to-context-end False
image

@chiehminwei
Copy link
Contributor Author

I took a look at the failing tests, and they don't seem to be related to this PR.
Something else on the main branch is broken.
For the gpu tests it's the same old sqlite3.OperationalError: database or disk is full.
For the teacher tests it's the parlai/tasks/multiwoz_v22 teacher that's failing.

text, label, speakers, hasAddedSpeaker = self._get_message_fields(
text, speaker, speakers, prev_context
)
_speakers = speakers[:]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious, why do you do this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pass by reference vs pass by value
If I don't do this then every example becomes the same as the last example, containing all the speakers in the entire episode (I ran into this bug)

Copy link
Contributor

@mojtaba-komeili mojtaba-komeili left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@chiehminwei chiehminwei merged commit 982acb5 into main Aug 1, 2022
@chiehminwei chiehminwei deleted the friends_speaker_labels branch August 1, 2022 20:50
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants