-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update modjo import script #33
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Dust has recently changed the endpoint for upserting documents to the datasource. This commit uses the new endpoint. See: https://docs.dust.tt/reference/post_api-v1-w-wid-vaults-vid-data-sources-dsid-documents-documentid
The first time I ran the script, I didn't realize the `TRANSCRIPTS_SINCE` configuration was a constant set directly in the script.
Previously, the `TRANSCRIPTS_SINCE` settings was set directly in the script. This is not convenient in case you want to schedule this script on a regular basis. This commit makes the `TRANSCRIPTS_SINCE` setting configurable via an environment variable, while preserving the initial behavior.
…esterday Previously, if `TRANSCRIPTS_SINCE` was not set, the default behavior was to fetch all transcripts starting from 2024-01-01. This commit changes the default behavior to fetch transcripts starting from yesterday. Although this is a breaking change, it is reasonable to think this won't hurt anyone.
That would allow Dust to link to a specific transcript in Modjo if needed.
Previously, the `AI Summary` field was not correctly rendered in the Dust document. It was displaying: ``` AI Summary: [object Object] ``` Now it displays the actual `content` of the AI Summary.
The 'AI Summary' field is some multiline markdown text. All 'Speakers', 'AI Summary' and 'Transcript' sections are multiline blocks, so I'm using markdown section headers to denote them.
The 'AI Summary' field is deprecated, in favor of the 'Highlights' field. See: https://api.modjo.ai/v1#tag/calls/operation/export-calls
In some cases, we might not want to ingest contact details in Dust documents. This commit adds an option for this. If not set, the default behavior is to include contact details, so we preserve the previous behavior.
This gives the user the option to skip ingesting the recording URL.
albandum
approved these changes
Oct 23, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx for the upgrades 🙏
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What
We wanted to import Modjo transcripts in Dust and saw this existing script. However, it didn't entirely fit our needs, so we made a few improvements.
This PR includes small fixes:
Along with additional features:
Notes for reviewers
@albandum, the
TRANSCRIPTS_SINCE
was hardcoded within the script itself, which wasn't convenient if we wanted to tweak its value in the context of a scheduled job.We made it configurable through environment variables, and introduced a small breaking change: by default, the script would ingest every transcript since yesterday, not since
2024-01-01
.I guess it is a matter of preference, so I'll understand if you prefer to keep the existing behavior: I can amend the PR.