Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I get Error with CONFIG = {"generation_config": {"response_modalities": ["AUDIO","TEXT"]}} in gemini-2/live_api_starter.py #386

Open
abdul7235 opened this issue Dec 27, 2024 · 15 comments
Assignees
Labels
type:documentation The documentation needs to be updated

Comments

@abdul7235
Copy link

abdul7235 commented Dec 27, 2024

Description of the bug:

From the documentation:

https://ai.google.dev/api/multimodal-live

I believe I can get response in multiple modalilties, but running the above CONFIG in my code I get following error:

websockets.exceptions.ConnectionClosedError: received 1007 (invalid frame payload data) Request trace id: a0bb7a2dd8834b47, [ORIGINAL ERROR] generic::invalid_argument: Error in program Instantiation for language; then sent 1007 (invalid frame payload data) Request trace id: a0bb7a2dd8834b47, [ORIGINAL ERROR] generic::invalid_argument: Error in program Instantiation for language

Why am I getting this error?

Actual vs expected behavior:

Is it possible to have response in Audio and Text Simaltaneoulsy?

If yes, please help me sort it out.

If no, then for good's sake it must be mentioned clearly in the documentation!

Any other information you'd like to share?

I had appreciate if google and GCP had properly organized and clear documentation for their services, APIs and SDKs. It is such a horrible experience integrating google's services due to documentation being horribly scattered and vague.

@LarsDu
Copy link

LarsDu commented Dec 29, 2024

I'm getting the same error. It would be nice to get both text and audio at the same time. This is particularly useful for generating dialogues for things like games...

@Giom-V
Copy link
Collaborator

Giom-V commented Dec 30, 2024

Hello @abdul7235 and @LarsDu,

At the moment, multimodalities is not available publically. You can only get Audio using the live APIs, and text using the "classic" ones.

I'll see what can be done to make that clear in the documentation.

@Giom-V Giom-V added the type:documentation The documentation needs to be updated label Dec 30, 2024
@Giom-V Giom-V self-assigned this Dec 30, 2024
@abdul7235
Copy link
Author

@Giom-V

Will multimodal output be available in the coming months?

Also, could you guide me on how to gain access to the non-public multimodal API?

@kshitij01042002
Copy link

Is it possible to get the Audio and function calling? @Giom-V

@simix
Copy link

simix commented Jan 4, 2025

this is when you sent unsafe prompts
im looking how to block it , i tried to use this and doesnt work

self.safety_settings = [
{
"category": "HARM_CATEGORY_DANGEROUS",
"threshold": "BLOCK_NONE",
},
{
"category": "HARM_CATEGORY_HARASSMENT",
"threshold": "BLOCK_NONE",
},
{
"category": "HARM_CATEGORY_HATE_SPEECH",
"threshold": "BLOCK_NONE",
},
{
"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
"threshold": "BLOCK_NONE",
},
{
"category": "HARM_CATEGORY_DANGEROUS_CONTENT",
"threshold": "BLOCK_NONE",
},
{
"category": "HARM_CATEGORY_SEXUAL",
"threshold": "BLOCK_NONE",
}
]

@simix
Copy link

simix commented Jan 4, 2025

now this one works for me :

self.safety_settings = [
{
"category": "HARM_CATEGORY_HARASSMENT",
"threshold": "BLOCK_NONE"
},
{
"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
"threshold": "BLOCK_NONE"
},
{
"category": "HARM_CATEGORY_DANGEROUS_CONTENT",
"threshold": "BLOCK_NONE"
},
{
"category": "HARM_CATEGORY_HATE_SPEECH",
"threshold": "BLOCK_NONE"
}
],

@kshitij01042002
Copy link

kshitij01042002 commented Jan 6, 2025

I am getting this error, any idea why I might be getting this?

Error in Gemini session: received 1007 (invalid frame payload data) Request trace id: 4bdc******bb063, No matching per-key-config for API key_salt: 7448*****67170175; then sent 1007 (invalid frame payload data) Request trace id: 4bdc6000403bb063, No matching per-key-config for API key_salt: 74484*****170175

@Giom-V
Copy link
Collaborator

Giom-V commented Jan 6, 2025

Will multimodal output be available in the coming months?

Yes it will.

Also, could you guide me on how to gain access to the non-public multimodal API?

@abdul7235, sorry but this is not a progra; you can request to join. The only way is to be very active (like GDEs) and we will reach out to you.

@Giom-V
Copy link
Collaborator

Giom-V commented Jan 6, 2025

Is it possible to get the Audio and function calling? @Giom-V

@kshitij01042002 This notebook will show you how to use function calling with the live API: https://github.com/google-gemini/cookbook/blob/main/gemini-2/live_api_tool_use.ipynb

@Giom-V
Copy link
Collaborator

Giom-V commented Jan 6, 2025

I am getting this error, any idea why I might be getting this?

Error in Gemini session: received 1007 (invalid frame payload data) Request trace id: 4bdc******bb063, No matching per-key-config for API key_salt: 7448*****67170175; then sent 1007 (invalid frame payload data) Request trace id: 4bdc6000403bb063, No matching per-key-config for API key_salt: 74484*****170175

@kshitij01042002 I'm guesssing you're not using the right API key. It should start with "AIza..." which is not the case of the one you seem to be using.

You need to generate it on AI Studio as documented here.

@Giom-V
Copy link
Collaborator

Giom-V commented Jan 6, 2025

this is when you sent unsafe prompts
im looking how to block it , i tried to use this and doesnt work

@simix I think your mistake is that your were using HARM_CATEGORY_DANGEROUS instead of HARM_CATEGORY_DANGEROUS_CONTENT. Here's the related documentation for reference.

@abdul7235
Copy link
Author

@Giom-V I just need to re confirm that will I be able to get response from gemini in Audio + Text in the upcoming version?

E.g If I ask "Hello Gemini tell me about the weather." Can I get the response in Audio and the same thing that gemini is speaking in text too? I mean I need the same response in both audio and text.

@Giom-V
Copy link
Collaborator

Giom-V commented Jan 15, 2025

@abdul7235 I don't it will be possible as Gemini generates the audio output directly, without using a TTS mechanism. If you want both (and I can see why) I think you'll have to generate text then use a TTS service to generate the audio.

@ArthurG
Copy link

ArthurG commented Jan 17, 2025

With the live-api on websockets [1], is there a way to adjust the safety params [2]? I couldn't see it in the source code of the python lib or the docs @Giom-V

[1] https://github.com/google-gemini/cookbook/blob/main/gemini-2/live_api_tool_use.ipynb
[2] https://ai.google.dev/api/generate-content#v1beta.HarmCategory

@Giom-V
Copy link
Collaborator

Giom-V commented Jan 20, 2025

@ArthurG I don't think you can at the moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:documentation The documentation needs to be updated
Projects
None yet
Development

No branches or pull requests

6 participants