-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I get Error with CONFIG = {"generation_config": {"response_modalities": ["AUDIO","TEXT"]}} in gemini-2/live_api_starter.py #386
Comments
I'm getting the same error. It would be nice to get both text and audio at the same time. This is particularly useful for generating dialogues for things like games... |
Hello @abdul7235 and @LarsDu, At the moment, multimodalities is not available publically. You can only get Audio using the live APIs, and text using the "classic" ones. I'll see what can be done to make that clear in the documentation. |
Will multimodal output be available in the coming months? Also, could you guide me on how to gain access to the non-public multimodal API? |
Is it possible to get the Audio and function calling? @Giom-V |
this is when you sent unsafe prompts self.safety_settings = [ |
now this one works for me : self.safety_settings = [ |
I am getting this error, any idea why I might be getting this?
|
Yes it will.
@abdul7235, sorry but this is not a progra; you can request to join. The only way is to be very active (like GDEs) and we will reach out to you. |
@kshitij01042002 This notebook will show you how to use function calling with the live API: https://github.com/google-gemini/cookbook/blob/main/gemini-2/live_api_tool_use.ipynb |
@kshitij01042002 I'm guesssing you're not using the right API key. It should start with "AIza..." which is not the case of the one you seem to be using. |
@Giom-V I just need to re confirm that will I be able to get response from gemini in Audio + Text in the upcoming version? E.g If I ask "Hello Gemini tell me about the weather." Can I get the response in Audio and the same thing that gemini is speaking in text too? I mean I need the same response in both audio and text. |
@abdul7235 I don't it will be possible as Gemini generates the audio output directly, without using a TTS mechanism. If you want both (and I can see why) I think you'll have to generate text then use a TTS service to generate the audio. |
With the live-api on websockets [1], is there a way to adjust the safety params [2]? I couldn't see it in the source code of the python lib or the docs @Giom-V [1] https://github.com/google-gemini/cookbook/blob/main/gemini-2/live_api_tool_use.ipynb |
@ArthurG I don't think you can at the moment. |
Description of the bug:
From the documentation:
https://ai.google.dev/api/multimodal-live
I believe I can get response in multiple modalilties, but running the above CONFIG in my code I get following error:
websockets.exceptions.ConnectionClosedError: received 1007 (invalid frame payload data) Request trace id: a0bb7a2dd8834b47, [ORIGINAL ERROR] generic::invalid_argument: Error in program Instantiation for language; then sent 1007 (invalid frame payload data) Request trace id: a0bb7a2dd8834b47, [ORIGINAL ERROR] generic::invalid_argument: Error in program Instantiation for language
Why am I getting this error?
Actual vs expected behavior:
Is it possible to have response in Audio and Text Simaltaneoulsy?
If yes, please help me sort it out.
If no, then for good's sake it must be mentioned clearly in the documentation!
Any other information you'd like to share?
I had appreciate if google and GCP had properly organized and clear documentation for their services, APIs and SDKs. It is such a horrible experience integrating google's services due to documentation being horribly scattered and vague.
The text was updated successfully, but these errors were encountered: