Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Summarize a video file with audio with Gemini 1.5 Pro feature but in pydantic #713

Open
RexsyBima opened this issue Jan 19, 2025 · 1 comment

Comments

@RexsyBima
Copy link

RexsyBima commented Jan 19, 2025

i once stumbled on this stuff inside google vertexai docs

https://cloud.google.com/vertex-ai/generative-ai/docs/samples/generativeaionvertexai-gemini-video-with-audio#generativeaionvertexai_gemini_video_with_audio-python

it says that we can create video summarization with vertexai, but i wonder if we can do the same thing with pydanticai versio of vertexai? if so, how? it says in the documentation that we can do it like this

import vertexai
from vertexai.generative_models import GenerativeModel, Part

# TODO(developer): Update and un-comment below line
# PROJECT_ID = "your-project-id"

vertexai.init(project=PROJECT_ID, location="us-central1")

model = GenerativeModel("gemini-1.5-flash-002")

prompt = """
Provide a description of the video.
The description should also contain anything important which people say in the video.
"""

video_file = Part.from_uri(
    uri="gs://cloud-samples-data/generative-ai/video/pixel8.mp4",
    mime_type="video/mp4",
)

contents = [video_file, prompt]

response = model.generate_content(contents)
print(response.text)
# Example response:
# Here is a description of the video.
# ... Then, the scene changes to a woman named Saeko Shimada..
# She says, "Tokyo has many faces. The city at night is totally different
# from what you see during the day."
# ...
@izzyacademy
Copy link
Contributor

@RexsyBima, similar issues were raised in #198 and #126

Multi-Modal support is coming in the future but it is not available at the moment.

Some folks have been able to implement multimodal support with OpenAI models and PydanticAI

https://github.com/rawheel/Pydantic-ai-MultiModal-Example

Not sure if Gemini can do the same yet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants