Mike Powers
Gemini 1.5 Pro Audio file support in testing with enterprise users

Google unveiled the Gemini 1.5 Pro upgrade in mid-February, surprising AI fans with a massive upgrade for its large language model (LLM). Gemini Pro powers the free Gemini product that anyone can access. Gemini Ultra is the version you have to pay for, via a Google One subscription.

Gemini 1.5 Pro is already as powerful as Ultra and recently got a significant upgrade: a context window of up to 1 million tokens. That means you can feed it prompts of around 700,000 words, over 30,000 million lines of code, 11 hours of audio, or 1 hour of video content.

Fast-forward to mid-April and Google announced that Gemini 1.5 Pro is available for testing to enterprise users via the Vertex AI development platform. The testing will include support for using audio files in prompts, which is an amazing feature to have from a genAI product. Unfortunately, however, not everyone currently has access to Gemini 1.5 Pro yet.

Those lucky enough to test Gemini 1.5 Pro will be able to upload audio files of any kind and ask the AI for information based on those files. As someone who has been using a ChatGPT-powered app called Whisper to transcribe audio files, I’ll say this Gemini 1.5 Pro feature is something I want to see from other genAI products.

Support for audio files opens up so many doors. I use the feature for interviews and video calls, as it significantly improves my ability to recall details. This feature obviously also makes transcription easier.

Google compares the context window of Gemini 1.5 Pro to Gemini 1.0, ChatGPT, and Claude. Image source: Google

I will say that support for audio and video files in Gemini also underscores the importance of good privacy policies governing such data. I wouldn’t want to upload audio files to Gemini or any other genAI program without knowing that my data is safe and that it won’t be used to train the AI.

I look forward to seeing how Google will handle the privacy of audio files uploaded to Gemini once the general public has access to the functionality.

Unfortunately, it’s unclear how long we’ll have to wait for a public beta test of Gemini 1.5 Pro. Or when Google will bring support for audio and video prompts to Gemini. I will say that Google I/O 2024 takes place in May, at which point we’ll learn more details about Google’s AI plans in 2024.

For now, Google’s Gemini 1.5 Pro beta test is included in the company’s Google Cloud Next ’24 announcements. In addition to making Gemini 1.5 Pro available to test, Google also announced other AI upgrades.

Of note, Google also updated Imagen 2, its text-to-image generation model. It now supports inpainting and outpainting, which lets you add or remove objects from a photo.

Imagen-generated pictures will also support SynthID digital watermarking. That’s another Google product that adds an invisible watermark to AI-generated pictures to identify their origin.

Finally, Google will test a way to improve its AI responses with Google Search so the answers contain up-to-date information. That can be a problem for all genAI products, Gemini included.

