Google DeepMind's Gemini 2.0 model now comes with a realtime video streaming feature. In simple terms, it can watch whatever's happening on your screen and hold a live voice conversation with you. The model knows its way around a DAW and will act as an AI music producer, offering insights into your sessions.
Musicians have been testing it with Ableton and Logic Pro to see how deep its music knowledge actually goes. We've confirmed that it can observe visual elements on screen and recommend solutions to common challenges, like adjusting the mix or developing a new section..
In this article we'll show you how to set up Gemini to begin using it with your audio workstation of choice. Watch the video below to see a live demo of how it works. If this piques your interest, keep scrolling to find a step by step tutorial.
Gemini 2.0 is currently free to use, so it won't cost you anything to test this out.
Table of contents
How to point Google Gemini 2.0 at your DAW
To get started with Gemini 2.0, all you'll need is a free Google account. Open your DAW of choice and then navigate to Google AI Studio's live streaming page here: https://aistudio.google.com/live.
On the main screen, confirm that your microphone and video permissions are both switched on. I've highlighted those icons in the screenshot below. If they are red and crossed out, you'll need to go into your browser settings and enable them first.
With your computer mic and video turned on, select the third card labeled "Share Your Screen". AI Studio will ask you to share a browser tab, window or full screen. I suggest you go with the "window" option and select your DAW as shown below:
That's all there is to it. Begin speaking with Gemini and ask it questions about what it's seeing in your DAW. Talk to it about the project, show it plugins that you've set up, and drill down to see how far this AI music producer can go.
Experiment: Producing music with Gemini's help
After confirming that Gemini 2.0 could viewing and discuss my DAW, I decided to find out whether it could truly listen to music and understand what it heard. This seemed like an important benchmark for an AI music producer to reach.
I had reason to believe that Gemini may have a basic understanding of audio. See the screenshot below from Google's Gemini developer docs:
Gemini can apparently "understand" non-speech elements like birdsongs and sirens. The docs don't specify whether music was included in the training data. However, as we reported in this article about Google's AI music datasets, they have already cultivated a large repository of music data for training purposes.
Based on the results of my tests, I don't think that Gemini was trained on those data sets, If you're interested in understanding what they've accomplished in the realm of generative AI music, check out MusicFX. It's not an AI music producer but it does act as a kind of AI DJ that creates realtime music on the fly.
For now, let's stay the course and take a look at how Gemini can be set up to receive audio from your DAW and computer mic at the same time.
Cable-free audio routing: DAW output to Gemini
Gemini wasn't designed to listen for DAW output, so we need to use an audio router. I recommend a free solution like Loopback or Blackhole. These are great apps to become familiar with, not only for this project but for a variety of other use cases. If you don't already have one of these, it's not complicated to learn.
Here's an annotated screenshot of the Loopback configuration we used:
Open Loopback or Blackhole and select your DAW & computer mic as inputs.
Next, Assign the audio router as your microphone input via the browser settings as shown above. From there, you'll follow the instructions we've already shared.
As a reminder, make sure your microphone and screen sharing are enabled in Google’s AI Studio web app. Select realtime streaming and then choose your DAW window. Ask Gemini to listen to your music and see what happens!
Gemini: An AI music producer who pretends to listen
If you watched the demo video at the beginning of this article, then you've already seen how Gemini works. The model appears to be capable of seeing and commenting on events within the DAW. It also speaks with confidence about common music production concepts.
During this test, whenever I asked it to listen and describe the music coming from Logic Pro, it answered with total confidence. I thought that it could sincerely hear the music, until I muted the DAW's output and it continued to "hear" everything.
After some additional tests outside the DAW, using vanilla media players, I found that it could only tell me the duration of the clip. It consistently got the instrument and genres wrong.
My conclusion is that Gemini infers instruments and sound design from visual cues like labeled tracks and other UI elements. This is something like working with an AI music producer that can see but cannot hear you, and who nevertheless insists that they can hear what's happening. Maybe it's trying to please us.
A third party company might be able to use Google’s API to fine tune Gemini with a labeled audio dataset and build a co-producer VST that truly listens to DAW audio output. It might require additional MIR models to work adequately, if Gemini's core architecture is incapable of detecting nuances in audio.
This brings me to a final point that I'd like to make about AI music producer software in general, and where it's likely headed during 2025.
AI music production software from 2023-2025
The Gemini co-producer use case is not the first of its kind. A few other companies have attempted to create an AI-assisted DAW. The best example of that is a company called WavTool, who as of December 2024 appear to be on a temporary hiatus.
WavTool's AI DAW launched in May 2023 with a GPT-4 composer that spoke with users about their project, generated MIDI, created wave tables and made edits directly to the DAW. They paused their service but have hinted that they will be returning. Some have speculated that they were acquired by a bigger company.
Phil Speiser's AI-powered mixing assistant plugin "The_Strip" is a second example of an AI music producer that observes your mix and uses an LLM to make comments about how you can improve it. However, unlike WavTool, Speiser built a plugin that is compatible with standalone DAWs like Ableton.
Then there are tools like RipX and Ace Studio, that are augmented with a suite of AI tools for improving ordinary music production workflows. Calling them a "producer" could be a stretch, since they're not conversational or autonomous.
Lastly, there are a few companies who positioned their software as "AI music producers" in their marketing copy, but are in reality little more than web browser apps offering AI audio loop generation as a service.
So where and when will the first serious AI music production tool become available, you ask? I believe Ableton is currently the best candidate, because their API gives third party developers access to almost every feature of their DAW.
I know of at least one developer who's working on an AI agent in stealth. Their tool will take text commands and act directly on Ableton via Max4Live. I'll be sure to update this article and let you know when that patch becomes publicly available.