top of page
Search
Writer's pictureEzra Sandzer-Bell

Generative Audio Workstations: AI VSTs & The Future of DAWs

Generative audio workstations are a name given to next-generation AI-powered DAWs. A handful of companies have already released GAWs for the public and experts in the market have predicted the trend will continue in the coming years.


FL Studio added a new AI mastering feature and non-AI chord generation that took their users by surprise. At the top of January 2024, the classic audio editing software Audacity launched a new collection of AI plugins. Apple launched Logic Pro 2 in May 2024, complete with AI session musicians and stem separation.


During this article, we'll provide several other examples and share some ideas about what GAWs could evolve into over the course of the next year.


Table of Contents


What is a generative audio workstation?


The expression generative audio workstation was first popularized in June 2023 by Samim Winiger, AI music expert and CEO at Okio. Its origins can be traced back to a research paper titled Composing with Generative Systems in the Digital Audio Workstation by Ian Clester and Jason Freeman.

The first digital audio workstations emerged in the late 1970s. They streamlined analog processing and improved on the limitations of recording to tape. By the early 1990s, producers gained access to more powerful DAWs like Pro Tools and Cubase, followed in turn by Fruity Loops, Ableton, Logic Pro X and countless others.


Fast forward to today - We're on the bring of a technological revolution.


AI music generators that don't qualify as a GAW


The first wave of commercial AI music generators were marketed primarily to non-musicians looking for easy and quick paths to a finished song. They appeal to content creators that might otherwise be using audio licensing catalogs like Artlist, Epidemic Sounds, Soundstripe, AudioJungle, Envato, and so on.


Web applications like Boomy, Soundraw, and Soundful do provide options for customization, but it would be a stretch to call them workstations. They lack the robust controls that seasoned audio engineers and composers require.


In the remainder of this article, we'll highlight DAWs with legitimate AI features.


Logic Pro: AI Session Musicians in the DAW



Logic Pro has been Apple's pro music composition tool for over a decade.


The DAW has included a pre-AI generative drummer since 2013, but the latest AI session musicians are of an entirely higher caliber. Users configure chord progressions globally, at the project level, and Logic uses that to spin up highly expressive performances on bass, piano and drums.


Under the hood, Apple has trained real AI models to compose MIDI for several different instruments. We've written a detailed overview of the pros and cons of their new AI sessions musicians here.


Logic Pro now also includes AI stem splitting into bass, drums, vocals, and "other", with and a bonus sound design tool for adding analog warmth to your audio tracks. You'll need ~13GB of free hard drive space to update to the latest version.


At this time, Logic 11 is the most concrete example of how the future of generative audio workstations will look and feel.


ACE Studio: The DAW that generates AI vocals



ACE Studio is a generative audio workstation specializing in AI vocal synthesis and pitch control. Users pick a style of singing voice, compose a melody in the MIDI piano roll and attach lyrics for the voice model to sing. It's really that simple.


In 2024, the company announced several new capabilities, including the option to train your own AI voice models. So if you're a producer working with an artist and want to prototype songs with their voice before handing off a demo, this is a great way to do it.


ACE Studio stem splitting and vocal to MIDI

ACE comes with an AI stem splitter to isolate vocals, with a vocal-to-MIDI-and-lyric converter. So if you're dabbling in AI song generation with web apps like Suno, you can plug any song into ACE and modify the vocal track's melody, lyrics, and even the vocal timbre.


Each MIDI note in the ACE piano roll has AI emotion parameters that help bring the voice to life. Control the amount of tension, energy, and breath on a note-by-note level. You can even draw in pitch lines to slide up or down to other notes.


Conventional DAWs like Logic Pro don't have any support for vocal generation. Their AI session musicians are great, but those are instrumental only. If you need vocal generation and pitch control, ACE is currently the best option.


The AIVA GAW: Beyond simple music parameters

AIVA's paramater-based tool
AIVA's paramater-based tool

One early mover in B2C AI music generation, called AIVA, included the familiar parameter-based web interface. Users can select properties like key signature, BPM, meter, and genre to spin up several songs. But their product goes above and beyond these features by providing a full DAW experience, in browser and as a downloadable, standalone desktop application.


Whenever a new piece of music is generated from parameters, AIVA's users have the option to go deeper with a DAW and MIDI piano roll editor. Here they can make changes to the notes manually or leverage generative features to modify the melody and chord progressions. Effect layers and mixing tools are also available.


For this reason, AIVA qualifies as a generative audio workstation.


The AIVA GAW
The AIVA GAW

WavTool: AI Chatbots in the GAW


WavTool is a great example of a GAW that's transformed the DAW landscape. Their text-to-music features are still in the early phases of development and could use improvement. Still they represent a meaningful and innovative shift in the way users think about music production workflows.


The video below showcases Wavtool's GPT-4 powered AI chatbot. This creative assistant understands text prompts related to the audio workstation and can act on your behalf. Users can request chord and melody material, new instrument tracks, changes to the mix, and more.



The first version of WavTool's AI chatbot (shown above) was constrained to generating MIDI tracks. However, a recent build introduced a new text-to-audio sample generator. This works nicely to bypass the GAW's limited sound design tools and provide immediate access to loopable samples. Check out a demo of this new feature below:



We expect to see more AI chatbots in future DAWs, perhaps leading to a scenario where producers work alongside AI bandmates to come up with new ideas.


Visit WavTool's website to learn more and sign up for free to try it out!


AI VSTs: Thin client vs local device processing


The symbiosis between DAWs and plugins will persist under the influence of generative AI. A few software companies are ahead of the curve and provide AI VSTs that work with current, conventional DAWs.


A conventional DAW can become a GAW when augmented with AI VSTs. The following apps can be roughly divided into software that runs its generations locally and thin client VSTs that use resources from a centralized cloud server.


Samplab 2: AI VST for Audio to MIDI


Thin client software is designed to consume less memory and processing power from the user's local device. They use API calls to send information up to the cloud and then pull the finished output back down to a user's computer. These AI VSTs still hook into the DAW like ordinary plugins. Artificial intelligence simply introduces new, innovative capabilities that would otherwise be memory-intensive.



Samplab is an example of an AI-powered audio-to-midi VST that runs stem separation on audio files in the cloud, transcribing them to MIDI and then surfacing the MIDI files in plugin's piano roll.


One unique capability that sets the plugin apart is the option to drag individual notes up and down. Samplab's piano roll will make a direct change to the original audio composition, while retaining its timbre and sound design. It can also detect chord progressions for polyphonic instrument layers.


The final audio and midi files can be dragged from the plugin into your DAW of choice. In the future, these kinds of transcription features could be part of a GAW.


Local tone transfer: Neutone, Combobulator, Mawf


Neutone is a hub that runs real-time AI audio processing within a DAW. It comes with some models by default, but includes the option to download more from within the VST. A walkthrough of the software can be found in the video above. Neutone includes access to Google Magenta's DDSP model, which can also be downloaded independently as its own plugin.


DataMind Audio published a new timbre transfer plugin in 2024 called The Combobulator. Their company combines a slick user interface and high quality audio with a solid ethical framework. Artists get a 50% revenue share every time on each sale of their models.



One of Google's DDSP developers, Hanoi Hantrakul, was later hired by TikTok and created an improved DSP model called Mawf. The beta version includes timbre transfer for three instrument types (saxophone, trumpet, and a bamboo flute from Thailand called the khlui). The output is significantly better than DDSP.


Mawf plugin settings

When using an audio track with Mawf, switch on control mode under the modulation tab as shown above. I recommend adjusting the dry/wet mix to 100% to isolate the timbre transfer, as shown in the screenshot above. Then you can use the dynamics and effects tabs to experiment with transforming the sound.


Generative vocal synths: Synthesizer V and Vocaloid

Synthesizer V
Synthesizer V

AI voice generators are extremely popular at the moment, but only a few of them are designed for musicians. Even fewer can run inside of a DAW. Two of the most popular plugins in this category today are Synthesizer V and Vocaloid 6.


Vocal synthesis could eventually be baked into generative audio workstations. Users will type in lyrics, select a voice model, and let generative AI produce novel vocal melodies as a source of inspiration. Autotune features will provide control over the melody through a piano roll and additional dynamic layers will be controlled via the GAW's mixing interface.


The end game with vocal synthesis doesn't have to be the elimination of human vocalists. Instead, it could be a way for non-singers to prototype music and send their rough ideas over to human talent, who then record it to give it a polished feel.


On the other hand, genres like trap and RnB appropriated autotune to create a new musical aesthetic. It follows that AI voices could become a core part of new genres of music and even be a coveted sound.


AI MIDI generation: Lemonaide, Orb, Magenta Studio



The AI MIDI plugin Lemonaide is a kind of self-contained mini GAW. It includes a piano roll, virtual instruments, and the option to export as audio and MIDI.


Users begin by choosing whether to produce melodies, chords, or both. From there, a key signature is chosen and users hit "get seeds". Each seed that it creates can be auditioned from a list and viewed in the bottom half of the app as notes on a MIDI piano roll.


Edit the notes, key signature, and tempo and then drag the MIDI into a DAW to further refine the sound design. AI MIDI generators like this are a great way to inspire new material quickly.


Check out our full article more a complete list of AI MIDI generation VSTs.


AI Audio Generation: Semilla


Max AI plugin

In November 2023, AI music developer and live performer Hexorcismos released a new Max plugin called Semilla. It can load pre-trained models and offers a wide range of parameters for audio output. Semilla is currently the most advanced Max patch we've seen for AI music generation. If you are going to run AI music experiments locally, one of the most popular solutions is Max 8.


Non-generative AI mixing and mastering in a GAW

Tasks like mixing and mastering are not usually considered generative, but they have been a staple of music production for years. Technical tasks like EQ, compression, saturation and stereo imaging can be streamlined through machine learning processes. Musicians who prefer to focus on composing can use these apps instead.


Here are a few popular AI mixing and mastering plugins that exist today:

  1. Izotope Neutron 4 - Mix Assistant runs automatic processing and also supports changes to the parameters so users maintain control.

  2. Izotope Ozone 10 - Mastering assistant uses genre references to guide its automation and provides controls for width, EQ, and dynamics

  3. Sonible Pure Bundle - Compressor, Limiter and Reverb plugins adjust parameters according to the received signal and provide a control knob

There are several other plugins like these on the market today. We can imagine that some of the same core functionality will be expected from a GAW.


Final thoughts on generative audio workstations


In this article we've covered generative AI for midi and audio, chatbot-assisted sound design, stem separation, and vocal synthesis. Some of the most innovative python libraries and models are operating in stealth mode on experimental Google Colab and Hugging Face spaces. This creates a barrier to entry for non-programmers as well as musicians who need tools that run directly in their DAW.


Most computers won't have sufficient GPU or VRAM to support high fidelity audio generation locally. However, software development frameworks like JUCE do already support API calls. This means that user authentication and pay-to-play generative services could begin rolling out at any time. As we mentioned, Samplab has already accomplished this.


The main barriers to innovation are funding and legal frameworks that protect the companies who serve up AI music models. We expect to see a breakthrough in the quality and volume of plugins available as these fall into place.


When plugins do begin to show meaningful growth, legacy DAWs will likely begin the inevitable pivot into native generative audio features. At this stage, VST chains could become less common, with most of the important actions available directly within the GAW.


bottom of page