top of page
Search
Writer's pictureEzra Sandzer-Bell

Native Instruments Develops New AI Text-to-Instrument Model


Native Instruments has been a trailblazer in virtual instrument and music production technology for more than 25 years. Their legacy of innovation is poised to continue in the future as they break new grounds with generative AI.


A research paper from their in-house machine learning team, published in July 2024, describes a novel AI audio model that creates virtual instruments from text prompts or audio inputs. They presented this model for a live audience in Mid-November, at a major AI music industry event called the ISMIR conference.


We interviewed their lead research scientist to learn more about how the model works. Their CPO followed up with us to share details about the roadmap and how this new tech might fit into their existing product portfolio.


This article will outline the benefits of a generative model for virtual instruments, along with an overview of their existing AI-powered plugins. To get started, let's have a quick look at why this kind of model will be valuable to musicians.


The shortcomings of instant AI song generation


Over the past two years, dozens of AI text-to-music generators have come to market. The most popular services, Suno and Udio, turn short ideas into complete songs in a matter of seconds. But is that what musicians really want?


Most songwriters, beat makers, and composers enjoy the creative decision-making process. The slow and painstaking process of writing an album results in a final product that mirrors the creator's imagination.


Despite the advantages associated with moving slowly, there are known choke points in the DAW that can interrupt the creative flow.


Native Instruments wants to use AI to help musicians find the right virtual sounds in a fraction of the time. They are exploring a few different approaches, ranging from text input to audio conditioning. We'll cover both in the following sections.


Table of Contents



AI text-to-synth: Generating virtual instruments



Text-to-synth is a generative audio technique in which virtual instruments are created from text prompts. Imagine typing in “warm finger style electric bass” and, with a few clicks, generating a fully playable bass instrument in your DAW.


This is no longer a distant dream. Native Instruments is working on a neural audio model that lets users generate instrument sounds from descriptive prompts or audio samples. It’s like having a personalized sound designer at your fingertips.


I want to clarify upfront that this model is not commercially available. However, the video above demonstrates how it works. There are several additional audio demos available on the webpage that accompanies the core research paper.


How Native Instruments' AI text-to-instrument model works


Each text prompt results in a sample-based instrument spanning 88 keys and five dynamic levels. That's 440 one-shot samples for each generative instrument.


The team has worked hard to maintain what they call “timbral consistency". They want instruments to sound cohesive regardless of which note you play or how hard you hit it. Timbral consistency is what separates amateur sounds from a professional one, and it’s challenging to achieve with generative models.


For producers, this could mean a new era of creativity with high-quality sounds generated on demand. It would be transformative for people who write excellent melodies and chord progressions but struggle with sound design.


Native Instruments' chief product officer comments on generative AI

Native Instruments CPO comments on generative AI audio synthesis

Native Instrument's chief product officer, Simon Cross, responded to our announcement to clarify that their team will likely publish an AI-powered search and navigation tool first. This makes sense because the company already has a massive quantity of virtual instruments and settings for users to choose from.


Generative virtual instruments will come at a higher cost and have a bigger environmental footprint. In the short term, they would also likely result in lower quality than human-crafted instruments.


Text-to-synth competitors: Wavtool and SynthGPT


Consumers will have to wait patiently for Native Instruments to release a text-to-synth plugin commercially. Meanwhile, other companies are already vying for mind share in what will eventually become a competitive software niche.

WavTool was the first mover, arriving on the scene back in May 2023. They leveraged the ChatGPT API to deliver text-to-wavetable synthesis. It was a bold step forward and ahead of its time, but the quality wasn't quite good enough.


WavTool shuttered their doors on November 15th 2024, hinting that they would be returning. It looks like they may have been acquired.



Meanwhile, the AI music startup Fadr SynthGPT emerged in 2024 with a service that interprets text prompts and matches them to synth presets. The app has been marketed aggressively and met with skepticism, as the video above questions whether the app is truly powered by generative artificial intelligence.


SynthGPT appears to be using a model trained on sound descriptors and audio characteristics. This would allow it to understand nuanced descriptions like "warm ambient pad" or "distorted glitch bass."


Unlike Native Instruments, SynthGPT interprets the language of a descriptive text prompt to determine core attributes—such as tone, texture, modulation style, and era influences (e.g., 80s synthwave or modern trap)—and generates a selection of presets that align closely with those specified qualities.


Instead of creating raw audio from scratch, the AI model behind SynthGPT seems to analyze the underlying meaning of a text prompt and assign oscillator settings, filter types, and other effects to modify existing virtual instruments.


Sample-to-instrument: Audio inputs instead of text


The generative audio model from Native Instruments doesn't stop at text prompts. It also supports audio-to-synth generation, giving users the ability to feed in an audio sample as a reference.


In practice, this means that musicians could upload a sampled instrument stem and model its tone dynamically. The model captures the essence of the input and applies it across a playable keyboard range, bringing even more versatility to sound design.


There are a few audio-to-audio timbre transfer companies on the market today, like Neutone and Combobulator, but this would be the first time we've seen a virtual instrument generator that supports MIDI inputs.



Synplant 2 is currently the closest plugin to the Native Instrument's sample-to-instrument model. It generates synth patches by analyzing audio samples and determining optimal synth settings to replicate that source sound.


This approach could be compared to SynthGPT's text-to-synth tool, because the core synthesis engine is not powered by generative AI.


Instead, it uses machine learning (music information retrieval) to understand the timbre of an existing sample and automates its own parameters to get as close to that sound as possible.


While this approach may seem disappointing to the AI crowd, it does allow for more tweaking and controllability than a purely generative approach.


Existing AI-powered plugins from Native Instruments



Native Instruments already offers a suite of AI-powered plugins commercially, to help streamline music production by simplifying mixing, mastering, and sound creation. Watch the video above for a general overview.


Among these is Ozone, an advanced mastering plugin from iZotope, a partner company under the Native Instruments umbrella. Ozone uses an AI-powered Master Assistant that analyzes the audio and suggests starting points for mastering, adapting to the genre and dynamics of the track.


It also includes tools like tonal balance controls and stabilization modules, which help achieve a polished, professional sound with minimal manual adjustments.


For vocal processing, Native Instruments’ Nectar plugin stands out with its AI-driven Vocal Assistant, which applies a series of vocal-specific effects based on the input it analyzes. This allows for easy adjustments to vocal tone, presence, and spatial effects.


Nectar’s AI capabilities mean that complex vocal processing tasks, which typically require significant expertise, are now accessible with a few clicks, delivering a polished vocal sound that sits comfortably within the mix.


There are a few more AI plugins in their basket, but in the interest of brevity we'll stop here. It's clear that Native Instruments wants empower music producers by automating complex technical processes, freeing up time for creative expression.


We will update this article in the future if and when they release a generative text-to-synth or audio-to-synth plugin commercially.

bottom of page