top of page
Search

Riffusion Returns With AI Image-to-Song Mobile Experience

Riffusion is an AI-powered text-to-song generator, on par with Suno and Udio. Like those apps, Riffusion accepts lyrics and descriptions of music styles. It uses those prompts to generate short music clips with singing, rapping and screaming AI vocalists.


They launched V1 in December 2022, followed by a $4M seed round and V2 in October 2023. Near the end of July 2024, they released a new mobile app that centered around an image-to-song generation experience.


Riffusion is moving away from the browser UX and toward native mobile and desktop apps instead. We'll cover everything in this article and try to give you a feel for their product history, starting with present day and going backward.



Table of Contents


New mobile app released July 2024

Riffusion released their first ever AI text-to-song mobile app in July 2024. This experience now centers around a fun photo-to-song capability that anyone can enjoy - even people who don't play any instruments and never made a song before.


I found it incredibly fun to play with, even as a musician. It had me laughing out spontaneously several times, as I was taking photos of random stuff around the house and turning it into music.


My takeaway was that even though I want and need other AI song generation tools for my DAW workflows, an experience like Riffusion is still valuable in its own way. So let's get into exactly how the mobile app works.


How Riffusion's AI mobile app works

  1. Users snap a picture with their smart phone camera

  2. An AI-powered model writes a caption to describe the image

  3. Riffusion writes lyrics about what it "sees"

  4. Those lyrics are rapped or sung by an AI vocalist

  5. Riffusion creates matching background music


I tested the app and confirmed that it works as advertised. Not only does it write songs about what's in the photo, but it can do it over and over again at lightning speed.


The same way you might scroll mindlessly through an IG or TikTok feed, you can scroll through songs about whatever was in your photo.


We've covered the timeline of AI image-to-music generators. SoundGen and Mubert are the two commercial products offering the service as of July 2024.


Their mobile app's community has shifted from the old explore page to a new featured page. Click the telescope icon on the app's toolbar to browse. You can submit your own creations to the featured page as well.


Here's a screenshot of the featured page from my own account:


Features in Riffusion Version 2.0


The remainder of this page is a reflection on previous version of the Riffusion web app. Some of the V2 features have been cut, like the ability to download stems.


In its previous incarnation, Riffusion asked you to sign up for a free account and then presented you with this dashboard first:

Riffusion v2.0 dashboard

Trending riffs on Riffusion's Explore page were a collection of AI music generated by other users. Hovering over a track, you could read the lyrics like karaoke and hear them performed by an AI voice.


Explore was similar to the mobile app's Featured page today.


Users could click the heart icon to improve that song's rank in the community. Clicking through the creator's profile name, you could see a library with other riffs they had made.


We time stamped a walkthrough of Riffusion below, so it takes you straight to the Explore community page. Have a listen to what the music sounds like and see how the app works.



How did you generate an AI song in Riffusion v2?


To create an AI song in Riffusion v2, you clicked the "plus" icon located at the top left corner of your dashboard. You were encouraged to add lyrics and describe the sound of music you wanted, based on features like genre, vocal style, or overall vibe.

Riffusion text-to-music interface

Users would type in lyrics or use a microphone to speak words directly into the app. If you didn't have any ideas about where to start, a one or two word concept was fine. The app included an AI powered "write lyrics" button that created lyrics based on short ideas.

There was a second prompt field called Describe the sound. It had a prompt genius button to help write descriptions of music in a similar way.

The prompt genius feature

Once lyrics were ready, users could hit Riff to generate some music. Within a minute, they'd have three tracks to choose from.


Clicking the play button on any card, they could hear the song played back with AI vocals and instrumentals.

Examples of riffs from Riffusion

Hovering over a card revealed a few options represented by icons.

  • The remix feature refreshed a track and delivers a variation.

  • The favorite icon stashed the track in your User Library.

  • The share icon copied a link to a public URL where anyone could listen.

  • The ellipsis included several additional options. You could open the riff in a dedicated window, save as audio or video, and split the audio into stems.

Saving audio, video, and stems in Riffusion

The Split out stems option would separate your track into vocals, drums, bass and "other" layers. Then download any individual audio layer and use them in a DAW.

Separate stems into vocals, drums, bass, and other

Clicking on Open riff navigated to a full page view of the track. You could access many of the same features, but stem separation was no longer available and instead there was some control over the riff's visibility (public vs private).

Single riff view after selecting Open Riff

That's pretty much how the V2 web app from October 2023 worked.


Now lets turn the clock back further in our Riffusion time machine, and go all the way back to the company's roots, long long ago, in December 2022.


Riffusion V1: The First AI Text-to-Music Generator

Riffusion version 1.0

Riffusion's first site version debuted in December 2022.


It came in on the tail end of a year ripe with AI text-to-image generation. Midjourney and Dalle had become household words. OpenAI released ChatGPT3 and suddenly the world had access to conversational artificial intelligence.


Text prompts became a hot topic and the internet wanted to know when AI music generation would have its moment in the sun. Who would be the first to drop a service that converted descriptions of music into raw audio?


At the time, AudioCipher’s text-to-midi plugin was turning words into melodies and chord progressions, but it wasn't using a diffusion model. It gave users control over key signature, scale, chord extensions and rhythm automation. The app was intended for DAW musicians to beat writer's block.


Riffusion AI: Spectrogram Generator to Music



Riffusion made their first big move in December 2022 with a web app that created short music cues riffs at lightning speed. The creators, Seth Forsgren and Hayk Martiros, are indie programmers who started the app as a hobby project.


The website became a viral hit overnight.


For everyday people, it was one of those rare moments on the internet where you can send friends to a website and amaze them with something. So Riffusion was being shared virally by people who enjoyed the latest developments in AI.


However, the machine learning and genAI programming crowds celebrated the innovative underlying tech deployed by the company.


Riffusion V1 was built on top of a Stable Diffusion model called img2img. It trained on a multimodal dataset of labeled spectrograms. Each short audio clip was captured in an image and paired with a caption describing features like genre, instrument, speed, vibe, etc.


So this meant that when a user requested music in a particular style, Riffusion's model would generate new spectrograms in that similar style.


It interpolated and stitched the short clips together, sonifying the spectrograms and turning them into audio files.


Riffusion remained in this state for most of 2023.


Mid-2023: Google MusicLM, Meta MusicGen


A working demo of MusicLM didn’t reach the public until May 2023. It delivered higher quality audio and compositions, but lacked the exciting audio upload and style transfer features that had been promised by their demo page in January.


Meta swooped in a month later, making a press release in June about their own generative AI text-to-music model, called AudioCraft (MusicGen). They one upped Google by making AudioCraft open source, with a Github repo and API that developers could use to embed the service into their own interactive AI tools.


MusicGen was soon available on Hugging Face, and developers surfaced the same style transfer feature MusicLM had claimed to have but failed to deliver.


As of November 2023, MusicLM and MusicGen lack any AI voice features.


Late-2023: Stable Audio, Splash, and Chirp


Three months later, in September 2023, Stability AI announced a text-to-music web application called Stable Audio. Around this same time, companies like Chirp and Splash were surfaced with AI lyric-to-song generators.


Riffusion's V2 website and features, released in October 2023 around the time they announced their $4M seed round, put them firmly in the running with other rising stars. They fell behind in the first half of 2024 as Suno and Udio took the lead, but Riffusion's new mobile and desktop apps show a lot of promise.


---------------------


This post was written by Ezra Sandzer-Bell, founder of AudioCipher Technologies.


Comments


bottom of page