Creating a music video used to be a time consuming and expensive process. But in a world where our success is connected to presence on social media, people have started looking for new ways to create visual content to stands out from the crowd. AI music videos are one of the most popular choices in 2024.
The video above is from Die Antwoord, an avant-pop group who turned their song Age of Illusion into an ai-generated animation. They are one of several major artists, alongside Linkin Park, to embrace this new technology.
In February 2024, OpenAI announced a new text-to-video service called Sora that turns text into somewhat realistic looking footage. It's not audio-reactive but it does open up a new world where ideas can become a visual reality. In early May, a band called Washed Out released one of the first ever AI music videos created with Sora.
Table of contents
Neural frames vs Kaiber: An overview
According to keyword traffic analysis on AHREFs, Neural frames and Kaiber are the two most popular AI music video generators in 2024. This is corroborated by other signals like social media popularity, presence in software review articles, and the number of YouTube influencers creating content about the platforms.
It had been more than six months since we tested Kaiber, so we decided to try out both systems to see how they stack up as of June 2024. We used the same music clip and text prompts for both websites for experimental controls.
In the end, we found that neural frames was superior for audio-reactive videos. It delivered more accurate AI image output, had an easier onboarding process, and provided more granular control over the audio-reactivity. Their anti-flicker feature solved a problem that surfaced with our Kaiber experiment.
On the other hand, neural frames does not offer video-to-video. Kaiber does offer vid2vid and after testing the same short video clip RunwayML and Pika, we found that Kaiber had the best output.
Continue reading for a walkthrough on each system and a demo of what the AI music videos looked like at the end of that process. You can use these tutorials to guide yourself through their systems (and set up realistic expectations before signing up).
Neural frames: Audio-reactive AI music videos
Based on the four most popular tools that we tested and will be reviewing in this article, we felt Neural frames was was the best AI music video generator. Despite having a less flashy interface than Kaiber, the actual output was substantially better in our experience. We'll show you what we mean in the following review.
How to create an AI music video with neural frames
To get started, sign up for a free trial. Once you're in, you'll be asked to pick from one of their visual models. Choose a style that matches the look and feel you're going for. There's also an option to create and fine tune your own custom models.
Imagery and art style for these videos are based on text prompts, just like other popular text-to-image services. You can describe any kind of image you want to create and if you're concerned that it's too basic, hit pimp my prompt to get a more elaborate description.
On this screen, you can also select the layout format you want, including a 1:1 square, landscape, and portrait views. Hit the render button to generate four images. Choose one and then move on to the video configuration step.
We'll be testing out the 1:1 ratio so we can use the video on an Instagram post. Landscape is best for YouTube format while portrait works for TikTok and Spotify.
Here's what the neural frames video editor looks like. Our text prompt called for little bubbles floating in a fantasy world, where a castle on a top of a chessboard, floating on a cloud in the sky. The AI image captured that concept perfectly:
How to add music to neural frames
We double clicked on the timeline row next to the music icon (shown in the screenshot above).
A popover asked us to choose from existing music provided by neural frames, or to upload our own music file.
We uploaded a 10 second music clip.
It took about 15 seconds for neural frames to split the music into stems and render the song's waveform on the timeline as shown below.
The next step in this process is to pick an instrument layer that will act as the modulation trigger. You can click the play button on any layer to hear what it sounds like and decide which would be the best choice for your animation.
In less technical terms, this means that every time a snare, kick drum, or loud guitar strum occurs, neural frames will signal to the video generator that it should change the imagery more dramatically at that moment in the song.
We isolated the snare and you can see below that the audio timeline now has a wave form located above it. These wave shapes represent the snare hits that will modulate our animation.
The trigger is used to modulate a specific target. If you click on that dropdown menu, you can select what you want it to do. This is a second layer of control that Kaiber does not offer. It defaults to strength, however we could have also selected motions like panning, zooming, rotation, and more.
We updated the duration from its 3 second default to 10 seconds, so that it covered the full length of our clip. With those modulation settings configured, we returned to the main screen and extended the prompt and modulation bars to match the full length of the audio clip. Then we hit render and waited.
Modulation controls are a major difference between neural frames and kaiber.
Kaiber did not let us choose an instrument layer to act as our trigger. They tried to handle this automatically, but this created a problem. Kaiber's video did not show any signs of audio reactivity, despite being configured to do so. We'll share more on that later in the Kaiber walkthrough section.
A note about rendering speeds: Neural frames renders one image at a time with animation frame rate of 25 FPS. This means that rendering times can be quite slow even for a short video.
To speed that up, you can switch on the new Turbo Mode feature that was added in June 2024. It will increase your rendering speed by 400%.
Rendering and exporting your neural frames video
We opted for the slow mode in order to get the highest image quality. Since we're using the Juggernaut XL model, it took about 5 minutes to render ten seconds. The speed varies depending on the model you choose. We were happy with the results and hit the download icon in the upper right corner of the screen to export.
After confirming the export, we were routed to our video collection where we waited for it to upscale to 2x quality. Here's what that dashboard looks like:
It was ready to go within a couple of minutes. Here's how the final neural frame AI music video turned out. Notice that the animation very clearly changes with each snare hit, while maintaining a slower and consistent movement between those modulations.
That covers the full process we went through for this round. Neural frames is actually capable of a lot more. It can animate a single image as we showed here but it can also animate in between two image keyframes to create a kind of morphing effect. Check out their blog for more tutorials and announcements.
Watch a neural frames compilation of their best AI music videos of 2023 below:
Kaiber: Trouble with audio-reactivity & flickering
Kaiber made headlines back in 2023 after a collab with Linkin Park that garnered 80M+ views on YouTube. Their system includes three different types of AI music video generation; the neural frames style flipbook (frame-by-frame), a video-to-video system that paints over your existing video with an animation, and a fluid motion option that looks more like traditional animation.
Their video to video feature is unique and something the neural frames does not currently offer. Check out a video walkthrough of how it works here:
We tested the system out in June 2024 to see how it's currently set up. Here's a summary of our experience, including some problems we ran into. In short, the audio reactivity didn't work properly and our video had a buggy, flickering effect that ruined our free test during the trial.
How to get started with Kaiber
You can sign up with Kaiber for free. The site will take you through a seven-step questionnaire that you have the option to skip (for some reason the screens says they will ask only three questions but that wasn't true for us).
Two paths: We tried two versions of the onboarding, one where we filled out the survey and one where we skipped it. Here's the main difference:
When we skipped the questionnaire, we received 60 generation credits and did not have to enter our credit card information.
When we filled out the survey, we received 110 credits but had to add our credit card or connect paypal.
After completing the survey, we reached a dashboard where we were encouraged to "create your first video". When we hit that button, there was a new popover that forced us to add a payment method before we could continue.
Yet another roadblock to the creative experience, but okay. We connected our PayPal account and moved forward. Kaiber's free trial comes with 100 free credits. It wasn't clear what that equated to in terms of seconds of video rendering output.
How to create an AI music video with Kaiber
Choose an AI video model
Upload a short 24 second music file to use for the video
Write a detailed text prompt and image style prompt
Review the video settings to select the video layout
Note that in the screenshot above, as we're first entering our prompts, there's no indication of how many credits it will cost to create an AI animation for this audio file. You have to click through the "video settings" button to see that detail.
On that next screen, there's a tiny bit of text in the bottom right that specifies the credits cost: 115/100. This was easy to miss while we were choosing aspect ratio, model, motion settings, and audio reactivity level.
We only noticed this limitation at the very end, as we went to generate previews.
The generate preview button teases you with four AI-generated images that you will use as your starting frame. There's no credit cost indicator on this "preview frames" screen. You have to click the large "create video" button and then...
The excitement builds, you're about to get your first AI music video for a short <30 second clip. Then they throw up the "insufficient funds" paywall and a second error message in the top right. Both of them inform you that you don't have enough credits to run this:
By now, you've invested ~10-30 minutes into their questionnaire, adding your payment card, coming up with a prompt, choosing a style, and configuring your settings. Abandoning the process would be painful and so most users will buckle.
There's no option to trim the audio clip, which is how you reduce the number of credits spent.
Instead you have to quit the process and start over, manually edit the audio clip on your local machine, re-upload it, rewrite the prompt and go through the settings configuration process a second time.
By reducing the music file to 10 seconds, we were able to get the credit cost down to 55/100 and move forward with our generation. This was a shot in the dark because Kaiber's interface didn't transparently tell us the equation for calculating credits per second of audio.
Unlike neural frames, Kaiber's AI image generator did not reflect the prompt accurately. We asked for the same image of a castle sitting on a chessboard that floats on a cloud in the sky. What we got was a castle on an island in the water, with clouds in the sky. There's no chessboard to be seen:
How to render and export your Kaiber video
We hit "create video" and were taken to a render screen where it took about five minutes to complete the rendering for our 10 second clip. This is a normal speed for AI video generation tools. They're much slower than AI image or AI music apps.
The two videos we created were disappointing. Neither showed any sign of audio reactivity, despite test on the medium and high settings. As you can see in the short demo below, the video has a flicker bug and the motion video does not respond to loud transients like the snare drums or sudden strums of the guitar.
Flickering is a known problem in AI video generators that use diffusion models. Neural frames recently rolled out a patch to their system and announced their new anti-flicker feature in June 2024.
Restyling with Kaiber's video to video model
We had a better experience with Kaiber's video to video model, but it wasn't audio reactive and the imagery wasn't ideal either. As you can see in the output below, one of the guitarists gained a kind of cowboy hat. However, the drummer's hat remained a beanie and he was suddenly holding a guitar. The bassist in the foreground still had a regular t-shirt with no hat or cowboy gear.
Runway got much closer to the image that we requested, but the frame rate was extremely low and clips were limited to 4 seconds on the free plan. Paying customers can upload 15 second videos on the paid plan.
Like RunwayML, we got closer to the cowboy aesthetic with Pika, but the image quality was really bad. Like Kaiber, the drummer suddenly had a guitar in their lap. Pika's maximum input video file size is 10mb on the free plan, so the clip had to be short and compressed.
Overall, the results of video-to-video with Kaiber, RunwayML, and Pika all fell short of our hopes and expectations. But these were just initial tests. Like any tool, with enough tweaking and experimentation it may be possible to do great things.
This band Contrast produced the best Kaiber AI music video that we've seen to date. It bears some resemblance to the aesthetic of Linkin Park's Lost video, also made with Kaiber. However, as user generated content it's easier to validate that high quality video-to-video restyling is in fact possible with the app.
Alternatives to neural frames and Kaiber
Noisee and Deforum are two alternatives to neural frames and kaiber. However, they offer very limited control by comparison. Neither tool is audio reactive, though Noisee does support audio upload. Here's a quick overview of each tool.
Deforum has a popular grassroots community but their commercial web application Deforum Studio seems to still be a work in progress. Deforum offers text-to-video but does not include audio-reactivity. This means you can use it to create animations but they won't respond to music, which makes it a bit less appealing for creating AI music videos.
Noisee accepts a few different types of audio sources. You can use mp3, mp4 and wav files or use URLs from popular sites like YouTube and Suno. As you can see in the screenshot above, there is virtually no control over the output other than a layout format and text prompt field.
We used the same prompt types as before and received mediocre results. The input prompt was expanded without our consent, and it omitted some of the key ingredients like the bubbles. As you can see on the left navigation menu, there's a series of key frames every seven seconds that are totally unrelated to each other.
Noisee doesn't really create an AI music video, but it will create animations that pan slowly and attach your audio file to it. Maybe the system will get better in the future but for now, we wouldn't recommend it.
Based on all of these experiments, we felt neural frames offered the best results with the most control. They update their user interface regularly and maintain an active presence on the neural frames Twitter account so it's easy to follow along and see what they're up to.