Music generation by AI Darue and ChatGPT. To answer your question, yes, Riffusion is a model developed by Seth Forsgren and Hayk Martiros that uses artificial intelligence to create music from text. A fascinating aspect of Reffusion is how it uses stable diffusion, an open-source AI model, to convert text into images that can be played as audio files.
“This is a v1.5 stable diffusion model, just fine-tuned spectrogram images paired with text. Audio processing happens downstream in the model.” Describes the creator of Riffusion. increase.
Refusion works by creating a spectrogram from text input. A spectrogram is used to visualize the spectrum of frequencies of a signal as it changes over time. A sonograph is another name for a spectrogram when used with an audio stream.
A sonograph is a two-dimensional image where the x-axis is time, the y-axis is sound frequency, and color represents sound amplitude as a function of both axes.
Since a sonograph is essentially an image, Stable Diffusion can be applied to create a musical composition with Riffusion. “I use Torchaudio which has great modules for efficient audio processing on the GPU.”
After experimenting with Riffusion, I found that the AI does a good job of blending contrasting genres depending on the text input. For example, when you instructed the AI to play an Indian sitar to a hip-hop rhythm, the AI slowed down the sitar sound and combined it with the beat to create an infinitely looping AI-generated jam.
Riffusion is a big step forward in developing AI-generated audio from simple text. Even if you can’t use it to create royalty-free his AI music for material.