Stability AI, a generative AI company based in London, has recently introduced a new text-to-audio AI platform named “Stable Audio.” The artificial intelligence-powered platform is the company’s first push into music and sound generation. It can create songs up to 90 seconds long, making it perfect for a wide range of projects such as advertisements, audiobooks, and video games.
The company has been a major leader in the field of artificial intelligence. However, it was previously recognized mostly for AI-generated images. However, with the launch of its first text-to-audio generative AI platform, it now competes directly with other industry heavyweights such as OpenAI, Google, and Meta.
Stable Audio utilizes a diffusion model, which is the same AI model that drives the company’s more popular picture platform, Stable Diffusion, but trained on audio instead of images. It may be used to create tunes or background audio for any project. Furthermore, the Stable Audio platform overcomes the constraints of traditional audio diffusion models by undergoing music-specific training and adding text metadata that determines a song’s starting and ending times. This function allows users to create songs of any length, which is useful for music creation.
Stable Audio was trained using “a dataset consisting of over 800,000 audio files containing music, sound effects, and single-instrument stems” and text metadata from stock music licensing provider AudioSparx, according to the company. The collection contains around 19,500 hours of audio. Stability AI claims it has license to use copyrighted content because of its collaboration with a licensing provider.
Audio diffusion models could previously only generate audio samples with predetermined durations. This hindered their ability to create full tunes. Stability AI has enhanced the model to provide Stable Audio customers more flexibility in deciding the length of the generated song, giving them additional control over the creative process.
Stability Audio provides three unique pricing options for platform users.
- The free version allows users to create up to 45 seconds of audio per month for a total of 20 recordings. Although the free version restricts users from using the audio generated by Stable Audio for commercial purposes.
- The Professional level costs $11.99 and allows users to create 500 tracks, each of which may last up to 90 seconds.
- For companies looking for customized consumption plans and pricing structures, the Enterprise subscription is offered.
“Stable Audio represents the cutting-edge audio generation research by Stability AI’s generative audio research lab, Harmonai,” the company said in a statement. “We continue to improve our model architectures, datasets, and training procedures to improve output quality, controllability, inference speed, and output length.”
Notably, text-to-audio conversion is not a new idea. For some time, many major players in the field of generative AI have been exploring this idea. For example, Meta released AudioCraft in August, a collection of generative AI models designed to generate natural-sounding speech, sound, and music based on cues. AudioCraft, on the other hand, is now only available to researchers and select audio pros. Google has introduced MusicLM, which lets users make sounds, but it is also restricted to researchers.