AI is Used by “Enhance Speech” to improve the Sound of Poor voice recordings
Some poor voice recordings can be improved by removing background noise and enhancing the voice using a recent free AI-powered audio processing tool from Adobe. When it works, the final product has the same sonic quality as a recording made in a professional sound booth using a top-notch microphone.
Project Shasta, a research endeavor into Artificial intelligence, is where the new tool, called Enhance Speech, first appeared. Project Shasta was recently renamed Adobe Podcast by Adobe.
Although using a desktop web browser is the recommended method for using Enhance Speech, doing so requires creating an Adobe account. Users can upload MP3 or WAV files up to 1GB in size or an hour long after logging in and completing their registration. Once the audio has been cleaned up, you can either download it or listen to the finished product in your browser.
In our tests with the service, Enhance Speech performed best when used with audio that contained a voice without excessive noise or crosstalk. For instance, after processing the audio with Enhance Speech, we recorded audio from the built-in microphone of an iMac of a person standing 10 feet away. The audio produced sounded like it had been recorded closely with a professional microphone in a quiet studio, including the sound of nearby fans.
How does it function in Adobe?
Adobe did not provide any information, but we believe that a deep-learning model was trained on a significant amount (possibly thousands of hours) of both clear and noisy audio. The model could then “learn” to recognise the frequencies of human voice and create a replica that closely resembles the original. We have contacted the company for comment, but until Adobe provides more technical information, this is just conjecture.
On that note, some commenters on Hacker News have mentioned hallucinated outcomes from extremely noisy audio (like speech recorded next to a waterfall) or from non-English language sources, which implies that Enhance Speech is doing more than just a basic noise-reduction technique. These outcomes incorporate unexpected output, such as phantom voices, where the AI incorrectly interprets the audio input.
Although Enhance Speech also uses AI to reduce noise, it is not the first tool to do so. For example, a similar task is carried out by the open source programme mayavoz and the commercial service Audo Studio, both of which are available for purchase.
It’s important to note that Enhance Speech is a component of a larger collection of AI-powered podcasting tools from Adobe, which also includes a Mic Check tool (currently offered for free) and a transcript-based audio-editing tool that is still in an invitation-only beta test.
Sharing is Caring, don’t forget to share POST with your friends