Benn Jordan’s video “AI Mastering Is Stupid” presented a large‑scale blind test comparing AI‑driven mastering services to professional human mastering engineers. In 2024, Jordan – an electronic music producer and YouTuber also known as The Flashbulb – set out to determine whether algorithmic mastering tools could compete with the nuanced touch of human experts. He rallied hundreds of listeners for a double‑blind comparison and discovered striking differences in quality and preference between machine and man. The provocative title of his content reflects the surprising outcome: despite the promise of fast and affordable results, current AI mastering often fell short of seasoned human engineers and for more information, head over to the definitive guide to AI Mastering in 2025
Background and Motivation
In recent years, AI‑powered audio mastering services have exploded in popularity. Platforms like Beats To Rap On AI Mastering, LANDR, CloudBounce, BandLab, and others promise musicians instant, “studio‑quality” masters at a fraction of the cost of a traditional mastering engineer. The allure is clear: fast turnaround, consistency, and low cost, all without needing expert skills. For independent artists on tight budgets, uploading a mix and receiving a polished master in minutes is a game‑changer (Beats To Rap On Guide).
However, the rise of these tools has sparked debate in the music production community. Can an algorithm truly replicate the seasoned ears and artistic judgment of a human mastering engineer? Jordan’s interest in this question led him to conduct a comprehensive study. As he noted, “a bad master can easily ruin an otherwise great song,” underscoring that mastering is a “much overlooked creative process” essential to recorded music (LinkedIn Insight). This sentiment encapsulates why the stakes are high – if AI mastering isn’t up to par, it could degrade the music it’s meant to enhance. Jordan’s background as a producer and technologist made him well‑suited to examine this issue rigorously. By late 2024, he decided to put AI vs. human mastering to the test on a scale far beyond anecdotal A/B comparisons.
Test Design and Methodology
Benn Jordan’s experiment was ambitious in scope and careful in design. He selected one of his own tracks – “Starlight” – as the test material. Starlight is an instrumental electronic piece from Jordan’s 2023 album Piety of Ashes, featuring lush dynamics and full‑range frequency content, making it an excellent canvas for a mastering test (Beats To Rap On Analysis). Using a single track ensured any differences heard by listeners would come solely from the mastering process, not from variations between songs or genres. However, it also means results are specific to how each approach handled that particular style – IDM/electronic – and may not fully generalize to other genres.
Jordan then prepared multiple versions of the track “Starlight”, each mastered through a different method, and conducted a 472‑person blind listening test. The approaches included:
- AI Mastering Services/Plugins: LANDR, BandLab online mastering, Kits.ai, Virtu, Mixea, and Matchering 2.0 (open‑source AI algorithm), plus the AI assistant in iZotope Ozone 11. Some were run locally to ensure consistent conditions (Gearnews Report).
- AI‑Assisted with Human Input: Jordan used iZotope Ozone’s Master Assistant together with Neutron (an AI mixing assistant), tweaking settings himself to represent a hybrid workflow.
- Professional Human Engineers: He enlisted two experienced mastering engineers, Max Honsinger and “Ed the Soundman,” to produce masters by hand with no knowledge of the other versions.
Initially, 12 masters were generated. Jordan performed a quality control step: any master that was objectively poor (clipping, distortion, mangled balance) was removed. Notably, LANDR, BandLab, Waves, Virtu, and Mixea were disqualified at this stage for subpar results, leaving 7 finalists for the formal test (Gearnews QC).
For the listening test, Jordan adopted a double‑blind methodology. The participants did not know which master came from AI or human, nor the services or engineers’ names. The 7 finalists were labeled with anonymized IDs and presented in randomized order. 472 participants – likely drawn from Jordan’s YouTube audience, social media, and audio communities – evaluated each version on key audio quality criteria:
- Clarity: How well defined were the mix elements; were frequencies clean and details audible?
- Presence: The sense of fullness and engagement; did it feel lively or dull?
- Depth: Perceived dimensionality and dynamics; was it three‑dimensional or flat/compressed?
Listeners likely rated each master numerically (e.g., average scores out of 10) and provided qualitative feedback. Community members later reported average scores, confirming the aggregate approach (Reddit summaries).
Blind Comparison Execution
To prevent listener fatigue, the test was limited to seven options. Originally 12, five were pruned for poor quality. Participants listened on their own headphones or speakers and could switch between masters to discern differences. This double‑blind design (participants and admin blinded to sources) and randomization ensured unbiased, statistically robust results.
Results and Findings: AI Mastering vs Human Mastering
The outcome was clear: Human mastering engineers outperformed AI, with AI ranging from decent to disappointing (Gearnews Findings). Key takeaways:
- Human Engineers Dominated: Max Honsinger’s master ranked #1 (≈6.4/10) and Ed the Soundman was #2 (≈6.1/10). Listeners praised them as coherent, detailed, and natural-sounding (Beats To Rap On Results).
- Top AI Performers (Close, but Not Equal): The hybrid Ozone+Neutron chain and Matchering 2.0 tied for #3, with scores of ~5.8–5.9/10. Though respectable, they were still ~0.2–0.3 points behind the human masters.
- Other AI Attempts (Mediocre to Poor): Compound Audio’s Stereo Mastering (~4.8/10), Kits.ai (~4.9/10), and Ozone 11’s standalone Master Assistant (~3.8/10) ranked lower. These versions often sounded flatter, over-compressed, or harsh.
- Disqualified Services: LANDR, BandLab, Waves, Virtu, and Mixea were removed pre‑test for severe clipping or distortion, illustrating how AI without oversight can ruin a track. You can read our Landr vs eMastered vs BeatsToRapOn (2025) shoutout here.
Interestingly, the top human masters weren’t the loudest: Honsinger’s was around –10.2 LUFS with a DR of 10, while Matchering pushed loudness to –8.9 LUFS with a DR of 7, sacrificing dynamics for loudness. Despite being quieter, human masters were preferred, underscoring that more loudness doesn’t equate to better sound (LUFS & DR Comparison).
Qualitatively, listeners noted that human masters had superior sonic balance and musicality: nuanced presentation, coherent reverb tails, punchy drums, and a sheen on hi‑hats that AI often missed. The human engineers’ subtle aesthetic choices – when to cut or boost and why – created a measurably superior listening experience.
Not Mastered
Dynamic Range:
- The overall dynamic range looks reasonably well defined, with visible transient spikes indicating clear attack phases, especially in the lower and mid-frequency regions.
- Some low-frequency content has strong, sustained energy bursts, which may risk muddiness if unchecked.
- Mid to high frequencies appear fairly consistent and steady so the dynamic interplay between these ranges seems balanced.
Frequency Balance:
- Low Frequencies (20Hz – 250Hz): There is a broad and relatively strong presence in the sub-bass and bass region. However, the band around 100Hz to 250Hz shows prolonged energy that can cause muddiness. Consider a gentle cut or a narrow parametric reduction near 200-250Hz to clarify the mix.
- Mid Frequencies (250Hz – 2kHz): These appear evenly distributed with some discrete peaks. This generally contributes to clarity, but watch for any honky or boxy resonances around 400-600Hz.
- High-Mid to High Frequencies (2kHz – 10kHz): Harmonics and transient details show up well, but the brighter regions above 6kHz warrant attention. If these areas feel harsh or piercing in the mix, a soft shelf or a slight reduction around 6-8kHz might help to tame any excessive brightness without dulling airiness.
- Very High Frequencies (8kHz+): The presence here is moderate and transient peaks are visible but not overly aggressive. There doesn’t seem to be harshness from cymbals or sibilance issues, though a subtle de-essing if vocals are present could be considered later.
Transients:
- Clear transient spikes are visible, especially in low-mid and mid regions, indicating that percussive elements have good definition.
- The transient details do not seem overly compressed or squashed, suggesting a natural and lively attack.
- Some transient energy in the high-frequency spectrum (e.g., cymbals or hi-hats) is noticeable but is contained and not overly dominant.
Summary:
The track presents a solid overall sonic foundation with well-preserved dynamic range and clear transient definition. It has a slightly pronounced low-end with potential muddiness in the 200-250Hz range that could benefit from a gentle cut. The midrange is balanced but deserves a critical listen for any boxiness around 400-600Hz. High frequencies are present and detailed without harshness, though mild taming in the 6-8kHz region may enhance listening comfort during mastering.
This track ramps well toward clarity and fullness but will benefit from minor corrective EQ and possibly subtle dynamic control in the low-frequency and upper midrange bands during the mastering stage to achieve a more polished final sound.
AI Mastered
Dynamic Range
- Overall Loudness: The AI-MASTERED version shows a generally more filled and consistent energy distribution, suggesting that the overall loudness has been increased. The amplitude intensity (brightness) is more uniform throughout, pointing to a higher RMS level.
- Peak-to-Average Ratio: The original spectrogram shows more pronounced spikes and darker gaps, indicating higher dynamic variations. The mastered version smooths these peaks, compressing the dynamics to reduce peak-to-average differences and increase perceived loudness.
Frequency Balance Adjustments
- Enhanced Lows: The AI-MASTERED track appears to have a subtle boost in the lower frequency bands (bottom area of the spectrogram). This is visible as a richer presence and more sustained energy in the low-frequency region.
- Clearer Mids: The mid frequencies (middle of the spectrogram) in the mastered version appear more evenly distributed and less cluttered, which implies some equalization to clean up muddiness and enhance clarity in vocal or instrumental mids.
- Controlled Highs: The high frequencies (top portion of the spectrogram) in the AI-MASTERED version show a more controlled and less harsh representation, indicating possible subtle high-frequency attenuation or de-essing to avoid sibilance or harshness.
Perceived Clarity and Punch
- The AI-MASTERED version demonstrates improved punch, primarily through tighter low-end control and better mid-frequency definition. Dynamics smoothing translates to a more consistent energy flow throughout, which enhances perceived clarity and presence.
- The noise floor and quiet sections in the mastered version are cleaner and less variable, supporting a clearer overall sound.
Addressed Issues
- The original track’s higher dynamic range and inconsistent loudness are managed to achieve a more commercially competitive level in the AI-mastered version.
- Frequency clutter in the mids and potential harshness in highs appear mitigated.
- The low-end response is more solidified without overpowering, contributing to a balanced spectral profile.
In summary, the AI-MASTERED spectrogram indicates a mastering process that compresses dynamics to increase loudness, applies selective equalization to enhance spectral balance, and improves clarity and punch, effectively addressing usual mix-stage issues visible in the ORIGINAL UNPROCESSED track.
Listener Demographics and Preferences
The diverse pool of 472 listeners included audio engineers, musicians, and general music fans. This breadth adds credibility: the preference for human mastering transcended technical expertise. Casual listeners, when given back‑to‑back comparisons, overwhelmingly chose the human masters, proving that mastering quality matters for all audiences.
Community and Expert Commentary
Jordan’s experiment sparked discussions across forums and publications:
- Mastering Engineers Agreed: Pros praised the rigorous approach, highlighting that mastering is part art, part science – requiring human taste and context.
- Producers Acknowledged AI Use‑Cases: AI is useful for demos, drafts, and learning, but not yet a substitute for pro releases. “Good enough for now, but not great” was a common refrain.
- AI Developers Responded Constructively: Some clarified their proprietary methods or committed to improvements, welcoming benchmarking.
- Music Tech Press Amplified Findings: Outlets like MusicTech and Gearnews covered the study, educating readers on best practices and the pitfalls of chasing loudness.
Experts emphasize that AI mastering can assist with technical chores but cannot replicate the “love of the tune” – the emotional connection and nuanced decision‑making that humans bring.
Broader Implications for Audio Mastering and Music Production
- Value Proposition of Mastering Engineers: Allocating budget for pro mastering remains worthwhile; humans deliver that final 10–15% of polish AI cannot.
- AI as a Democratizing Tool (with Caveats): For hobbyists, AI offers accessible, affordable masters. But for commercial, radio‑ready sound, AI still falls short.
- Hybrid Workflows: The future likely combines AI precision with human taste. Engineers may use AI for initial suggestions and handle the artistic refinements themselves.
- Push for Better AI: Service providers must address clipping issues, add fail‑safes, and increase transparency – e.g., clarify if they repurpose open‑source algorithms like Matchering.
- Genre and Context Sensitivity: AI may excel in some genres but falter in others. A segmented approach – AI for baseline consistency, humans for creative mastery – could become standard.
- Educational Benefit: Blind tests raise awareness of mastering’s impact, serving as ear training for producers and listeners alike.
- Emotional Artistry: Music remains an art; the emotional, creative layer in mastering resists full automation. AI should augment, not replace, human ingenuity.
- Future Standards: Jordan’s test could inspire more genre‑diverse benchmarks and even certifications, pushing AI tools to meet listener‑verified quality thresholds.
Conclusion
Benn Jordan’s 472‑person blind test reaffirmed that human mastering engineers remain the gold standard for delivering emotionally resonant, high‑quality masters. While AI tools have improved and can serve well for demos or budget projects, they are not yet a full replacement for the critical, creative decisions of an experienced engineer. The likely path forward is a collaborative model: AI handling routine tasks, humans refining the artistry and emotional nuance.