A Remarkable Achievement (Old Fashioned Software)

In 1997, a German musician, mathematician and guitar-maker named Peter Neubäcker asked a question that would change recording forever:

What does a stone sound like?

This wasn’t poetry. It was a genuine philosophical inquiry into the nature of sound itself.

Neubäcker wasn’t a conventional signal processing engineer. He was a philosopher, a specialist in harmonics, a luthier, and a student of Pythagoras and Johannes Kepler. He spent years studying a single plucked guitar string. He 3D-printed a two-foot model of its waveform. He built Pythagorean monochords in his Munich apartment. He was obsessed with understanding what sound is made of — not as raw waveform data, which is how every other engineer approached it, but as musical phenomena. Notes. Harmonics. The mathematical relationships between tones.

When a researcher visited his apartment in 2016 and asked him about Melodyne as a pitch correction tool — the thing it’s most famous for — he said: “I don’t know so much about that. To me it’s all rubbish.”

He didn’t care about tuning pop vocals. He wanted to understand the true nature of sound. The stone question was about whether software could perceive the resonant character of any object — the harmonics, the overtones, the timbral signature — the way a musician perceives them. Not as data. As music.

That obsession is the origin of everything that followed.

Here’s something most people don’t think about.

When you record a band playing together — or even a single piano chord — the result is a single waveform. One continuous stream of audio data. Every instrument, every note, every harmonic is folded together into one signal.

Getting them back apart was supposed to be impossible. Not difficult. Impossible. The standard analogy in signal processing was: it’s like trying to un-bake a cake. You can’t separate the eggs from the flour once they’ve been mixed and cooked.

For decades, this was just accepted. If you wanted isolated parts, you needed the original multitrack recordings. If those didn’t exist — if all you had was the final mix — you were stuck. You could EQ out certain frequency ranges. You could use phase cancellation tricks. But you couldn’t cleanly extract a vocal from a guitar from a bass from a drum kit.

Peter Neubäcker didn’t accept that.

In 2000, Neubäcker co-founded a company called Celemony in Munich. In 2001, they released Melodyne.

The first version worked on monophonic audio — single voices, single instruments. You could take a vocal recording and see every note laid out as a blob on a pitch grid. Move a blob up, the pitch changes. Stretch it, the timing changes. It was remarkable, but it was still operating on one note at a time.

Because Neubäcker approached sound as music rather than as data, Melodyne understood recordings differently from every other audio tool. It didn’t see a waveform. It saw notes — with pitch, duration, timing, and harmonic structure. It perceived the musical content within the signal.

That philosophical difference — treating audio as music rather than maths — is what made the next breakthrough possible.

In 2008, at the Musikmesse in Frankfurt, Neubäcker demonstrated something the audio engineering world had considered flatly impossible.

He called it DNA — Direct Note Access.

He played a recording of a piano chord. Multiple notes, all sounding simultaneously, all baked into one waveform. And then he reached in and moved a single note. Changed its pitch. Left the others untouched.

The audience lost their minds.

The key technical challenge was separating overlapping harmonics. When two notes play simultaneously, their harmonic series interleave and overlap in frequency space. Working out which harmonics belong to which note is, as one writer put it, “a mind-curdling problem.”

Neubäcker had been chipping away at it for over a decade. Every other engineer in the field relied on a technique called FFT — Fast Fourier Transform — the standard mathematical method that underpins virtually all digital audio analysis. FFT takes a sound wave and converts it into a frequency spectrum: it tells you what pitches are present in a signal. It’s been the foundational tool of digital signal processing since the 1960s. It’s what everyone uses. Neubäcker tried it, found it wasn’t precise enough to separate closely overlapping harmonics from multiple simultaneous notes, and essentially threw it out. He built his own analysis system from scratch — something far more precise, rooted in his understanding of how musical harmonics actually behave. He abandoned the standard toolkit because the standard toolkit couldn’t hear what he could hear.

The result was an algorithm that could perceive the individual musical components within a polyphonic signal.

It took until November 2009 to ship as a product — so long that some users suspected the demo had been faked and Neubäcker had absconded to the South Seas. But it was real. And it worked.

In 2012, Celemony received a Technical Grammy — the music industry’s equivalent of a lifetime achievement Oscar for engineering. Pete Townshend called Melodyne “a miracle.” Midge Ure called it “black magic.”

They weren’t wrong.

I want to pause here because something important gets lost in the current conversation.

DNA Direct Note Access was not built with machine learning. It wasn’t trained on data. It didn’t require GPU clusters or vast datasets of labelled audio. It was engineered — by a man who understood harmonics deeply enough to write algorithms that could perceive musical structure within a waveform.

This is old-fashioned software engineering in the most admirable sense. A human being understood a problem so deeply that he could write rules to solve it. Not approximate it. Not statistically guess at it. Solve it.

In 2026, that feels almost quaint. We’re surrounded by AI tools that achieve remarkable things through pattern-matching at scale. And those tools are impressive. But there’s something different about a solution that comes from understanding rather than training. Something that deserves recognition.

Neubäcker didn’t ask: “What does a neural network think this waveform contains?”

He asked: “What does a stone sound like?” — and spent a decade building an answer from first principles.

From 2018 onwards, machine learning entered the field of audio source separation. But here’s the thing: the AI approach has almost nothing in common with what Neubäcker built.

They are two completely separate paths to a related destination.

Melodyne DNA works by understanding music. It analyses the physics of harmonics — the overtone series, the way notes combine, how harmonic signatures interleave. It identifies individual notes within polyphonic audio using hand-built algorithms rooted in acoustics and mathematics. It’s rule-based. It knows why sounds combine the way they do. It operates at the note level — individual notes within a chord.

AI stem separation works by pattern recognition. Neural networks — originally built for image segmentation and medical imaging — are trained on massive datasets of isolated stems paired with their mixed versions. The AI learns statistical patterns: what vocals “look like” in a spectrogram versus drums versus bass. It doesn’t understand harmonics. It doesn’t know what a note is. It recognises patterns it’s seen before. It operates at the instrument level — vocals versus drums versus bass versus other.

The AI tools didn’t build on Neubäcker’s algorithms. They didn’t even come from the same branch of computer science. The neural network architectures — U-Net, convolutional networks, transformers — were borrowed from fields like image processing and biomedical research. Deezer’s Spleeter, released as open source in 2019, used pre-trained deep neural networks. LALAL.AI developed successive generations of neural networks (Rocknet, then Cassiopeia, then Phoenix), each trained on increasingly vast datasets. The models got better not through deeper understanding of sound but through more data and more sophisticated statistical processing.

The quality kept improving. Four-stem separation (vocals, drums, bass, other) became standard. Then six stems. Then more granular: piano, strings, guitar, synthesiser. Companies like iZotope, AudioSourceRe, and Apple all shipped their own implementations.

By 2025, stem separation had gone from impossible to ubiquitous.

But here’s the thing: the AI tools didn’t prove it was possible. Neubäcker did. He un-baked the cake first. He demonstrated — using nothing but his understanding of sound — that a mixed signal could be decomposed into its musical components. The AI world arrived at their own, completely different way of doing it. But the door was already open.

Two parallel paths. One from philosophy and harmonics. One from statistics and data. Same impossible destination.

In November 2025, Ableton released Live 12.3. And included in the update was the feature that changed how I think about music.

Built-in stem separation. Right there in the DAW.

Right-click any audio clip. Select “Separate Stems to New Audio Tracks.” Wait a few seconds. And suddenly your single stereo file is four separate tracks: Vocals, Drums, Bass, Other. Each on its own channel. Ready to edit, effect, rearrange, or delete.

Ableton’s implementation is powered by technology from Music AI (the team behind Moises). It uses machine learning — the AI path, not the Neubäcker path. It runs locally on your machine, no internet connection required, and the results are impressive. Not perfect. You occasionally get bleed between stems, or a hi-hat that ends up in “Other.” But for practical creative work, it’s extraordinary.

Two clicks. A few seconds. And you have stems.

This isn’t just a convenience feature. It’s a fundamental shift in what’s possible for anyone with a laptop and a DAW.

Before stem separation, remixing required access. You needed the original multitracks, which meant you needed to know people, or have a label deal, or get lucky. The raw materials were locked away. If all you had was the final mix — the MP3, the WAV, the YouTube rip — you could listen to it but you couldn’t work with it.

Now you can.

Any song. Any recording. Any audio file. You can pull it apart into its components and rebuild something new. Not with perfect fidelity — this isn’t magic — but with enough quality that the results are usable, musical, and good.

Every finished track is now also a library of raw materials. Every mix is a starting point. Every song you’ve ever made — even the ones buried on old hard drives, exported years ago, project files long since lost — is now a source of stems you can work with again.

That realisation is unbelievably freeing.

1997 – Peter Neubäcker begins research. “What does a stone sound like?”
2001 – Melodyne ships. Monophonic pitch editing. Revolutionary but limited.
2008 – DNA demonstrated at Musikmesse. Polyphonic note editing. Supposed to be impossible.
2009 – DNA ships as a product.
2012 – Technical Grammy for Celemony.
2018/19 – Machine learning enters the field. Spleeter, RX, RipX. A completely separate approach.
2025 – Stem separation becomes a standard DAW feature. Two clicks.

Most of the current conversation about audio technology focuses on AI — and rightly so. The machine learning tools are what made stem separation fast enough and good enough to be practical for everyday use. They’re what put it inside Ableton, inside Logic, inside the browser. They’re what made two-click stem splitting a reality.

But the AI tools didn’t build on Neubäcker’s work. They came from a completely different lineage — image processing, biomedical research, statistical pattern recognition. They solve a related problem by a fundamentally different method.

What Neubäcker did was something else entirely. He proved — using nothing but his understanding of music, mathematics, and the physics of sound — that the impossible was possible. That a baked cake could be un-baked. That a mixed signal could be decomposed into its musical components. He did it with old-fashioned software: hand-built algorithms rooted in philosophy and acoustics, not trained on data.

That’s not a footnote in the AI story. That’s a separate story. And it’s remarkable on its own terms.

A philosopher in Munich asked what a stone sounds like, spent a decade building an answer from first principles, and opened a door that the entire world then walked through — twice, by two completely different routes.

The stone question led to everything.

— Mcauldronism

Source link

Leave a Reply Cancel reply