Comparing classical music interpretations

I built an audio player to easily compare multiple interpretations of the same piece. Here’s an interactive demo, and a video to give you a sense of how it works:


What does it mean to interpret classical music?

At first glance, sheet music is prescriptive: the composer has provided all of the notes, the dynamics (forte, piano), tempo (lento, presto) and changes in tempo (de/accelerando).

In practice, however, the interpreter has a lot of leeway. In some extreme cases, such as the Cadenza in solo concertos, the performer gets to improvize a melody based on a chord progression. Some pieces include ornamentation (eg. trills, etc) which are largely left up to the performer to interpret.

That said, cadenzas and ornaments are somewhat rare. In general, every piece is under-specified by the composer. This gives the performer a lot of leeway to express themselves through the performance, selecting tempo, phrasing, articulation and tone.

Example: Bach’s Goldberg Variations

The Goldberg Variations were composed by Johann Sebastian Bach in 1741, and then popularized by Glenn Gould in his debut album in 1955, transforming a work once considered esoteric into one of the most iconic piano recordings.

In 1981, a year before his death, Gould recorded the pieces again. After a long period of reclusion, he was able to revisit the variations and produce a completely different take. In an interview, he said:

…since I stopped playing concerts, about 20 years, having not played it in all that time, maybe I wasn’t savaged by any over-exposure to it…

Compare Gould’s 1955 and 1981 recordings

Both the 1955 and 1981 recordings are available on YouTube, of course. I found that listening to two distinct performances is not the same as having one integrated player. So I built one: a player specifically for comparing multiple interpretations of the same piece.

Here is a demo that lets you compare the first variation from the Goldberg Variations. Try it out here. You can use your keyboard to skip between interpretations (↑, ↓) just as easily as you can seek within a track (←, →). The mouse works as well. Note that I haven’t tested at all on mobile. Sorry, it’s just a prototype and I’m on paternity leave 😇

I also tried it on Mozart’s Requiem

I am a huge fan of Mozart’s Requiem, and once came across an online thread debating which conductor’s performance was the best. I soon found myself listening to a dozen or so different versions of the same piece. When I was a younger music appreciator, I would often wonder what the point of a conductor really was. I no longer have this question.

Just to give you a taste for how different the interpretations are, here’s an example of three conductors performing the Introitus, the first movement in the Requiem. Check it out here, but be patient as it may take a minute to load and decode the audio. Böhm’s brooding tempo and lumbering chorus (ugh) contrasts especially well with Levin’s crisp and minimalist take.


Technical details

For this prototype, I focused on creating a reasonable UI to play back and interact with multiple time-aligned performances of the same piece. An index file specifies metadata for each track, most importantly the URL to the label file and the URL to the audio file. Each label file is a text file with lines in the format START_TIME END_TIME BAR_NUMBER.

To create the label files, I manually annotated the waveform. Even with Audacity’s extremely useful label track feature, it was a lot of manual work to go through the score, and find each bar’s time range in each recording. At the end of the day, I had start and end times for each bar. For times that don’t fall exactly on bar lines, I linearly interpolate between the bar boundaries, which works reasonably well, but is sometimes a bit off. More granular timing references would address this better, but that currently means doing more manual labor. No thanks!

Science, help me automate this, please

An obvious question is how to automate the labor of synchronizing a recording to a score. In general, I think this is an unsolved problem, especially for complex tracks containing hundreds of instruments and varying levels of background noise.

An promising approach that could work for solo piano music might be to use something like Onsets and Frames to extract piano rolls and then apply something like a Dynamic Time Warp (DTW) in piano roll space. A more general approach might be to synthesize each bar into raw audio (from MIDI), and then align recordings to synthesized audio using something like DTW based on a Constant-Q transform (CQT).

My brief and ill-guided attempts to do something like this on real-world examples didn’t yield good enough results. Any ML/DSP experts want to take this on?


This is a post by Boris Smus, originally from Boris’ website, posted to XRDS with permission of the author.

Information superwhichway revisited: XRDS, 24 years ago

Back in September 1994, the ACM took a bold step into the mostly-unknown, and started its first digital-only publication — Crossroads: The ACM Student Magazine. It has changed through the years, including the transition to a dual-format, digital+printed magazine it is today (and which today seems to be the norm). I found it very interesting (and fun!) to take a look at our first issue, trying to peek into the future that was being forecasted for us almost a quarter of a century ago.

Very aptly, this first issue’s main topic is The Internet. Quite a bold step back then! While the Internet had already existed in some form since the late 1960s, and in a form very similar to what we now use (TCP/IP based networking) since 1983, its use was mostly restricted to academia and military research and communications; while Crossroads was aimed at students on Computer Science-related disciplines, a majority of them didn’t even know much about what this network was about if not for specific needs of their tutors.

Crossroads’ first editor, Saveen Reddy, mentions in his editorial: “The theme of this issue is the Internet and computer networking. These represent relatively recent inventions. However, the general public’s knowledge and appreciation for them is even more recent, spurred on by a deluge of coverage by popular media. Unconfined to military or research purposes, the Internet has grown rapidly. Currently experiencing rapid growth for commercial uses, it is becoming a global resource”.

Commercial use of the Internet had only been allowed in 1993, and its growth was truly explosive. While most of current XRDS readers won’t remember what happened in computing by 1994, I have the relative luck to be a latecomer to formal studies in my life; having been a computer enthusiast as a teenager in the early nineties, I can still remember a world before the Internet.

In its early days, media would usually refer to the Internet as The Information Superhighway — We would laugh at the moniker. And, of course, so did Purdue student Craig Pfeifer when he wrote his article, “Information Superwhichway?”. Of course, if you look at the specific technologies it mentions, the article is indeed old and dated — USENET newsgroups? Apple Newton? FTP and Gopher? Fax machines? MUDs (Multi User Dungeons)? Telnet? Please!

But a slightly deeper reading… Shows in a way the full circle we have described when we talk about humans communicating. It would be foolish of me to argue whether the Internet has changed the way we perceive the world. Reading Pfeifer’s text, his analysis can be almost completely detached from the conjunctural.

Other defining items in communications history

Every technology that has become a basis of strongly improving human ability to communicate has been attacked by the holders of central power. The Gutenberg movable type printing press was a true revolution regarding the spread of culture, but was met with the attempts to control and censor its products via royalty-granted printing licenses (which evolved into what we now know as copyright), as well as the always present church censorship. Nevertheless, with the social effects it had, the printing press is often regarded as the most important invention in history.

Mimeographs were invented in the late 19th century. They didn’t provide a qualitative improvement over the –by then– many available printing processes, but it democratized printing: Mimeographs are portable and cheap, and schools, churches and clubs started printing their own leaflets. But, of course, it meant they could completely escape compulsory censorship regimes. In fact, several revolutions in the early 20th century were strongly fueled by clandestine mimeographers, and trying to stop them became routine (of course, failed routine) for the ruling regimes.

In the eighties and nineties, the very peculiar BBS culture grew with computer enthusiasts around the world. BBSs (Bulletin Board Systems) were mainly hobbyist-run computers with a modem, which usually offered some discussion forums, online games (turn-based, of course, as they had no network connection in the sense we understand it today), and some file sharing; BBSs were the breeding ground for the early free software and shareware distribution models.

Communication was fully decentralized (dozens to hundreds of BBSs existed on most mid-sized cities), near-instantaneous and virtually impossible to control. And, of course, as you can see on the particularly relevant editorial of the April 1993 Boardwatch Magazine, the censorship machinery was quite ready and well oiled throughout the United States. What were the arguments? Alleged distribution of hacking tools and information, software piracy and pornography. Due to the inner cohesion of the BBS community and the noise generated, most of the accused operators were freed after long processes with no charges filed.

The Internet, then and now

Just 18 months after the Boardwatch editorial, Pfeifer’s article in Crossroads talks about the image problems the early commercially available Internet had: “When the Internet is the focus of a story, it’s usually negative. Whether it is how child pornography runs rampant on the Information Superhighway or how easy it is to receive pirated software, it seems that the media doesn’t focus on the positive events that take place daily on the Internet”.

Pfeifer continues, “The Internet never sleeps. It’s kind of like New York, but a little bit cleaner, and the high crime rate isn’t so obvious. Of course, with the influx of new users onto the eighth wonder of the world, there is bound to be some friction. Computer crime will probably increase. The Internet (…) is a system based on trust. But when fiendishly minded people see the Internet as an untapped resource, ripe for the plucking, we have a problem.”

These last paragraphs could perfectly apply today — Only not for the Internet as a whole (it is too much engrained into our social conscience and lifestyle). But this is precisely the kind of attacks we see when talking about privacy-enhancing technologies that try to protect user’s privacy and anonymity on the Internet. Tools akin to what we discussed in the XRDS Summer 2018 issue, which I was honored to be the lead editor for.

And yes, what is the media narrative today when tools such as Tor are discussed? “Oh, but that’s just a gateway to the dark net, and… You don’t want to go there! That’s bad and dangerous. There are loose criminals! There is child porn and drugs, and guns and whatnot!” — Of course, this same narrative was applied to the Internet as a whole back in 1994. Or to the BBSs slightly before that. Or, with scarecrows fit to the spirit of their day, to the agents of social change a hundred or more years ago.

Throughout history, communications technology have appeared that allow for easier, better knowledge circulation. Tools that bring the information flow closer to the individual and further away from the power centers — With that, implying greater surveillance resistance and the ability to remain anonymous. 24 years ago, our magazine started by looking at the great potential Internet held for changing society, although nobody could really forsee the depth of the impact. My hopes are that, over time, privacy enhancement technologies gradually become as engrained into our communication uses as Internet has.

Pfeifer concludes by quoting a then-new meme: “You never know to whom you are writing, because, on the Internet, nobody knows you’re a dog. Somehow, though, and no matter how careful I am, all of the ads I have seen today are for dog food.