How does Music++ Work?

Music++ is composed of three  simultaneous tasks or threads that run at the same time on the computer called Listen, Predict, and Play.   Here is a video that illustrates how these tasks work together.    Read on for further explanation.  

Listen is the module that is responsible for hearing the live player.  This task is made much easier since the program has the musical score to the piece of music.  Thus the Listen problem is one of performing a real-time alignment or match between the score and the incoming audio.   In essence, the program provides a running commentary on the incoming audio, spitting out times at which the various solo notes begin.   Since the program needs to hear some of a note before it can determine it has begun, the Listen module delivers its onset times with some latency.  This detection  latency is a part of human hearing as well.   In the video clip the green markers that appear in the top of the image illustrate the solo note onset times provided by listen.  If you look and listen carefully you can observe this latency.  

I believe it is not possible to coordinate musical parts in a purely responsive manner.  That is, a system that simply triggers its events on the detection of notes in the solo part will perform badly in  most musical contexts.  This is due, in part, to the inherent detection  latency that is built into the problem which makes the responsive system perpetually late.   Instead, Music++ schedules the accompaniment's note times by continually predicting into the future based on what it has currently observed.  During the initial rehearsal the Predict module schedules future accompaniment notes through an evolving tempo estimate that adjusts to the solo players time-varying tempo.  However, the Predict module develops a model of the solo player's musical interpretation phrased in terms of both tempo and individual note elongations or compressions.  In this way Predict learns from the soloist in a manner analogous to what human musicians accomplish during rehearsal.

The series of predictions that come from Predict act like a trail of breadcrumbs for the Play module to follow.  However, you can see in the video that the breadcrumbs can move before they are reached.   These prediction times drive the synthesis of the accompaniment audio through a well-established technique known as phase-vocoding.  This technique allows variable-rate playback of audio without introducting pitch change.   Thus, our playback rate continually changes so that the pending accompaniment note will be played at the time prescribed by Predict.