Neural Network Ensembles and Representing Collaborative Interaction

I recently had the chance to present a paper about my “Neural iPad Ensemble” at the Audio Mostly conference in London. The paper, discusses how deep learning can help model and create free-improvised music on new interfaces, where the normal rules of music theory may not apply. I described the Recurrent Neural Network (RNN) design that I used to produce an AI touchscreen ensemble that responds to a “lead” human performer. In the demonstration session, I set up the iPads and RNN and had lots of fun jamming with the conference attendees.

Playing with the Neural iPad Ensemble at Audio Mostly

Many of those at the conference were very curious about making music with AI systems and the practical implications of using deep learning in concert. Some had appealing, enigmatic, but sometimes confused, ideas about musical AI. I was often asked whether I could recognise personalities of the performers in the output. Unfortunately, this isn’t possible in my RNN, because, like in all machine learning systems, the output reflects only the training data, and not the context that we humans see around it. The data just doesn’t record enough about individuals for the network to reproduce their performances!

A limitation of my system is that it learns musical interactions from sequences of high-level gestures. These measurements are made once every second on the raw touchscreen data and describe it as, for example, “fast taps”, or “small swirls”. In the performance system, a synthesiser replays chunks of performances that correspond to the desired gestures. This simplification makes it easier to design the neural network, but mean that the RNN isn’t trained on the low-level data that we might consider to contain the nuanced “personality” of a performance.

Another easily confused point of my system is that the human performer isn’t really the “leader”. In the interest of making the most of a limited data set, every performer in every example is permuted as the “leader” and each of the three “ensemble” performers. In fact, few (if any) performances in the corpus had a leader at all. In my system, the human performer is really just one of four equal performers, so it’s unreasonable to expect that you could “control” the RNN performers by performing in a certain way, or doing something unexpected.

A practical issue with performance with the neural iPad ensemble is starting and stopping the music! In a human ensemble, the performers use cues like counting in, an audible breath, or a look, to bring in the ensemble and to signal the end of an improvisation. With my RNN ensemble, cues have no effect: sometimes the group starts playing, sometimes it doesn’t. Just when you think the performance might be over, the group starts up again without you! Thinking about the training data, this behaviour makes sense. Each training example is 120 seconds long — too short to capture the long-term curve of a performance, including starting and ending. The examples do contain plenty of instances where one performer plays alone, or three play while one lays out, so there’s precedence in the data for completely ignoring the “leader’s” entrances!

A short performance with the Neural iPad Ensemble

As with many AI systems, these limitations reveal that humans are so good at combining different datasources and contexts that we forget that we’re doing it. A truly “natural” iPad ensemble might have to be trained on much more than just high-level gestures in order to keep up with “obvious” musical cues and reproduce musical personalities.

While this system has limitations, it is still fun to play with and useful as a reflection on creative AI. Of course, there’s many ways to improve it, one of the most important would be to train the RNN on the lowest level data available; in this case, the raw touch event data from the iPad screens. A promising way to approach this is with Mixture Density RNNs (more on that later).

I’m looking forward to more chances to perform with the Neural iPad Ensemble soon, and to find new ways to use machine learning in creative performances. If you happen to be in Oslo, come check out my iPad Ensemble at the Elvelangs i Fakkelys festival, or chat with me about musical AI at Musikkteknologidagene!