Skip to main content

Free Content What/when causal expectation modelling applied to audio signals

A causal system to represent a stream of music into musical events, and to generate further expected events, is presented. Starting from an auditory front-end that extracts low-level (i.e. MFCC) and mid-level features such as onsets and beats, an unsupervised clustering process builds and maintains a set of symbols aimed at representing musical stream events using both timbre and time descriptions. The time events are represented using inter-onset intervals relative to the beats. These symbols are then processed by an expectation module using Predictive Partial Match, a multiscale technique based on N-grams. To characterise the ability of the system to generate an expectation that matches both ground truth and system transcription, we introduce several measures that take into account the uncertainty associated with the unsupervised encoding of the musical sequence. The system is evaluated using a subset of the ENST-drums database of annotated drum recordings. We compare three approaches to combine timing (when) and timbre (what) expectation. In our experiments, we show that the induced representation is useful for generating expectation patterns in a causal way.

Keywords: audio; expectation; music; unsupervised learning

Document Type: Research Article

Affiliations: Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain

Publication date: 01 June 2009

More about this publication?
  • Access Key
  • Free content
  • Partial Free content
  • New content
  • Open access content
  • Partial Open access content
  • Subscribed content
  • Partial Subscribed content
  • Free trial content