Skip to main content

Open Access Autocorrelation-Based Features for Speech Representation

Download Article:
This study investigates autocorrelation-based features as a potential basis for phonetic and syllabic distinctions. These features have emerged from a theory of auditory signal processing that was originally developed for architectural acoustics. Correlation-based auditory features extracted from monaural autocorrelation and binaural cross-correlation functions are used to predict perceptual attributes important for the design of concert halls: pitch, timbre, loudness, duration, reverberation-related coloration, sound direction, apparent source width, and envelopment [1, 2, 3, 4]. The current study investigates the use of features of monaural autocorrelation functions (ACFs) for representing phonetic elements (vowels), syllables (CV pairs), and phrases using a small set of temporal factors extracted from the short-term running ACF. These factors include listening level (loudness), zero-lag ACF peak width (spectral tilt), τ1 (voice pitch period), 1 (voice pitch strength), τe (effective duration of the ACF envelope, temporal repetitive continuity/contrast), segment duration, and Δ1/Δt (the rate of pitch strength change, related to voice pitch attack-decay dynamics). Times at which ACF effective duration τe is minimal reflect rapid signal pattern changes that usefully demarcate segmental boundaries. Results suggest that vowels, CV syllables, and phrases can be partially distinguished on the basis of this ACF-derived feature set, whose neural correlates lie in population-wide distributions of all-order interspike intervals in early auditory stations.

Document Type: Research Article

Publication date: 01 January 2015

  • Access Key
  • Free content
  • Partial Free content
  • New content
  • Open access content
  • Partial Open access content
  • Subscribed content
  • Partial Subscribed content
  • Free trial content