Skip to main content

Open Access Two-Formant Models, Pitch, and Vowel Perception

Download Article:
It is well known from synthesis experiments that vowels may be approximated to a fair degree of phonetic quality by two formants only. The lower formant is then identical to F 1 of a corresponding natural vowel whilst the upper formant, here labelled F2′, becomes matched to F 2 of back vowels and in front vowels to a location somewhere in the range of F 2, F 3, F 4. We have carried out such matching experiments for nine Swedish vowels. It is found that the high extreme value of F2 is found for the vowel [i] in the F 3, F 4-range. Evidently F 2′ need notcoincide with any particular of F2 and higher formants and should to a smaller or larger extent be related to the finite frequency resolution of the ear in this area.

As a reference set we choose 4-formant synthetic vowels. The F2 values from the matching experiments have been compared to two other derivations of F2. One is from an empirical formula which enables us to calculate F2 given F 1, F 2, F 3 and F 4 of the reference. This formula relies on the auditory space-frequency representation and considerations of predictable shifts of the spectral envelope with a shift in formant frequencies. The second approach is to subject the reference vowels to a filtering with a multiple output filterbank designed from the Békésy-Flanagan data. In each output channel corresponding to a definite space coordinate a zero-crossing frequency was calculated. The spatial distribution of zero-crossings was next converted into a quite novel signal parameter, the number of channels carrying the same zero-crossing frequency within a certain small quantal value. For each coordinate we thus determine how many adjacent taps carry the same characteristic frequency. Evidently a single dominating spectral peak will force its mean frequency in the response at many taps. Accordingly there is a correspondence between spectral place prominence and the number of taps carrying the same zero-crossing frequency.

The interesting outcome of these simulations and calculations was that the density of channels with the same zero-crossing frequency often displayed two sharp peaks only, one corresponding to F 1 and the other to F2 whilst the amplitude coordinate representation was very unselective. When several peaks occurred in the region above F 1 the highest peak was selected as a candidate for F2. These derivations of F2 as well as those calculated from the empirical formula agreed with the F2 from matchings experiments with an average spread of 75 Hz.

This model could have physiological relevance only to the extent that stimulus periodicity is retained in the auditory nerve. However, the amplitude-place information has a similar distribution and the calculations of Karnickaya et al. demonstrate that psycho-acoustic models may predict results similar to ours, that is, a selection of the two largest peaks in the auditory spatial stimulation pattern.

We still have an argument unsettled whether two harmonics in the F 1 region further apart than a critical band are weighted at a higher auditory level or if simply the most loud or else prominent harmonic is picked up to represent the formant. Our experiments demonstrate a continuity in the shift of phonetic [i-e] boundary with increasing F 0 frequency which supports an averaging process rather than a sharp peak picking process.

A model of F 0 and formant interaction is proposed whereby a successive rise of F 0 causes the fundamental tone to gain importance as a timbre constituent. This is typical of high soprano voices. At high F 0 we judge the phonetic value of the vowel more from F 0 than from formants. The shift of auditory dominance from F 1 proper to F 0 would have an effect similar to a shift down in F 1 and F 2 which might explain the shift of phonetic F 1 and F 2 boundaries to higher frequencies with increasing F 0.

Experiments with dichotic split of the stimulus one ear receiving some of the formants and the other ear the remaining formants of a vowel show that the vowel identity is retained at a level of binaural summation.

Document Type: Research Article

Publication date: 01 December 1974

  • Access Key
  • Free content
  • Partial Free content
  • New content
  • Open access content
  • Partial Open access content
  • Subscribed content
  • Partial Subscribed content
  • Free trial content