The Source-Filter Theory: Model of Speech
First formulated by the phoneticist Gunnar Fant in 1960,1 the source-filter model provides an acoustic description of speech. The idea is that a simple sound from the vocal folds has to travel through the throat and the mouth, passing all sorts of obstacles in order to produce a meaningful sound. According to this model, speech production involves two stages: the source stage and the filter stage. Here's a summary of the key components of the source-filter model.
- The sound source is a movement of air that produces a relatively simple pressure wave. In vowels, the sound source is the vibration of the vocal folds.
- The various structures in the vocal tract form a filter that morphs the pressure wave created by the source.
- Posturing this filter in different ways causes changes in the wave. Changing the position of the tongue, for example, can change the wave of a vowel from [i] (as in cheese) to [a] (as in bot).
- After the wave from the source has passed through the filter, it exits the mouth as a complex pressure wave.
- The complex wave that results from the source and the filter is a distinct, recognizable speech sound.
Running a recording of human speech (aka a speech signal) through a phonetic-analysis program presents you with a great deal of information and graphs. The source-filter model is important because it helps you interpret and analyze this information. It allows you to make predictions about the nature of a speech signal just by looking at it in a computer program.
Definition of Source-Filter Theory
The definition of source-filter theory is relatively simple:
The Source-filter theory is a theory of phonetics that describes speech as the sound produced by a source (usually the vocal folds) and modified by a filter (the vocal tract).
In other words, source-filter theory divides speech into two parts. One part creates a raw sound, and the other part acts as a tube to shape the sound as it passes through.
Playing the trumpet provides an example of source-filter theory in action.
In order to get a sound out of a trumpet, you have to purse your lips and blow air into the trumpet's mouthpiece. This is sometimes called "buzzing" your lips against the mouthpiece. This action creates the sound source.
The body of the trumpet acts as a filter for the sound source. It turns the simple buzz of the mouthpiece into an amplified, clear, recognizable trumpet sound.
Fig. 1 - "Buzzing" your lips against a trumpet's mouthpiece creates a sound source, which is then filtered by the body of the trumpet.
Remember that cruel and unusual method of recording the voice right at the level of the vocal folds? The resulting recording sounds like a simple buzzing sound—not unlike the buzz of a trumpet mouthpiece. If you're curious, you can look up the sound and hear it for yourself.
The Source-Filter Theory: Model of the Vocal Tract
Now to translate the source-filter model from trumpets to the vocal tract. In most speech sounds, the sound source is the vibration of the vocal folds, and the filter is the remainder of the vocal tract.
The Sound Source (The Glottis)
The source of voiced speech sounds is the vibration of the vocal folds. The vocal folds are contained in the part of the larynx called the glottis.
The larynx, also known as the voice box, is an organ made of bone and tissue located at the center of the throat.
The glottis is the part of the larynx that contains the vocal folds.
The vocal folds make up the tissue membrane responsible for voicing in speech.
When you force air from your lungs through your closed glottis, you cause your vocal folds to vibrate. This vibration, called the glottal source wave, is the sound source for most speech sounds.
The glottal source wave is the sound source for most, but not all, speech sounds. The source for voiceless speech sounds that originate higher in the vocal tract is the constriction within the vocal tract. For example, when you produce the voiceless labiodental fricative [f], the sound source is air passing through the constriction between the lower lip and the upper teeth. The filter for this sound is very small because there isn't much in front of those structures to morph the sound.
The Filter (The Vocal Tract)
On its way out into the world, the glottal source wave must pass through the filter: the vocal tract.
The vocal tract consists of all the speech organs from the larynx to the lips.
Fig. 2 - The sound source is the vibration of the vocal folds (glottis), and the filter is the vocal tract.
The primary organs in the vocal tract that are relevant to speech production are the epiglottis, pharynx, velum, tongue (tip, blade, body, and root), velum, alveolar ridge, hard palate, teeth, lips, and nasal cavity. Each of these organs can filter the sound from the glottal source wave.
Source-Filter Theory in Vowels
Source-filter theory is also useful because it can help explain the key characteristics of vowels, specifically fundamental frequency (pitch) and formants.
The fundamental frequency (or pitch) of a voiced sound is the sound's primary audible frequency.
Formants are amplified frequency ranges that distinguish one vowel from another.
Fundamental Frequency
When you sing an A at 440 Hz, the fundamental frequency of your voice is 440 Hz. The fundamental frequency of a speech sound depends on the speed of vocal fold vibration and on the length of the filter.
Frequency is usually measured in hertz (Hz), which means cycles per second. If the wave from your voice has a fundamental frequency of 440 Hz, then the wave repeats its pattern 440 times every second. You can raise the fundamental frequency of your voice either by raising your larynx to shorten the vocal tract or by pushing air through your vocal folds with more pressure, forcing them to vibrate faster.
Formants
Formants are bands of loud frequencies that characterize unique vowels. They are categorized into three bands:
F1, the lowest formant, correlates to vowel height. The lower the F1 frequency, the higher the vowel. For example, the vowels with the lowest F1 values are the high vowels, like [i] as in beet and [u] as in boot.
F2, the next lowest formant after F1, correlates to vowel backness. The lower the F2 frequency, the further back the vowel. For example, the vowels with the lowest F2 values are the back vowels, like [o] as in boat and [ɑ] as in bot.
F3, the highest formant relevant to speech sounds, correlates to certain vowel constrictions, especially r-colored vowels like the [ɹ] in the General American pronunciation of bird. This sound has a relatively low F3 value.
What does this have to do with source-filter theory? According to the source-filter model, formants are determined by the shape of the filter.
The low back vowel [ɑ] includes a constriction, or area of tension, at the pharynx. This constriction separates the vocal tract filter into a short tube from the larynx to the pharynx and a long tube from the pharynx to the lips. This filters the glottal source wave into a sound with a high F1 and a low F2, which you perceive as the [ɑ] vowel.
Examples of Source-Filter Theory
You can see examples of the acoustic effects of the source and filter by looking at a sound spectrum.
A sound spectrum is a plot of the simple wave components of a complex wave.
The sound spectrum is like a snapshot of a vowel at a single point in time. It shows all of the frequencies present in the wave (on the x-axis) and the amplitude, or loudness, of each frequency (on the y-axis).
Remember that the sound source helps to determine the fundamental frequency of the wave. The frequencies on the x-axis are evidence of the glottal source wave. These frequencies look like a row of "spikes" on the spectrum.
The first big spike on the sound spectrum is the wave's fundamental frequency.
The filter determines a vowel's formants. Formants are characterized by the loudest frequencies in the wave. On the sound spectrum, the curvy pattern in amplitude at the top of the spikes provides evidence of the filter.
Fig. 3 - The sound source provides a compilation of frequencies. The filter changes the amplitudes of the source's frequencies.
The most important points of this example of the source-filter theory are these:
Source Filter Theory - Key takeaways
- Source-filter theory is an acoustic model that describes speech as the sound produced by a source (usually the vocal folds) and modified by a filter (the vocal tract).
- The source of voiced speech sounds is the vibration of the vocal folds.
- The filter of speech sounds is the vocal tract.
- The fundamental frequency of a speech sound depends on the speed of vocal fold vibration and on the length of the filter.
- Formants are determined by the shape of the filter.
References
- Gunnar Fant (1960). Acoustic Theory of Speech Production.
How we ensure our content is accurate and trustworthy?
At StudySmarter, we have created a learning platform that serves millions of students. Meet
the people who work hard to deliver fact based content as well as making sure it is verified.
Content Creation Process:
Lily Hulatt is a Digital Content Specialist with over three years of experience in content strategy and curriculum design. She gained her PhD in English Literature from Durham University in 2022, taught in Durham University’s English Studies Department, and has contributed to a number of publications. Lily specialises in English Literature, English Language, History, and Philosophy.
Get to know Lily
Content Quality Monitored by:
Gabriel Freitas is an AI Engineer with a solid experience in software development, machine learning algorithms, and generative AI, including large language models’ (LLMs) applications. Graduated in Electrical Engineering at the University of São Paulo, he is currently pursuing an MSc in Computer Engineering at the University of Campinas, specializing in machine learning topics. Gabriel has a strong background in software engineering and has worked on projects involving computer vision, embedded AI, and LLM applications.
Get to know Gabriel