Unit 2: Description of speech sounds

2.1   Non segmental: Intensity, pitch and quality

2.2   Segmental aspects of speech: Definition of consonants, vowels, diphthong and blends

2.3   Classification of consonants place, manner, voicing

2.4   Classification of vowels

2.5   Supra-segmental: Intonation, stress, pause, etc.


 

2.1   Non segmental: Intensity, pitch and quality

 

 

The non-segmental aspects of speech refer to features that go beyond the individual sounds or segments (phonemes) of speech. While segmental aspects include consonants and vowels, the non-segmental aspects involve prosodic features like intensity, pitch, and quality. These elements play a crucial role in how speech is interpreted, influencing meaning, emotion, emphasis, and the flow of conversation. Let s break down each of these non-segmental features:

1. Intensity: Intensity refers to the loudness or volume of speech. It is the force with which speech is produced and is often related to the amount of air pressure generated by the lungs.

Role in Speech:

      Emphasis: Intensity can be used to highlight certain words or phrases, making them stand out in a sentence. For example, saying "I really want THAT" can emphasize the word "that" through increased intensity.

      Emotional Expression: Loudness or softness can convey emotions. For instance, shouting can express anger or excitement, while a softer voice may indicate calmness or sadness.

      Speech Act: Intensity can also differentiate between types of speech acts. For instance, a louder voice can indicate a command, while a softer voice may indicate a question or request.

      Perceptual Impact: Intensity helps listeners perceive the emotional tone or urgency of speech. It's often tied to the stress of syllables, words, or phrases, making certain parts of speech more noticeable.

2. Pitch: Pitch refers to the perceived frequency of a sound, ranging from low to high. It s determined by the vibration rate of the vocal cords faster vibrations create higher pitches, while slower vibrations produce lower pitches.

Role in Speech:

3. Quality: Voice quality, or timbre, refers to the unique characteristics of a person's voice that distinguish it from others. It is the result of the shape and size of the vocal tract, as well as the tension and vibration of the vocal cords.

Role in Speech:

 

Interaction of Intensity, Pitch, and Quality

 

Together, these non-segmental features work harmoniously to convey meaning, emotion, and context in speech. They complement the segmental aspects (the individual phonemes) to create a more complete and expressive form of communication.

Example: Imagine someone saying, "I love you." The meaning can change dramatically based on the intensity, pitch, and quality:

 

Intensity, pitch, and quality are essential non-segmental aspects of speech that provide depth and richness to communication. They help speakers express emotions, indicate the structure of speech, convey emphasis, and even reveal personal characteristics. These elements, in combination with segmental features (individual sounds and words), create the full spectrum of human speech.

 


 

2.2   Segmental aspects of speech: Definition of consonants, vowels, diphthong and blends

 

 

While speaking, we use both segmental features and suprasegmental features. The segmental features embrace consonant and vowel sounds or phonemes, whereas stress, pitch, intonation, length, juncture etc. are included in suprasegmental features. The suprasegmental features play a prominent role in determining the mood, sex, emotional state, gender and age of the speaker and the meanings of his / her utterances.

Teaching pronunciation provides the students with basic idea about intelligible pronunciation of segmental and suprasegmental features resulting in a intelligible communication. Teaching the pronunciation of English develops confidence in students to speak in English wherever they go. Fraser (2000) remarks that with good pronunciation, a speaker is intelligible despite other errors; with poor pronunciation, a speaker can be very difficult to understand, despite accuracy in other areas .

Segmental Features

All the consonant and vowel phonemes are segmental features. They refer to discrete units that can be identified physically or auditorily in the stream of speech of any language.

Consonant Sounds

Consonant sounds are those speech sounds for the production of which there is a sort of obstruction in the vocal tract. Crystal (2003) asserts that Consonants are sounds made by a closure or narrowing in the vocal tract so that the airflow is either completely blocked, or so restricted that audible friction is produced (p.103). It means consonant sounds are produced when there is an obstruction of the airflow somewhere in the vocal tract.

Vowel Sounds

Vowels are the speech sounds which are produced without any sort of obstruction in the vocal tract. They are the peaks of syllables. Crystal (2003) asserts that vowels are sounds articulated without a complete closure in the mouth or a degree of narrowing which would produce audible friction (p.517). It means vowel sounds are produced with a friction of the airflow in the vocal tract. Therefore, they are voiced.

Diphthongs as speech sounds involve two vowels. They glide from one vowel to another one, and the whole glide acts like one vowel sound. There is a noticeable change in the quality of vowel when we pronounce them.

 

 

 

Definition of consonants, vowels, diphthong and blends

 

vowel is a speech sound produced by humans when the breath flows out through the mouth without being blocked by the teeth, tongue, or lips.

A vowel is a speech sound where the mouth is open and the tongue doesn t touch the top of the mouth, the teeth, etc so that the flow of air is not limited. It is necessary to know that there is a difference between a vowel sound and a vowel letter in the alphabet.

 

The English vowel sounds are written with letters in the English alphabet. All English words have vowel letters. They are vowels in English: A, E, I, O, U, and sometimes Y.  Y is sometimes a vowel because the letter Y presents both vowel and consonant sounds, like fry .

 

Six vowel letters are used to indicate the 13-15 vowel sounds in English. This means the vowel sounds are more than letters in the English alphabet. Monophthongs and diphthongs are the two main categories of vowel sounds.

 

A monophthong is simply a vowel. The word mono represents one and the diphthong represents a sound. This means that monophthong represents one vowel sound in the word. If you notice the position of the tongue, the mouth will stay the same when these words are uttered.

 

diphthong is a vowel sound in which the tongue changes position to produce the sound of two vowels.

A diphthong is a mix of two vowel sounds or vowel letters, like the sounds /aɪ/ in pipe /paɪp/ or the letters ou in doubt. The part di represents two and the other part represents sounds. Although a diphthong has two different vowel sounds, they stay within the same syllable or unit of sound.

 

The best way to know if a word has a diphthong with two vowel sounds, or a monophthong, is to listen to how it sounds when you say it out loud. If the vowel sound changes within the same syllable, it s most certainly a diphthong.

 

consonant is one of the speech sounds or letters of the alphabet that is not a vowel. Consonants are pronounced by stopping the air from flowing easily through the mouth, especially by closing the lips or touching the teeth with the tongue.

Consonants may come alone or in clusters but have to be connected to a vowel to form a syllable.

 

English has 21 consonant letters, for 24 consonant sounds in most English accents:  H, J, K, L, M B, C, D, F, G, N, P, Q, R, S, T, V, W, X,  Z and (sometimes Y). The letter y produces a consonant sound if at the beginning of a word yellow but a vowel sound if at the end of a word sunny .

 

A consonant blend occurs when two or more consonants are blended, yet each sound may be heard in the blend. The most common beginning consonant blends include: pl, pr, sl, sm, sp and st, bl, br, cl, cr, dr, fr, tr, fl, cl, gr. Blends can also appear at the end of words.

 

Blends are most of the time categorized into r-blends, such as br and cr , s-blends, such as sc and SK and l-blends, such as bl and cl . Some blends include three consonants. Popular three consonant blends include str, spl, and spr.

Digraphs are two letters that produce just one sound. These are the seven basic consonant digraphs; ch, CK, th, ch, ph, ng, and wh. Yet, some digraphs have more than one pronunciation.  ghosts are some digraphs like wr and gn   because the first letter is not pronounced.

Consonant cluster

refers to two or more consonants positioned side by side in a word. They are also called consonant blends. The longest possible cluster in English is three consonant sounds at the beginning. They can be up to four consonants long at the end of the word.

 

So, Vowels and consonants are the two main categories in the English language. There is no English word that contains more than three consecutive consonants. Each word contains at least one vowel sound. The English language has short vowel sounds and long vowel sounds. The pronunciation of the sounds is produced according to the place and manner of articulation.

 

Comparison Chart

 

Basis for Comparison

Vowels

Consonants

Meaning

A vowel is a speech sound, produced by a fairly open vocal passage, with vibration in the vocal cords, but no audible friction.

A consonant is a speech sound produced when the vocal passage is fully or partially closed by vocal organs.

Obstruction

There is no obstruction when lungs expel air.

Something obstructs the air expelled by the lungs.

Letter

5 or sometimes 7 vowels

21 consonants

Sound

20 vowel sounds.

24 consonant sounds.

Article used

An

A

 

 

 

 


 

2.3   Classification of consonants place, manner, voicing

 

 

 

Humans use complex coordination of airflow, vocal fold vibration, and movement of the articulators (lips, tongue, etc.) to make speech sounds. The 24 consonant speech sounds are organized into three areas to describe how each sound is produced. These include the place of articulation, the manner of articulation, and voice (or voicing).

Place manner and voicing chart for all 24 English consonants.

Place of Articulation

The place of articulation refers to WHERE a sound is produced in the mouth. We produce speech sounds by constricting airflow using the lips, teeth, tongue, hard palate, soft palate, and/or throat.

Let s break down each place of articulation in English consonants further. When looking at the place manner voice chart, the sounds move from the front of the mouth on the left of the chart to the back of the mouth on the right.

Bilabial

A bilabial sound is made using both lips. We produce bilabial sounds when we put both lips together and release air with the speech sound. English bilabial sounds include the following:

W is a unique sound that is considered both a labial and a velar because the sound is produced beginning with the lips and ending with the back of the tongue.

Labio-dental

Labio-dental sounds are made by placing the upper teeth on top of the lower lip. English has two labio-dental sounds:

Interdental

Interdental sounds are made by placing the tongue between both the upper and lower teeth. English interdental sounds include:

Alveolar

An alveolar sound is made when the tip of the tongue touches or is just below the alveolar ridge. The alveolar ridge is the bumpy part of the roof of the mouth that is just behind the top teeth. Alveolar sounds include the following:

Post-alveolar

Post-alveolar sounds are made when the middle part of the tongue is touching or just behind the alveolar ridge. These include the following sounds:

Palatal

Palatal sounds are made when the tongue is close to or touching the middle part of the roof of the mouth (hard palate). There is only 1 palatal sound in English:

Velar

We produce velar sounds when we raise the tongue to the soft palate (the roof of the mouth just behind the hard palate). Velar sounds in English include the following:

Glottal

Glottal sounds are made in the vocal folds in your throat (AKA the glottis). Unlike all other sounds, the tongue does not help make this sound. English only has 1 glottal sound:


Manner of Articulation

The manner of articulation refers to HOW a sound is produced, meaning how the air is released from your mouth to make each speech sound. We produce these sounds in many different ways, from releasing a small puff of air out your lips to allowing the air to flow over the tongue in a small channel.

Speech sounds are made by either restricting the airflow out of the mouth/nose or allowing little to no restriction. These are called obstruents and sonorants.

Stop/Plosive

We produce stops by stopping the airflow and then releasing it with a burst. For example, you make P /p/ and B /b/ by closing both the top and bottom lips, then releasing the air out after building up pressure. English has 6 stops, including the following:

Fricative

A fricative is produced by creating a narrow passageway for air to escape the mouth. The air creates a noisy sound as it blows through the mouth. An example of this is when you produce F /f/ and V /v/ by forcing air through the narrow space between the top teeth and bottom lip. English has 9 fricatives. They include the following sounds:

Affricate

An affricate is a combination of a stop and a fricative. We produce these sounds by narrowing the passage in the mouth for air to escape (like a stop) and then releasing it gradually through that narrowed passageway (like a fricative). English has 2 affricates:

Nasal

As the name suggests, we produce nasals by producing the sound through the nasal cavity (nose). The 3 nasal sounds in English include the following:

Liquid

Liquid consonants are complex sounds that include both lateral and rhotic sounds. The tongue and palate create a partial restriction of the airflow out the mouth. This produces the vowel-like consonants L /l/ and R /r/.

Glide

Lastly, a glide (AKA a semivowel or semiconsonant) is a consonant that has a vowel-like quality. The tongue restricts airflow through the mouth creating a space over the tongue for the air to flow before releasing out the mouth. You make the sound by gliding the lips or tongue from one shape into a vowel. English has 1 2 glides:

Voice (Voicing)

Voicing is when the vocal cords are vibrating. We produce speech sounds when the vocal cords are on and vibrating, and we produce some when they are off and not vibrating.

If you place your fingers over your throat while saying a Z sound you ll feel your vocal cords vibrating. Z is a voiced sound since the vocal cords vibrate. Now place your fingers on your throat and say an S sound. You will not feel any buzzing indicating that S is a voiceless sound.

Many speech sounds come in pairs and are made in the same place and the same manner. The only difference between these sounds is one is voiced and one is voiceless. P and B are examples of these sister sounds. P and B are both bilabial stops, but P is voiceless (no vibrating vocal folds), and B is voiced (vibrating vocal folds).

Other Classifications

 

Summary of Consonant Classification

Parameter

Types

Examples

Place of Articulation

Bilabial, Labiodental, Dental, Alveolar, Post-Alveolar, Palatal, Velar, Glottal

/p/ (pat), /f/ (fat), /s/ (sip), /ʃ/ (shoe), /k/ (cat), /h/ (hat)

Manner of Articulation

Stop (Plosive), Fricative, Affricate, Nasal, Liquid, Glide

/p/ (pat), /f/ (fat), /tʃ/ (church), /m/ (mat), /l/ (lip), /j/ (yes)

Voicing

Voiced, Voiceless

/b/ (bat), /s/ (sip), /d/ (dog), /k/ (cat)

Aspiration

Aspiration (e.g., /pʰ/)

/pʰ/ (pat)

 

This classification helps in understanding how consonants are produced and how different features of speech can be identified and categorized. It is essential for linguistic analysis, language teaching, and speech therapy.


 

2.4   Classification of vowels

 

 

Vowels can be defined as speech sounds produced without obstruction or audible friction of the lung air passing through the oral passage. The tongue and the lips can produce various resonating chambers by assuming different shapes. Vowels, thus, result from various resonating cavities formed by these articulators.

Vowel sounds are classified into certain sets basing on the following main factors.

i.            The Shape of the Lips:

The lips can assume spread, neutral or rounded positions. Those vowels in the production of which the lips assume a rounded shape are called rounded vowels. The vowels in do , shoe , and fruit are, for example, rounded vowels. The vowels produced with a spread or neutral shape of the lips are called unrounded vowels. The vowels in tree , egg , friend , come are examples of unrounded vowels.

ii.            The Raising of the Tongue:

The tongue can assume different shapes because of its flexibility. Every change in its shape results in a different vowel sound. Such changes are determined by two factors: (a) the part of the tongue that is raised, and (b) how high it is raised. The parts of the tongue which produce the vowels are called the front, the back, and the central. These are imaginary divisions of the tongue. These parts again can be raised or lowered to produce certain sound effects. The raising or lowering, however, remains restricted to a certain degree. The highest point to which the tongue can be raised is called the close position. The lowest point to which it can be brought down is called an open position.

Two more intermediate imaginary positions are also demarcated to describe the raising of the tongue. They are the half-close and the half-open positions.

 

Vowels are classified into two pure vowels and diphthongs.

Pure Vowels

Vowels which have a single vowel sound when pronounced are called pure vowels. The twelve vowel sounds we have mentioned earlier are pure vowels. Words such as announce(ə), fret(e), sun(ʌ), tick(ɪ), please(iː), dot(ɒ), foot(ʊ), food(uː), word(ɜː), warm(ɔː), arm(aː) and pant( ) come under this category.

Pure vowels are further classified into two checked vowels and free vowels. Read on to learn what they are.

Checked Vowels

Among the 12 vowel sounds, 7 of them are considered checked vowels. They are ʌ, e, ə, ʊ, ɪ, and ɒ. These vowels cannot occur at the end of open syllables.

Free Vowels

The five vowel sounds uː, iː, ɜː, aː and ɔː are considered free vowels. These vowels can be used at the end of open syllables.

Types of Vowels/Vowel Sounds

Vowels/vowel sounds are categorised into two based on the length of the vowel sound and the number of vowels. They are

       Short vowels

       Long vowels

Let us look at each of them in detail.

Short Vowels

Short vowels are those that appear individually in words. These words normally end with consonants. They can, in no way, appear at the end of the last syllable of a word.

Here are a few examples.

       a in pan

       e in rent

       i in pit

       o in cot

       u in truck

Long Vowels

The term long vowels is used to refer to two or more vowels that appear in words. Words with long vowels can start/end with vowels. Take a look at the following examples to understand.

       a in fake

       e in tedious

       i in blind

       o in rote

       u in cumin

The categorisation of vowel sounds as long and short would not be the same. There is a difference. They would include words with diphthongs as well. Let us look at the following examples to comprehend how it works.

Short Vowel Sounds

       a in braid , falcon , steak

       e in furry , tread , says

       i in women , eject , houses

       o in entrepreneur , cause , flaw

       u in flood , done , son

Long Vowel Sounds

       a in faint , weight , dainty

       e in receive , weak , encyclopaedia

       i in tight , ice , eye

       o in blow , road , door

       u in new , queue , vacuum

 

List of Pure Vowels with Examples

The 12 vowel sounds in English have been provided below with examples to help you understand. Check them out.

Vowel sounds

Examples

/ʌ/

cut, butter

/aː/

park, far

/ /

bat, fan

/ɒ/

goggles, fog

/ɔː/

more, warn

/ɜː/

bird, worm

/e/

pet, ten

/ə/

vendor, monitor

/ɪ/

sit, pin

/iː/

theme, fleet

/ʊ/

cook, put

/uː/

flute, boon

Diphthongs

Diphthongs are speech sounds formed by the combination of two vowel sounds. They do not resemble the speech sound of either vowel sound, instead form an entirely new speech sound.

List of Diphthongs with Examples

Given below is a table with the eight diphthongs in the English language. Go through the examples given for each diphthong to clearly understand what the phoneme sounds like.

Diphthongs

Examples

/aɪ/

fight, write

/aʊ/

plough, cow

/eə/

their, chair

/əʊ/

soak, rodent

/eɪ/

fate, pain

/ɪə/

here, cheer

/ʊə/

poor, sure

/ɔɪ/

toy, exploit

 

All the English vowels are voiced, meaning that the vocal cords vibrate to produce them. Altogether there are twenty vowels in English. Twelve of these them are monophthongs and the remaining eight are diphthongs. Of the twelve monophthongs, seven are short vowels and five are long vowels. The monopthongs are also called pure vowels since they do not change in their quality. Vowels which involve a gliding movement from one quality to another are called diphthongs. The glide, however, takes place within the same syllable.

 

 

 

 


 

2.5   Supra-segmental: Intonation, stress, pause, etc.

 

 

In speech, suprasegmental refers to a phonological property of more than one sound segment. Also called nonsegmental, the term suprasegmental, which was coined by American structuralists in the 1940s, is used to refer to functions that are "over" vowels and consonants.

Suprasegmental information applies to several different linguistic phenomena (including pitch, duration, and loudness). Suprasegmentals are often regarded as the "musical" aspects of speech.

 

Length: Length of a sound is the duration or period of time taken to its articulation. Length is the quality of vowel in most of languages. Jones (1979) considers length as the length of time during which it is held on continuously in a given word or phrase (232), for example, / ɪ/ is a short vowel and / i: / is a long vowel. They create different meanings in the words. /sɪt/ is the phonemic transcription of the word sit (to take a seat), and / si:t /is the phonemic transcription of the word seat (a place to sit).

 

Stress: Stress is an extra force used in pronouncing a syllable. It is the degree of loudness, tenseness, sonority and muscular energy used while pronouncing a particular syllable. Jones (1979) describes stress as the degree of force with which a sound or syllable is uttered . Gimson (1990) affirms that the number of syllables stressed by the speaker depends largely upon the nature of words composing the utterance . Cross (1992) defines stress as the articulation of a syllable with greater emphasis, or more force than others . Stress plays a distinctive (phonemic) role in English. The place of stress in the same words suggests different meanings and parts of speech.

Role in Speech:

 

Intonation: Intonation is defined as the linguistic use of pitch at a sentence level. The rise or fall of pitch in the utterance of a phrase or sentence is called intonation. It is the quality of an utterance. Harmer (1990) considers intonation as the music of speech . Kelly (2006) defines intonation as the way voice goes up and down in pitch when we are speaking .

Role in Speech:

 

Pause: A pause is a brief stop or break in speech. It can vary in length and can occur between words, phrases, or sentences.

Role in Speech:

 

Juncture: Juncture is a phonetic boundary between phonemes or syllables. This is related to the proper pausing while speaking. Carr (2008) opines juncture as a boundary or transition point in a phonological sequence . Trask (2005) asserts that juncture is any phonetic feature whose presence signals the existence of a grammatical boundary . The same phonological utterance may have different meanings due to pausing in different places. Examples: /ən-eɪm/. The pause after n forms a phrase an aim , and / ə-neɪm / in which the pause occurs after a constructs a phrase a name .

 

Pitch / Tone: The pitch of a sound is an auditory property that enables a listener to place it on a scale going from low to high, without considering its acoustic properties. Crystal (2003) defines pitch as the attribute of auditory sensation in terms of which a sound may be ordered on a scale from low to high .. Ladefoged (1982) asserts that pitch variations that affect the meaning of a word are called tones . Richards, Platt and Platt (1999) define tone as height of pitch and change of pitch which is associated with the pronunciation of syllables or words , and which affects te meaning of the words. The variation in pitch may give different kinds of information such as gender, the age of the speaker, the emotional states of the speaker and meanings of words.

Pitch Range: Pitch range refers to the variation in pitch from the lowest to the highest point in a speaker's voice. It is closely related to intonation.

Role in Speech:

 

Rhythm: Rhythm in speech refers to the pattern of stresses and unstressed syllables or beats in spoken language. It is closely tied to the pace and flow of speech.

Role in Speech:

 

Tempo (Speech Rate): Tempo refers to the speed at which a speaker talks, including both fast and slow speech.

Role in Speech:

 

Volume (Loudness): Volume refers to the loudness or softness of speech.

Role in Speech:

 

Summary of Supra-segmental

 

Feature

Description

Examples/Effects

Intonation

Variation in pitch across speech.

Signals questions, statements, emotions.

Stress

Emphasis on syllables or words.

Distinguishes meaning (e.g., noun vs. verb).

Pause

Breaks in speech, from short to long pauses.

Clarifies meaning, adds emotion, controls tempo.

Rhythm

Pattern of stressed and unstressed syllables.

Varies across languages (stress-timed vs. syllable-timed).

Pitch Range

Variation between high and low pitch.

Expressiveness, signals meaning and emotion.

Tempo

Speed of speech.

Reflects urgency or calmness.

Volume

Loudness or softness of speech.

Conveys emotions, context, and urgency.

 

In conclusion, supra-segmentals are essential in shaping how speech is interpreted. They provide critical context, emotional tone, and emphasis that help listeners understand not only the literal meaning of words but also the speaker s intentions, emotions, and the structure of their speech.