The production of speech sounds

All the sounds we make when we speak are the result of muscles contracting. The muscles in the chest that we use for breathing produce the flow of air that is needed for almost all speech sounds; muscles in the larynx produce many different modifications in the flow of air from the chest to the mouth. After passing through the larynx, the air goes through what we call the vocal tract, which ends at the mouth and nostrils; we call the part comprising the mouth the oral cavity and the part that leads to the nostrils the nasal cavity. Here the air from the lungs escapes into the atmosphere. We have a large and complex set of muscles that can produce changes in the shape of the vocal tract, and in order to learn how the sounds of speech are produced it is necessary to become familiar with the different parts of the vocal tract. These different parts are called articulators, and the study of them is called articulatory phonetics.
Fig. 3.1 is a diagram that is used frequently in the study of phonetics. It represents the human head, seen from the side, displayed as though it had been cut in half. You will need to look at it carefully as the articulators are described, and you will find it useful to have a mirror and a good light placed so that you can look at the inside of your mouth.

i) The pharynx is a tube which begins just above the larynx. It is about 2 cm long in women and about 5 cm in men, and at its top end it is divided into two, one part being the back of the oral cavity and the other being the beginning of the way through the nasal cavity. If you look in your mirror with your mouth open, you can see the back of the pharynx.

ii) The soft palate or velum is seen in the diagram in a position that allows air to pass through the nose and through the mouth. Yours is probably in that position now, but often in speech it is raised so that air cannot escape through the nose. The other important thing about the soft palate is that it is one of the articulators that can be touched by the tongue. When we make the sounds k, g the tongue is in contact with the lower side of the soft palate, and we call these velar consonants.
iii) The hard palate is often called the “roof of the mouth”. You can feel its smooth curved surface with your tongue. A consonant made with the tongue close to the hard palate is called palatal. The sound j in ‘yes’ is palatal.
iv) The alveolar ridge is between the top front teeth and the hard palate. You can feel its shape with your tongue. Its surface is really much rougher than it feels, and is covered with little ridges. You can only see these if you have a mirror small enough to go inside your mouth, such as those used by dentists. Sounds made with the tongue touching here (such as t, d, n) are called alveolar.
v) The tongue is a very important articulator and it can be moved into many different places and different shapes. It is usual to divide the tongue into different parts, though there are no clear dividing lines within its structure. Fig. 3.2 shows the tongue on a larger scale with these parts shown: tip, blade, front, back and root. (This use of the word “front” often seems rather strange at first.)
vi) The teeth (upper and lower) are usually shown in diagrams like Fig. 3.2 only at the front of the mouth, immediately behind the lips. This is for the sake of a simple diagram, and you should remember that most speakers have teeth to the sides of their mouths, back almost to the soft palate. The tongue is in contact with the upper side teeth for most speech sounds. Sounds made with the tongue touching the front teeth, such as English θ, ð, are called dental.

vii) The lips are important in speech. They can be pressed together (when we produce the sounds p, b), brought into contact with the teeth (as in f, v), or rounded to produce the lip-shape for vowels like u:. Sounds in which the lips are in contact with each other are called bilabial, while those with lip- to-teeth contact are called labiodental.
The seven articulators described above are the main ones used in speech, but there are a few other things to remember. Firstly, the larynx could also be described as an articulator – a very complex and independent one. Secondly, the jaws are sometimes called articulators; certainly we move the lower jaw a lot in speaking. But the jaws are not articulators in the same way as the others, because they cannot themselves make contact with other articulators. Finally, although there is practically nothing active that we can do with the nose and the nasal cavity when speaking, they are a very important part of our equipment for making sounds (which is sometimes called our vocal apparatus), particularly nasal consonants such as m, n. Again, we cannot really describe the nose and the nasal cavity as articulators in the same sense as (i) to (vii) above.
The words vowel and consonant are very familiar ones, but when we study the sounds of speech scientifically we find that it is not easy to define exactly what they mean. The most common view is that vowels are sounds in which there is no obstruction to the flow of air as it passes from the larynx to the lips. A doctor who wants to look at the back of a patient’s mouth often asks them to say “ah”; making this vowel sound is the best way of presenting an unobstructed view. But if we make a sound like s, d it can be clearly felt that we are making it difficult or impossible for the air to pass through the mouth. Most people would have no doubt that sounds like s, d should be called consonants. However, there are many cases where the decision is not so easy to make. One problem is that some English sounds that we think of as consonants, such as the sounds at the beginning of the words ‘hay’ and ‘way’, do not really obstruct the flow of air more than some vowels do. Another problem is that different languages have different ways of dividing their sounds into vowels and consonants; for example, the usual sound produced at the beginning of the word ‘red’ is felt to be a consonant by most English speakers, but in some other languages (e.g. Mandarin Chinese) the same sound is treated as one of the vowels.
If we say that the difference between vowels and consonants is a difference in the way that they are produced, there will inevitably be some cases of uncertainty or disagreement; this is a problem that cannot be avoided. It is possible to establish two distinct groups of sounds (vowels and consonants) in another way. Consider English words beginning with the sound h; what sounds can come next after this h? We find that most of the sounds we normally think of as vowels can follow (e.g. e in the word ‘hen’), but practically none of the sounds we class as consonants, with the possible exception of j in a word such as ‘huge’. Now think of English words beginning with the two sounds bI; we find many cases where a consonant can follow (e.g. d in the word ‘bid’, or l in the word ‘bill’), but practically no cases where a vowel may follow. What we are doing here is looking at the different contexts and positions in which particular sounds can occur; this is the study of the distribution of the sounds, and is of great importance in phonology. Study of the sounds found at the beginning and end of English words has shown that two groups of sounds with quite different patterns of distribution can be identified, and these two groups are those of vowel and consonant. If we look at the vowel-consonant distinction in this way, we must say that the most important difference between vowel and consonant is not the way that they are made, but their different distributions. It is important to remember that the distribution of vowels and consonants is different for each language.
We begin the study of English sounds in this course by looking at vowels, and it is necessary to say something about vowels in general before turning to the vowels of English. We need to know in what ways vowels differ from each other. The first matter to consider is the shape and position of the tongue. It is usual to simplify the very complex possibilities by describing just two things: firstly, the vertical distance between the upper surface of the tongue and the palate and, secondly, the part of the tongue, between front and back, which is raised highest. Let us look at some examples:
i) Make a vowel like the i: in the English word ‘see’ and look in a mirror; if you tilt your head back slightly you will be able to see that the tongue is held up close to the roof of the mouth. Now make an { vowel (as in the word ‘cat’) and notice how the distance between the surface of the tongue and the roof of the mouth is now much greater. The difference between i: and as is a difference of tongue height, and we would describe i: as a relatively close vowel a
nd as as a relatively open vowel. Tongue height can be changed by moving the tongue up or down, or moving the lower jaw up or down. So we would illustrate the tongue height difference between i: and ae
as in Fig. 3.3

ii) In making the two vowels described above, it is the front part of the tongue that is raised. We could therefore describe /i:/ and /æ/ as comparatively front vowels. By changing the shape of the tongue we can produce vowels in which a different part of the tongue is the highest point. A vowel in which the back of the tongue is the highest point is called a back vowel. If you make the vowel in the word ‘calm’, which we write phonetically as /ɑ:/, you can see that the back of the tongue is raised. Compare this with /æ/  in front of a mirror; as is a front vowel and /a:/ is a back vowel. The vowel in ‘too’ /u:/ is also a comparatively back vowel, but compared with a: it is close.
So now we have seen how four vowels differ from each other; we can show this in a simple diagram.
                        Front    &n
bsp;          Back
Close                 i:                      u:
Open               æ                       a:
It has become traditional to locate cardinal vowels on a four-sided figure (a quadrilateral of the shape seen in Fig. 7 – the design used here is the one recommended by the International Phonetic Association). The exact shape is not really important – a square would do quite well – but we will use the traditional shape. The vowels in Fig. 7 are the so- called primary cardinal vowels; these are the vowels that are most familiar to the speakers of most European languages, and there are other cardinal vowels (secondary cardinal vowels) that sound less familiar. In this course cardinal vowels are printed within square brackets [ ] to distinguish them clearly from English vowel sounds.

There is another important variable of vowel quality, and that is lip-position. Although the lips can have many di
fferent shapes and positions, we will at this stage consider only three possibilities. These are:
i)  Rounded, where the corners of the lips are brought towards each other and the lips pushed forwards. This is most clearly seen in cardinal vowel no. 8 [u].
ii) Spread, with the corners of the lips moved away from each other, as for a smile. This is most clearly seen in cardinal vowel no. 1 [ I ].
iii) Neutral, where the lips are not noticeably rounded or spread. The noise most English people make when they are hesitating (written ‘er’) has neutral lip position.
English has a large number of vowel sounds; the first ones to be examined are short vowels. The symbols for these short vowels are:I,e, ɔ, ʊ, ʌ, Ə and æ. Short vowels are only relatively short; as we shall see later, vowels can have quite different lengths in different contexts.
Each vowel is described in relation to the cardinal vowels.
/I/       (example words: ‘bit’, ‘pin’, ‘fish’) The diagram shows that, though this vowel is in the close front area, The lips are slightly spread,
/e/      (example words: ‘bet’, ‘men’, ‘yes’) This is a front vowel between cardinal vowel [a] and n [e]. The lips are slightly spread.
/æ/      (example words: ‘bat’, ‘man’, ‘gas’) This vowel is front, but not quite as open as cardinal vowel [e]. The lips are slightly spread.
/ʌ/        (example words: ‘cut’, ‘come’, ‘rush’) This is a central vowel, and the diagram shows that it is more open than the open-mid tongue height. The lip position is neutral.
/       (example words: ‘pot’, ‘gone’, ‘cross’) This vowel is not quite fully back, and between open-mid and open in tongue height. The lips are slightly rounded.
/ʊ/        (example words: ‘put’, ‘pull’, ‘push’) The nearest cardinal vowel is  [u], but it can be seen that u is more open and nearer to central. The lips are rounded.
There is one other short vowel, for which the symbol is  Ə. This central vowel – which is called schwa – is a very familiar sound in English; it is heard in the first syllable of the words ‘about’, ‘oppose’, ‘perhaps’, for example.
English Long vowels
If we compare some similar pairs of long and short vowels, for example /i:/  with / I /, or / u: / with /ʊ/,  or / æ /  with /ɑ:/, we can see distinct differences in quality (resulting from differences in tongue shape and position, and lip position) as well as in length. For this reason, all the long vowels have symbols which are different from those of short vowels; you can see that the long and short vowel symbols would still all be different from each other even if we omitted the length mark, so it is important to remember that the length mark is used not because it is essential but because it helps learners to remember the length difference. Perhaps the only case where a long and a short vowel are closely similar in quality is that of /ɜ:/  and /Ə/  but /Ə/  is a special case.

/I:/      (example words: ‘beat’, ‘mean’, ‘peace’) This vowel is nearer to cardinal vowel no. 9 [ I ] (i.e. it is closer and more front) than is the short vowel of ‘bid’, ‘pin’, ‘fish’. Although the tongue shape is not much different from cardinal vowel no. 9, the lips are only slightly spread and this results in a rather different vowel quality.
/Ə:/    (example words: ‘bird’, ‘fern’, ‘purse’) This is a mid-central vowel which is used in most English accents as a hesitation sound (written ‘er’), but which many learners find difficult to copy. The lip position is neutral.
/ɑ:/     (example words: ‘card’, ‘half, ‘pass’) This is an open vowel in the region of cardinal vowel no. 4 [a], but not as back as this. The lip position is neutral.
/ɔ:/    (example words: ‘board’, ‘torn’, ‘horse’) The tongue height for this vowel is between cardinal vowel no. 6 [ɔ] and no. 7 [o], and closer to the latter. This vowel is almost fully back and has quite strong lip-rounding.
/u:/     (example words: ‘food’, ‘soon’, ‘loose’) The nearest cardinal vowel to this is no. 8 [u], but BBC /u:/ is much less back and less close, while the lips are only moderately rounded.
In terms of length, diphthongs are similar to the long vowels described above. Perhaps the most important thing to remember about all the diphthongs is that the first part is much longer and stronger than the second part; for example, most of the diphthong /aI/ (as in the words ‘eye’, ‘I’) consists of the a vowel, and only in about the last quarter of the diphthong does the glide to I become noticeable. As the glide to /l/ happens, the loudness of the sound decreases. As a result, the /I/ part is shorter and quieter. Foreign learners should, therefore, always remember that the last part of English diphthongs must not be made too strongly.
The total number of diphthongs is eight (though /uƏ/  is increasingly rare). The easiest way to remember them is in terms of three groups divided as in this diagram

The centring diphthongs glide towards the /Ə/ (schwa) vowel, as the symbols indicate.

The closing diphthongs have the characteristic that they all
end with a glide towards a closer vowel. Because the second part of the diphthong is weak, they often do not reach a position that could be called close. The important thing is that a glide from a relatively more open towards a relatively closer vowel is produced.
Three of the diphthongs glide towards /I/, as described below:

Two diphthongs glide towards U, so that as the tongue moves closer to the roof of the mouth there is at the same time a rounding movement of the lips. This movement is not a large one, again because the second part of the diphthong is weak.
/ʊƏ/  (example words: ‘load’, ‘home’, ‘most’) The vowel position for the beginning of this is the same as for the “schwa” vowel /Ə/  , as found in the first syllable of the word ‘about’. The lips may be slightly rounded in anticipation of the glide towards /ʊ/, for which there is quite noticeable lip-rounding.
/aʊ/ (example words: ‘loud’, ‘gown’, ‘house’) This diphthong begins with a vowel similar to aI. Since this is an open vowel, a glide to /ʊ/  would necessitate a large movement, and the tongue often does not reach the u position. There is only slight lip-rounding.
3.3 Triphthongs
The most complex English sounds of the vowel type are the triphthongs. They can be rather difficult to pronounce, and very difficult to recognise. A triphthong is a glide from one vowel to another and then to a third, all produced rapidly and without interruption. For example, a careful pronunciation of the word ‘hour’ begins with a vowel quality similar to /ɑ:/, goes on to a glide towards the back close rounded area (for which we use the symbol /ʊ/), then ends with a mid-central vowel (schwa, /Ə/). We use the symbol /aʊƏ/ to represent the pronunciation of ‘hour’, but this is not always an accurate representation of the pronunciation.
The triphthongs can be looked on as being composed of the five closing diphthongs described in the last section, with O added on the end. Thus we get:
eI + Ə  = eIƏ                    Əʊ + Ə   = ƏʊƏ  
aI + Ə  =  aIƏ                   aʊ + Ə   = aʊƏ
ɔi + Ə  =  ɔiƏ  
The principal cause of difficulty for the foreign learner is that in present-day English the extent of the vowel movement is very small, except in very careful pronunciation. Because of this, the middle of the three vowel qualities of the triphthong (i.e. the /I/ or /ʊ/ part) can hardly be heard and the resulting sound is difficult to distinguish from some of the diphthongs and long vowels. To add to the difficulty, there is also the problem of whether a triphthong is felt to contain one or two syllables. Words such as ‘fire’ /faIƏ/ or ‘hour’ aʊƏ  are probably felt by most English speakers (with BBC pronunciation) to consist of only one syllable, whereas ‘player’ /pleIƏ/ or ‘slower’ /sleʊƏ/ are more likely to be heard as two syllables.
We will not go through a detailed description of each triphthong. This is partly because there is so much variation in the amount of vowel movement according to how slow and careful the pronunciation is, and also because the “careful” pronunciation can be found by looking at the description of the corresponding diphthong and adding /Ə/ to the end. However, to help identify these triphthongs, some example words are given here:
/eIƏ/ ‘layer’, ‘player’                / ƏʊƏ/  ‘lower’, ‘mower’
/aIƏ/ ‘liar’, ‘fire’                       /Ə/  ‘power’, ‘hour’
/ɔiƏ/ ‘loyal’, ‘royal’

