Japanese Phonemes and Phones

You’d be surprised how hard it is to get a straight answer on how many phonemes exist in Japanese. Most of the time, what you get are the phones of Japanese, which, of course, are quite different. Today we’ll try to settle the question once and for all:

If you’ve ever seen a Kana chart, then you’ve seen the Gojuuon, of the 50 sounds. This in itself gives us a really great starting point to finding the answer.

(Courtesy of myjapaneseprofessor.com)

From here, we can guess that there have to be at least 5 vowels and 9 consonants, assuming that the solo ん and な,に,ぬ,ね, and の use the same /n/.

/k/ /s/ /t/ /n/ /h/ /m/ /y/ /r/ /w/ ||  /a/ /i/ /u/ /e/ /o/

But wait, there’s more!

We also know that Japanese voices various columns, through the dakuten, those being the ゛marks you see after a kana.

With the dakuten, you get voiced /k/ /s/ and /t/; and you also get a voiced bilabial plosive of /p/, which is what /h/ was once upon a time. So we have /g/ /z/ /d/ and /b/. And we also have the handakuten, which is the ゜you sometimes see after the kana starting with /h/, which make /h/ into /p/.

/k/ /g/ /s/ /z/ /t/ /d/ /n/ /h/ /b/ /p/ /m/ /y/ /r/ /w/ ||  /a/ /i/ /u/ /e/ /o/

But wait, there’s more!

We also have the phenomenon of palatalization, which is represented by the ゃゅょ you can see after き,ぎ, し, じ, ち, に, ひ, び, ぴand み, effectively making digraphs.

For the sake of simplification and, we will call these digraphs /ky/ /gy/ /sy/ /zy/ /ty/ /ny/ /hy/ /by/ /py/ and /my/. This does not correspond with the romanization system we use, but that’s okay. When we’re just transliterating, we’re not terribly concerned about phonemes as such.

Okay, so here’s our answer

/k/ /g/ /ky/ /gy/ /s/ /sy/ /zy/ /z/ /t/ /ty/ /d/ /n/ /ny/ /h/ /hy/ /b/ /by/ /p/ /py/ /m/ /my/ /y/ /r/ /w/ ||  /a/ /i/ /u/ /e/ /o/

24 consonants, and 5 vowels. (We’ll eventually concede them an extra nasal phoneme, but let’s pretend all non-fontal nasals are the same.)

Now let’s figure out the phones.

In a perfect world, one would have one phone for one phoneme. But that’s really never the case. Fair warning, phonetics is a hotly debated topic, so if you talk to 5 different people, you’ll get 5 different answers.

The crossed out phonemes are the ones that we won’t worry much about because they do not have allophones.

/k/ /g/ /ky/ /gy/ /s/ /sy/ /zy/ /z/ /t/ /ty/ /d/ /n/ /ny/ /h/ /hy/ /b/ /by/ /p/ /py/ /m/ /my/ /y/ /r/ /w/ ||  /a/ /i/ /u/ /e/ /o/

Allophones are two phones that correspond to the same phoneme under different distributions. You’ll see what I mean in a minute.

Distribution refers to the surroundings of a phone, i.e. what comes before and after it. Depending on it’s distribution, the sound a phoneme is will change. For example, take the words “house” and “Hugh.” Notice how the /h/ changes from an open sound to a constrained sound. The sound changes because of what’s coming in front of it, it’s distribution.

In Japanese, the most important factor in phoneme distribution is what comes after it, meaning, in most cases, what vowel comes after the consonant. There are three instances where what comes before the phoneme is important: /g/, and /u/ and /n/.


The easiest case to talk about is /u/, because everyone becomes savvy to this very quickly.

Let’s take two anime names: “Uzumaki” and “Sasuke.” Note that those two middle /u/’s are pronounced different. In the first case, it’s pronounced; and in the second it isn’t.

When /u/ is between two voiceless consonants, it is silent.

Let’s look at once more case, that of verbs: “desu,” “kimasu,” “korosu.” Here /u/ is also silent. But if you look at verbs such as “kiku” and “naku,” the /u/ is pronounced.

In final position, if preceded by /s/, /u/ is silent.

In the case of /g/, this is a dialectical thing, but it’s common enough to talk about. (I’d even say that it’s exaggerated in some instances where people are trying to speak clear, Standard Japanese.)


With /g/, there is a chance it will turn into the velar nasal: [ŋ]

[ŋ] is the same sound that Spanish’s “ñ” represents. It’s the same /n/ in sing.

You’d be surprised by how difficult it is to pin down when this phenomenon happens. It has probably changed a lot in the past 50 years. We have heard from people who learned Japanese 30 years ago that it’s supposed to be a posh feminine thing, but we’ve heard audio of men doing it all the time. Just keep in mind that it does happen.


There seems to be some consensus that the /n/’s in な, に, ぬ, ね, and の are all the same. It’s the alveolar nasal [n].

In the case of ん, where the thing following it is not a vowel, it will assimilate in position according to the features of the consonant. If it’s followed by a vowel, however, it will be [n], which is why we’ll try to keep it one phoneme. (Some linguists believe that this is actually a nasalized vowel, but we cannot quite see it.)

The most famous case of this kind of assimilation happening is with the word 先輩 (せんぱい/senpai), where it is pronounced with an [m], “sempai.” [m] is the bilabial nasal, and [p] is a bilabial plosive.

The same thing happens with velar consonants [k] and [g]. 産休 (さんきゅう/sankyuu) will be pronounced as our friend [ŋ].

In cases where it is at the end of the sentence, it will be pronounced as the uvular nasal [ɴ].

When /n/ precedes a bilabial consonant, it is [m].

When /n/ precedes a velar consonant, it is [ŋ].

When /n/ is in final position, it is [ɴ].

In all other cases, /n/ is [n].


/r/ is another famous case. This is a case of two allophones having “equal distribution,” i.e. you can exchange one for another and you’ll never really have a problem.

/r/ is both a flap and a lateral. To put it colloquially, you can pronounce either as an r or as an l. Some people have developed rationales for when a consonant is pronounced either as an the flap [ɾ] or as the lateral [l], but there is a lot of inconsistency between them.


There exist a couple of variations with /h/ (and aspirations with certain plosives), but the one case we need to talk about is when /h/ precedes /u/.

The tendency is to place /h/ generally in the glottal area, so the glottal fricative [h].

Before /u/, /h/ can be pronounced as a bilabial fricative [ɸ], the labiodental fricative [f], or the glottal fricative [h].

The standard pronunciation, to our understanding, is [ɸ], but the others are heard.


/w/ is a dying phoneme. There originally existed four Kana for it (which can still be heard in the Iroha poem): わ, ゐ (wi), ゑ (we), and を. Now only two are in use: わ and を; and を is dying out. If it were not for its functional use, for clearly marking the accusative case particle, it would be gone as well.

We bring up /w/ because there are those who pronounce the /w/ before /o/ and those who don’t. Both seem to be fine, though some may claim that this is an abnormality known as hypercorrection, where a sound is produced unnaturally for the same of some kind of regularity (this is common in cases of assimilation, such as pronouncing the final /s/ in “rings” as an [s] and not as a [z]).

Regardless, there is an older generation that did learn to pronounce を with the /w/.

When /w/ precedes /o/, it can either be silent or pronounced.

/t/ /d/ /s/ /z/

These four consonants undergo a similar transformation before /i/.

The consensus is that they become post-alveolar affricates.

/t/ before /i/ becomes [tɕ]

/d/ before /i/ becomes [dʑ]

/s/ before /i/ becomes [ɕ]

/z/ before /i/ becomes [ʑ]

It’s worth noting here that the palatal phonemes that /ty/, /gy/, /sy/, and /zy/ have these very same phones.

Then, in the case of /t/ and /d/, when they precede /u/, they become affricates, but alveolar and not post-alveolar, so [t͡s] [d͡z].

/t/ before /u/ becomes [t͡s]

/d/ before /u/ becomes [d͡z]

And that, friends, covers most of everything.

There are some things involving loanwords and nuances, such as /s/ remaining [s] before /i/ and there being a [v] and not just a [b], but that’s secondary to all this.