Robin Allott. 1974. Language and Speech 17: 377-402.
Survey of colour-names in many different languages shows that there is a greater resemblance between them in geographically distant and unrelated languages than can plausibly be explained as the result of chance, borrowing or hitherto unsuspected language relationships. This suggests a universal tendency for there to he a relation between the meaning, the percept and the phonological form which does not operate absolutely but tends to restrict the sounds used and increase the probability that certain patterns of words will be found for certain percepts, particularly where the percepts are sharply defined and uniformly recognized.
The starting point for this article is the excellent study by Berlin and Kay (1969) published under the title 'Basic Color Terms, their Universality and Evolution'. The direction and conclusion of that study can be described briefly as follows. Current doctrine in linguistics and anthropology holds that each language and culture expresses a unique world view by its particular way of slicing up reality into named categories. This doctrine emphasizes the difficulty of word-for-word translations between languages, and interprets this difficulty as evidence that each language (and culture) predisposes its hearers to see the world in terms different from those employed by other languages (and cultures). The typical example of this tenet, which appears most frequently in texts on linguistics and anthropology, is colour vocabulary. According to accepted doctrine, words for basic colours are not translatable across languages; each language encodes colour perceptions into basic colour words (e.g. white, black, red) in a way that is totally arbitrary with respect to comparable encoding in other unrelated languages. However, on the basis of a careful linguistic and psychophysical investigation in ninety-eight languages of diverse language families, this traditional doctrine is rejected. Berlin and Kay find that eleven psychophysically defined colours serve as the perceptual focal points of all the basic colour words in all the languages of the world. This set of eleven psychophysically defined percepts thus constitutes a substantive semantic universal. Basic colour words are translatable. Furthermore though it has no particular relevance for this article, they found that words for the basic colours arose in different languages in a particular sequence: so all languages with only two basic colour words have words for black and white; languages with exactly three basic colour words have words for black, white and red and so on. They interpret this ordering as an evolutionary one.
The following points can be picked out from their study as of significance for the purposes of this article:
(a) their collection in a very careful way of the words for different colours from 98 languages belonging to very many different language families from widely separated parts of the world;
(b) their highly interesting and comprehensive testing of the perceived colours in fact associated with particular colour-names in different languages through mapping of colour terms with the aid of a display of 329 colour chips;
(c) the extent to which their conclusion that semantic universals do exist in the domain of colour vocabulary carries conviction in the light of their research i.e. the word for RED in different languages does in fact mean the same perceived RED;
(d) their rejection of semantic arbitrariness, controverting Sapir and Whorf: "We suspect that this allegation of total arbitrariness in the way languages segment the colour space is a gross overstatement
Standardised colour stimuli were used in conducting the research. These consisted of a set of 329 colour chips, composed of 320 colour chips of forty equally spaced hues and eight degrees of brightness, all at maximum saturation and nine chips of neutral hue (white, black and greys). Their data were gathered in two stages. First the basic colour words of the language in question were elicited from the informant, using as little as possible of any other language. Secondly, each subject was instructed, using the array of colour-chips, to map both the focal point and the outer boundary of each of his basic colour terms. No informant was asked to map his colour terms until the investigator had elicited verbally his full list of basic colour terms. The languages studied were genetically diverse. All informants were native speakers of their respective languages. The primary data included basic colour terminologies for the following languages: Arabic (Lebanon), Bulgarian, Catalan, Cantonese, Mandarin, English, Hebrew, Hungarian, Ibibio (Nigeria), Indonesian, Japanese, Korean, Pomo (California), Spanish, Swahili, Tagalog (Philippines), Thai, Tzeltal (Southern Mexico), Urdu and Vietnamese. Their main conclusions were based on study with these languages. To confirm their findings they searched the literature for reports on colour terminologies in other languages and gathered what they described as reasonably reliable information on seventy-eight languages in addition to the twenty for which they had experimental data. They found that the information from these other languages conformed almost totally to the conclusions they had drawn from their experimental data. The main conclusion from the experimental data was that the foci of (perceived) colour categories are similar among totally unrelated languages. The location of colour foci varies no more between speakers of different languages than between speakers of the same language.
For the purposes of their study, they grouped the ninety-eight languages studied into seven stages of an evolutionary sequence running from primitive languages with words only for WHITE and BLACK to more advanced languages with words for the whole range of colours. The classification was as follows:
WHITE BLACK
STAGE I
Nine languages:
7 New Guinea 1 Congo 1 South India
WHITE BLACK RED
STAGE II
Twenty-one languages:
2 Amerindian 16 African 1 Pacific 1 Australian Aboriginal 1 South India
WHITE BLACK RED GREEN
STAGE IIIa
Eight languages:
6 African 1 Philippine 1 New Guinea
WHITE BLACK RED YELLOW
STAGE IlIb
Nine languages:
2 Australian Aboriginal 1 Philippine 3 Polynesian 1 Greek (Homeric) 2 African
WHITE BLACK RED GREEN YELLOW
STAGE IV
Eighteen languages:
12 Amerindian 1 Sumatra 4 African 1 Eskimo 380
WHITE BLACK RED GREEN YELLOW BLUE
STAGE V
Eight languages:
5 African 1 Chinese 1 Philippine 1 South India
WHITE BLACK RED GREEN YELLOW BLUE BROWN
STAGE VI
Five languages:
2 African 1 Sumatra 1 South India 1 Amerindian
COMPLETE ARRAY OF COLOURS STAGE VII
Twenty languages:
1 Arabic 2 Malayan 6 European 1 Chinese 1 Indian 2 African 1 Hebrew 1 Japanese 1 Korean 2 South East Asian 1 Amerindian 1 Philippine
The basis for the examination of the colour-words in the remainder of this article is the statement that there exist universally 'for humans eleven basic perceptual colour categories which serve as the psychophysical referents of the eleven or fewer basic colour terms in any language. The quotation by Berlin and Kay from Jakobson and Halle's discussion (1956) Of the relation between sound and colour distinctions is of particular relevance: "a cautious study of synaesthetic associations between phonemic features and colour attributes should yield clues to the perceptual aspects of speech sounds". This is the underlying purpose of this study.
Berlin and Kay's study, with its systematic examination of the relation between colour-perception and colour-naming in different languages, serves as a convenient basis for exploration of some of the questions involved in the perennial discussion of sound symbolism - or better of the issue of synaesthetic associations between phonemic features and perceptual aspects of speech sounds referred to by Jakobson and Halle. This involves analysing the material in Berlin and Kay on somewhat different lines, looking at the similarities and differences between the words used for the same colour in different languages. What is proposed is to study one by one four principal colour - WHITE, BLACK, RED and BLUE - and the names for them found in the 98 languages (together with a few additional languages omitted from their study).
WHITE
The words for WHITE in the order in which they appear in Berlin and Kay's data (including several names where languages have more than one word) are:
modla tooka mabosag karey bjalo mola tótokin kena kwara pak hólo botani bura urá- blanc kakekakek yopa - sak yer bopu wapok leukos sak mabior velle cicena nzu era white vellai -eupe kena wewe lavan moða pupu ofuafu fari fejér [feher] mola pelthiti likai pái shiro miakalunga cituba bontar eborr hayahtar merkalunga fufu bottar bókùn poetih a-li iftin !gow - putih eru dap tk?up pote? lel dyéma lagti? cuo vellai belyy linteh àfiá gakurktak lo'kwe blanco dinteh tuba qöcá poetih putî fum kole ruwa vellá kha.w fufé danédjo coa cimú.xcimux safaid - ranébé amilal - mpembe 'ad lagai ?abiad trang mpembe ratuan chijme klojanna kakara churunkura pópo? putih
The next step is, without any consideration for the accepted relationships of the languages (not identified in the above list) or for the geographical proximity of the use of the different words, to examine the list objectively and systematically for phonological resemblances between these words, all of which express a single, fairly sharply-defined visual percept, the colour white. The words may resemble each other in varying degrees: (a) identity (b) similar consonants (c) similar vowels and word structure (d) related consonants (e) vaguer resemblances, and of course there may he a class of totally dissimilar (isolated) words. The question of the straightforward probability of varying degrees of similarity and dissimilarity is discussed more fully at the end of this article.
An objective and systematic examination of the list produces the following groupings (which include words which resemble each other to a greater or less extent):
modla kakekakek bopu velle miakalunga eru
mola kakara (fum) vellai merkalunga bura
hólo cicena fufé ali ruwa
moða kena yopa (kole) mabosag (karey)
mola kena (wapok) vellai mabior (kwara)
(kole) kha.w -eupe vellá urá-
pupu bjalo linteh era
fufu (blanc)dinteh (fari)
(dape) belyy (fejér)[feher]
(Òfiá) (blanco) eborr
(tuba) (lavan) mpembe yer
ofuafu mpembe (shiro)
pópo? (dape)
(pái)
(pak) 'ad dyéma
?abiad chijme
(pelthiti)lagti? tooka !gow wewe
pote? leukos tótokin (tk?up)(white) ratuan
poetih likai (tuba) cuo (hayahta)
putih lagai (cituba) qöcá (kha.w) churunkura
poetih lo'kwe (tk?up) coa
putih (lel) iftin nzu
putî
(botani) sak danédjo gakurktak
(bottar) sak
(bontar) ranébé ókùn
amilàl cimú.xcimux safaid trang klojanna
The result of this exercise carried through without regard to language relationships is to produce a number of groups where the words appear to have a certain degree of similarity (words in brackets are those where the similarity is more distant), some with a good number of words in them, some with only a few, and to leave a certain number of words still isolated. It will be interesting to look more closely at these different groups first to see how plausible the resemblance appears to be when a group contains a substantial number of words and then at the isolated words and small groups to see whether they can be associated with any of the larger groups (some words with a resemblance to more than one group are included in several groups in brackets).
There are eight groups each containing six or more words. Out of the 103 distinct words listed (including duplicate words from single languages), 71 are contained in one or other of the larger groups; IS words are listed as apparently completely isolated from any plausible resemblance to other words; 23 words are listed with brackets round them to show that 'the resemblance is rather remote.
Now at first sight 71 words out of 103 which fall into groups could suggest a rather impressive degree of uniformity but this may be due to the fact that the languages from which the words are drawn are related or because the resemblance is less convincing when critically examined. Taking the largest group first, that starting with 'bopu' and containing 15 words for white, they can be broken into sub-groups and rearranged to show the resemblance more clearly as follows:
bopu yopa (dape) pupu -eupe (tuba) fufu (wapok) pópo? (pa'i) fufé (pak ofuafu (fum) (àfiá)
The first one seems a quite plausible collection of resemblances (with the two words in brackets as possibly similar, though more distant). The second group except for the first two words is clearly more speculative; the final two words in brackets do not fit well into the group or with each other. The words in the third sub-group have some resemblance in structure to the other words in the group but that is all that can be said. Another point which it may be of interest to consider before looking at the language relationships of the words in the first group is whether there are any distant or plausible resemblances with words in other groups. So the first group starting with 'modla', the fourth group starting with 'velle' and the sixth group starting with em' and the group in which the main element is 'pote?' are mainly composed of two-syllable words with some similarity of word-shape but beyond this nothing apparently of interest arises. In particular the isolated words look very different from the words in this group.
How far, however, are resemblances in this group due to language relationships? Taking the first sub-group in the previous paragraph, the languages from which the words are drawn are as follows:
bopu Ngombe : Afro-Asiatic (Congo)
pupu Tiv : Congo-Kordofaniian (Nigeria)
fufa Tshi : unclassified (West Africa)
pópo? Sierra Popoluca : Penutian (Mexico)
fufé Jekri : Congo-Kordofanian (Nigeria)
fum Bulu : Congo-Kordofanian West Africa)
Òfiá Ibibio : Congo-Kordofanian (South Nigeria)
Except for one word of Mexican origin, all these words are from African languages; it is interesting that the two where the resemblance is less clear, fum' and Òfiá, come from the same language group as most of the rest of the words. The Mexican word resembles other words in the group more closely than words from related African languages. The fact that selection on this basis of resemblance should have brought words largely from the same language family together is reassuring.
Looking now at the words in the second sub-group, they come from the following languages:
yopa Oueensland : non-Austronesian
-eupe Swahili : Congo-Kordofanian (Tanzania)
wapok Queensland : non-Austronesian
pak Cantonese : Sino-Tibetan (South China)
pái Mandarin : Sino-Tibetan (North China)
and the two words in the third sub-groups:
dape Bagirmi : Nilo-Saharan (Chad)
tuba Ila : Congo-Kordofanian (North Rhodesia)
On this, observe as interesting the resemblance between the Swahili and the Queensland word and the fact that this grouping including remoter resemblances has succeeded in bringing together the words for WHITE from a good number of possibly related African languages. The degree of similarity and difference between two undoubtedly related Chinese words is also instructive.
Taking the sixth group which contains the next largest number of words, 12, beginning 'eru', the words can again be divided into sub-groups as follows:
eru bura kwara
era eborr (karey)
urá- - (fari) (shiro)
yer (fejér)[feher]
ruwa
The languages from which these words are drawn are as
follows:
eru Baganda : Congo-Kordofanian (Uganda)
era Bedauye : Afro-Asiatic (Ethiopia)
urá- - Tarascan (Mexico)
yer Dinka : Nilo-Saharan (Sudan)
ruwa Ixcatec : Otomanguean (Mexico)
bura Fitzroy River Group : Australian (Queensland)
eborr Masai : Nilo-Saharan (Sudan)
(fari) Hausa : Afro-Asiatic (Nigeria)
(fejér) Hungarian : Altaic (Hungary)[correct to: feher Hungarian:
Finno-Ugrian]
karey Songhai : Nil-Saharan (Mali)
(kwara) Songhai : Nilo-Saharan (Mali)
(shiro) Japanese : Altaic (Japan)
On this, one can observe once again the bringing together of words from a somewhat related geographical area in Africa, the presence of the Amerindian 'urá- -' close to words from African languages, the resemblance of the words 'eborr' and 'bura' from the Sudan and Queensland. The Japanese 'shiro' is included on the basis of similarity of word-form.
The next group examined is that shown as starting with the word ' lagti?'. Rearranging to bring out the similarities:
likai Western Apache : Athapascan (S.W. United States)
lagai Navaho : Athapascan (S.W. United States)
lagti? Hanunoo : Austronesian (Philippines)
leukos Greek : Indo-European (Greece)
lo'kwe Bari : Nilo-Saharan (Sudan)
(lel) Nandi : Nilo-Saharan (Ethiopia)
Again this shows words from related geographical areas being brought together but with some quite striking resemblances between remote language - 'lagai' Navaho and 'lagti?' from the Philippines, 'leukos' Greek and 'lo'kwe' from the Sudan. The question whether these can be described simply as coincidences, indeed what meaning in this context can be given to The idea of coincidence is considered at the end of this article. As a minimum statement, some of the resemblances seem of interest.
This leaves four sizable sub-groups. Taking first that beginning '!gow' and rearranging somewhat to bring out similarities:
!gow Kung Bushman : Khoisan (Kalihari Desert)
cuo Daza : Nilo-Saharan (East Nigeria)
qöcá Hopi : Uto-Aztecan (S.W. United States)
tk?up Chinook Jargon (America N.W. Coast)
sak Tzeltal : Mayan (Mexico)
sak Tzotzil : Mayan (Mexico)
Except for the first two words this brings together
words from North and Central America. The degree of
resemblance is quite good.
The three remaining groups considered involve for the
most part clearly related languages:
vellai Plains Tamil : Dravidian (S. India)
velle Paliyan : Dravidian (S. India)
vellá Malayalam : Dravidian (S. India)
belyy Russian : Indo-European (Russia)
byalo Bulgarian : Indo-European (Bulgaria)
(blanco) Spanish : Indo-European (Spain)
(blanc) Catalan : Indo-European (Spain)
(a-li) Arawak : Andean-Equatorial (Surinam)
(lavan) Hebrew : Semitic (Israel)
The resemblance between the Slav and Dravidian words is
of some interest. The Hebrew word is included allowing for
metathesis.
The remaining two broadly homogeneous groups are:
modla Dugum Dani : Ndani (New Guinea)
moða Pyramid Wodo : Ndani (New Guinea)
mola Hitigima : Ndani (New Guinea)
mola Upper Pyramid : Ndani (New Guinea)
hólo Jale' : Central New Guinea
(kole) Mende : Congo-Kordofanian (Sierra Leone)
The last word resembles the others in word-shape and is
the only word from a geographically distant language.
And a final group of words for WHITE:
pote? Samal : Austronesian (Philippines)
putî Tagalog : Austronesian (Philippines)
putih Malay : Austronesian (Malaya)
putih Bahasa Indonesia : Austronesian (Indonesia)
poetih Malay : Austronesian (Malaya)
poetih Javanese : Austronesian (Sumatra)
(pelthiti) Toda : Dravidian (S. India)
Except for the Dravidian word which has a broad resemblance, the other words come from the same language family. To these might possibly be added:
botani Poto : Congo-Kordofanian (Congo)
bottar Batak : Austronesian (Sumatra)
bontar Batak : Austronesian (Sumatra)
To complete the systematic consideration of the words for WHITE, it may be useful finally to consider whether there is any plausible but more distant resemblance between the main groups considered and whether any of the isolated words and small groups can be related The following observations are made:
There is some similarity of word-form between the groups 'modla' 'vellai' 'pote?' 'bopu' and some isolated resemblances:
white (English) tooka (Ndembu : Congo) hayahta (Korean) tótokin (Pomo : California) mabosag (Bisayan : Philippines) 'ad (Somali : Chad) mabior (Dinka : Sudan) ?abiad (Arabic :Lebanon)
Following the same approach in considering the words for BLACK, the words in the order in which they appear in Berlin and Kay are as follows:
Following the same approach in considering the words for BLACK, the words in the order in which they appear in Berlin and Kay are as follows:
mili boindu uli yik hitam
muli unma guru bi cerno
sin manara (melas) bibi hak
golegole citema oji turí- negre
bohindu -eusi uli ?ihk' car
sihappu ii uliuli ?ik' macar
muði kârthiti obyibi hadál black
muli tuntum dilhil baki shahor
kubikubinga humáksanbirong wiwi fekete
o-ri ili agong hei kuro
dagaru biru zho erok kkamahta
fima li?el zìkò itam
të ebúbít yasko ?etom tui
vin shiaa girmitak karuppu chërnyy
dudu teli lurnö negro
ndombe balédjio qöym irang itím
mwindo balébé tiye kadupo dam
mutanga mado hma xayxayx kálá
wuyila rapen lidzin aztuf ðen
likolkokin urapulla cuch ?aswad glinna
maitum nema
These words can be formed into groups in a similar way as follows:
mili turí- (golegole) urapulla dudu
muli teli kálá o-ri (tuntum)
muði tiye kuro (oji) (ndombe)
melas) tui guru (-eusi)
uli te karuppu mutana
uli (hei) (kadúpe) bohindu (maitum)
(wiwi) car boindu (mado)
(wuyila) sin macar (mwindo)
ili shia (cerno) likolkokin
ii sihappu (chërnyy) dagaru li?el
l?el shahor (kârthiti)
(kubikubinga) fima dilhil
(vin)
unma agong cuch ?ik' lurnö (ii)
manara (hok) ?ihk' (cerno) itím
nema zho yik (chërnyy) itam
hma (shahor) bi zìkò? ?etom
humáksan (shia) bibi (yasko) xayxayxh hitam
(aztùf) obyibi (hok) (citema)
balédjio biru ?aswad (dam)
balébé girmitak birong hadál (ðen)
(irang)
(erok) negre
rapen qöym (agong) baki negro glinna
(karuppu) black
(urapulla)lidzin ebúbít (kkamahta) fekete
Again it is of interest to see how far the similarities of words belonging to one of the groups above may be due to a relationship of the languages involved and to look more closely at the degree of resemblance. Taking the largest group first (which contains several sub-groups) starting with the word 'mili', this can be divided as follows
mili Dugum Dani : Ndani (New Guinea)
muli Hitigima : Ndani (New Guinea)
muði Pyramid Wodo : Ndani (New Guinea)
muli Upper Pyramid : Ndani (New Guinea)
melas Greek : Indo-European (Greece)
Except for the Greek word, the resemblance here is due
directly to language relationship.
The next sub-group is drawn as follows:
uli Ellice Island : Austronesian (Polynesia)
uli Pukapuka : Austronesian (Polynesia)
uliuli Tongan : Austronesian (Polynesia)
(wiwi) Dahomeyan : Congo-Kordofanian (Dahomey)
wuyila Ndembu : Congo-Kordofanian (Congo)
ili Bagirmi : Nilo-Saharan (Chad)
ii Tiv : Congo-Kordofanian (Nigeria)
li?el Chinook Jargon : Amerindian (N.W. America)
The resemblances are quite strong, partly due to
language relationship, but also between African and
Austronesian languages. It is interesting that the resemblance
between the first and second sub-groups brings together words
from the same general area (the Pacific).
The next sub-group of the first group for BLACK is
derived as follows:
tui Nandi : Nilo-Saharan (Ethiopia)
tiye Ixcatec : Otomanguean (Mexico)
teli Mende : Congo-Kordofanian (West Africa)
të Bullom : Congo-Kordofanian (Sierra Leone)
turí- Tarascan (Mexico)
(hei) Mandarin : Sino-Tibetan (North China).
Within this group, except for the Chinese word 'hei' which seems a very doubtful inclusion, the resemblance is quite strong with words drawn from Africa and America which are fairly similar. The resemblance of this sub-group to the previous two is mostly in the shape of the words though there are language-relationship links between the sub-groups e.g. 'teli' Mende and 'ii' Tiv, both 'from the same broad African group.
The next substantial group of words for BLACK (though clearly the resemblances are less striking than in the first group) starts with the word 'golegole'. It can be divided into sub-groups and arranged as follows:
kuro Japanese : Altaic (Japan)
guru Fitzroy River : Australian (Queensland)
karuppu Plains Tamil : Dravidian (S. India)
(kadúpe) Malayalam : Dravidian (S. India)
car Dinka : Nilo-Saharan (Sudan)
macar Dinka : Nilo-Saharan (Sudan)
kálá Urdu : Indo-European (India)
(golegole) Murray Island : non-Austronesian (New Guinea)
(kârthiti) Toda : Dravidian (S. India)
(cerno) Bulgarian : Indo-European (Bulgaria)
(chërnyy) Russian : Indo-European (Russia)
There are some considerable resemblances here between remote languages even though the sub-group as a whole does not hold together very closely. Perhaps the Japanese/Queensland resemblance is of most interest, along with the Dinka/Todo resemblance ('kârthiti 'meaning ' it is black' so that the ending should be disregarded).
It is doubtful whether the remaining sub-group is of sufficient interest to treat here but for what it is worth it is drawn as follows:
o-ri Arawak : Andean-Equatorial (Surinam)
urapulla Arunta : Australian (Australia)
oji Ibo : Congo-Kordofanian (Nigeria)
-eusi Swahili : Congo-Kordofanian (Tanzania)
The next group considered of words for BLACK is that shown as starting 'ii ':
(ii) Tiv : Congo-Kordofanian (Nigeria)
itím Tagalog : Austronesian (Philippines)
itam Malay : Austronesian (Malaya)
?etom Samal : Austronesian (Philippines)
hitam Bahasa Indonesia Austronesian (Indonesia)
(citema) Shona : Congo-Kordofanian (Rhodesia)
(dam) Thai : Sino-Tibetan (Thailand)
ðen Vietnamese (Vietmam)
The resemblances here are strong, mostly because of
language relationships; the Shona word citema' is interesting
if, as appears the case, the specific element for black in the
word is 'tema'. The variants from Austronesian languages show
the diversity of forms possible.
The next group starts 'bi ':
bi Songhai : Nilo-Saharan (Mali)
bibi Songhai : Nilo-Sabaran (Mali)
ebúbít Ibibio : Congo-Kordofanian (S. Nigeria)
obyibi Urhobo : Congo-Kordofanian (Nigeria)
biru Hanunoo : Austronesian (Philippines)
birong Batak : Austronesian (Sumatra)
(irang) Javanese : Austronesian (Sumatra)
(erok) Masai : Nilo-Saharan (Sudan)
(agong) Batak : Austronesian (Sumatra)
The relation between the words from different families of African languages is of interest, and also again the variant forms in the Austronesian group. The Philippine/African resemblance is quite good. 'erok'/'irang' is a reasonably good resemblance, certainly as close as some of those in related languages.
The final group of words for BLACK considered is that beginning '?ik ':
?ik' Tzotzil : Mayan (Mexico)
?ihk, Tzeltal : Mayan (Mexico)
yik Sierra Popoluca : Penutian (Mexico)
zìkò Nupe : Congo-Kordofanian (Nigeria)
(yasko) Daza : Nilo-Saharan (East Nigeria)
hak Cantonese : Sino-Tibetan (South China)
The resemblance between the Mexican group of words and
the Nupe word 'zìkò' is of interest. The Cantonese
hardly fits.
Apart from resemblances in the larger groups, attention
can also be drawn to:
shia Ila : Congo-Kordofanian (Africa)
shahor Hebrew : Semitic (Israel)
mado Somali : Afro-Asiatic (Chad)
maitum Bisayan : Austronesian (Philippines)
unma Queensland : non-Austronesian (Queensland)
nema Shona : Congo-Kordofanian (Rhodesia)
hma Mazatee : Otomanguean (Mexico)
manara Queensland : non-Austronesian (Queensland)
humáksan Yibiri : Afro-Asiatic (Chad)
black English : Indo-European
baki Hausa : Afro-Asiatic (N.W. Nigeria)
The same approach can be applied to words for colour RED, which are in the order in which they appear in Berlin and Kay:
kore palá abang
mynfu ndàídàt kace
bléma subila ni cuweppe
shah kpou ?ilp'ilp
goddioudo lichi azgàh
re 'as beg ?ahmar
didé laulau s-wigi mérah
mbwaki tutuka tsuku cerveno
motáné mapula cábac hung
erereng kula kyirey vermell
chinana kiran kondon lual
tantankin erythros kyama adom
eyeyengo uhie cará piros [vörös]
oti kula cah aka
owang
cipswuka gu1a coh ppalkhahta
ekundu ßaßare adaro pirir
nyian nchi veve krasnyy
pògh rara ja rojo
subila gara húng pulá
koko bara enyuki de.ng
ásèrah !ga dzuúfú lal
'at'e pel peat
rara? maado sivappu ðo
anpaluktak lo'tor shilowa
red
These can be grouped as follows:
kore red didé tantankin koko
re mérah (goddioudo) tutuka (kace)
erereng (krasnyy) (de.ng) tsuku cará
(eyeyengo)rojo cipswuka cah
(aserah) (ekundu) subila) coh
rara? (koko) !ga
kiran ja
cuweppe aka
erytbros mbwaki
rara mynfu (owang)
gara cábac
bara
(ßaßare) bléma pokh ásèrah
kyirey (subila) kpou 'as
(kondon) motáné pel adaro
(kyama) (oti) palá azgàh
cará piros 'at'e
(cah) shah pirir adom
(coh) cará pulá adaro
merah (cah) chinana mapula
cerveno (coh) (nyian) ?iIp'ilp ndàídàt
laulau nchi maado hung lo'tor
kula ni adom hung
kula lichi krasnyy
kula swigi anpaluktak enyuki
lual ppalkahta beg
veve cábac
uhie shilowa peat vermell abang
The result is that there are three fairly large groups (one very large) and several smaller groups, with about a dozen apparently isolated words. Out of the 92 words listed 43 are contained in one or other of the three larger groups. Looking at the groups more closely and dividing them where seems appropriate into sub-groups, the first large group can be divided into two sub-groups:
re kore red kyirey rara? kiran (gara) koko (bara) (kyama) (ßaßare) (kondon) rara (ekundu) erereng coh (eyeyengo) cah erythros (cerveno) rojo (krasnyy) cará ásèrah mérah mérah
The languages from which the words in the first sub-group are drawn are as follow:
re Bulu : Congo-Kordofanian (Africa)
red English : Indo-European
rara? Hanunoo : Austronesian (Philippines)
(gara) Batak : Austronesian (Sumatra)
(bara) Batak : Austronesian (Sumatra)
(ßaßare) Urhobo : Congo-Kordofanian (Nigeria)
erereng Nasioi : Southern Bougainville (South Pacific)
(eyeyengo) Poto : Congo-Kordofanian (Nigeria)
erythros Greek : Indo-European
rojo Spanish : Indo-European
cará Tarascan (Mexico)
ásèrah Yibir : Afro-Asiatic (Chad)
mérah Bahasa Indonesia : Austronesian (Indonesia)
mérah Malay : Austronesian (Malay)
The similarities between the first ten words in this list are of very considerable interest, particularly the words from Bulu, English, Hanunoo and Nasioi which arise in very widely separated language groups and geographical areas. It is striking that no single word of this pattern is found among the words for BLACK.
The second sub-group is drawn from the following languages:
kore Arawak : Andean-Equatorial (Surinam)
kyirey Songhai : Nilo-Saharan (Mali)
kiran Fitzroy River : Australian (Queensland)
koko Tshi (West Africa)
(kyama) Songhai : Nilo-Saharan (Mali)
(kondon) Songhai : Nilo-Saharan (Mali)
(ekundu) Swahili : Congo-Kordofanian (Tanzania)
coh Tzotzil Mayan (Mexico)
cah Tzeltal : Mayan (Mexico)
(cerveno) Bulgarian : Indo-European
(krasnyy) Russian Indo-European
Apart from the first three words which show a considerable resemblance (and provide the link to the first sub-group) this is not a very homogeneous sub-group though the resemblances of Tshi, Tzotzil and Tzeltal are interesting as also is the Swahili/Songhai from widely separated parts of Africa. The Russian and Bulgarian words are included to recall the relationship of the Indo-European words for RED.
The second main group of words for RED to be examined is that shown as starting with 'pògh'. This can be rearranged as follows
pògh Toda : Dravidian (S. India)
kpou Mende : Congo-Kordofanian (Sierra Leone)
pal Chinook Jargon Amerindian (NW America)
palá Hopi : Uto-Aztecan (SW United States)
pulá Tagalog : Austronesian (Philippines)
(anpaluktak) Eskimo : Eskimo-Aleut (Canada)
(ppalkahta) Korean : Altaic (Korea)
mapula Bisayan : Austronesian (Philippines)
?ilp'ilp Nez Perce : Penution (United States)
piros Hungarian : Altaic (Hungary) [correct to: Finno-Ugrian]
pirir Ndani : Nilo-Saharan (Ethiopia)
There are some interesting similarities in this group drawn from Africa, North America and the Pacific plus the resemblance between the Hungarian and Ethiopian word. The Eskimo root almost certainly goes with the Hopi and Chinook words; the Korean is included for the similarity of word-pattern with the Eskimo.
Apart from similarities in the main groups of words for RED, the following more restricted resemblances can be noted:
tantankin Pomo : Hokan (United States)
tutuka Arunta : Australian (Australia)
lual Dinka : Nilo-Saharan (Sudan)
laulau Tanna Island : Non-Austronesian (New Guinea)
maado Dazo Nilo-Saharan (E. Nigeria)
adom Hebrew : Semitic (Israel)
?ahmar Arabic : Semitic (Lebanon)
Finally, the words for BLUE are considered, even though a good number of languages lack a distinct word and there is some degree of confusion in colour-naming, particularly with GREEN, so that the relation between word and distinct colour-perception is more doubtful. The words for blue are as follows:
delíf biroe blau siniy fefe belaoe mangok goluboy shudi nilá blue azul lán ku.skú.s kahol bugháw mbusth ollonyori asmawêê kék asúl dòfa ?azra? ao fá.? bilu biru changsayk nílá nilam sino biroe (xanh) murye l'am arus lhil lanna
On the basis of some degree of resemblance, these words (listed in the order in which they occur in Berlin and Kay) can be grouped as follows:
delíf bilu (asmawêê) mangok
(fefe) biroe ?azra?
(dòfa) belaoe (ao)
(fá.?) biru arus kahol
blau azul
lán blue asúl
l'am biroe changsayk
nilam (goluboy)
nilá sino
nilá siniy bugháw
(xanh) ku.skú.s (xanh)
lhil lanna kék
mbusth ollonyori
There are three groups of interest. The largest, starting with 'bilu' can be arranged as follows:
bilu Samal : Austronesian (Philippines)
blue English : Indo-European
blau Catalan : Indo-Europeaa (Spain)
belaoe Javanese : Austronesian (Sumatra)
(goluboy) Russian : Indo-European
birn Babasa Indonesia : Austronesian (Indonesia)
biroe Javanese : Austronesian (Sumatra)
biroe Malay Austronesian (Malaya)
There is an unsolved, and perhaps insoluble, problem of how far these words in the Philippines and Indonesia have been borrowed from European languages and how far they are native. If they are native, the resemblance is striking. Berlin and Kay quote their source as saying for one of the languages, Samal, "'bilu ' is not a loan; it traces back to proto-Austronesian". Apart from this, if it is a borrowing, it is strange that they should have borrowed the word 'blue' in an area where the predominant European influence is Spanish, which uses, not a word related to 'blue', but 'azul' borrowed from the Arabic.
The second largest group can be rearranged as follows:
lán Mandarin : Sino-Tibetan (North China)
l'am Cantonese : Sino~Tibetan (South China)
(lhil lanna) Zuni : Penutian (SW United States)
nilá' Malayalam : Dravidian (S. India)
nílá' Urdu : Indo-European (India)
nilam Plains Tamil : Dravidian (S. India)
(xanh) Vietnamese
Again there may be borrowing in this group but nevertheless the relation of the Chinese, Zuni and Indian words is interesting.
Apart from these two groups, limited similarities of some interest are:
?azra? Arabic : Semitic (Lebanon)
arus Nandi : Nilo-Saharan (Ethiopia) perhaps a borrowing)
kék Hungarian : Altaic [Finno-Ugrian] (probably a borrowing)
ku.skú.s Nez Perce : Penutian (United States)
fá.? Thai : Sino-Tibetan (Thailand)
dòfa Nupe : Congo-Kordofanian (Nigeria)
This completes the survey of the differing words for physiologically well-defined percepts of WHITE BLACK RED and (subject to the qualification already made) BLUE. The study could be extended to other colours but many languages lack words for these or do not sharply distinguish them in perception; also the prevalence of borrowing, ancient and modern, between languages is more apparent and there is a greater likelihood that the colours have, not distinct and specific names, but names containing descriptions drawn from particular objects e.g. chocolate for brown, orange, and so on.
One can now take stock of this examination of words from many languages in an attempt to see on an overall view how considerable the resemblances are between words for identical concepts. So one can say:
(1) resemblances certainly do exist between the words in unrelated languages and in languages spoken in geographically remote parts of the world, where the likelihood of borrowing is small; some of the resemblances are very striking; other resemblances are about as close as those between words derived at a distance from historically earlier forms (as for example under Grimm's Law).
(2) Is it necessary to attempt to find an explanation for the resemblance - or can one say that this degree of resemblance and coincidence is what one would expect looking at any large collection of words drawn from a number of languages i.e. are the results within the limits of mere chance word-formation?
(3) If an explanation of the resemblances does appear to be needed (other than chance or language relationship) what form could it take?
As regards the reality of the resemblances noted, to bring this out more clearly one can list words where an undeniable and significant resemblance exists, drawn from the comparisons in earlier paragraphs:
bopu yopa eru bura
pupu -eupe era eborr
pópo? urá-
ruwa
likai velle botani
lagai belyy bottar
lagti?
leukos white
lo'kwe hayahta
all words for WHITE;
uli tui kuro itam
ili tiye guru ?etom
li?el
yik mado unma baki
zìkò maitum nema black
hma
manara
all words for BLACK;
re kore palá piros tantankin lual
red kyirey pulá pirir tutuka laulau
rara? kiran
erereng
erythros
all words for RED;
bilu lán ?azra?
blue l'am arus
biru nilam
lhil lanna
all words for BLUE
A quick and less systematic survey of a few other languages not covered by Berlin and Kay throws up some additional resemblances. Compare:
albus Latin with belyy Russian valkoinen Finnish velláa Tamil bore Tucanoan eborr Masai bura Queensland karaa Turkish car Dinka iitadali Arawak itam Malay kellwe Arawak kálá Urdu baksoya Chibcham baki Hausa kira Arawak kyirey Songhai kiraizi Turkish kiran Queensland hararai Arawak erereng Nasioi eirei Siriona hiarede Witoto errani Telegu
(The Amerindian words are drawn from 'Comparative Studies in Amerindian Languages', Matteson et al., (1972).)
The easiest explanation of these resemblances between widely differing languages in the words they use for the same percepts is that there is nothing to explain - that it is probable that when a large number of words is compared, there will be some apparent resemblances between them - and that in any case subjectively the degree of resemblance may he exaggerated. This is a difficult argument to dispose of - - though of course it could have been used, and was used, to deny that there was any need for systematic explanation of the resemblances observed between what are now classified as Indo-European languages. There can be long theoretical argument on how probable mathematically, or pseudo-mathematically, it is that by a random process of selection of sounds (from the millions of possible combinations) in several cases the same combination or a closely similar combination will be used for one of the unlimited number of distinct percepts and concepts that can be identified by the human mind. To say that the resemblances are the result of chance alone does not, however, seem a probable explanation and it is not strictly a scientific one - since science has progressed not by assuming that even small regularities are accidental but by looking for principles which reduce similarities and regularities to expressions of an underlying real uniformity. This is not to say that there may not be some accidental resemblances amongst those noted but it is too much to assume that they are all accidental.
It is impossible to construct any precise measure of joint similarity of words and the concepts they refer to but there are two approaches which illuminate the matter to some extent. The first is a kind of internal test that can be applied to the collection of words listed in this article. If the resemblances between words for any particular colour are an artefact - the random product of looking at a large number of words, then there should be as extensive and apparent resemblances between the words for different colours as there are between words for a single colour. Now one can expect some resemblances between these groups for several reasons; there is a limited possibility of confusion between the percepts to which the words relate, so dark-blue and dark-red are close to black and in some languages at an early stage in Berlin and Kay's evolutionary scheme, there is in fact only a single word meaning black and all dark colours, plus another word meaning white and all light colours. An example of this is that one New Guinea language has as the word for BLACK the word 'biru' which is otherwise used in that part of the world for BLUE. Another possibility from which superficial resemblance between words for say WHITE and BLACK may arise is that in a single language the word for BLACK may simply be a variant of the word for WHITE as again in some New Guinea languages 'mola' and 'muli'. But beyond resemblances of this kind, which are not the result of a random process, there should if the hypothesis is sound that resemblances between the words for a single colour are the product of chance, be much the same degree of resemblance between words for different colours.
The most objective way of deciding whether this is the case no doubt would be to list together in no particular order all the words for WHITE and BLACK and then arrange them according to similarity, without reference to meaning. However, it is possible to reach much the same result by comparing the groupings of words for WHITE and BLACK given above. Doing this, one notes the following:
(1) There are no words for BLACK which resemble the
words of the pattern 'mola' for WHITE, other than those which
come from the same group of New Guinea languages;
(2) 'kole' (New Guinea) resembles 'kálá'
(Mende);
(3) There is no reasonable resemblance to the group
containing 'kakekek' and 'kena' for WHITE amongst the words
for BLACK
(4) The nearest resemblance to the large group of words
for WHITE beginning with the word 'bopu' amongst the words for
BLACK is 'bibi', which does not fit the pattern of the WHITE
Group;
(S) There is no word in the BLACK group which nearly
resembles words in the WHITE group containing 'velle' and
'belyy'; the nearest words are probably 'balédjio' and
'balébé';
(6) The nearest resemblances to words in the group
starting 'eru' are the three words for BLACK 'urapulla',
'o-ri' and 'irang', none of them very striking;
(7) There is a resemblance 'karey', 'kwara' for WHITE
with the words 'car', 'karuppu', 'kuro', 'Itarthiti' for
BLACK;
(8) ' fejér' ['feher'] resembles 'fekete' but both come from
Hungarian;
(9) 'shiro' (Japanese) resembles 'shia' (Ila) and
'shahor' (Hebrew);
(10) 'dyem'a ' resembles 'dam' and 'ðen'
(11) The only resemblance to words in the 'lagti?' group
are the words for BLACK 'likolkokin' 'li?el' and 'lel'
(12) 'cuo', ' qöcá' and' coa ' for WHITE bear
some resemblance to ' cuch 'and 'zho'.
On these resemblances 'between words for WHITE and BLACK one can comment first, that there are not many of them. With double the number of words, there should be many more resemblances than were found amongst words for either BLACK or WHITE separately, if resemblance is a chance occurrence altogether. Secondly, of the resemblances noted, (items (7) to (12) above) only the resemblances ' karey ' car', 'li?el' 'lel', 'shiro' 'shia', and possibly, 'dyéma' dam', are really close. It is impossible on this basis to conclude that between the words for WHITE taken as a whole and the words for BLACK taken as a whole, the resemblances are as numerous and as striking as those between words for WHITE taken separately and words for BLACK taken separately. It seems fair to conclude that the resemblances between the words for the same colour cannot properly be explained as simply the result of a random process, coincidence.
Another approach which can be followed to see if it throws light on the degree of resemblance that may in practice be found and be indicative of relationship is to look at colour words taken from a single language family and to apply to the collection of words so made the same sort of classification as has been applied earlier in this article to the colour-words in Berlin and Kay's study. In practice this can most conveniently be done for Indo-European languages, taking words for WHITE, BLACK, RED and BLUE:
abi cerveno nero safaid albus cerny negro sada argos cerno niger sefid azul chërnyy noir seyah belyy erythros nil sujah bily goluboy nila schwarz bjalo kala preto surkh bianco kala red sino black kelainos rakt sinyy blanco krasnyy rojo turchino blanc kuaneos rosso vermelho blau lal rouge weiss bleu leukos ruber white branco melas rudy caeruleus modry rot
This makes 58 words in all, roughly arranged in alphabetical order. Grouping them simply according to apparent resemblance one gets a number of groups as follows:
abi caeruleus kala safaid
albus (kuaneos) kala sefid
argos kelainos sada
(krasnyy)
azul cerveno seyah
cerny lal sujah
belyy cerno (melas)
bily chërnyy nila schwarz
bjalo nil (surkh)
bleu (erythros) leukos sino
blue ruber sinyy
blau rudy modry
(goluboy) rouge turchino
blanc rosso nero
blanco rojo negro vermelho
bianco rot niger
(branco) rakt noir weiss
red white
black (preto)
The words fall into two largish groups and three or four smaller ones. Six or so words appear to be isolated. In the largest group of words beginning 'belyy', there are words meaning BLUE, WHITE and BLACK. This seems at first sight puzzling but of course in English there are resemblances between the words for different colours, not only 'blue' and 'black' but also 'grey' and 'green'. The resemblance between 'black' in English and 'blanc' and 'blanco' in Latin languages has long been noted by the lexicographers as a source of difficulty, particularly since 'blanc' also seems originally to be from a Germanic origin. Perhaps the explanation here is similar to that for the resemblance between 'mola' and 'muli' in New Guinea languages, namely that the difference between similar word forms comes precisely from the fact that they refer to opposed colours; another example might be 'vert' and 'vermelho' for GREEN and RED. In the next largest group of words which resemble each other beginning '(eryrhros)', all the words mean RED except for 'preto' which means BLACK in Portuguese, and resembles the other words more distantly.
The other groups are comparatively small and this in itself demonstrates that quite a number of distinct words for colours are used within a single language family. The group beginning with 'cerveno' contains words from Slav languages but one of them means RED and the rest BLACK. The group beginning ' kala' contains words meaning BLACK except for the rather remote resemblance 'krasnyy' meaning RED. The group beginning 'lal' contains one word for RED, 'lal', one rather remote resemblance meaning BLACK, 'melas' and the remaining two words meaning BLUE. The group beginning 'safaid' contains three words meaning WHITE and two from the same group of languages meaning BLACK. 'schwarz meaning BLACK does not go closely with 'surkh' meaning RED.
What conclusions if any then can be drawn from this comparison of colour-names in Indo-European languages? Obviously it could be much extended by bringing in other languages but it is doubtful whether this would lead to different conclusions:
(1) for the most part, grouping words by resemblance
brings together words with the same meaning - this is what one
would expect. It is especially striking for words for RED;
(2) But there are some puzzling resemblances between the
words for different colours as noted above, 'black', 'blue',
'blanco';
(3) there is a number of isolated words; some of these
are borrowed from other language groups e.g. 'azul', or have
a descriptive origin e.g. 'turchino', the Turkish colour, for
BLUE in Italian;
(4) There can be a continuous spectrum of resemblance so
that each end may only have a distant resemblance to each
other but links between them can be found e.g. 'erythros' or
'ruber' and 'red', 'belyy' and 'branco'.
In the light of this, one would not be surprised to find some resemblances between the words for different colours in the 98 languages considered earlier in this article; indeed it is surprising that there were not more between the words for WHITE and BLACK than those noted earlier. On the other hand, one could not criticise the similarities found in some of the larger groups of words as being the result of subjective exaggeration; most of the similarities are not much more remote than those found in Indo-European languages for words meaning the same colour. There may well be a similar explanation for the isolated words noted earlier in the article, i.e. that they are not genuine colour-words but transferred words with an originally different non-colour meaning in the same way as 'caeruleus' means the colour of the sky and not originally simply BLUE.
If then the resemblances noted earlier in this article do exist and are not simply a subjective exaggeration, and are not to be explained as the result of coincidence, a random product, or of known language relationships or geographical contiguity, what is the explanation for them? All that is left is: (a) some so far unrecognised language relationships; (b) borrowing; (c) a tendency for there to be a relation between the colour perceived and the sounds used to name it (a sound-meaning link, a sound-percept link) - a less tendentious way of putting it than saying sound symbolism (which involves unexamined assumptions).
It would require rather heroic assumptions to posit recognized language relationships between geographically widely separated languages to explain the resemblances found between the words used for colours. If this is the explanation, then a vast amount of systematic work would be needed before it could be accepted. The difficulties of establishing clear family relationships between languages even in geographically proximate areas - in Africa, in Latin America, in North America, in South East Asia - do not offer much hope that there will ever be adequate proof of relationship between even more widely separated languages.
As regards borrowing as an explanation, obviously care is needed. Some colour words, for less common colours, are very obviously borrowed (see the array of words of French origin used in English for shades of colours). On the other hand, borrowing is likely to take place only where there is lack of an appropriate native word for a particular percept. It is less likely that words should be borrowed for common percepts, as clearly WHITE, BLACK and RED are, and borrowing is improbable between languages where there has been no special political, cultural or geographical reason for borrowing to take place. The most doubtful case of borrowing, as already noted, is the naming of the colour BLUE where a number of languages lacked a distinction between BLUE and GREEN but even here one should not readily explain as borrowing the use, for example, of 'biru' in the Spanish-speaking Philippines. It is not enough to say that closely similar words must have been borrowed; there must also be a special probability that they have been borrowed. For most of the resemblances noted in this article, borrowing does not seem an adequate explanation.
If chance, language relationship and borrowing are ruled out as sufficient explanations of the resemblances observed between the names for colours in many languages, then all that is left is some universal tendency for there to be a relation between the meaning of the word, the percept, and the phonological form of the word, some sound-meaning link which clearly does not operate absolutely (since then there would not be many different words for the same colour) but which tends to restrict the sounds which may be formed into words to represent sharply defined percepts, or better which increases the probability that certain patters of words will be found for certain percepts; this implies semantic universals in language (following Berlin and Kay) coupled with some underlying universal principles in generating words that appear in languages. This is of course a much larger and more debatable field which cannot be attacked in this article¹. The conclusion rests that, as Jakobson and Halle suggested, study of synaesthetic associations between phonemic features and colour attributes does appear to be profitable in the approach to the perceptual aspects of speech sounds.