The vocabulary does not aim at exhaustion, but most Indonesian words that are reasonably close equivalents of LWT meanings are probably listed. These include formal words which are rarely used, but appear in Indonesian dictionaries, and are understood by educated speakers. Such words are often current in standard Malaysian Malay and/or some regional dialects of Malay-Indonesian. Colloquial words were included if they are also used to some extent in written literature (including journalism). Some colloquialisms are used throughout Indonesia, but some are specific to Jakarta Indonesian (which is rapidly becoming a lingua franca, including for certain genres of written communication). Words or forms likely to be used only by natives of Jakarta and not understood in other regions were avoided.
The word form is cited in standard Indonesian orthography, with one exception. In the standard orthography, two phonemes are not distinguished: the mid front vowel /e/ and the mid central vowel /ə/, which are both spelled <e>. In this database, the two are distinguished: /e/ is written <é>, and /ə/ is written <e>. Whenever the pronunciation of a word cannot be transparently inferred from the spelling, a phonemic transcription is provided in ‘Comments on word form’.
There are three types of cases when more than one form is listed in the Word form field. Sometimes a word may be represented by different morphological forms derived from the same root, for example potong / potongan ‘piece’. Sometimes there are different phonological forms of the same word, for example mangkuk / mangkok ‘bowl’. And sometimes there are alternative spellings of the same word, e.g. cabai / cabe ‘chili pepper’. However, in no case were forms representing different etyma (having different roots or bases) entered in the same record.
Elements given in parentheses are optional. For example, gempa (bumi) ‘the earthquake’ means the Indonesian form is either gempa or gempa bumi; (burung) gagak ‘crow’ means the form is either gagak or burung gagak. This is important, because for determining whether a form is classified as a loanword or as containing a borrowed element (W9 and all the fields relating to loanwords), the parenthesized word was ignored. However, it was taken into consideration for the purpose of determining its analyzability.
A meaning is entered if there is a significant difference between the LWT meaning and the Indonesian counterpart, or to explain shades of meaning in case of multiple counterparts of LWT meanings. Comments on the meaning of the word form are also entered here.
The criteria used to determine the analyzability of word forms are as follows:
This is chosen when no analysis is possible in modern Indonesian. This includes:
A. Historically monomorphemic words.
B. Words which were historically analyzable but are no longer perceived as such by lay speakers, e.g. sembahyang ‘pray’, historically formed from sembah ‘worship’ + hiang ‘deity’.
C. Reduplicated forms which only occur in a reduplicated form. For example, kunang-kunang ‘firefly’ exists only as such; the base *kunang does not occur by itself.
This is chosen in the following cases:
A. Words whose base does not occur by itself and has no independent meaning (‘cranberry morph’); for example, pengemis ‘beggar’ is derived from the root *kemis (which does not occur by itself with this meaning) + the agentive prefix peng-.
B. Words that appear to have a root, but also contain phonological material which are neither another root nor an affix. For example, the form keluar ‘to go out’ contains the Indonesian root luar ‘out, outside’, but standard Indonesian does not have a prefix ke- .
C. Words which were originally monomorphemic, but by metanalysis are now perceived as containing more than one morpheme. For example, pertama ‘first’ (from Sanskrit prathama) is perceived to contain the Indonesian prefix per- and a base *tama, which resulted in the derived form pertama-tama ‘at first’ by reduplicating the perceived base *tama.
D. Reduplicated forms whose base does occur by itself but which are not derived from the base by any regular process (that is, the meaning of the derived form cannot be predicted). For example, the base of laki-laki ‘man’ is laki, does occur by itself with the meaning ‘husband’, but the meaning of laki-laki cannot be predicted.
E. A compound one of whose constituents does not occur independently. For example iri hati ‘envious’ contains hati ‘liver’ which often occurs in idiomatic expressions as the seat of emotions (like English ‘heart’). However iri by itself has no meaning in modern Indonesian.
F. Words which are historically polymorphemic but perceived so only by some speakers. For example, the form perempuan ‘woman’ is transparently derived from empu ‘master’ + the noun-deriving circumfix per-an, but most lay speakers do not realize that, and regard the word as monomorphemic.
G. Words which are historically analyzable and still transparent. For example lelaki ‘male’ is transparently derived from laki ‘husband’ by partial reduplication, yet this is a frozen form, and partial reduplication is no longer productive in standard Indonesian.
This category contains words derived by reduplication and words that contain affixes. These affixes include meng- (‘active’) and di- (‘passive’) when required in standard Indonesian. (See Note 2 under W9 below.)
Determining which words are compounds is not always a straightforward process in Indonesian. Words were marked ‘analyzable compound’ in the following cases.
A. A coordinate compound that, were it a phrase, would require a conjunction, e.g. tanah-air ‘native country’ (lit. ‘land-water’); were it a phrase, it would be tanah dan air (land and water).
B. A sequence of words which could be interpreted as a phrase, but the meaning of the phrase would be different from the intended meaning. That is, the compositional semantics is not predictable. For example, rumah sakit (rumah ‘house’ + sakit ‘sick’) does not have the predictable meaning ‘a house that is sick’, but rather means ‘a hospital’.
C. Some other expressions that were intuitively deemed to be fixed. (There are no clear phonological and morphological criteria to distinguish between phrases and compounds in Indonesian.)
This choice was rarely used. Phrases are not analyzed as lexical entries in Indonesian, unless they consist of idioms. However, the distinction between ‘compound’ and ‘phrase’ was sometimes difficult, and the choice arbitrary.
The following ages were used.
"Prehistorical" refers to any stage of the language or its ancestors before the emergence of written Malay. This was in 7th century, but the language of these early inscriptions did not arise overnight, so the year 500 was chosen as the end of this stage.
"Modern" refers to the period from the 18th century, when written documents from Indonesia begin to exhibit patterns different from the written language of the Malay heartland on the Malay Peninsula.
"Early Malay" refers to the period between Prehistorical and Modern.
This field is filled in for all words. When the relevant word included one of the prefixes meng- or ber-, the choice of register refers to the base rather than to the entire form, because these two prefixes are by definition formal (except for a handful frozen forms) and mostly obligatory in standard Indonesian.
Below are some examples for the criteria used to determine the degree of certainty.
1. Very little evidence for borrowing
A. Words it is difficult to determine the direction of the borrowing because the etymon is similarly attested (either well or poorly) in both donor language and borrowing language families, but the evidence does not favor borrowing into Indonesian.
B. Words which have been claimed by some authorities to be loanwords, but I find the evidence unconvincing.
C. Words which show some similarity in sound and meaning to a word in another language, but the similarity is not great, and there is insufficient independent evidence to determine whether it is indeed a case of borrowing.
2. Perhaps borrowed
A. Words which violate a minor phonotactic constraint of inherited Malay vocabulary, and do not have a good internal etymology.
B. Words that are related by borrowing to similar words in other
languages, but for which it is more likely that Indonesian (or Austronesian) was
the donor language rather than the borrowing language.
C. Words which show some similarity in sound and meaning to a word in another language; while the similarity is not great, there is some independent evidence in support of borrowing, for example from an intermediate language.
3. Probably borrowed
A. Words which clearly violate the phonotactics of inherited Malay vocabulary, and cannot be reconstructed in Proto Malayic, even if no source word can be identified.
B. Words which somewhat violate Malayic phonotactics, and for which there is a reasonable candidate for a source word, although there are phonological and/or semantic differences between the source word and loanword that are not easily explained.
C. Words which clearly violate Malayic phonotactics, and are well represented in modern Malayic isolects, but denote a concept which was not present during the period when Proto Malayic was spoken.
4. Clearly borrowed
A. Words which have an equivalent in another language that constitutes a close phonological and semantic match, and the proposed source word is well attested in the donor language family.
B. Words which have a close phonological match in another language, even though the semantics do not match well, but the semantic difference can be logically explained.
C. Words which have close semantic match in another language, even though the phonology doesn’t match well, but the phonological differences can be logically explained.
1. A lack of clear etymology with no further evidence for borrowing was not deemed sufficient to suspect a word to be a loanword.
2. For the purpose of this field, the presence of the verbal prefixes meng- (active), di- (passive), and ber- (intransitive) were ignored. A word containing just one of these prefixes and no other affixation was considered as a loanword if its base was determined to be a loanword. This is because the process of borrowing words which function as verbs in standard Indonesian requires the use of a verbal prefix. If other affixation is present, even if one of the above three prefixes is also used, the word was classified as ‘Created on loan basis’.
3. Some extralinguistic factors were also considered. For example, plausibility of contact with speakers of the putative donor language; presence of the referent in the environment (or lack thereof); date of introduction of the referent or of the word.
Notes are provided regarding words which are not loanwords in their entirety but which contain a borrowed element. If this borrowed element already occurs in the database as a Word Form, a cross-reference is provided.
If the fields of both immediate and earliest source word are filled in, and the word was not directly borrowed from the earlier source language into the immediate source language, I entered information about the intermediate loanwords here (as far as I knew them). Forms which represent different historical stages of the same etymon in the same language are not entered.
At least one reference is provided for practically all loanwords with an identifiable source word. The references are either for one of the existing works on loanwords in Indonesian or for a dictionary of the source language (frequently both).
Abdul Kadir Usman, 2002. Kamus Umum Bahasa Minangkabau Indonesia (comprehensive Minangkabau-Indonesian dictionary). Padang: Anggrek Mulia.
Adelaar, K.A., 1989. Malay influence on Malagasy: historical and linguistic inferences. Oceanic Linguistics 28/1:1-46.
Adelaar, K. Alexander, 1992. Proto Malayic: The reconstruction of its phonology and parts of its lexicon and morphology. Pacific Linguistics Series C – 119. Canberra: Department of Linguistics, Research School of Pacific Studies, The Australian National University.
American Heritage Dictionary (AHD), 1976. Second College Edition. Boston: Houghton Mifflin Company.
Burrow, T., and M.B. Emeneau, 1966. A Dravidian Etymological Dictionary. Oxford University Press.
Casparis, J.G. de, 1997. Sanskrit Loan-words in Indonesian. NUSA 41. Jakarta: Atma Jaya University.
Collins Robert French Dictionary, 2003. Fifth Edition. Glasgow: HarperCollins.
Dyen, Isidore, 1946. Malay tiga ‘three’. Language 22:131-37.
Echols, John M., and Hassan Shadily, 1975. Kamus Inggris-Indonesia: An English-Indonesian Dictionary. Ithaca: Cornell University Press and Jakarta: Gramedia.
Echols, John M., and Hassan Shadily, 1998. Kamus Indonesia-Inggris: An Indonesian-English Dictionary. Third edition, ed. by John U. Wolff and James T. Collins. Ithaca: Cornell University Press and Jakarta: Gramedia.
Even Shoshan, Avraham, 1985. Hamilon Hehadash (the new dictionary). Four volumes. Jerusalem: Kiryat Sefer.
Fabricius, Johann Philipp, 1972. Tamil and English dictionary. 4th edition. Tranquebar: Evangelical Lutheran Mission Pub. House. Accessed online via Digital Dictionaries of South Asia, a project of the University of Chicago, at:
Garden, Domnern and Sathienpong Wannapok, 1999. Phojjananukrom Thai-Angkrit: Thai-English Dictionary. Bangkok: Amarin.
Glare, P. G. W., ed., 1996. Oxford Latin Dictionary. Oxford: Clarendon Press.
Gonda, J., 1952. Sanskrit in Indonesia. Den Haag: Oriental Bookshop.
Grijns, D. J., J. W. de Vries and L. Santa Maria, 1983. European Loan-words in Indonesian. Indonesian Etymological Project V. Leiden: KITLV.
Hardjadibrata, R. R., 2003. Sundanese English Dictionary. Based on Soendanees Nederlands Woordenboek by F.S. Eringa. Jakarta: Pustaka Jaya.
HarperCollins Portuguese Dictionary, 2001. Second Edition. Two parts: (A) English-Portuguese, (B) Portuguese English. Glasgow: HarperCollins Publishers.
Hayyim, Sulayman. 1934-36. New Persian-English dictionary. Teheran: Librairie-imprimerie Beroukhim. Accessed online via Digital Dictionaries of South Asia, a project of the University of Chicago, at:
Jones, Russell, 1978. Arabic Loan-words in Indonesian. Indonesian Etymological Dictionary III. London: School of Oriental and African Studies.
Kamus Besar Bahasa Indonesia (KBBI) (unabridged Indonesian Dictionary), 2001. Third edition. Chief editor: Hasan Alwi. Jakarta: Pusat Bahasa.
Kamus Dewan: Edisi Baru (Dewan dictionary: new edition), 1991. Chief Editor: Othman bin Sheikh Salim, Sheik.
Kamus Inggris-Melayu Dewan (KIMD): An English-Malay Dictionary, 1992. Editors in Chief: A.H. Johns and D.J. Prentice. Kuala Lumpur: Dewan Bahasa dan Pustaka.
Kramers Dutch-English Dictionary, 1987. Amsterdam/Brussels: Elsevier.
Larousse de poche, 1954. (French dictionary.) Paris: Librairie Larousse.
Leo, Philip, 1975. Chinese Loanwords Spoken by the Inhabitants of the City of Jakarta. Seri Data Dasar No. 7. Jakarta: LIPI (Lembaga Ilmu Pengetahuan Indonesia).
Liddel, H.G., and Scott. 1888. An intermediate Greek-English Lexicon. Oxford: Clarendon Press.
Made Sutjaja, I Gusti, 2000. Practical Balinese-English English-Balinese Dictionary. Denpasar: BP.
Malay Concordance Project (MCP). An online project of the Australian National University, run by Ian Proudfoot. Accessed online at:
Mardiwarsito, L., 1981. Kamus Jawa Kuna-Indonesia (Old Javanese-Indonesian dictionary). Ende, Flores: Nusa Indah.
Monier-Williams, Monier, 1899. A Sanskrit-English dictionary. Oxford: Clarendon Press.
McAlpin, David W., 1981. A core vocabulary for Tamil. Revised edition. Philadelphia: Dept. of South Asia Regional Studies, University of Pennsylvania. Accessed online via Digital Dictionaries of South Asia, a project of the University of Chicago, at:
McGreggor, R.S., 1993. Oxford Hindi-English Dictionary. Oxford University Press.
Nurlela Adnan, Ermitati, and Rosnida M. Nur, 2001. Kamus Bahasa Indonesia-Mingangkabau (Indonesian-Minangkabau dictionary). Jakarta: Balai Pustaka.
Pearsall, Judy, ed, 1999. The Concise Oxford Dictionary (COD). Oxford University Press.
Platts, John T., 1884. A dictionary of Urdu, classical Hindi, and English. London: W.H. Allen & Co.. Accessed online via Digital Dictionaries of South Asia, a project of the University of Chicago, at:
Robson, Stuart, and Singgih Wibisono, 2002. Javanese English Dictionary. Singapore: Periplus.
Shorto, Harry, 2006. A Mon-Khmer comparative dictionary. Edited by Paul Sidwell with Doug Cooper and Christian Bauer. Pacific Linguistics 579. Canberra: Research School of Pacific and Asian Studies, The Australian National University.
Sijs, Nicoline van der, forthcoming. Dutch Loanwords Database. Loanword Typology Project.
Steingass, Francis Joseph, 1892. A comprehensive Persian-English dictionary. London: Routledge & K. Paul. Accessed online via Digital Dictionaries of South Asia, a project of the University of Chicago, at:
Stevens, Alan M., and A. Ed. Schmidgall-Tellings, 2004. Kamus Lengkap Indonesia-Inggris: A Comprehensive Indonesian-English Dictionary. Athens: Ohio University Press and Bandung: Mizan.
Sugiarto et al., ed., 1995. Kamus Indonesia-Daerah (Indonesian-regional languages dictionary). Jakarta: Gramedia.
Tamil Lexicon, 1924-36. University of Madras. Accessed online via Digital Dictionaries of South Asia, a project of the University of Chicago, at:
Wehr, Hans, 1976. Arabic-English Dictionary. Edited by J M. Cowan. Ithaca: Spoken Language Services.
Wilkinson, R.J., 1959. A Malay-English Dictionary (Romanised). Two volumes. London: MacMillan & Co.
Williams, Edwin B., 1987. The Bantam New College Spanish & English Dictionary. Revised Edition. New York: Bantam Books.
Yule, Henry, and A.C. Burnell, 1968. Hobson-Jobson: A Glossary of Colloquial Anglo-Indian Words and Phrases and of Kindred Terms. New edition, edited by William Crooke. London: Routledge & Kegan Paul.
Zoetmulder, P.J., 1995, with the collaboration of S.O. Robson. Kamus Jawa Kuna Indonesia. Original Title: Old Javanese-English Dictionary. Jakarta: Gramedia.
Zorc, R. David, and Malcolm D. Ross, 1995. “A glossary of Austronesian reconstructions”. In: Tryon, Darrell T. (ed.): Comparative Austronesian Dictionary: An Introduction to Austronesian Studies. Berlin & New York: Mouton de Gruyter, 1105-1197.
Sometimes it is relatively easy to tell if a word had existed in Malay before its replacement by a loanword, if the native word is still used in some varieties of Malay-Indonesian. For example, the original word might still be used in Malaysia but not in Indonesia; or it may be used in very formal or poetic Indonesian only; or it may be found in old literature, but has since become obsolete. However, there may have been Malay words that were completely replaced by loanwords at an early stage, and left no trace; in such cases, it was not possible to determine whether replacement had taken place.
This option was chosen in cases where there is good reason to believe that the concept was introduced together with the word for it.
This option was chosen when it is known that when the loanword entered language, another word (borrowed or not) had already existed in the language with the same meaning, and this word is still in use. It is thus a combined diachronic + synchronic designation.
Note: A word was marked with ‘Replacement’ or ‘Coexistence’ only if both the original word and the loanword were determined to be exact or near-exact counterparts of the LWT meaning.
This field is only filled in for about a third of the words.
Highly integrated: The word contains only phonemes derived from Proto Malayic and does not violate of the phonological structure of inherited Malay vocabulary.
Intermediate: Violates the phonological structure of inherited Malay vocabulary but not that of modern Indonesian.
Unintegrated: Contains phonemes which do not normally occur in modern Indonesian or violates one or more phonotactic constraints of modern Indonesian.
Very little information is available regarding the presence or absence of certain concepts and entities before contact, especially when the contact was very early. So often this field was filled with ‘no information’. For function words, ‘Not applicable’ was written in, since grammatical categories and relations cannot be said to present in or absent from the environment.
The following abbreviations which are not listed in the Leipzig Glossing Rules are used in the morpheme-by-morpheme gloss.
ABST – abstract noun
ACT – active
AGT – agent, agentive
CIRC – circumfix (second part of a circumfix; see Leipzig Glossing Rules, Rule 7)
INVOL – involitive
NOUN – noun-forming affix (can derive nouns from other nouns)
ORD – ordinal
PART.RED – partial reduplication
RED – reduplication
STAT – stative
SUPERL – superlative