a word from Vocabulary Japanese
In most cases, the aim is for the most basic and readily understood Japanese words to be entered into the database. Overly formal or colloquial words are used when otherwise only circumlocutions are available, which are only included when they constitute set phrases. Somewhat formal words that are prevalent in writing are also used, usually alongside words more common in spoken language.
The word forms are cited in the standard Hepburn transcription, with vowel lengths dinstinguished by way of the macron, with the exception of /i:/ which is doubled as <ii>. The pitch accent is usually not marked, with the exception of where it might be relevant for instances of apparent homophony.
The word forms are cited in the Japanese script. Japanese employs a mixture of three scripts, two syllabaries and a logographic script:
(1) Kanji: this is the logographic script borrowed from Chinese. It is used for most full words: nouns, personal pronouns and the unchanging part of verbs and adjectives, of both the Native Japanese and Sino-Japanese strata. Whereas with words from the Sino-Japanese stratum each character reflects a monosyllabic Chinese pronunciation (called on’yomi ‘sound reading’ in Japanese), Native words are usually assigned completely to a Chinese character based on the meaning of the Chinese character, seldom multi-character comibinations (such as kyō ‘today’ which is assigned to a multi-character combination 今日 meaning ‘today’ in Chinese). This way of pronouncing the kanji like Native words is called kun’yomi ‘meaning reading’. This was also done for words of Foreign origin. Also, some words of Native Japanese origin that are distinguished in writing but can reasonably be classified to be instances of polysemy rather than homophony. In these cases, each instance is entered into W3 together with a meaning description.
There is the phenomenon of ateji (‘directed characters’, cf. Tajima 1998:452-461), where Chinese characters are applied to words that are not Sino-Japanese not according to their meaning but according to the sound. For instance, kega 怪我 ‘wound’ is a Native word, yet the Chinese characters, meaning “suspicious” and “self” respectively, do not reflect the meaning of the word, but rather were chosen only for their pronunciation. There are a few instances of this throughout the database, which have all been marked as such in W3. Earlier, ateji were also used for words of Foreign origin, which has also been included where applicable.
The Ministry of Education prescribes a set of 1,945 characters to be taught in schools. After the Second World War, the characters underwent a series of mild reforms, which applied to the prescribed set only and was of much smaller scale than the efforts undertaken in China. I have followed this and have not noted the traditional forms in cases where the characters have been reformed.
(2) Hiragana: this script was orginally developed from the cursive writing style of Chinese calligraphy and was first used in diaries and novels that were written in the Japanese vernacular as opposed to the more formal writing which was done in Chinese. The script is currently used for most function words (with the most important exception being personal pronouns) and verb and adjective desinences. In the latter case, there are sometimes several possible variants, of which only one is included in the database. For instance, azukaru ‘keep’ can be both written 預かる and 預る. Hiragana is also used in cases where a word originally written in kanji has grammaticalized into a function word (for instance miru as a full verb means ‘see’ and is written 見る, while as an aspectual auxiliary expressing the conative aspect, it is written completely in hiragana as みる). Hiragana is also used for any word of either Native Japanese or Sino-Japanese stratum that is written with characters outside of the set prescribed by the Ministry of Education. However, I have usually tried to always give the kanji spelling for lexical words if applicable.
(3) Katakana: this script was originally developed from shorthand renderings of Chinese characters and used in conjunction with Chinese characters as a reading aid. The script is now used to write any foreign names that are not part of the Sinosphere and foreign loanwords (these include all words from languages other than Chinese and Chinese loanwords that were borrowed after the 19th century. As mentioned above, some Foreign words used to be written in kanji as well, but this has ceased, and where applicable the kanji spelling has been included alongside the primary katakana one. For instance, buriki ‘tinplate’ used to have two kanji spellings: 錻力, which is a case of ateji, and鉄葉, which is a case of semantically-based kanji assignment (characters meaning ‘steel’ and ‘leaf’); both have been discarded today in favour of the katakana spelling ブリキ.
Katakana are also used for animal names in a scientific context, even in cases where kanji are available. However, in cases like these I have chosen the kanji whenever possible.
The so-called mimetic words are usually written in hiragana, but sometimes in katakana for emphasis. For more discussion of the different scripts and the development of their modern usage, cf. Hayashi (1982) and Gottlieb (1995:ch.1).
Sometimes, the words are written in different characters but where it can be argued that these constitute cases of polysemy (since Japanese also employs Chinese characters to write Native Japanese words, this sometimes leads to semantic distinctions made in Chinese superimposed on Native Japanese items, for instance, the verb miru “see” can be written with 4 or 5 different Chinese characters). Thus we have tobu 跳ぶ “jump” and tobu 飛ぶ fly”, and fune 舟 “boat” and fune 船 “ship”. In cases like that, I have included all the relevant variants in this field, as I regard the differences in writing to be secondary.
Comments on word
Comments on word
This field is used for the following purposes:
1. Whenever a word underwent a noteworthy phonological or morphological change, this is noted here. Regular sound change leading to forms in Modern Japanese different from older stages of Japanese are usually not recorded throughout the database.
2. Variants are also recorded in this field. These are usually limited to words sharing the same lexical material. Phrases and compounds that are also commonly used are noted in the Meaning field, and in case of loanwords synonyms are discussed in the Other Comments field.
3. Also, in cases of apparent homophony in Native words, the locus of the pitch-accent is noted. For instance, awa ‘foam’ and awa ‘millet’ can be ruled out as an instance of polysemy since the former has the accent on the second mora and the latter on the first.
4. When a word is written according to the principle of ateji, i.e. Chinese characters used according to their pronunciation, not their meaning, this is recorded here, and also if the ateji writing is still being used.
|obsolete variants jin 腎 (1177) and jin no zō 腎の臓 (1481). Current form first attested in 1283.|
The following criteria have been followed in assigning the analyzability values in the database:
This is chosen when the word cannot be further analyzed in modern Japanese, including verbs and adjectives, which are inflected but whose stem cannot be further broken down. However, reduplicated forms and words that are historically analyzable are viewed as ‘semi-analyzable’.
A. Reduplicated forms whose base does not occur on its own. These are usually adverbs, the process is not overly productive. Cf. Hamano 1998. There is also another type of reduplication, which is regarded as a derived form and is discussed below.
B. Words that include a cranberry morph, i.e. an element that does not occur by itself. For instance, katatsumuri ‘snail’, consists of two elements, kata ‘single’ and tsumuri, with the latter not appearing on its own.
C. Words that include affixes that are no longer productive and often opaque. For instance taira, which is a contraction from a now opaque prefix ta- and the root hira ‘flat’. There are also a number of cases involving an adjective suffix –ka, which is no longer productive.
D. Words which are historically polymorphemic but which are now perceived to be monomorphemic. For instance sakana is originally a compound of sake ‘wine’ and na ‘food’, this is no longer perceived as such.
E. Words whose original structure has been obscured through time. This is usually because of one of the following factors, or any combination of the three:
a. writing: a compound word is written with one character, obscuring the internal structure, or a derived word is written with a character different than the root it is derived from. For instance, mabuta ‘eyelid’ is originally a compound of me ‘eye’ and futa ‘lid’, a fact that is obscured by the fact that mabuta is written in one character. Also, mise ‘shop’ is said to have been derived from the nominalised form of the verb miseru ‘to show’, but again this is obscured through the writing.
b. contraction: a compound word that has undergone exceptional phonological change (for instances of regular phonological or morphological change, see Frellesvig 1995). Examples include otto ‘husband’, a contraction of o ‘man’ and hito ‘person’; fude ‘pen’, from fumi ‘text’ and te ‘hand’; hōki ‘broom’, from ha ‘leaf’ and haki, the nominalised form of the verb haku ‘to sweep’.
c. phrase: in some cases, what was originally a phrase has become a single word. Often this involves two nouns connected with an attributive marker, such as no, na or the archaic tsu. Without exception, these words are written in one character, and have sometimes undergone further phonological changes of their own. Some examples: kinoko ‘mushroom’, from the phrase ki no ko ‘tree’s child’; honoo ‘flame’, from the phrase hi no ho ‘fire’s ear’; matsuge ‘eyelash’, from the phrase me tsu ke ‘eye’s hair’.
F. Verbs that are marked as transitive or intransitive. As discussed at length in Jacobsen 1992, there is a small number of verbs that forms sets of intransitive-transitive pairs. This process is no longer productive and the marking unpredictable. Whenever this is applicable to a verb in the database, it is noted in W4 whether it is transitive or intransitive and what the form of its counterpart verb is.
G. Sino-Japanese compounds were usually regarded as ‘analyzable compounds’ because of the fact that their internal structure was readily available on account of the logographic characters. However, in certain cases, exceptional phonological change has led to making this less obvious. Usually these words are also no longer written in Chinese characters, but rather with hiragana. Examples include yakan ‘kettle’, originally a compound of yaku ‘medicine’ and kan ‘can’,
A. Reduplicated forms that form the collective of the base. These are usually from nouns and are regarded as derived forms here. One example is hito-bito ‘people’ from hito ‘person’.
B. Words that are derived by affixes. As far as Native Japanese words are concerned, by far the most frequent case is a deverbal noun, derived by means of a nominalising suffix which is identical in form to the stem form (but not in accent, cf. Martin 1988:387). For instance, hikari 'light' is derived from the verb hikaru 'to light'. As for Sino-Japanese words, this usually refers to suffixes used to form nouns.
Compounds are clearly recognisable as such, both from the accent (cf. Akamatsu 2000:268-270) and also from certain morphophonological phenomena that occur with them. For the NJ stratum and some SJ words, rendaku is a common occurrence (however, it should be noted that this is a phenomenon with many exceptions, cf. Shibatani 1990:173-175 and Vance 1987:146-148), and for the SJ stratum, an assimilation of a bisyllabic first element to certain following consonants is common, as in gakkō 'school', where gaku becomes gak- in front of kō (Vance 1995: 155-164). For the Sino-Japanese words, it can be argued in some cases whether these are really compound words since some elements nowadays never occur outside of compounds, but due to the high degree of transparency of the Chinese characters to native speakers I have opted to analyse these as compounds.
The entering of phrases into the database was usually avoided and only undertaken when it was the only choice available (sometimes, commonly used phrases were also entered into the W6 field). This was only deviated from when the only nonphrasal expression available was nonstandard or extremely rare.
Except for words marked as unanalyzable, and some marked as semi-analyzable, morpheme-by-morpheme glosses have been provided for all entries. The rules and abbreviations laid out in the Leipzig Glossing Rules were followed, with the following exceptions:
The genealogical classification of the Japanese language is a famously controversial question. Except for the really far-fetched theories such as those linking Japanese to Indo-European, Basque or Sumerian, the majority of the scholars working on the question seems to prefer a relation to either Altaic or Austronesian. In the case of Altaic theories, some scholars restrict themselves to positing a closer relationship between Japanese and Korean (Martin 1966, Lewin 1976, Whitman 1995, Beckwith 2005 arguing for a connection between Koguryoic and Japanese rather than Korean and Japanese), while others then relate Japanese and Korean (and usually Ainu as well) to the Altaic family as a whole (Miller 1971, Miller 1996, and Vovin 1994). For the Austronesian theories, usually an Altaic-Austronesian superstrate-substrate mix is proposed (Polivanov 1918), although Benedict 1990 has proposed a genetic connection to Austronesian, Miao-Yao and Tai-Kadai, resulting in a super-family called “Japanese/Austro-Tai”. However, Vovin 1994 argues against Benedict 1990 on a number of methodological grounds. I will follow Vovin 1994 in its criticism and assume that Austronesian does not have a genetic link to Japanese, but very well might have a substrate relationship. Finally, it should be noted, that even for the most convincing theory, the Altaic/Korean hypothesis, the number of cognates does not exceed 320. For this reason I decided to include the information on possible Korean-Japanese cognates in a separate custom field 8 “Korean”, rather than including this in the Age field, which I have restricted to periods Japanese written records are available for. This would set the earliest period at the 8th century AD. The field follows a periodisation based on a superset of historical periods that seems to be agreed upon by most authors, even though there are slight differences:
— Old Japanese (Jōko Nihongo上古日本語, abbreviated OJ): this is usually equated with the historical Nara Period (710-794). It represents the period with the earliest written records of Japanese, even though some ritual texts (Norito, s. Philippi 1959:1-4 and Bentley 2001:6-36) that were recorded in that period might originally have been devised a century or so prior to their publication date.
— Late Old Japanese, also called Classical Japanese by some (Chūko Nihongo中古日本語, abbreviated as CJ): This is usually equated with the historical Heian Period (794-1185).
— Middle Japanese (Chūsei Nihongo中世日本語): this spans the three historical periods, which are usually referred to as the Japanese Middle Ages: Kamakura, Muromachi and Azuchi-Momoyama, all together from 1185-1603. Contact with the West actually ensues from the mid-1500s, thus the first influx of vocabulary from Western languages still falls towards the end of Middle Japanese.
— Early Modern Japanese (Kinsei Nihongo近世日本語): This is usually equated with the Edo Period (1604-1867), a period where Japan isolates itself politically, but where a trading outpost is maintained in Nagasaki with an intensive exchange with the Dutch trading mission posted there.
— Modern Japanese (Gendai Nihongo現代日本語): Japan is forced to open up to the outside world in the 1850s and embarks on an endeavour of rapid modernisation and westernisation of the country, ushering in the Meiji Restoration in 1867/8.
As far as the period prior to the first written records is concerned, cf. the remarks under W9 in the section on the Native Japanese stratum, regarding the putative early loans from Classical Chinese into Old Japanese and the alleged substrate items from Austronesian.
Late Old Japanese
Late Old Japanese, also called Classical Japanese by some (Chūko Nihongo中古日本語, abbreviated as CJ): This is usually equated with the historical Heian Period (794-1185).
It is a well known and oft-described fact in descriptions of Japanese that the Japanese lexicon is stratified (e.g. Shibatani 1990).
Mainly five different strata are distinguished in Modern Japanese:
➢ Native vocabulary (NJ) 和語: in Japanese this is either called wago ‘Japanese words’, or Yamato kotoba ‘Yamato kotoba’ after the first Japanese kingdom established in the 4th century CE. This refers to the native vocabulary that is considered not to have been borrowed from any other language.
➢ Mimetic vocabulary 擬音語: there is a number of onomatopoetic words in Japanese, which are usually divided in three categories: giongo ‘sound-mimicking words’, giseigo ‘voice-mimicking words’ and gitaigo ‘mode-mimicking words’, but as the distinctions are notoriously difficult to draw (Hamano 1998:6), I have only used giongo in the database. Only words that follow the patterns as described in Hamano 1996 are recorded as mimetic words here; other words that contain mimetic elements are marked accordingly in Custom Field 2.
➢ Sino-Japanese vocabulary (SJ) 漢語: called kango in Japanese or ‘Han words’, there was a massive influx of Chinese vocabulary into the Japanese language from the time Japan made contact with the Chinese civilisation via Korea beginning from the 5th century A.D. onwards. By the time of the first written records from the Japanese archipelago the Chinese language had already firmly established its place among the Japanese elite, who had become bilingual in Chinese and Japanese. Even after bilingualism declined, Chinese remained the language of administration.
➢ Non-Chinese foreign vocabulary (F) 外来語: called gairaigo in Japanese, ‘words that came from outside (of Japan)’, refer to borrowings that have been taken from languages other than Chinese since the exposure to the West in the 16th century.
➢ Hybrid vocabulary (H) 混種語: known as konshugo ‘mixed words’ in Japanese, there are also some lexemes that consist of morphemes of different strata.
Some group the native vocabulary and the mimetic vocabulary together. They may be both of native origin, but they follow different phonological, morphosyntactic and graphematic rules (Vance 1987:140-152, Ito and Mester 1996, Irwin 2005, and also the remarks on the script in the annotation to the Original script field).
It is important note that the above mentioned strata correspond to the intuitions of contemporary native speakers and do not necessarily reflect the true origins of the words concerned. Some words will be perceived as Native Japanese although they might have been borrowed in times prehistorical, and Sino-Japanese words will be perceived such regardless of the individual word history, i.e. whether the word in question was borrowed from Chinese, coined in Japan or somesuch (Ota 2004, Irwin 2005).
In parentheses, there are sometimes comments on the different strata, chiefly in the following situations:
➢ Words that have an onomatopoetic component but do not follow the patterns for mimetic words (cf. Hamano 1998) are marked as such in this field.
➢ Some Chinese characters can have several SJ readings, which are called on’yomi. There are two types, go’on (“Wú sounds“) and kan’on (“Hàn sounds”) on the one hand and tō’on (“Táng sounds”) and sō’on (“Sòng sounds”) on the other. The former refer to the pronunciation of SJ terms that were borrowed during the period of Japanese-Chinese bilingualism of the Japanese elite, the pronunciation of which became frozen after the decline of said bilingualism. So even if a SJ word was borrowed after the decline of bilingualism, if it was through writing, the pronunciation would be either go’on or kan’on.The latter reflects the pronunciation of SJ words that were borrowed after the decline of bilingualism not through writing but due to oral communication (“oral loans”). In the case of tō’on and sō’on, there are no clear-cut distinctions between them, sometimes even the designation sōtō’on is found. As far as go’on and kan’on go, they are clearly differentiated (even if in some cases the forms are homonymous), but they were for the most part not included in this field. For more information cf. Miller (1967:102-112), for a more detailed account Yamada (1940); also for a character dictionary Morohashi (1955).
➢ There is another category of on’yomi, that refers to the case where a character has changed its SJ reading due to a mistake or an unexpected sound change: kan’yōon 慣用音 ‘idiomatic reading’ (Miller 1967:106). If this occurred in a word, it was marked as such. This does not include cases
As the Age field was restricted to the language periods, this field was used to record the year of when the word was first attested in Japanese written records. The years are based on Nippon Kokugo Daijiten (2000), and if the first attestation was in an important work of literature, the name is also given in Japanese characters. This is especially true for Old Japanese.
This field was only used very sparingly. The default value was set to “regular” for all entries, and only words that were specifically of a highly colloquial or of an exclusively formal nature were marked as such.
The frequency figures are based on the data in Kokuritsu Kokugo Kenkyūsho 2005. 70 magazines from the year 1994 were used in the study, with a total of 1,074,617 morphemes (the study usually counted inflectional verb desinences as own words, but as far as derivational morphology goes, this depended on productivity). Due to the relatively low number of tokens, approximately only the 500 most frequent terms from the database were entered into this field.