a word from Vocabulary Selice Romani
Alternative forms of a lexeme are separated by a comma: e.g. felhó, felhóva ‘cloud’. Optional parts of the lexeme’s form are bracketed, e.g. daj (taj) dad ‘parents’.
With inflecting words, the field contains information on inflectional irregularities (oblique stems of nouns, comparatives of adjectives, perfective stems of verbs etc.). With nouns, the field almost always indicates their gender. With verbs, the field sometimes indicates their transitivity. With function words and adverbs, the field indicates the class of the word: pronoun (personal or reflexive), demonstrative, pro-word (interrogative or indefinite), preposition, adverb, co-verb (preverb, adverbial verb modifier), numeral and quantifier, particle etc. Occasionally, those functions of function words that are not sampled in the database are also mentioned. Especially with phrasal forms but also elsewhere, the field may contains information on syntactic construction. The field was also used to highlight mismatches between the pre-defined semantic category of the Meanings table and the grammatical word-class of the Romani word form, which are very rare: e.g. there is no adjective ‘stinking’ in Selice Romani, and so the verb khanden ‘to stink’ was used as an equivalent.
The category ANALYZABLE PHRASAL is used for lexemes consisting of two or more words. The categories ANALYZABLE COMPOUND and ANALYZABLE DERIVED are used for compound and derived lexemes, respectively, whose morphological structure is fully transparent synchronically. The categories SEMI-ANALYZABLE and UNANALYZABLE require more detailed comments. The former category is assigned to several types of morphologically complex lexemes:
1. To synchronically non-transparent compounds: for example, the per nang o [ROOT/PREFIX-naked-INFL] ‘barefoot’ is, diachronically, a compound of the noun pr o ‘foot’ and the adjective nang o ‘naked’, but its first morpheme per is not regularly related to the nominal root.
2. To lexemes that contain a synchronically identifiable derivational marker but which, at the same time, have no synchronic base lexeme: for example, the noun lub ip e [ROOT-ABSTRACT-INFL] ‘adultery’ contains the productive de-adjectival marker of abstract nominalizations but there is no base lexeme *lub( o).
3. To lexemes that are derived from an existing lexeme by means of an obsolete or otherwise idiosyncratic derivational marker: for example, the noun šer and [head-SUFFIX] ‘space under one’s head’ is derived from the noun šer o ‘head’ by means of the unique marker and.
4. To pronouns, pro-words and other function words (e.g. modals) which involve idiosyncratic ‘derivational’ morphology. Although spatial adverbs and prepositions show somewhat more regular morphological relations, they too are assigned to this category.
5. To loanwords that are morphologically analyzable in the source language and whose source base has also been borrowed, unless the imported derivational marker has become productive in Selice Romani (e.g. šetít( )šíg o ‘darkness’ and šetít n o ‘dark’, from Hungarian sötít síg and its base sötít).
The category UNANALYZABLE is assigned to the following types of lexemes:
1. To lexemes with monomorphemic inflectional stems. Importantly, pre-inflectional adaptation markers of loanwords are excluded from consideration: for example, although the inflectional stem of the verb čukl in en [hiccough-LOAN-INFL] ‘to hiccough’, a loanwords of Hungarian csukl ik, is bimorphemic, the stem before the adaptation suffix in is monomorphemic.
2. To loanwords that are morphologically analyzable in the source language but whose source base has not been borrowed (e.g. zéčíg o ‘vegetables’ < Hungarian ződsíg ‘greenness; vegetables’, cf. *zéd n o < Hungarian ződ ‘green’).
Age refers to the (diachronic) syntactic or derivation structure of the item, but not to its phonological form or to its meaning. The age of analyzable lexemes (collocations, compounds, and derivations) reflects the time of creation of such complex expressions rather the age of their parts. The relevant units of temporal continuity of univerbal lexemes are inflectional stems. For example, although the ‘canonical’ forms of the Selice Romani verb d-en ‘to give’ (citation form: third plural present indicative) do not directly continue the ‘cannonical’ forms of the Old Indo-Aryan verb dā-da-ti ‘to give’ (citation form: third singular present indicative), but rather certain inflectional forms without the initial reduplication, there is continuity of the verbs’ inflectional stems between Old Indo-Aryan and Selice Romani, viz. da > d-, and so the Selice Romani verb d-en ‘to give’ may be considered to go back, via Old Indo-Aryan, to Proto-Indo-European deh- ‘to give’. Two types of age categories are used:
• First, there are genealogical age categories (1–6). Some of these (1–3 and 5) represent nodes on the tree model of the genealogical affiliation of Selice Romani, and are assigned to lexemes that can be reconstructed for these stages of the language. I should note that some lexemes assigned to the OLD INDO-ARYAN category might be actually older, i.e. PROTO-INDO-IRANIAN or even PROTO-INDO-EUROPEAN, as I have only checked for pre-Indo-Aryan etymologies of Old Indo-Aryan etymons selectively. The category LATER THAN EARLY ROMANI includes lexemes that are dialect specific within Romani, i.e. not reconstructable for Early Romani, and that, in addition, are not loanwords, based on loanwords, or calqued from a post-Early Romani L2.
• Second, there are two subtypes of contact-related age categories: those that are defined with reference to the period of contact with an L2 or a cluster of L2s (7–13) and those that are defined with reference to the beginning of such period only (14–16). The former are assigned not only to loanwords but also to calques from a given L2. The latter subtype of contact-related age categories is only relevant for past L2s; they are assigned to lexemes that are based on loanwords from a given L2 or that contain derivational markers from that L2. Certain arbitrary decisions had to be taken with regard to the assignment of lexemes to concrete L2s (e.g. a loanword that may originate in Slovak or Czech, has been assigned to the age category SLOVAK; see also Chapter).
NO. AGE CATEGORY FROM TO
1 Proto-Indo-European 4500 BCE 3000 BCE
2 Proto-Indo-Iranian 2500 BCE 2000 BCE
3 Old Indo-Aryan 1900 BCE 500 BCE
4 Middle Indo-Aryan 500 BCE 700 CE
5 Early Romani 900 CE 1300 CE
6 later than Early Romani 1300 CE 2007 CE
7 West Asian L2 700 CE 1000 CE
8 Greek L2 900 CE 1300 CE
9 South Slavic L2 1300 CE 1750 CE
10 Hungarian L2 1650 CE 2007 CE
11 Vlax presence 1850 CE 2007 CE
12 Slovak L2 1920 CE 2007 CE
13 Czech L2 1950 CE 2007 CE
14 West Asian L2 or later 700 CE 2007 CE
15 Greek L2 or later 900 CE 2007 CE
16 South Slavic L2 or later 1300 CE 2007 CE
The dates of the age categories are generally very approximate (see the book chapter for discussion). Note that some age categories show temporal overlap or even coincidence (e.g. EARLY ROMANI and GREEK L2).
|West Asian L2 (700–1000)|
Early Romani reconstruction
Early Romani reconstruction
This field contains three kinds of information: (a) with those Early Romani etymons that have been retained in Selice Romani, the field contains the reconstructed Early Romani form and meaning of the etymon (plus, mostly, the reconstructed gender with nouns); (b) with those Early Romani etymons that have been lost in Selice Romani, the field contains those reconstructed Early Romani forms that match the relevant Selice Romani form in meaning, and etymological notes on these Early Romani forms; (c) occasionally, the field also contains information on post-Early Romani (i.e. dialect-specific) forms that have been lost in Selice Romani but retained in closely related Romani dialects.
Boretzky & Igla's etymology
Boretzky & Igla's etymology
This field gives Boretzky & Igla’s (1994) etymologies for Early Romani (plus a few post-split) etymons that have been retained in Selice Romani, indicating the page number of the etymology in the source.
Boretzky, Norbert & Igla, Birgit. 1994. Wörterbuch Romani–Deutsch–Englisch für den südosteuropäischen Raum: mit einer Grammatik der Dialektvarianten. Wiesbaden: Harrassowitz.
|80: cf. Hindi dum ‘tail’, cf. also Persian dum(b) ‘tail’|
Mānušs et al. etymology
Mānušs et al. etymology
This field gives Mānušs et al.’s etymologies for Early Romani (plus a few post-split) etymons that have been retained in Selice Romani, indicating the page number of the etymology in the source.
Mānušs, Leksa, Neilands, Jānis & Rudevičs, Kārlis. 1997. Čigānu–latviešu–angļu etimoloģiskā vārdnīca un latviešu–čigānu vārdnīca. [Gypsy–Latvian–English etymological dictionary and Latvian–Gypsy dictionary.] Rīgā: Zvaigzne ABC.
|53: < Indo-Iranian, cf. Persian and Hindi dum ‘tail, hind-quarters’|
This field gives Vekerdi’s etymologies for Early Romani and some post-split etymons that have been retained in Selice Romani, indicating the page number of the etymology in the source. It also indicates Vekerdi’s reference to Turner 1962–6.
Vekerdi, József; with the assistance of Zsuzsa Várnai. 2000. A comparative dictionary of Gypsy dialects in Hungary. Gypsy–English–Hungarian dictionary with English to Gypsy and Hungarian to Gypsy word lists. Budapest: Terebess Publications.
|59: perhaps < Sanskrit dumbha [Turner 6419]|