Abstract
The Khitan large script is generally believed to have been created in 920
The findings further indicated that the Khitan large script showed a lower degree of phonographic development than the Khitan small script, and can be understood as a transitional stage toward a fully phonographic writing system.
Introduction
Yelü Abaoji or Emperor Taizu, founder of the Liao Dynasty, is traditionally credited with the invention of the Khitan large script in 920
In 1935, the Japanese physician Yamashita Taizō discovered Khitan large script inscriptions in the Chifeng region of Inner Mongolia, providing the material basis for subsequent research. Progress, however, remained slow due to the large script's substantial logographic component, the scarcity of bilingual texts, and persistent misconceptions about its nature.
Systematic decipherment began in 1957 when Yan Wanzhang interpreted the Epitaph of Xiao Xiaozhong (Ye, 1957), marking the beginning of sustained scholarly inquiry.
As research advanced, Liu (1983) proposed that the Khitan large script contained a significant phonetic component. This hypothesis was further supported by his later work on the Epitaph of Yelü Changyun (Liu and Wang, 2004), establishing a crucial foundation for subsequent research. In the decades since, scholars have continued to reconstruct and interpret the pronunciations and meanings of the large script forms, building upon Liu’s initial findings and incorporating newly discovered materials. According to incomplete estimates, around 500 Khitan large script graphemes and several hundred lexical items have been deciphered to date.
Building on these achievements, and drawing on insights from Chinese character structure, Khitan linguistic typology, and Khitan small script research, this article further examines the nature and structural characteristics of the Khitan large script. It offers preliminary findings to stimulate further discussion.
Intrinsic nature and characteristics of the Khitan large script
Historical records concerning the nature and characteristics of the Khitan large script
The Khitan large script is attested in a range of historical sources. Prior to the establishment of their state, the Khitan people “possessed no written records, and relied on notched wood as tokens” (“ben wu wen ji, wei ke mu wei xin.” “本无文纪, 惟刻木为信。”) (Wang, 1978). Following the founding of the Liao Dynasty, Emperor Taizu, Yelü Abaoji, created a writing system with the assistance of his ministers, Yelü Tulübu and Yelü Lubugu. As recorded in the Liao Shi, “in the spring, first month, on the yichou day of the fifth year of the Shence era (920
Additional sources, including the Wudai Huiyao 五代会要 (Essentials of the Five Dynasties), Xin Wudai Shi 新五代史 (New History of the Five Dynasties), Qidan Guozhi 契丹国志 (Record of the Khitan State), Wenxian Tongkao 文献通考(Comprehensive Examination of Literature), and Shushi Huiyao 书史会要 (Essentials of Calligraphy History), offer broadly consistent accounts. They report that “[During the reign of] Abaoji … employed many Han Chinese scholars, who taught him to modify half of the lishu (clerical script) characters, creating several thousand characters to replace the use of notched wood” (“zhi Abaoji……er duo yong Hanren, Hanren jiao yi lishu zhi ban zeng sun zhi, zuo wen zi shu qian, yi dai ke mu zhi yue.” “至阿保机……而多用汉人, 汉人教以隶书之半增损之, 作文字数千, 以代刻木之约。”) (Ye, 1985). The phrase “modify half of the lishu characters” is the only surviving description of the script's formation. It indicates that the Khitan large script was not created ex nihilo, but developed through the modification of Chinese character structures with the assistance of Han literati.
However, comparison with extant inscriptions revealed certain discrepancies between these accounts and the script itself in terms of calligraphic style and grapheme formation. Although historical sources describe the script as based on “half of the lishu,” the calligraphic style of surviving Khitan large script materials more closely resembles kaishu (regular script). In terms of grapheme formation, the script is not limited to the adding or removing of strokes from the lishu: alongside inherited and modified Chinese characters, it also includes newly created Khitan graphemes.
Overall, despite minor inconsistencies, the historical sources converge on a key point: the Khitan large script originated through the modification of Chinese characters. This consensus provides an essential framework for further analysis of its linguistic structure and typological characteristics.
Logographs and phonograms in the Khitan large script
A comprehensive review of deciphered Khitan large script graphemes revealed a distinction between logographic and phonographic forms. These were further classified into three types: graphemes identical to Chinese characters, those derived from modified Chinese characters, and those uniquely created within the Khitan script.
Logographs in the Khitan large script
Logographs constitute a core component of the Khitan large script and are characterized by the principle that “meaning precedes pronunciation.” Their structure largely follows the logic of Chinese character formation, showing clear affinities with the liushu 六书 (the six traditional categories of Chinese characters), 1 while also incorporating innovations that give rise to distinct features. These graphemes can be divided into three main types.
First, some logographs are identical to Chinese characters and retain clear traces of the liushu. They directly adopt Chinese structural principles, retaining features of xiangxing (pictographs 象形), huiyi (compound ideographs 会意), and zhishi (simple ideographs 指事). Pictographic examples include
2
“sun 日,”
“moon 月,”
“small小” (Wu, 2024), which fully preserve both the form and meaning of their Chinese counterparts. Compound ideographs, such as
, and
<di>, follow Chinese forms, though their meanings may differ. Simple ideographs, for example,
“one一,”
“two 二,” and
“three 三,” encode numerical meaning directly through stroke count.
Second, some logographs are derived from modified Chinese characters. These were created by adding or removing strokes. Examples of addition include
, created by adding a horizontal stroke to 兄 “elder brother,” and
, created by adding a dot to 仸 “bent.” Examples of stroke reduction include
, derived from 馬 “horse” by removing three dots;
, from 国 “state” with one dot removed; and
, a simplified form of 黄 “yellow.”
Third, some logographs were independently created within the Khitan script. In addition to borrowing and modifying Chinese characters, the Khitans developed new graphemes, which may also be classified as pictographic or compound ideographs. Pictographic examples include
, “house 房,” whose shape resembles the “Stone House” in Zuzhou City 祖州城 of the Liao Dynasty;
“official 官,” resembling an official's hat; and
“to walk 行,” reflecting the swaying motion of a walking person (Wu, 2024). Compound ideographs include
“Khitan 契丹,” whose internal structure may symbolically represent social or political divisions (e.g. northern and southern administrations, or the Yelü and Xiao clans), and
“tomb 坟” composed of “person 人” beneath “field 田.” In addition, there are unique graphemes, such as
, whose meanings remain unclear and show no direct correspondence to Chinese characters.
Phonograms in the Khitan large script
Phonograms in the Khitan large script are characterized by the principle that “pronunciation precedes meaning.” They can be divided into two main categories: first, phonograms modeled on Chinese characters fall into two subtypes: form-meaning borrowing and form-sound borrowing. In form-meaning borrowing, a grapheme adopts the form and meaning of a Chinese character, but is read with the corresponding Khitan pronunciation. In other words, while the graphic form is borrowed from Chinese, the reading reflects the Khitan name of the object denoted by the character rather than its Chinese pronunciation. For example, the grapheme
“country” derives its form from 杏 “apricot,” but its pronunciation corresponds to the Khitan word for “apricot.” Comparative evidence from related languages supports this interpretation: Mongolian (ᠭᠦᠢᠯᠡᠰᠦ güilesü) and Daur (güilees) both show initial gui- in the word for “apricot” (Liu, 2014). Similarly,
(Liu, 2014), graphically based on 吹 “to blow,” likely reflects the Khitan verb “to blow,” as suggested. This is supported by puleːx in the Khorchin dialect of Mongolian. Other examples include 
, and
.
In form-sound borrowing, graphemes adopt both the form and pronunciation of Chinese characters. Some Khitan large script graphemes that are direct loanwords from Chinese also retain the meanings consistent with their Chinese counterparts. Examples include
, and
.
Second, some phonograms are derived from modified Chinese characters. These were created by adding or removing strokes, while retaining pronunciations identical or similar to the original Chinese characters. Examples of reduction include
, derived from 印 (yin) by removing a left-falling stroke;
, from 斗 (dou) with a dot omitted; and
, from 伐 (fa) with a dot removed. Examples of addition include
, formed by adding a left-falling stroke to 武 (wu);
, from 尺 (chi) with an added horizontal stroke; and
, from 水 (shui) with an added dot.
Linguistic units recorded by the Khitan large script
A script is a system of written symbols for recording language and cannot be understood in isolation from it. To clarify the characteristics of the Khitan large script, it is essential to examine the linguistic units it represents. The following discussion addresses three aspects: the features of the Khitan language, its phonetic encoding, and its syllabic marking features.
Historical records concerning the features of the Khitan language
The early historical text Book of Wei: Biography of the Shiwei states that “the language of the Shiwei state … is the same as that of the Kumoxi, Khitan, and Doumolou states” (“shiwei guo……yu yu kumoxi,qidan,doumolou guo tong.” “失韦国……语与库莫奚、契丹、豆莫娄国同”) (Wei, 1974). The Shiwei language is generally identified as proto-Mongolic, suggesting that Khitan shares similarities with Mongolian and is genetically related to it.
This relationship is also reflected in the grammatical structure. The Song Dynasty work Yijian Zhi 夷坚志 records that, “When Khitan children first learn to read, they practice by inverting sentence order in colloquial speech … For instance, the lines ‘A bird roosts in the tree by the pond; a monk knocks at the gate beneath the moon’ are recited as ‘In the moonlight, the monk knocks at the gate; In the water's depths, in the tree the old crow sits’” (“qidan xiao'er chu du shu, xian yi su yu dian dao qi wen ju er xi zhi……ru ‘niao su chi bian shu, seng qiao yue xia men’ liang ju, qi du shi ze yue ‘yue ming li he shang men zi da, shui di li shu shang lao ya zuo,’ da lv ru ci.” “契丹小儿初读书,先以俗语颠倒其文句而习之……如‘鸟宿池边树, 僧敲月下门’两句, 其读时则曰‘月明里和尚门子打, 水底里树上老鸦坐,’ 大率如此”) (Hong, 1998). This example indicates that Khitan follows a “subject–object–verb” word order, in contrast to Chinese but consistent with Mongolic languages.
Accordingly, Khitan is commonly classified as a member of the Altaic language family, closely related to the Mongolic branch. On this basis, both Khitan and Mongolian may be characterized as agglutinative languages.
Features of vowels and consonants marking
Features of vowel markers
Vowel marking is a prominent feature of the Khitan large script. Analysis of deciphered graphemes revealed the presence of dedicated signs for vowels, including ɑ, ə, i, o, ʊ, and u. The existence of these specialized graphemes suggests that the vowels were systematically distinguished in the design of the script, as shown in Table 1.
Khitan large script with marked vowels.
From a phonetic perspective, the syllable is the most naturally perceptible unit of speech. It consists of a vowel nucleus, either alone or combined with consonants. In the Khitan large script, dedicated graphemes are used to represent monophthongs—a feature likely related to their ability to function as independent syllables.
Features of pure consonant marking
The situation is considerably different for the marking of pure consonants. No graphemes consistently used to represent consonants alone have been identified among the deciphered Khitan large script graphemes. Although a few isolated cases exist—for example, in the terms for yuan shuai 元帅 “marshal,”
and
, where the initial graphemes
and
appear to represent the consonant <ŋ>—these instances are exceptional and do not support the existence of a productive system of consonant-only markers. From the perspective of script design, this is unsurprising: had the Khitans developed dedicated graphemes for pure consonants, there would have been little need to create the thousands of graphemes attested in the script.
In sum, the presence of graphemes marking monophthongs can be explained by the fact that vowels may function as independent syllables. By contrast, the absence of dedicated consonant graphemes indicates that the Khitan large script was not designed at the level of individual phonemes, but rather around a higher-level phonetic unit: the syllable.
Khitan large script graphemes for syllable marking
Feature of monosyllabic Khitan large script graphemes
Based on a systematic collation and analysis of deciphered Khitan large script graphemes, monosyllabic graphemes constituted the largest group and can be further subdivided according to syllabic structure. VC-type syllables (V = vowel, C = consonant) number approximately 40, represented by graphemes such as
, and
. CV-type syllables, totaling around 76, include
, and
. CVC-type syllables form the largest subgroup, with roughly 80 syllables, such as
, and
. CVV-type syllables account for about 40 instances, including
, and
. By contrast, VVC-type syllables are relatively rare, numbering fewer than 20, as exemplified by
, and
. In total, monosyllabic graphemes number around 260, representing approximately 49% of the attested corpus.
Feature of polysyllabic Khitan large script graphemes
In addition to monosyllabic graphemes, the Khitan large script also includes a substantial number of graphemes representing polysyllabic terms. Among these, approximately 60 are disyllabic graphemes. For instance,
corresponds to the glyph
, meaning “one 一”;
corresponds to
“bali 拔里”; and
corresponds to
“confer 封.” Trisyllabic and longer forms are relatively rare, and occur primarily in personal names and official titles. Examples include
, which corresponds to
“楚不鲁”; 
, corresponding to
.”
In summary, the Khitan large script is dominated by monosyllabic graphemes, supplemented by polysyllabic forms. The emergence of polysyllabic graphemes may be linked to the logographic influence of Chinese characters, although this issue requires further investigation. Given the agglutinative nature of the Khitan language, the script appears to have been designed following the “one grapheme per syllable” principle, whereby each distinct syllable was represented by a dedicated graph. This principle allowed for relatively precise representation of the language, but also resulted in an extensive grapheme inventory.
Correspondence between Khitan large script and Khitan small script
Historical records concerning the relationship between the two scripts
Historical documents clearly record that the Khitan people employed two distinct writing systems to represent their language: the Khitan large script and the Khitan small script. The large script was created in 920
As recorded in the
Liao Shi
, its creation is associated with Diela: When an envoy from the Uyghurs arrived and no one could understand their language, the Empress said to Emperor Taizu: “Diela is intelligent and competent for this task.” He was sent to meet the envoy. After staying together for twenty days, Diela mastered their language and writing, and thus created the Khitan Small Script, which was [the signs of which were] few in number but comprehensive and consistent (huihu shi zhi, wu neng tong qi yu zhe, taihou wei taizu yue: “diela cong min ke shi” qian ya zhi. Xiang cong er xun, neng xi qi yan yu shu, yin zhi qidan xiaozi, shu shao er gai guan. 回鹘使至, 无能通其语者, 太后谓太祖曰: “迭剌聪敏可使。” 遣迓之。相从二旬, 能习其言与书, 因制契丹小字, 数少而该贯).
These historical accounts reveal the fundamental contrast between the two systems: the large script comprises a vast number of graphemes, whereas the small script has a limited glyph inventory and distinct phonographic (alphabetic–syllabic) properties.
Surviving Khitan inscriptions also make reference to both scripts, providing more direct evidence of their relationship. For example the sequence
, in Line 5 of the Epitaph of Yelü Qi, has been interpreted as “da li (yin) zhi zi 大礼(印)之字,” meaning “Khitan Large Script.” The corresponding small script form,
, appears in Line 11 of the Epitaph of Yelü Jue, alongside the form
, interpreted as “da fu li (yin) zhi zi 大副礼(印)之字,” referring to the “Khitan Small Script.” This contrast between “da li (yin) zhi zi 大礼(印)之字” and “da fu li (yin) zhi zi 大副礼(印)之字” proves that the Khitan people clearly distinguished between the two writing systems in their documentary records.
Correspondence between the two scripts
The Khitan large script and Khitan small script functioned as parallel writing systems during the Liao Dynasty, both used to represent the Khitan language. As a result, corresponding terms in the two scripts generally share the same pronunciations. Analysis of these correspondences provided important insights into the structure and nature of the large script.
A survey of deciphered Khitan large script graphemes revealed a large inventory, and systematic patterns of correspondence with small script glyphs. The most common pattern was one-to-one correspondence, with approximately 276 documented cases. Examples include the grapheme
, corresponding to the glyph
“east 东”;
, corresponding to
“tai 太”, and
, corresponding to
“white 白'.” The one-to-two correspondences were also frequent, with around 200 instances. Examples include
, corresponding to
“summer 夏”,
, corresponding to
“winter 冬”, and
, corresponding to
“gong 公.” One-to-three correspondences were less common, with 24 identified cases, such as
, corresponding to
<tʻ-ie-en> “sky 天” and
, corresponding to
<n-ɑm-ur > “autumn 秋.”
Higher-order correspondences are rare. Only four cases of one-to-four documented examples have been identified. Examples include
, corresponding to
“become 成” and
, corresponding to 
“tabuye 挞不也.” Instances of one-to-five or one-to-six correspondences are exceptional, with only a single example attested in each category:
, corresponding to
“breed 孳息” and
corresponding to
“xi(重)熙.”
Taken together, one-to-one and one-to-two correspondences accounted for the majority of deciphered large script graphemes. The fact that a single large script grapheme may correspond to multiple small script glyphs indicates a lower degree of phonographic specification in the large script. This pattern demonstrates that in the transition from the creation of the large script to the small script, the Khitans progressed toward developing a more systematic phonographic representation.
Conclusion
First, in terms of its intrinsic nature, the Khitan large script can be characterized as a phonographic–logographic writing system that combines the form of Chinese characters with Khitan phonology. While it draws on Chinese characters, it also reflects systematic adaptations to the structural features of the Khitan language.
Second, with respect to the linguistic units it recorded, the script encompasses both monosyllabic and polysyllabic graphemes, with the former constituting the overwhelming majority. This distribution is consistent with the agglutinative typology of the Khitan language.
Third, the correspondence between the Khitan large and small scripts indicates that the former exhibits a lower degree of phonographic development than the small script, indirectly suggesting that the large script represents an earlier, more transitional stage toward a more purely phonographic writing system. Given that the two scripts are traditionally dated only a few years apart, it is unlikely that the large script was an entirely new invention in 920
Finally, the emergence of the small script did not lead to the immediate disappearance of the Khitan large script. The two systems coexisted for a long time, and the large script remained in use even during the Jin Dynasty. This suggests that the large script was continuously revised and refined in practical application, retaining its practical value and serving the social needs of the Liao Dynasty alongside the small script.
Footnotes
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Key Project of the National Social Science Fund of China: “Interpretation of Newly Discovered Epigraphic Documents of the Liao Dynasty Royal Consort Clan and Study on the Phonetic System of the Khitan Small Script” (21AYY023).
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
