Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Semantic ontology creation and vocabulary expansion method for Tibetan language

A technology of ontology and semantics, applied in special data processing applications, instruments, electrical digital data processing, etc., to achieve the effect of improving processing accuracy

Inactive Publication Date: 2013-12-25
MINZU UNIVERSITY OF CHINA
View PDF4 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0014] Shortcomings of the prior art: the semantic ontology creation technologies of languages ​​such as Chinese and English are all based on a large-scale corpus using algorithmic statistics to generate
However, due to the language characteristics and data sparsity of Tibetan, the ontology creation and vocabulary expansion technologies successfully applied in Chinese, English and other languages ​​cannot be directly applied to Tibetan.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Semantic ontology creation and vocabulary expansion method for Tibetan language
  • Semantic ontology creation and vocabulary expansion method for Tibetan language
  • Semantic ontology creation and vocabulary expansion method for Tibetan language

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0043] In order to make the technical means, creative features, goals and effects achieved by the present invention easy to understand, the present invention will be further described below in conjunction with specific embodiments.

[0044] Using the Tibetan ontology creation and vocabulary expansion method, the first step is to manually build the upper-level ontology by knowledge engineers and language experts. After using the electronic dictionary to expand synonyms, the corresponding lexical expansion and translation based on pattern matching. In the second step, according to the ontology concept and the corresponding hyponym relationship, look up synonyms in the marked corpus or electronic dictionary, and sort the similarity from high to low based on the lexical semantic similarity algorithm. Knowledge engineers revise the ranking results and edit ontology.

[0045] With reference to accompanying drawing 1, the present invention comprises the steps:

[0046] (1) Manually...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a method for processing Chinese minority scripts, in particular to a semantic ontology creation and vocabulary expansion method for a Tibetan language. The method comprises the following steps: (1) establishing an upper level ontology on the basis of the Chinese dictionary of the HowNet; (2) expanding conceptual synonyms appearing in the upper level ontology by using definitions in an electronic dictionary; (3) carrying out a conceptual hyponymy mode matching algorithm on the upper level ontology in a multi-language ontology library to expand the concept of the upper level ontology; (4) searching for conceptual synonyms in the expanded ontology; (5) sequencing from higher similarities to lower similarities on the basis of an ontology conceptual lexical semantic similarity algorithm; (6) modifying the sequencing results and editing the ontology. According to the method, the upper level ontology is established on the basis of the Chinese dictionary of the HowNet, levels of different concepts are defined according to a hyponymy in the ontology, and more new semantic words can be obtained on the basis of the hyponymy, so that the vocabulary of the existing Tibetan language ontology is expanded, and the Tibetan language information processing accuracy is increased greatly.

Description

technical field [0001] The invention relates to a method for processing ethnic minority characters, in particular to a method capable of realizing Tibetan semantic ontology creation and vocabulary expansion. Background technique [0002] The concept defined in the dictionary itself has no ambiguity, and it can uniquely and accurately point to entities or objects in the real world. But in sentence processing, concepts in sentences are represented by words. For example, the concept word "Trojan horse" can represent at least three concepts in the following three sentences: [0003] (1) Trojan horse is a kind of toy. [0004] (2) Trojan horse is a kind of sports equipment. [0005] (3) Trojan horse is a virus. [0006] Therefore, the so-called concept ambiguity is caused by the fact that one concept word can represent multiple concepts. Tibetan also has different Chinese translations due to different contexts: [0007] [0008] In addition, for Tibetan, there are many f...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27
Inventor 邱莉榕
Owner MINZU UNIVERSITY OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products