Method and device for constructing Tibetan language question and answer corpus

A corpus and Tibetan language technology, applied in the field of big data processing, can solve problems such as the lack of expected data for Tibetan question and answer, the failure to build a large-scale Tibetan question and answer corpus, and insufficiency

Active Publication Date: 2019-12-24
MINZU UNIVERSITY OF CHINA
View PDF7 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] Compared with the rich question-and-answer system in Chinese and English, Tibetan question-and-answer data is particularly small and of a single type, and the Chinese-Tibetan and English-Tibetan translation technology is insufficient. It is difficult to directly apply the Chinese-English question-answer corpus to the Tibetan question-answer corpus. Therefore, at present There is no way to build a large-scale Tibetan Q&A corpus

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for constructing Tibetan language question and answer corpus
  • Method and device for constructing Tibetan language question and answer corpus
  • Method and device for constructing Tibetan language question and answer corpus

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0023] figure 1 A flow chart of a method for constructing a Tibetan question-and-answer corpus in an embodiment of the present invention is shown, and the method includes:

[0024] Step 101, using a Tibetan triplet entity as the central word entity, and obtaining all triplets related to the central word entity;

[0025] Step 102, mapping all entities in all triples into correspondences between entities and labels;

[0026] Step 103, constructing a Tibetan question-and-answer corpus according to the corresponding relationship and the central word entity.

[0027] Wherein, in step 101, a Tibetan triplet entity can be randomly selected as the central word entity, such as figure 2 As shown, the selected triple entity is Father, >

[0028] Among them, the tags in step 102 include shallow tags and deep tags. Shallow tags are not related to triplet attributes, generally people, places, organizations, etc. Deep tags are related to triplet attributes, such as The time of death,...

Embodiment 2

[0037] In Embodiment 2 of the present invention, on the basis of constructing the Tibetan question-and-answer corpus in the above-mentioned embodiments, a scheme of optimizing natural sentences in the constructed Tibetan question-and-answer corpus is added.

[0038] image 3 It shows a schematic diagram of optimizing natural sentences in the Tibetan question-and-answer corpus in the embodiment of the present invention. The natural sentences in the Tibetan question-and-answer corpus include template questions and real questions. The specific optimization steps are as follows Figure 4 Shown below:

[0039] Step 201, calculating the vector of template questions in the Tibetan question-and-answer corpus and the vector of real questions in the Tibetan question-and-answer corpus;

[0040] Specifically, the word2vec tool is used to add the vector latitudes of each word to obtain the sentence vector expression. The vector expression of the template question is marked as Z, and the ...

Embodiment 3

[0047] In Embodiment 2 of the present invention, on the basis of optimizing and constructing the Tibetan question-and-answer corpus in the above-mentioned Embodiment 2, a scheme for expanding the Tibetan question-and-answer corpus is added, and an end-to-end neural network is trained.

[0048] Such as Figure 5 As shown, the specific plans include:

[0049] Construct anticipation: the Tibetan language question-and-answer corpus constructed according to embodiment one and the effective template question sentence constructed by embodiment two construct quadruples, wherein the order of quadruples is subject, relation, object and question;

[0050] Encoding stage: use the TransE algorithm to obtain the vector expressions of entities and relations in the Tibetan question-and-answer corpus, obtain subject vector expressions, relational vector expressions, and object vector expressions, and form triplet word vectors based on the subject vector expressions, relational vector expressio...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a method and device for constructing a Tibetan language question and answer corpus, and belongs to the field of big data processing. The method comprises the steps of selectinga Tibetan language triple entity as a central word entity, and obtaining all triples related to the central word entity; and mapping all entities in all the triples into a corresponding relationship between the entities and the tags, and constructing a Tibetan language question and answer corpus according to the corresponding relationship and the head word entity. According to the scheme, the Tibetan language question and answer corpus is constructed by finding all triples related to the Tibetan language triple entity and mapping the triples into the corresponding relation between the entity and the label, and the defects of time consumption and labor consumption of manual participation are overcome.

Description

technical field [0001] The invention relates to the technical field of big data processing, in particular to a method and equipment for constructing a Tibetan question-and-answer corpus. Background technique [0002] The question answering system is a very important research hotspot in the field of natural language processing in recent years. It allows users to ask questions in natural language, and then returns a relatively accurate and satisfactory answer to the user. [0003] Compared with the rich question-and-answer system in Chinese and English, Tibetan question-and-answer data is particularly small and of a single type, and the Chinese-Tibetan and English-Tibetan translation technology is insufficient. It is difficult to directly apply the Chinese-English question-answer corpus to the Tibetan question-answer corpus. Therefore, at present There is no way to build a large-scale Tibetan Q&A corpus. Contents of the invention [0004] The embodiment of the present inven...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27
CPCY02D10/00
Inventor 孙媛夏天赐
Owner MINZU UNIVERSITY OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products