Unlock instant, AI-driven research and patent intelligence for your innovation.

An Auto-Enlargement Method of Domain Dictionary Based on Vocabulary Annotation

An automatic expansion and domain technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as not considering domain correlation, failing to effectively use dictionary resources, and labeling skew in word domains

Active Publication Date: 2015-10-28
BEIJING INSTITUTE OF TECHNOLOGYGY
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Most of the existing semi-automatic and fully automatic domain dictionary methods need the support of the domain corpus. The quality of the generated domain dictionary depends on the quality of the domain corpus used. The completeness of the domain dictionary is limited by the size of the domain corpus. At the same time, considering Due to the influence of corpus imbalance, the domain labeling of words is more likely to be skewed to the domain with a large corpus
Both of the above methods fail to effectively utilize the existing dictionary resources, and do not consider the correlation between domains

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • An Auto-Enlargement Method of Domain Dictionary Based on Vocabulary Annotation
  • An Auto-Enlargement Method of Domain Dictionary Based on Vocabulary Annotation
  • An Auto-Enlargement Method of Domain Dictionary Based on Vocabulary Annotation

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0039] The present invention will be described in further detail below in conjunction with the accompanying drawings and specific embodiments.

[0040] Table 1 shows the lexical information and the intersection between dictionaries in the four fields of communication, aviation, machinery and computer in the Huajian machine translation dictionary. In Table 1, the domain dictionaries of the four fields of communication, aviation, machinery and computer contain 12626 words, 7592 words, 19250 words, and 5156 words respectively. The intersection number of dictionaries in the field of communication and aviation is 4432; the number of intersections in the field of communications and machinery is 6210; the number of intersections in the field of communications and computers is 2705; the number of intersections in the fields of aviation and machinery is 4908; The number of intersections of dictionaries is 2064; the number of intersections of machinery and computers is 2383.

[0041] T...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a field dictionary automatic extension method based on vocabulary annotation and belongs to the technical field of natural language processing. The field dictionary automatic extension method based on the vocabulary annotation comprises the following steps: (1) growing a field classification tree through analyzing the relevancy of a field dictionary which belongs to fields; (2) obtaining a training set for each field dictionary to be extended; (3) processing pretreatment to the training set to obtain a linguistic data feature set; (4) counting times of each panel point, corresponding to each vocabulary in the linguistic data feature set, appearing in the linguistic data feature set and the number of the linguistic data feature set of one vocabulary contained by a secondary panel point, corresponding to the linguistic data feature set, of the panel point; (5) calculating the confidence coefficient of each vocabulary in each linguistic data feature set; (6) adding new vocabulary to the field dictionary to be extended. The field dictionary automatic extension method based on the vocabulary annotation has no need to collect a field corpus by workers, so that the influence of the quality of the field corpus, limit of the scale and unbalance of the field corpus can be avoided.

Description

technical field [0001] The invention relates to a method for automatically expanding a domain dictionary, in particular to a method for automatically expanding a domain dictionary based on vocabulary notes, and belongs to the technical field of natural language processing. Background technique [0002] Domain Dictionary refers to a collection of terms or expressions unique to a specific domain. Domain dictionaries are the basic resources of natural language processing. Domain knowledge is widely used in word meaning disambiguation and syntactic analysis for various tasks such as machine translation, information retrieval, data mining, and text classification. The scale and quality of domain dictionaries are directly related to performance of related applications. [0003] The construction and expansion methods of domain dictionaries can be divided into three categories according to the degree of automation: manual construction and expansion methods based on expert knowledge...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/27
Inventor 黄河燕史树敏朱朝勇
Owner BEIJING INSTITUTE OF TECHNOLOGYGY