Unlock instant, AI-driven research and patent intelligence for your innovation.

Multi-word expression extraction method and device

A technology for obtaining devices and vocabulary, which is applied in special data processing applications, instruments, and electronic digital data processing, etc., can solve the problems of low accuracy of multi-word expression database and inability to obtain multi-word expressions at one time, so as to improve utilization rate and improve The effect of accuracy

Inactive Publication Date: 2017-05-10
CAS HEFEI INST OF TECH INNOVATION
View PDF5 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the problems faced by the existing statistical methods are: one-dimensional mutual information needs to manually set the threshold, there is an adaptability problem to different data, it is limited to the binary structure of multi-words, and it is impossible to obtain multi-word expressions of multi-word combinations at one time. And it needs to be implemented step by step, and the accuracy of multi-word expression database construction is low

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-word expression extraction method and device
  • Multi-word expression extraction method and device
  • Multi-word expression extraction method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0031] A multi-word expression extraction method, the method comprises the steps of the following sequence: (1) the document base adopts preprocessing such as word segmentation and part-of-speech tagging to form a source language document; (2) calculates the mutual information of adjacent words in the multi-document, and Further calculate the jump information before and after the mutual information sequence; (3) The mutual information sequence and the jump information sequence form a two-dimensional mutual information set; (4) The two-dimensional mutual information set uses a classifier to express inliers and outliers for multiple words , to filter continuous internal point links to construct multi-word expressions. Such as figure 1 shown.

[0032] Combine the following figure 1 The present invention is further described.

[0033] In the step (1), Chinese word segmentation, part-of-speech tagging, named entity recognition, and part-of-speech selection are performed on all t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a multi-word expression extraction method and device. The method comprises the steps that a vocabulary set is formed after a document library is preprocessed, mutual information of every two adjacent vocabularies in multiple documents is calculated, transient information before and after each mutual information is acquired, the mutual information and the transient information form two-dimensional mutual information, multi-word expression is screened out by clustering the two-dimensional mutual information, and then a multi-word expression library is constructed. According to the multi-word expression extraction method and device, the problems that a threshold value of one-dimensional mutual information needs to be manually set, and the one-dimensional mutual information has the adaptability to different data are avoided; meanwhile, a multi-word dual structure is not limited, and multi-word expression of a multi-word combination can be acquired at a time; in addition, the method does not need to be achieved step by step, the multi-word expression utilization rate is effectively increased, and the multi-word expression library construction accuracy is improved.

Description

technical field [0001] The invention relates to the technical field of statistical machine translation and cross-language information retrieval, in particular to a multi-word expression extraction method and device thereof. Background technique [0002] Multi-word expression is a combination of multiple words with grammatical, semantic or pragmatic characteristics and complete meaning. The recognition of multi-word expressions can improve the efficiency and accuracy of word segmentation, part-of-speech tagging, and machine translation. In machine translation, correct identification of multi-word expressions in the source language is helpful to choose a suitable translation, and avoid unnatural or even unintelligible target language caused by separate translation of multiple words. [0003] The extraction methods of multi-word expressions are basically divided into statistical methods and rule-based methods. Rule-based methods generally study a certain type such as verb phr...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27
CPCG06F40/216G06F40/295G06F40/30
Inventor 朱泽德曾新华郑守国孙熊伟翁士状
Owner CAS HEFEI INST OF TECH INNOVATION