Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Chinese text term extracting method utilizing quadratic mutual information

A technology of mutual information and terminology, applied in special data processing applications, natural language data processing, instruments, etc., can solve the problem of low accuracy rate of single feature term extraction, achieve strong field adaptability, low noise of extraction results, and high accuracy rate high effect

Active Publication Date: 2017-02-22
BEIJING INSTITUTE OF TECHNOLOGYGY
View PDF5 Cites 14 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0021] The present invention proposes a Chinese text term extraction method using secondary mutual information, which solves the problem of low accuracy of term extraction using a single feature term in the term extraction task, by combining three statistical features of secondary mutual information, word frequency, and word length The part-of-speech features and the removal of redundant terms based on the nesting relationship between terms improve the accuracy of term extraction

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese text term extracting method utilizing quadratic mutual information
  • Chinese text term extracting method utilizing quadratic mutual information
  • Chinese text term extracting method utilizing quadratic mutual information

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0061] In order to better illustrate the purpose and advantages of the present invention, the implementation of the method of the present invention will be further described in detail below in conjunction with examples.

[0062] The data source used in the experiment is the book "Water Supply Water Quality Testing 3: Water Quality Analysis Technology", which was published by Water Conservancy and Hydropower Press on October 1, 2014. It consists of four chapters and 18 subsections. During the experiment, terms were extracted from the text content of each chapter and section of the book, and the statistical information of the data source is shown in Table 1.

[0063] Table 1. Data source of term extraction experiment

[0064]

[0065] Among them, the effective number of words refers to the number of remaining words after removing characters other than hyphens and Chinese characters.

[0066] The experimental procedure is as follows:

[0067] Step 1, import the experimental ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a Chinese text term extracting method utilizing quadratic mutual information and belongs to a computer science and natural language processing technology. The method comprises the steps that firstly, core words are expanded into multiple candidate terms by combining the quadratic mutual information, word frequency, word length and word characteristics based on previous and later core word expansion, then redundant terms centralized in a candidate set are removed and the terms are graded and sorted according to the nest relation of the terms, the word length and the word frequency characteristics. The method comprehensively considers the language rules and statistical characteristics of the terms, and the term extraction accuracy is improved.

Description

technical field [0001] The invention relates to a Chinese text term extraction method using secondary mutual information, and belongs to the technical field of computer science and natural language processing. Background technique [0002] Term extraction is the process of extracting phrases from the text that can describe the subject of the text and have completeness and domain representation. Term extraction is an important basic research task in the field of natural language processing technology, and has important applications in many fields such as automatic summarization, information retrieval, text classification and clustering. [0003] Terminology is a collection of titles used to represent concepts in a specific field, also known as professional terms or technical terms. The characteristics of terms are usually summarized into two measurable features: 1) Unithood, which measures whether a term can express an independent and complete meaning and has a stable struct...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/27
CPCG06F40/284
Inventor 罗森林陈倩柔潘丽敏吴舟婷
Owner BEIJING INSTITUTE OF TECHNOLOGYGY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products