Chinese phrase string-based fine-grained thematic information extraction method

A topic information, fine-grained technology, applied in semantic analysis, natural language data processing, special data processing applications, etc., can solve the problems of lack of Chinese text mining technology research, complexity of Chinese characters, late development of text mining technology, etc.

Inactive Publication Date: 2016-09-28
SOUTH CHINA UNIV OF TECH
View PDF3 Cites 17 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] Due to the complexity of Chinese characters and the lack of research on Chinese text mining technology, the development of domestic text mining technology is relatively late

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese phrase string-based fine-grained thematic information extraction method
  • Chinese phrase string-based fine-grained thematic information extraction method
  • Chinese phrase string-based fine-grained thematic information extraction method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0070] The embodiments of the present invention will be further described below in conjunction with the examples, but the implementation of the present invention is not limited thereto.

[0071] The following is a search on the Internet for the Chinese economy, and some sentences and text collections are extracted for illustration, and the implementation of the next steps is carried out.

[0072] (1) Global economic and trade growth is sluggish, because the international economic crisis since 2008 has not completely withdrawn, and its impact is still there.

[0073] (2) As China's economic aggregate has become the second largest in the world, many major countries have become somewhat defensive towards China, and neighboring countries are also somewhat hostile towards China.

[0074] (3) China has entered the middle-income stage, the gap between the rich and the poor is still relatively large, part of the social contradictions are intensifying, and ordinary people's demands for...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a Chinese phrase string-based fine-grained thematic information extraction method. The method comprises the following steps: firstly carrying out pre-processing such as Chinese word segmentation, stop word processing and part-of-speech tagging on an input original text set; during the pre-processing, carrying out expand vocabulary input so as to improve the correctness of Chinese word segmentation; after the pre-processing stage is finished, obtaining a processed structured text set; carrying out part-of-speech-based regular expression matching so as to obtain a preliminary phrase screening result; and carrying out statistics on string frequency information of each word, selecting seed words, and expanding the phrases to finally obtain a phrase extraction result. Experiments prove that the text extraction method can be used for effectively and concisely extracting text phrases, and has certain reliability and applicability.

Description

technical field [0001] The invention generally relates to the field of text mining, and specifically relates to a method for extracting fine-grained subject information based on Chinese phrase strings. Background technique [0002] With the continuous development of the Internet age, information has shown explosive growth. In recent years, "big data" and cloud computing technologies have been hyped very hotly, and have also been applied in different fields. This method is based on topic information extraction of Chinese phrase strings, which belongs to text mining technology. In the era of information explosion, people passively accept a large amount of invalid information, such as emails, advertisements, and false news on the Internet, which wastes a lot of time and energy. Although search engines can help people obtain specific information to a certain extent, they still cannot dig deeper into the hidden useful information. Therefore, people urgently need to search for ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/27
CPCG06F16/313G06F40/253G06F40/30
Inventor 黄翰丁东辉林伟佳郝志峰杨晓伟
Owner SOUTH CHINA UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products