Text event classification method based on CHI feature selection

A technology of feature selection and classification methods, applied in unstructured text data retrieval, special data processing applications, instruments, etc., can solve the lack of event classification methods and other problems, and achieve the effect of improving analysis efficiency and analysis accuracy

Inactive Publication Date: 2015-10-07
NANJING NORMAL UNIVERSITY
View PDF4 Cites 33 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

At present, methods such as classification system, pattern matching and machine learning are mostly applied to text classification, but there is a lack of a complete classification method for events in text

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text event classification method based on CHI feature selection
  • Text event classification method based on CHI feature selection
  • Text event classification method based on CHI feature selection

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0028] Further details will be given below in conjunction with the accompanying drawings and embodiments.

[0029] The overall process of this method can be found in figure 1 . This example selects online news report manuscripts as the original corpus for model training. A total of 9 categories of topics are selected, including: automobile, finance, IT, health, sports, tourism, education, recruitment, culture, and military texts. There are 2,000 texts in each type, a total of 18,000 The corpus covers most of the event topics in social life, with high coverage, rich features, and moderate corpus size, which can provide sufficient training and testing corpus. For the text corpus to be classified, 20 online news reports of the "rainstorm" event were selected as an example of implementation.

[0030] (1) Classification model training process:

[0031] Step 11: Text training corpus selection; that is, selecting text training corpus from network texts.

[0032] Step 12: text cor...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention discloses a text event classification method based on CHI feature selection, which comprises the classification model training process aiming at training corpora and the text classifying process. According to the method, classification of text event information is finally implemented by analyzing language description features in a Chinese text event, using a CHI value as a topic feature vector and aiming at the selected training corpora to form feature files and a training template, wherein the model training process comprises the following steps of: (1) selecting the text training corpora; (2) preprocessing the text corpora; (3) selecting category features and generating a feature file set; (4) generating a text feature vector, carrying out normalization processing and generating a feature vector file; and (5) carrying out SVM model training. The text classifying process is similar with the model training process. The method adopted by the present invention can be widely applied to the identifying, classifying, analyzing and monitoring process of Chinese text data mining and analysis efficiency and analysis accuracy of Chinese text natural language processing can be effectively promoted.

Description

technical field [0001] The invention belongs to the field of data mining of geographic information, and specifically discloses a text event classification method based on CHI feature selection. Background technique [0002] With the explosive growth of Internet resources, classification technology has become a field of concern and a research hotspot. According to relevant survey reports of global technology research and consulting companies, at least 95% of human-computer interaction information in the next 10 years will be text language, in which events are the basic unit of people's cognition and understanding of the world, including the time and space of human description of event information. The attribute element information and the semantic relationship between each element of the event. [0003] However, the effective use of event information and knowledge discovery in text has become an urgent problem in the field of text data mining. Text classification can not on...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/36G06F16/353
Inventor 张雪英王曙顾佳诚廖健平朱瑞军
Owner NANJING NORMAL UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products