Method and device for classifying text

A text classification and text technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems that do not consider the correlation and mutual influence between words and words

Inactive Publication Date: 2016-11-23
HITACHI LTD
View PDF4 Cites 14 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, the problem with such a classification method is that the words contained in the text are considere

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for classifying text
  • Method and device for classifying text
  • Method and device for classifying text

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0029] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0030] In the following description of the present invention, one sentence or several sentences or short sentences are used as an example of a text. However, it should be noted that this is done only for the convenience of describing the embodiment, and cannot be regarded as an actual processing situation. In fact, in the actual application process, it is preferable to treat a paragraph or an article as a text.

[0031] Adopt the text classification method provided according to the embodiment of the present invention, can divide text into ordinary text and valuable text according to the size of the value (effective information amount) of text, wherein, ordinary text is considered as value (effective information amount) Smaller, that is, texts of ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for classifying texts. The method comprises: establishing a training text set, generating a first text classifier and a second text classifier; preprocessing to-be-classified texts, using a replacement string to replace text noise in the texts; counting the probability of the replacement string, when the probability is larger than or equal to a filtering threshold value of the first text classifier, the to-be-classified texts being divided into common texts; when the probability is smaller than the filtering threshold value, performing word segmentation on the to-be-classified texts which is preprocessed; establishing a first text representation, a second text representation, and a third text representation of the to-be-classified texts; based on a method of characteristic representation, calculating first text characteristic representation of the first text representation, second text characteristic representation of the second text representation, and third text characteristic representation of the third text representation; and based on the first text characteristic representation, the second text characteristic representation, and the third text characteristic representation, using a second classifier to classify the to-be-classified texts. Also disclosed is a device for classifying texts.

Description

technical field [0001] The invention relates to a text classification method and device. Background technique [0002] With the continuous development of information technology, the amount of text information faced by human beings is increasing day by day, and there are more and more channels to obtain text information, for example, by browsing the web, using search engines for information retrieval, receiving emails, etc. However, among the massive text information available to users, the value (effective information volume) of the text information is uneven. Therefore, classifying text information according to the value (effective information amount) contained in the text information is an effective means of organizing and managing text information. value (effective information) text information, in order to facilitate the further processing and utilization of text information with higher value, reduce the waste caused by the processing of text information with lower valu...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 周樟俊张学
Owner HITACHI LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products