Unlock instant, AI-driven research and patent intelligence for your innovation.

Text data analysis method and device, server and storage medium

A technology of text data and analysis methods, applied in text database clustering/classification, unstructured text data retrieval, semantic analysis, etc., can solve the problem of reducing the accuracy of text classification, reducing the similarity between text content features and subject words, and topics Words and vocabulary are not comprehensive enough to achieve the effect of improving accuracy

Active Publication Date: 2018-05-29
RUN TECH CO LTD BEIJING
View PDF6 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, under normal circumstances, there are many words that can reflect the meaning of a certain topic word, and the method of artificially determining the topic word vector can easily lead to an insufficient vocabulary for expressing the topic word
Therefore, in the case of incomplete subject word vectors, the similarity between text content features and subject words will be reduced, thereby greatly reducing the accuracy of text classification

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text data analysis method and device, server and storage medium
  • Text data analysis method and device, server and storage medium
  • Text data analysis method and device, server and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0028] figure 1 It is a flow chart of a text data analysis method provided by Embodiment 1 of the present invention. This embodiment is applicable to the case of classifying text, and the method can be executed by a text data analysis device. The method specifically includes the following steps:

[0029] Step 110, expand the predetermined subject words, and determine subject words vectors.

[0030] In a specific embodiment of the present invention, the subject headings are a set of subject categories of each text to be classified, such as subjects such as politics, finance and economics, and education. Since there are many words that can represent the subject meaning of the subject heading, it is necessary to expand the subject heading. In this embodiment, each subject term can be matched with each vocabulary in the preset corpus through semantic analysis, and the vocabulary matched with each subject term in the corpus can be used as the extended vocabulary of each subject t...

Embodiment 2

[0044] On the basis of the first embodiment above, this embodiment provides a preferred implementation of a text data analysis method, which can determine the training text feature vector and the test text feature vector according to relatively complete subject word vectors. figure 2 A flow chart of a text data analysis method provided in Embodiment 2 of the present invention, such as figure 2 As shown, the method includes the following specific steps:

[0045] Step 201, matching each subject word with each vocabulary in a preset corpus through semantic analysis.

[0046] In a specific embodiment of the present invention, the corpus is a basic resource of language knowledge carried by an electronic computer, which stores language materials that have actually appeared in the actual use of the language, and needs to be processed to become useful resources. In this embodiment, HowNet Chinese thesaurus (HowNet) can be used as the corpus of extended subject words. Through the m...

Embodiment 3

[0078] image 3 It is a schematic structural diagram of a text data analysis device provided by Embodiment 3 of the present invention. This embodiment is applicable to the situation of classifying texts, and the device can implement the text data analysis method described in any embodiment of the present invention. Specifically, the device includes:

[0079] The subject word vector determination module 310 is used to expand the predetermined subject words and determine the subject word vectors;

[0080] The training text feature vector determination module 320 is used to determine the training text feature vector according to the subject word vector;

[0081] The test text feature vector determination module 330 is used to convert the text to be tested into a test text feature vector according to the subject word vector;

[0082] A classification module 340, configured to classify the text to be tested according to the training text feature vector and the test text feature v...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a text data analysis method and device, a server and a storage medium. The method includes the steps that pre-determined subject terms are expanded, and subject term vectors are determined; training text feature vectors are determined according to the subject term vectors; to-be-tested texts are converted into testing test feature vectors according to the subject term vectors; the to-be-tested texts are classified according to the training test feature vectors and the testing text feature vectors. It is achieved that the training text feature vectors and the testing text feature vectors are determined by establishing the complete subject term vectors, the to-be-tested texts are classified according to the feature vectors determined by the complete subject term vectors, and the accuracy of text classification is improved.

Description

technical field [0001] The invention relates to the technical field of intelligent information processing, in particular to a text data analysis method, device, server and storage medium. Background technique [0002] With the rapid development of Internet technology, most of the information is stored and displayed in the form of text. Therefore, in order to facilitate the storage, management and query of information, it is particularly important to classify text data. [0003] At present, there are two main types of text data analysis methods for text classification, namely methods based on link analysis and methods based on content analysis. The method based on link analysis is mainly to make direct or indirect evaluation through the link relationship between document pages. This method has a wide range of applications but the accuracy rate is not high. The content-based method uses the similarity between the content characteristics of the text data to be analyzed and th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/27
CPCG06F16/35G06F40/30
Inventor 谢永恒刘忠松火一莽万月亮
Owner RUN TECH CO LTD BEIJING