Text clustering method, electronic device and storage medium

A text clustering, electronic device technology, applied in the Internet field, can solve the problems of ignoring context information, unable to accurately select parameters, difficult to define the number of categories, etc., to achieve the effect of improving accuracy and efficiency, and improving efficiency

Active Publication Date: 2019-10-18
招商局金融科技有限公司
View PDF6 Cites 25 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] 1) In the current mainstream method of generating sentence vectors, using the existing word vector summation and averaging will ignore the context information of the text, while models such as sent2vec and doc2vec require large-scale, high-quality training corpus, and user insurance consulting The corpus does not meet the conditions;
[0005] 2) For clustering algorithms, due to the complexity and diversity of insurance problems, it is difficult to define the number of categories, and it is impossible to accurately select parameters;
[0006] In view of the above defects, the efficiency and accuracy of text clustering are greatly reduced

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text clustering method, electronic device and storage medium
  • Text clustering method, electronic device and storage medium
  • Text clustering method, electronic device and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0022] It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0023] The invention provides a text clustering method. The method may be performed by a device, and the device may be implemented by software and / or hardware.

[0024] refer to figure 1 Shown is a flow chart of a preferred embodiment of the text clustering method of the present invention.

[0025] In an embodiment of the text clustering method of the present invention, the method only includes: Step S1-Step S4.

[0026] Step S1, receiving a text clustering instruction issued by a user, the instruction including the corpus to be clustered.

[0027] In the following description, various embodiments of the present invention are described with an electronic device as the main body. In this embodiment, the electronic device receives a text clustering instruction sent by the user through the terminal, and the electroni...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a text clustering method. The method comprises the steps of receiving a text clustering instruction sent by a user; pre-training a pre-determined initial language model by utilizing the to-be-clustered corpus to obtain a target language model; sequentially inputting each text in the to-be-clustered corpus into the target language model for feature extraction, obtaining a sentence vector of each text in the to-be-clustered corpus according to a model output result, and generating a to-be-clustered sentence vector set; and, by utilizing a preset clustering algorithm, clustering the to-be-clustered corpora based on the to-be-clustered sentence vector set to obtain sentence vectors corresponding to each category, and determining a clustering result of the to-be-clustered corpora. The invention further discloses an electronic device and a computer storage medium. By utilizing the method and the device, the text clustering accuracy and efficiency can be improved.

Description

technical field [0001] The invention relates to the technical field of the Internet, in particular to a text clustering method, an electronic device and a computer-readable storage medium. Background technique [0002] With the popularization of artificial intelligence in daily life applications, the development of natural language processing is becoming more and more important. Since most of the corpus has no labels and the high cost of labeling, unsupervised clustering of text is particularly important. [0003] However, for texts within the category of professional domain corpus, the existing techniques are not effective in clustering such texts. Taking insurance common questions as an example, users’ inquiries about insurance belong to the category of insurance-specific corpus, which has the characteristics of small data size, diverse expression methods, professional difficulty in interpreting, and noisy data (for example, advertisements). For this type of text , the ex...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/35G06F16/36G06Q40/08
CPCG06Q40/08G06F16/35G06F16/36
Inventor 张蓓刘屹徐君妍刘濂邵嘉琦徐楠沈志勇万正勇
Owner 招商局金融科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products