Text classification method, device, computer equipment and storage medium

A text classification and text technology, applied in the field of semantic analysis, can solve problems such as low algorithm efficiency and high time complexity, and achieve the effect of reducing vector dimensions and reducing time complexity

Active Publication Date: 2021-09-21
浙江大搜车软件技术有限公司
View PDF1 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

These two algorithms use the word vector technology to capture the semantics and greatly improve the classification accuracy, but the word-based representation makes the algorithm have a high time complexity
When the dictionary dimension of the corpus is too large, the algorithm efficiency is very low

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text classification method, device, computer equipment and storage medium
  • Text classification method, device, computer equipment and storage medium
  • Text classification method, device, computer equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0061] Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this application will be thorough and complete, and will fully convey the concepts of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

[0062] Furthermore, the drawings are merely schematic illustrations of the application and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus repeated descriptions thereof will be omitted. Some of the block diagrams shown in the drawings are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The application discloses a text classification method, device, computer equipment and storage medium. The text classification method includes: obtaining the text to be classified, and constructing a topic frequency vector of the text to be classified, wherein the topic frequency vector is a vector formed by the frequency of topics contained in the text to be classified; according to the preset topic vector Get the distance between the texts to be classified with the topic frequency vector, the distance between the texts to be classified is inversely proportional to the similarity of the texts to be classified; according to the similarity of the texts to be classified The text is classified. Therefore, the present application can abstract the text into a small number of topic sets, and determine the classification of the text through the topics, which can greatly reduce the computational complexity.

Description

technical field [0001] The present application relates to the technical field of semantic analysis, in particular to a text classification method, device, computer equipment and storage medium. Background technique [0002] The semantic distance between two texts plays a very important role in many natural language processing applications, such as information retrieval, sentiment analysis, and news classification, etc. [0003] The vectorized representation of text is the key to text classification algorithms. The commonly used text representation method is the Bag of Words (BOW) algorithm. By combining the BOW model with the word vector, someone proposed a WMD (WordMover's Distance) algorithm based on word matching, which uses the word vector to capture the semantics between words. distance, achieving high classification accuracy. Later, someone proposed the Supervised Word Mover's Distance (SWMD), which is a variant of the WMD algorithm in supervised scenarios. These tw...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/35G06K9/62
CPCG06F16/35G06F18/22G06F18/2411
Inventor 吴欣辉姜楠
Owner 浙江大搜车软件技术有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products