Topic text sentence vector generation method and device

A sentence vector and sentence generation technology, applied in the field of text processing, can solve the problems of labeling knowledge points and recommending topics that are difficult to achieve good results, and achieve the effect of improving the extraction effect and accuracy.

Active Publication Date: 2019-07-02
江西风向标智能科技有限公司
View PDF4 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Therefore, by manipulating these mathematical characters through traditional training methods, it is easy to magnify the impact of the formulas in the sentence on the semantics, whi

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Topic text sentence vector generation method and device
  • Topic text sentence vector generation method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0012] According to one or more embodiments, such as figure 1 As shown, a sentence vector generation method for the professional field of basic subjects includes the following steps:

[0013] S1, screen out all keywords according to the text expression of the basic subject, add them to the dictionary, and then perform dictionary segmentation on the sentences in the topic text, and mark the keywords appearing in the sentences at the same time;

[0014] S2, based on the word segmentation results and all the keywords screened out, after encoding each sentence and the keywords contained in it, the RNN model is established and the prediction training is carried out by randomly removing keywords;

[0015] S3, using the features extracted by the trained model to generate a sentence vector for each sentence in the title text.

[0016] The sentence vector is generally the average of the word vectors, and the sentence vector can be obtained by adding and summing the word vectors and th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

A topic text sentence vector generation method comprises the following steps: S1, screening out all keywords according to topic text expression, adding a dictionary, carrying out dictionary word segmentation on sentences in a topic text, and marking the keywords appearing in the sentences at the same time; S2, on the basis of the word segmentation result and all screened keywords, after each sentence and the keywords contained in the sentence are coded, establishing an RNN model, and carrying out prediction training by adopting a method of randomly removing the keywords; and S3, generating a sentence vector for each sentence in the question text by utilizing the characteristics extracted by the trained model.

Description

technical field [0001] The invention belongs to the technical field of text processing, and in particular relates to a method and device for generating topic text sentence vectors. Background technique [0002] The method of converting text into vectors is a method often used in the field of natural language processing technology. The main models are Cbow and Skip-gram, One_hot, TF / IDF, etc. The processing of text vectorization is mainly to facilitate text classification, clustering and similarity calculation, so as to achieve the purpose of effectively processing data information. This method is widely used in business fields such as news recommendation, document classification, sentiment analysis, automatic summarization, information retrieval, machine translation, etc. Some of them are presented through mathematical proprietary characters, and the relationship between characters is close, not only the proportion of characters is high, but also the frequency of co-occurre...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27G06F16/33G06F16/35
CPCG06F40/242G06F40/289Y02D10/00
Inventor 梅阳阳郑文娟
Owner 江西风向标智能科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products