Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Unsupervised text similarity calculation method

A technology of text similarity and calculation method, applied in the field of unsupervised text similarity calculation, can solve the problems of not considering the meaning of words and the relationship between words, unable to calculate accurately, and difficult to meet the requirements of high-speed growth of information, etc. The effect of improving accuracy and improving accuracy

Active Publication Date: 2019-12-03
BEIJING INST OF COMP TECH & APPL
View PDF3 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The string-based method considers characters or words as independent knowledge units, and does not consider the meaning of the words themselves and the relationship between words, so this method cannot accurately calculate the
Although the method based on the supervised neural network can make good use of semantic information, the quality of the training classifier depends largely on the accuracy of the training samples, and the construction of label data is a time-consuming and labor-intensive work. Supervision methods are becoming more and more difficult to meet the requirements of rapid information growth

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Unsupervised text similarity calculation method
  • Unsupervised text similarity calculation method
  • Unsupervised text similarity calculation method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0019] In order to make the purpose, content, and advantages of the present invention clearer, the specific implementation manners of the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments.

[0020] figure 1 Shown is a schematic diagram of the overall network model framework, such as figure 1 As shown, unsupervised text similarity calculation methods include:

[0021] Step 1: Embedding layer model pre-training includes:

[0022] The preprocessing of the question and answer corpus can obtain a question set composed of words. Since the neural network can only accept numerical data and cannot directly process Chinese phrases, it is necessary to pre-train all the words in the question set to generate a set that can meet the needs of the model. word vectors.

[0023] The word embedding method based on neural network shows very good performance in the semantic representation of words. The word embedding method...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to an unsupervised text similarity calculation method, which comprises the following steps of: 1, pre-training an embedded layer model, and pre-training all words in a problem set to generate word vectors meeting the requirements of the model; 2, mining semantic information of sentences through a coding layer network; step 3, performing model improvement based on TFIDF fusion; the method comprises the steps that when each question sentence is input into a neural network, TFIDF calculation is conducted on each input question sentence, calculated weights are input into theneural network, final sentence vector representation is controlled, a normalized TFIDF calculation method is adopted, and the final sentence vector representation is fused into a coding layer and a representation layer. According to the method, the deep neural network model (Bi-LSTM) is used for unsupervised training of the corpus to obtain the language model, and the information of the large-scale corpus can be fully utilized in an unsupervised training mode, so that the text matching accuracy is improved, and the information retrieval precision is improved.

Description

technical field [0001] The invention relates to a communication method, in particular to an unsupervised text similarity calculation method. Background technique [0002] With the advent of the era of big data and the explosive growth of information, information retrieval and matching are playing an increasingly important role in various fields. And one of the key technologies is the text similarity calculation technology. Traditional text similarity calculation methods are mainly string-based methods and corpus-based methods. The string-based method is to compare texts at the literal level, and the co-occurrence and repetition degree of strings are used as the measure of similarity; the corpus-based method is to use the information obtained from the corpus to calculate the text similarity. The method based on the corpus is mainly based on the neural network method, and the training classifier is obtained through the algorithm of supervised learning, and the similarity sco...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/27G06K9/62G06N3/04G06N3/08
CPCG06N3/049G06N3/08G06F18/22Y02D10/00
Inventor 吴超宋颖毅柯文俊陈旭陈静王坤龙杨雨婷
Owner BEIJING INST OF COMP TECH & APPL
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products