Text similarity calculation system and method based on multi-keyword pair matching, and storage medium

A text similarity, similarity calculation technology, applied in computing, computer components, neural learning methods, etc., can solve problems such as noise and redundancy, and achieve the effect of simple model structure, strong robustness, and accurate results

Pending Publication Date: 2021-01-15
NORTHWEST UNIV(CN)
View PDF0 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The purpose of the present invention is that there are defects or deficiencies in the prior art of text similarity calculation, to solve the redundancy and noise problems of the text pair , and to use multiple semantic levels and two granularities of words and sentences to classify the text pair P, Q> carry out similarity calculation, and the present invention discloses a text similarity calculation system, method, and storage medium based on multi-keyword pair matching

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text similarity calculation system and method based on multi-keyword pair matching, and storage medium
  • Text similarity calculation system and method based on multi-keyword pair matching, and storage medium
  • Text similarity calculation system and method based on multi-keyword pair matching, and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0033] join figure 1 , a text similarity calculation system based on multi-keyword pair matching, at least including sequentially connected:

[0034] Preprocessing module: preprocess the acquired text P and text Q respectively, and obtain the preprocessed text pair ;

[0035] Pre-training module: the pre-processed text pair is pre-trained to obtain the word vector of each word;

[0036] Context module: perform semantic encoding on the word vector of each word obtained from pre-training, and obtain the semantic vector of the text pair ;

[0037] Keyword pair extraction module: extract multiple keyword pairs with different semantic levels from the text pair ;

[0038] Word-level similarity calculation module: calculate the word-level similarity of text pairs through word-level tasks;

[0039] Sentence-level similarity calculation module: calculate the sentence-level similarity of text pairs through sentence-level tasks;

[0040] Similarity result output module: The simila...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a text similarity calculation system and method based on multi-keyword pair matching, and a storage medium, for texts P and Q, text similarity calculation is completed by meansof Word2vec, a bidirectional long short-term memory network (BiLSTM), an attention mechanism and a dual-task architecture. According to the method, the influence of noise and redundant data on the model performance is avoided through the WP-Attention and the dual-task architecture, and the model is simple in structure, easy to expand, high in robustness and easy to popularize and use in practice.

Description

technical field [0001] The present invention relates to the field of text mining and computer technology, in particular to a text similarity calculation system, method and storage medium based on multi-keyword pair matching. Background technique [0002] With the rapid development of the Internet and artificial intelligence, the information generated by the Internet has shown explosive growth, and people are eager to extract content that is highly consistent with their own needs and interests from the massive information. In order to meet this demand, a variety of applications have emerged, such as search engines, automatic question answering systems, document classification and clustering, text information retrieval, etc., and one of the key technologies in these application scenarios is text similarity calculation technology. The performance of these applications Depends on the accuracy of sentence similarity calculation. [0003] Text similarity in natural language proce...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/194G06F16/332G06K9/62G06N3/04G06N3/08
CPCG06F40/194G06F16/3329G06N3/08G06N3/045G06F18/22G06F18/214
Inventor 冯筠卢鑫孙霞邓瑶
Owner NORTHWEST UNIV(CN)
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products