Long text-oriented semantic matching method and system

A technology for semantic matching and long text, applied in the field of semantic matching methods and systems for long texts, can solve problems such as unsatisfactory effects of text semantic understanding methods, achieve unsatisfactory results, optimize user experience, and improve search speed

Active Publication Date: 2020-02-21
SICHUAN CHANGHONG ELECTRIC CO LTD
View PDF19 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The technical problem to be solved by the present invention is to provide a long text-oriented semantic matching method and system to solve the unsatisfactory effect of the text semantic understanding method in the prior art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Long text-oriented semantic matching method and system
  • Long text-oriented semantic matching method and system
  • Long text-oriented semantic matching method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0038] Embodiment 1 provides a semantic matching method for long texts, which is mainly used in the field of semantic matching of long texts, to find TOPK text data similar to the target text, such as figure 1 The specific implementation steps shown are as follows:

[0039] Step s1: Perform data processing on the input text, including operations such as removing special characters, word segmentation, word segmentation, and text preprocessing.

[0040] During the data processing in step s1, invalid characters in the input text can be removed, and then the input text can be converted into a text sequence in units of characters and a text sequence in units of words.

[0041] Step s2: Map the input text after data processing into a numerical sequence. Specifically, it may include:

[0042] Step s21: Perform word vector training based on the data in the database, and generate a dictionary to obtain a word vector model. Different sub-feature extraction modules have different word ...

Embodiment 2

[0072] Embodiment 2 provides a long text-oriented semantic matching system, including:

[0073] The text processing module is used to perform data processing on the input text, including operations such as removing special characters, word segmentation, word segmentation, and text preprocessing;

[0074] A numerical sequence generation module, which is used to map the input text after data processing into a numerical sequence in units of words and a numerical sequence in units of words;

[0075] The feature vector extraction module is used to input the numerical sequence of the input text into the feature extraction model to obtain the feature vector of the input text. The feature extraction module includes multiple sub-feature extraction models, and the feature vector of the input text is the output result of multiple sub-feature models. Fusion;

[0076] The database processing module is used to pass each piece of data in the database through a text processing module, a nume...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to the technical field of natural language understanding, discloses a long text-oriented semantic matching method and system, and the method is used for solving the problem of unsatisfactory effect of a text semantic understanding method in the prior art. The method comprises the following steps: performing data processing on an input text, wherein the data processing comprises removing special characters, segmenting words and segmenting characters; mapping the input text subjected to data processing into a numerical sequence; inputting the numerical sequence of the inputtext into a feature extraction model to obtain a feature vector of the input text; clustering based on the feature vectors; based on the clustered database, selecting TOP-N types of candidate data most similar to the input text from the database; and performing similarity measurement on the feature vector of the input text and the feature vector of the candidate data, and selecting TOP-K data most similar to the input text from the candidate data. The method is suitable for semantic matching of the long text.

Description

technical field [0001] The invention relates to the technical field of natural language understanding, in particular to a long text-oriented semantic matching method and system. Background technique [0002] As one of the important directions in the field of artificial intelligence, natural language understanding technology has always been a research hotspot for researchers in related fields. Especially in recent years, with the rapid development of mobile Internet technology and the increasing degree of informatization, people are increasingly eager to allow machines to understand natural language, so as to achieve the goals of reducing manual investment and sharing massive data. [0003] In related technologies, mainstream methods are text semantic understanding methods based on recurrent neural networks and text semantic understanding methods based on convolutional neural networks. However, the usual recurrent neural network and convolutional neural network are difficult...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/33G06F40/289G06F40/30G06K9/62G06N3/04
CPCG06F16/3344G06N3/045G06F18/23G06F18/214
Inventor 杨兰展华益孙锐周兴发饶璐谭斌
Owner SICHUAN CHANGHONG ELECTRIC CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products