Text similarity measuring system based on multi-feature fusion

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
A text similarity and multi-feature fusion technology, applied in the field of semantic-based text similarity measurement method and system, can solve the problems of lack of semantics, large difference in text length, and low accuracy of similarity results

Active Publication Date: 2015-06-10

XINJIANG TECHN INST OF PHYSICS & CHEM CHINESE ACAD OF SCI

View PDF4 Cites 59 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0016] The present invention provides a text similarity measurement system based on multi-feature fusion, which combines multiple features based on word frequency, word vector and Wikipedia tags to The purpose of measuring similarity is to solve the problem of lack of semantics caused by the lack of consideration of text context in conventional text similarity measurement systems, and the problem of low accuracy of similarity results caused by large differences in text length

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment

[0066] In order to make those skilled in the art better understand the present invention, the present invention will be described in further detail below in conjunction with the accompanying drawings:

[0067] as attached figure 1 Shown, the present invention comprises the following steps:

[0068] Training text preprocessing: Preprocessing the training text, word segmentation, removing stop words, and removing punctuation marks; for example, for sentence A: "The leader reprimanded the staff" and sentence B: "The employee was criticized by the boss", after word segmentation, After removing stop words and removing punctuation marks, it is expressed as A: [leadership, reprimand, employee] and B: [employee, boss, criticism];

[0069] Word vector model training: In order to obtain the semantic features between words in the text, the deep learning method is used to perform multiple iterations to train the text, and each vocabulary in the training text set is represented as a 200-d...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a text similarity measuring system based on multi-feature fusion and relates to the field of intelligent information processing. According to the system, the text similarity is measured by fusing multiple features based on word frequencies, word vectors and Wikipedia labels. The invention aims to solve the problem of semantic loss caused by non-considering of contexts in a conventional text similarity measuring system and the problem of low similarity result accuracy caused by larger text length difference. The text similarity measuring system is implemented by the following steps: carrying out preprocessing such as word segmentation and stop word removal on a training text; training corpora of the processed training text as a word vector model; measuring the similarity based on the word frequencies, the similarity based on the word vectors and the similarity based on the Wikipedia labels between input text pairs to be computed, and carrying out weighted summation to obtain a final text semantic similarity measuring result. According to the system, the measurement accuracy of the text similarities can be improved, so that the requirement on intelligent information processing is met.

Description

technical field [0001] The invention relates to the technical field of intelligent information processing in the field of information technology, in particular to a method and system for measuring text similarity based on semantics. Background technique [0002] Semantic similarity is a core technology in the field of intelligent information processing, which can be applied to query expansion, word sense disambiguation, question answering system and information retrieval, etc. Assessing semantic similarity is also an important task in numerous research fields, such as psychology, cognitive science, artificial intelligence, etc. [0003] Supervised methods and unsupervised methods are two mainstream methods of semantic similarity measurement. Supervised methods require prior knowledge, such as knowledge base systems or ontology resources, such as DBPedia, WordNet, HowNet, etc.; unsupervised methods mainly use statistical learning The method obtains context information and ru...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F17/30

CPCG06F16/30

Inventor马博李晓蒋同海周喜王磊杨雅婷赵凡

OwnerXINJIANG TECHN INST OF PHYSICS & CHEM CHINESE ACAD OF SCI

Text similarity measuring system based on multi-feature fusion

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology