Text similarity matching method on basis of vector space model

A text similarity and vector space technology, applied in the field of text similarity matching based on the vector space model, can solve the problem that keywords cannot be related horizontally, keywords cannot accurately express user needs, and query keywords cannot reflect well. User intent, etc.

Inactive Publication Date: 2013-04-17
IOL WUHAN INFORMATION TECH CO LTD
View PDF3 Cites 37 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] Since the keywords may have synonyms and polysemy, the accuracy of the similarity calculation results obtained by the traditional vector space model method is not high, and the results are often unsatisfactory; the keyword weighting algorithm is only to find the text and The relationship between keywords cannot be horizontally linked to the relationship between keywords in different texts, which brings the following problems to text retr

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text similarity matching method on basis of vector space model
  • Text similarity matching method on basis of vector space model
  • Text similarity matching method on basis of vector space model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0041] The technology mainly related to the invention is the text similarity technology in the text retrieval technology. Text retrieval is an interdisciplinary subject. From the perspective of major subjects, it spans computer, intelligence, mathematical statistics and other disciplines. From the perspective of specific research directions, it includes technologies such as text retrieval, natural language processing, data mining, and machine learning.

[0042] The translation reference library (referred to as the reference library) is a huge resource library with a large number of texts. It adopts a complex similarity search method to perform similarity search on the text to be translated, so as to find similar reference text sets. The operation speed Very slow, difficult to do fast retrieval. However, using the relatively simple VSN vector space method for similarity retrieval has very low accuracy. This method uses an improved VSM method, which can greatly improve the retriev...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a text similarity matching method on the basis of a vector space model. The text similarity matching method includes extracting keywords of texts, clustering all the keywords and generating a keyword concept tree; and computing the similarity of the texts according to the created keyword concept tree of the keywords in the texts to be translated, and acquiring texts in a translation depository according to the similarity. The texts in the translation depository are matched with the texts to be translated. According to the technical scheme, the test similarity matching method has the advantages that relations among the texts can be relatively accurately reflected, so that the similarity of the texts can be sufficiently reflected.

Description

technical field [0001] The invention relates to a computer technology, in particular to a text similarity matching method based on a vector space model. Background technique [0002] Now some commonly used text retrieval models include text-based retrieval models and structure-based retrieval models. Text-based retrieval models include: vector space model, approximate model, probability model and statistical language retrieval model; structure-based text retrieval models include: internal structure retrieval model, external structure retrieval model. [0003] Text similarity, that is, the numerical measure of the similarity between two texts, take two texts D1 and D2, if (D1∩D2) / (D1∪D2) is closer to 1, the higher the similarity between the two texts, Vice versa. In text retrieval technology, similarity calculation is mainly used to measure the similarity between text objects, and it is a basic calculation in data mining and natural language processing. The key technology ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06F17/28
Inventor 江潮
Owner IOL WUHAN INFORMATION TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products