Improved text similarity solution method

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A text similarity, text technology, applied in the improved text similarity solution field, can solve the problems of not considering the difference in importance, low accuracy, no internal connection, etc., and achieve the effect of high accuracy and good applicability

Inactive Publication Date: 2017-05-03

SICHUAN YONGLIAN INFORMATION TECH CO LTD

View PDF0 Cites 3 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

Therefore, if you want to accurately calculate the similarity between texts, you must start from the semantic level; the second type is a method based on semantic analysis, which uses a semantic dictionary to perform semantic analysis on the words in the text, but there is no Deepen the internal relationship between semantics, and do not consider the difference in the importance of different words in the text to the text, so the accuracy of the calculation is low

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0015] In order to solve the difference in the importance of different words in the text in the feature words to the text, combined with figure 1 The present invention has been described in detail, and its specific implementation steps are as follows:

[0016] Step 1: Initialize the text corpus module, treat the comparison text (W 1 , W 2 ) for preprocessing, the specific description process is as follows:

[0017] The text (W 1 , W 2 ) for word segmentation and stop word processing.

[0018] Step 2: Based on the information theory method, calculate the weight value W of the vocabulary in the text I , the specific calculation process is as follows:

[0019] The formulas for calculating word frequency based on information theory are:

[0020]

[0021] above formula is the amount of information that the vocabulary has in the document about the word frequency, p(c 1，2 ) respectively for word c 1 、c 2 Probability values in the text.

[0022] The formula for calcu...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides an improved text similarity solution method. The method comprises the following steps: performing word segmentation and unused word removal processing on two texts (the formula is as shown in the specification), calculating weights of feature vocabularies in the texts based on an information theory method, calculating position weights of the vocabularies according to the position information of the vocabularies, constructing feature value target functions (the formula is as shown in the specification) in the texts by integrating all factors, extracting two feature vocabulary vectors, and converting a text similarity problem into a vector distance problem. Compared with the traditional text similarity calculation method, the improved text similarity solution method provided by the invention has the advantages of higher accuracy and wider application range, and meanwhile provides good theoretical basis for subsequent text clustering.

Description

technical field [0001] The invention relates to the technical field of semantic network, in particular to an improved method for solving text similarity. Background technique [0002] At present, there are two main types of text similarity calculation methods. The first type is based on mathematical statistics, such as the classic vector space model method. This type of method is simple to calculate, and can reflect the similarity between two texts in terms of word frequency and frequency level. But a text with practical significance has its own central idea that it wants to express, which is a concept on the semantic level. The central idea extracted by mathematical statistics methods is far from the central idea actually expressed in the text. Therefore, if you want to accurately calculate the similarity between texts, you must start from the semantic level; the second type is a method based on semantic analysis, which uses a semantic dictionary to perform semantic analys...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F17/27

CPCG06F40/30

Inventor 金平艳

Owner SICHUAN YONGLIAN INFORMATION TECH CO LTD

Improved text similarity solution method

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology