Text similarity calculation method based on x2-C

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A technology of text similarity and calculation method, which is applied in the field of text similarity calculation based on χ2-C, and can solve the problems of complex structure of CNN model, many parameters, and long running time.

Active Publication Date: 2020-01-17

SHANDONG UNIV OF SCI & TECH

View PDF7 Cites 1 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, there are disadvantages such as complex structure of CNN model, many parameters, and long running time.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0049] The specific embodiment of the present invention will be further described below in conjunction with accompanying drawing and specific embodiment:

[0050] Based on χ 2 The text similarity calculation method of -C comprises the following steps:

[0051] Step 1: Preprocess the test data and the content of the corpus;

[0052] Step 2: Use the convolutional neural network CNN to classify the test data set;

[0053] Step 3: Use the TF-IDF algorithm to calculate the initial weight of the feature words in the detection sample;

[0054] The TF-IDF algorithm uses express,

[0055]

[0056] Among them, W dt Indicates the weight value of feature word d in document t, TF dt Indicates the word frequency of feature word d in document t, m d Indicates the number of occurrences of feature word d in document t, S represents the total number of feature words in document t, IDF dt Indicates the inverse text frequency index of the feature word d, n d is the number of texts co...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a text similarity calculation method based on x2-C, and particularly relates to the field of text information processing. According to the method, a convolutional neural network CNN is used to classify a test data set; calculating an initial weight of each feature word in the detection sample according to the TF-IDF; calculating a domain correlation factor by using an x < 2>-C algorithm, calculating an initial weight by using the word position factor alpha in combination with the domain correlation factor to obtain a feature word weight, establishing a word bank by using all feature words of the detection sample, and expressing the detection sample as an initial text vector in combination with the word bank and the feature word weight; utilizing a word2vec tool tocalculate the similarity degree among the words in the word bank and form a word meaning similarity degree matrix; the initial text vector is calculated by using the matrix to obtain the text vector,and finally the text vector is calculated by using a cosine similarity algorithm to obtain the similarity between the texts, so that the association degree between the feature words and the field of the feature words, the semantic relationship between the feature words and the position information of the feature words are increased, and the accuracy of text similarity calculation is improved.

Description

technical field [0001] The invention relates to the field of text information processing, in particular to a χ-based 2 -C's text similarity calculation method. Background technique [0002] Text similarity is the calculation of the degree of semantic similarity between texts. In the era of information explosion, text similarity is used in many fields. For example: question answering system, automatic marking, plagiarism checking, etc., the traditional calculation of text similarity is based on the vector space model (VectorSpaceModel, VSM), which uses TF-IDF to calculate the weight of feature words in the text and convert the text into multidimensional The text vector of space measures the similarity between texts by calculating the similarity of text vectors. However, the TF-IDF algorithm only considers the relationship between term features and documents, and does not consider the relationship with categories, resulting in low accuracy of text similarity calculation. Th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F40/194G06F16/35

CPCG06F16/35

Inventor赵卫东李化泽王铭刘昊

OwnerSHANDONG UNIV OF SCI & TECH

Text similarity calculation method based on x2-C

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology