Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Text similarity calculation method based on x2-C

A technology of text similarity and calculation method, which is applied in the field of text similarity calculation based on χ2-C, and can solve the problems of complex structure of CNN model, many parameters, and long running time.

Active Publication Date: 2020-01-17
SHANDONG UNIV OF SCI & TECH
View PDF7 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, there are disadvantages such as complex structure of CNN model, many parameters, and long running time.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text similarity calculation method based on x2-C
  • Text similarity calculation method based on x2-C
  • Text similarity calculation method based on x2-C

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0049] The specific embodiment of the present invention will be further described below in conjunction with accompanying drawing and specific embodiment:

[0050] Based on χ 2 The text similarity calculation method of -C comprises the following steps:

[0051] Step 1: Preprocess the test data and the content of the corpus;

[0052] Step 2: Use the convolutional neural network CNN to classify the test data set;

[0053] Step 3: Use the TF-IDF algorithm to calculate the initial weight of the feature words in the detection sample;

[0054] The TF-IDF algorithm uses express,

[0055]

[0056] Among them, W dt Indicates the weight value of feature word d in document t, TF dt Indicates the word frequency of feature word d in document t, m d Indicates the number of occurrences of feature word d in document t, S represents the total number of feature words in document t, IDF dt Indicates the inverse text frequency index of the feature word d, n d is the number of texts co...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a text similarity calculation method based on x2-C, and particularly relates to the field of text information processing. According to the method, a convolutional neural network CNN is used to classify a test data set; calculating an initial weight of each feature word in the detection sample according to the TF-IDF; calculating a domain correlation factor by using an x < 2>-C algorithm, calculating an initial weight by using the word position factor alpha in combination with the domain correlation factor to obtain a feature word weight, establishing a word bank by using all feature words of the detection sample, and expressing the detection sample as an initial text vector in combination with the word bank and the feature word weight; utilizing a word2vec tool tocalculate the similarity degree among the words in the word bank and form a word meaning similarity degree matrix; the initial text vector is calculated by using the matrix to obtain the text vector,and finally the text vector is calculated by using a cosine similarity algorithm to obtain the similarity between the texts, so that the association degree between the feature words and the field of the feature words, the semantic relationship between the feature words and the position information of the feature words are increased, and the accuracy of text similarity calculation is improved.

Description

technical field [0001] The invention relates to the field of text information processing, in particular to a χ-based 2 -C's text similarity calculation method. Background technique [0002] Text similarity is the calculation of the degree of semantic similarity between texts. In the era of information explosion, text similarity is used in many fields. For example: question answering system, automatic marking, plagiarism checking, etc., the traditional calculation of text similarity is based on the vector space model (VectorSpaceModel, VSM), which uses TF-IDF to calculate the weight of feature words in the text and convert the text into multidimensional The text vector of space measures the similarity between texts by calculating the similarity of text vectors. However, the TF-IDF algorithm only considers the relationship between term features and documents, and does not consider the relationship with categories, resulting in low accuracy of text similarity calculation. Th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F40/194G06F16/35
CPCG06F16/35
Inventor 赵卫东李化泽王铭刘昊
Owner SHANDONG UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products