Unlock instant, AI-driven research and patent intelligence for your innovation.

Calculation method, device, equipment and storage medium for text similarity

A text similarity and calculation method technology, applied in the text similarity calculation method, equipment, storage media, and device fields, can solve the problem of reducing the number of users' search for similar text content, failing to represent similarity, and weakening the similarity of text-related content and other issues to achieve the effect of increasing the number of references, increasing diversity, and accurately understanding

Active Publication Date: 2021-11-09
鼎易创展咨询(北京)有限公司
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The judgment of text similarity in the prior art is mainly aimed at the judgment of the overall similarity of the text, and when the text contains multiple topics, the overall similarity of the text cannot represent the similarity of each topic in the text, which weakens the relationship between text-related content. similarity, which reduces the amount of users' viewing of similar text content

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Calculation method, device, equipment and storage medium for text similarity
  • Calculation method, device, equipment and storage medium for text similarity
  • Calculation method, device, equipment and storage medium for text similarity

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0030] Figure 1A It is a flowchart of a method for calculating text similarity provided by Embodiment 1 of the present invention. This embodiment can be applied to any document management system or expert system that needs to analyze text data. A text similarity calculation method provided in this embodiment can be executed by the text similarity calculation device provided in the embodiment of the present invention, the device can be implemented by software and / or hardware, and integrated in the implementation of the method Among the devices, the device executing the method in this embodiment may be any device capable of querying and analyzing document data, such as a tablet computer, a desktop computer, and a notebook. Specifically, refer to Figure 1A , the method may include the following steps:

[0031] S110. Obtain the target text and at least one target text according to user requirements, and perform word segmentation processing on the at least one target text to obta...

Embodiment 2

[0057] figure 2 In the method provided by Embodiment 2 of the present invention, each word in the word sequence of the target text is clustered, and the method flow chart of the subject and the corresponding keyword in the target text is respectively obtained. This embodiment is based on the above-mentioned On the basis of the embodiments, each word in the word sequence of the target text is clustered, and the topics and corresponding keywords in the target text are respectively obtained for further explanation. Specifically, such as figure 2 As shown, the method may include the following steps:

[0058] S210. Determine text feature words and corresponding word vectors in the benchmarking text according to the weights of each word in the word sequence of the benchmarking text.

[0059] Among them, when the word sequence after word segmentation of the benchmarking text is obtained, in order to filter out the words of little contribution or importance in the benchmarking tex...

Embodiment 3

[0086] image 3 It is a flowchart of a method for calculating text similarity provided by Embodiment 3 of the present invention. This embodiment is optimized on the basis of the foregoing embodiments. Specifically, refer to image 3 , this embodiment may include the following steps:

[0087] S310. Obtain the target text and at least one target text according to user requirements, and perform word segmentation processing on the at least one target text to obtain a corresponding word sequence.

[0088] S320. Perform clustering processing on each word in the word sequence of the target text, and respectively obtain topics and corresponding keywords in the target text.

[0089] S330, perform word segmentation processing on the target text, and obtain all target words in the target text.

[0090] S340. According to the word vectors and weights of all keywords in each topic of at least one target text, respectively determine the similarity between each target word and each keywo...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The embodiment of the invention discloses a text similarity calculation method, device, equipment and storage medium. Wherein, the method includes: obtaining the target text and at least one target text according to user requirements, and performing word segmentation processing on at least one target text to obtain a corresponding word sequence; performing clustering processing on each word in the word sequence of the target text , to obtain the topics and corresponding keywords in the benchmarking text respectively; according to the keywords of the benchmarking text, respectively calculate the text similarity between the target text and each topic in at least one benchmarking text. According to the technical solution of the embodiment of the present invention, the topics contained in the benchmark text and the corresponding keywords are acquired through clustering, thereby realizing the similarity judgment of the target text and different subject contents in the benchmark text, and increasing the text similarity judgment The diversity increases the number of users' access to similar text content, enabling users to quickly and accurately understand the target text.

Description

technical field [0001] The embodiments of the present invention relate to the field of data processing, and in particular to a method, device, device and storage medium for calculating text similarity. Background technique [0002] With the development of digital technology, a large amount of text data is stored inside the enterprise. When analyzing these text data, users need to find and consult similar text information, so as to quickly understand each text. Due to the increase of text data, the method of manually reading each text and using manually labeled text categories or labels to judge text similarity can no longer meet the timeliness requirements, nor can it guarantee the uniform quality of annotations. Therefore, computer natural language processing is required. To judge the similarity of each text. [0003] At present, when judging text similarity, the text data to be analyzed, that is, the target text, and the text data for reference, that is, the benchmark tex...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/35G06F40/289G06K9/62
CPCG06F40/284
Inventor 应文池王虹森
Owner 鼎易创展咨询(北京)有限公司