Unlock instant, AI-driven research and patent intelligence for your innovation.

Method, device, electronic device and storage medium for determining similar text information

A technology of text information and module determination, applied in the field of Internet information, can solve the problem of high cost of manual annotation of parallel corpus, and achieve the effect of saving manpower and reducing costs

Active Publication Date: 2021-07-16
TENCENT TECH (SHENZHEN) CO LTD
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The purpose of this application is to at least solve one of the above-mentioned technical defects, especially the technical defect of high cost of manually annotating parallel corpus

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method, device, electronic device and storage medium for determining similar text information
  • Method, device, electronic device and storage medium for determining similar text information
  • Method, device, electronic device and storage medium for determining similar text information

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0036] The embodiment of the present application provides a method for determining similar text information, such as figure 1 As shown, the method includes:

[0037] S101, for a plurality of text information to be processed, determine the semantic similarity between each text information to be processed according to the semantic vector of each text information to be processed;

[0038] Multiple pieces of text information to be processed are pre-obtained, and can be obtained by manual labeling, or by a machine, or by a combination of man and machine. Preferably, the amount of text information to be processed is on the order of millions or more.

[0039] The method of determining the semantic vector of each text information to be processed is not limited, one of which is to input the text information to be processed into a pre-trained word vector model, and the word vector model outputs the semantic vector corresponding to each text information to be processed vector, and dete...

Embodiment 2

[0047] The embodiment of the present application provides another possible implementation mode. On the basis of the first embodiment, the method shown in the second embodiment is also included, wherein S101 includes S1011 (not shown in the figure):

[0048] S1011. For a plurality of text information to be processed, calculate the vector angle between the semantic vectors of any two text information to be processed, and use the vector angle as the semantic similarity between the any two text information to be processed;

[0049] S102 includes S1021 (not marked in the figure):

[0050] S1021, if the vector angle between any text information to be processed and the semantic vector of another text information to be processed is greater than the preset first threshold, determine that another text information to be processed is semantically corresponding to any text information to be processed Other pending text information.

[0051] Calculate the vector angle between the semantic ...

Embodiment 3

[0086] The embodiment of the present application provides a device for determining similar text information, such as figure 2 As shown, the device 20 for determining similar text information may include: a first determination module 201, a second determination module 202, and a filter determination module 203, wherein,

[0087] The first determination module 201 is configured to determine the semantic similarity between each pair of text information to be processed according to the semantic vector of each text information to be processed for a plurality of text information to be processed;

[0088] The second determination module 202 is configured to determine at least one other text information to be processed among the plurality of text information to be processed semantically corresponding to each text information to be processed according to the semantic similarity;

[0089] The filtering determination module 203 is configured to perform filtering processing on each text ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

Embodiments of the present application provide a method, device, electronic device, and storage medium for determining similar text information. The method includes: for a plurality of text information to be processed, according to the semantic vector of each text information to be processed, determine the semantic similarity between each pair of text information to be processed, and according to the semantic similarity, determine the At least one other text information to be processed in the multiple text information to be processed semantically corresponding to each text information to be processed and at least one other text information to be processed in the multiple text information to be processed semantically corresponding to each text information to be processed information, perform filtering processing, and determine each text information to be processed and its corresponding filtering result as a pair of similar text information. The similar text information pairs in the embodiment of the present application are obtained through machine processing, have a unified standard, save manpower, financial resources and time, and can effectively reduce costs.

Description

technical field [0001] The present application relates to the field of Internet information technology, and in particular, the present application relates to a method, device, electronic device and storage medium for determining similar text information. Background technique [0002] Deep learning is one of the technical and research fields of machine learning, which realizes artificial intelligence in computing systems by establishing artificial neural networks with hierarchical structures. In the field of artificial intelligence question answering, the similarity calculation of sentences is the core of ensuring the accuracy of answers. Similarity calculation mainly depends on the training of various deep learning models, and training these deep learning models requires a sufficient amount of training text information support. [0003] Usually, the magnitude of the training text information of the deep learning model must be at least tens of millions, so as to ensure the h...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/194G06F16/36G06F40/30
CPCG06F16/374G06F40/194G06F40/30
Inventor 王卓然亓超马宇驰郭伟陈华荣
Owner TENCENT TECH (SHENZHEN) CO LTD