Method, device, electronic device and storage medium for determining similar text information
A technology of text information and module determination, applied in the field of Internet information, can solve the problem of high cost of manual annotation of parallel corpus, and achieve the effect of saving manpower and reducing costs
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0036] The embodiment of the present application provides a method for determining similar text information, such as figure 1 As shown, the method includes:
[0037] S101, for a plurality of text information to be processed, determine the semantic similarity between each text information to be processed according to the semantic vector of each text information to be processed;
[0038] Multiple pieces of text information to be processed are pre-obtained, and can be obtained by manual labeling, or by a machine, or by a combination of man and machine. Preferably, the amount of text information to be processed is on the order of millions or more.
[0039] The method of determining the semantic vector of each text information to be processed is not limited, one of which is to input the text information to be processed into a pre-trained word vector model, and the word vector model outputs the semantic vector corresponding to each text information to be processed vector, and dete...
Embodiment 2
[0047] The embodiment of the present application provides another possible implementation mode. On the basis of the first embodiment, the method shown in the second embodiment is also included, wherein S101 includes S1011 (not shown in the figure):
[0048] S1011. For a plurality of text information to be processed, calculate the vector angle between the semantic vectors of any two text information to be processed, and use the vector angle as the semantic similarity between the any two text information to be processed;
[0049] S102 includes S1021 (not marked in the figure):
[0050] S1021, if the vector angle between any text information to be processed and the semantic vector of another text information to be processed is greater than the preset first threshold, determine that another text information to be processed is semantically corresponding to any text information to be processed Other pending text information.
[0051] Calculate the vector angle between the semantic ...
Embodiment 3
[0086] The embodiment of the present application provides a device for determining similar text information, such as figure 2 As shown, the device 20 for determining similar text information may include: a first determination module 201, a second determination module 202, and a filter determination module 203, wherein,
[0087] The first determination module 201 is configured to determine the semantic similarity between each pair of text information to be processed according to the semantic vector of each text information to be processed for a plurality of text information to be processed;
[0088] The second determination module 202 is configured to determine at least one other text information to be processed among the plurality of text information to be processed semantically corresponding to each text information to be processed according to the semantic similarity;
[0089] The filtering determination module 203 is configured to perform filtering processing on each text ...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


