Text duplicate removal method and device
A text and text processing technology, applied in the field of text processing, can solve the problems of high false positive rate, complex implementation and large amount of calculation.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0025] In order to make the object, technical solution and advantages of the present invention clearer, the solutions of the present invention will be further described in detail below with reference to the accompanying drawings and examples.
[0026] In the embodiment of the present invention, text deduplication is completed through the following three steps:
[0027] Step 1. Create a case library:
[0028] In order to deduplicate text, it is first necessary to designate multiple pieces of text as case texts, and process each of the case texts to build a case library.
[0029] The processing of each case text includes the following steps:
[0030] A1. Extract the feature words of the case text to obtain a feature word string.
[0031] Existing word segmentation methods can be used to extract text feature words.
[0032] For example, for the case text: What the hell happened to your car:
[0033] Extract the feature words to get the following feature word string: what happ...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 