Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Text deduplication method and device, equipment and medium

A text and text library technology, applied in complex mathematical operations, instruments, electrical and digital data processing, etc., can solve the problem that the effect of deduplication is not very ideal, etc.

Pending Publication Date: 2022-04-08
CHINA CONSTRUCTION BANK
View PDF0 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] In the current method of filtering out duplicate data information for users, the deduplication algorithms used are relatively simple similarity comparison methods, which will lead to unsatisfactory deduplication effects, and users will still see a lot of duplicates. content

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text deduplication method and device, equipment and medium
  • Text deduplication method and device, equipment and medium
  • Text deduplication method and device, equipment and medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0076] Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numerals in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with this application. Rather, they are merely examples of apparatuses and methods consistent with aspects of the present application as recited in the appended claims.

[0077] The present application provides a text deduplication method, device, equipment and medium, aiming to solve the above technical problems in the prior art.

[0078] The technical solution of the present application and how the technical solution of the present application solves the above technical problems will be described in detail below with specific embodiments. The following...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a text duplicate removal method and device, equipment and a medium, and relates to the technical field of natural language processing. Determining a feature vector of each text according to the text feature of each text, and generating a feature matrix among texts in a preset text library according to the feature vector of each text; determining similarity information between texts in a preset text library according to the feature matrix, and screening out a first preset number of texts similar to each text in the preset text library according to the similarity information; and according to a relationship between the similarity information of the second preset number of texts and the similarity information of the third preset number of texts, determining and removing the to-be-deduplicated texts in the preset text library. By the adoption of the technical scheme, under the conditions that the data size is large and CPU resources are limited, the calculation efficiency of repeated data information can be improved, and repeated texts can be accurately deleted.

Description

technical field [0001] The present application relates to the technical field of natural language processing, and in particular to a text deduplication method, device, equipment and medium. Background technique [0002] At present, the data information on the Internet is relatively complicated, and users need to read a large amount of repeated data information when viewing the required data information. Therefore, how to filter out duplicate data information is a problem that the public is more concerned about. [0003] In the current method of filtering out duplicate data information for users, the deduplication algorithms used are relatively simple similarity comparison methods, which will lead to unsatisfactory deduplication effects, and users will still see a lot of duplicates. content. [0004] Therefore, there is an urgent need for a text deduplication method, which can improve the calculation efficiency of duplicate data information and accurately delete duplicate t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62G06F40/30G06F40/284G06F17/16
Inventor 鄢秋霞李昱张圳李斌安飞飞
Owner CHINA CONSTRUCTION BANK
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products