De-duplication method, device, apparatus and storage medium thereof for feature word

A feature word and phrase technology, applied in the field of semantic analysis, can solve the problem of high computational complexity, save computational space, reduce computational complexity, and improve generalization ability.

Pending Publication Date: 2018-12-21
DONGJUN NEW ENERGY CO LTD
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, these de-duplication techniques are subject to certain degree, and cannot meet the requirement of maintaining good consistency of feature words extracted from multiple texts.
[0004] In addition, the computational complexity of existing feature mapping methods is too high

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • De-duplication method, device, apparatus and storage medium thereof for feature word
  • De-duplication method, device, apparatus and storage medium thereof for feature word
  • De-duplication method, device, apparatus and storage medium thereof for feature word

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0025] The application will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain related inventions, rather than to limit the invention. It should also be noted that, for ease of description, only parts related to the invention are shown in the drawings.

[0026] It should be noted that, in the case of no conflict, the embodiments in the present application and the features in the embodiments can be combined with each other. The present application will be described in detail below with reference to the accompanying drawings and embodiments.

[0027] Please refer to figure 1 , figure 1 A schematic flowchart of a method for deduplicating characteristic words provided by an embodiment of the present application is shown.

[0028] Such as figure 1 As shown, the method includes:

[0029] Step 110, acquiring a set of phrases associated w...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

A method, a device, an apparatus, and a storage medium for de-duplication of feature word are disclosed. The method includes: acquiring a set of phrases associated with a current feature word in the set of feature words; calculating a sum of a specified part of a phrase based on an ASCII code corresponding to the phrase one by one to obtain a first set of sum values; the deduplicated feature wordsbeing determined by judging the number of minimum values in the first set of sum values. According to the technical proposal of the embodiment of the present application, by calculating the sum of ASCII codes, the feature words with the same meaning are de-duplicated, thereby reducing the computational complexity of the current feature word de-duplication method, saving computational space, and significantly improving the generalization ability of the current feature words to the text.

Description

technical field [0001] The present application generally relates to but not limited to the technical field of semantic analysis, and specifically relates to a method, device, equipment and storage medium for deduplication of characteristic words. Background technique [0002] In natural language processing technology, the smallest meaningful unit in natural language is a phrase or word. Generally speaking, the significance of extracting a single phrase as a feature word is that it can well summarize the main content of the text and reduce the complexity of text processing. In the prior art, there are many algorithms for extracting feature words based on text, such as word frequency-inverse document frequency (TF-IDF) method, information gain and other algorithms. [0003] With the development of technology, there may be multiple words expressing the same meaning among the feature words extracted from multiple texts, resulting in redundancy of feature words. The current fea...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06F17/22
CPCG06F40/126G06F40/205G06F40/284
Inventor 李利明
Owner DONGJUN NEW ENERGY CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products