Characteristic word extraction method, text similarity calculation method, device and equipment

An extraction method and feature word technology, applied in the computer field, can solve the problem of not directly giving feature words and low accuracy, and achieve the effect of expanding the scope of screening and improving the accuracy

Active Publication Date: 2022-02-01
PING AN TECH (SHENZHEN) CO LTD
View PDF9 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, in most cases, the text does not directly give its corresponding feature words, so it is necessary to provide a feature word extraction scheme
[0003] However, the inventor found in the process of studying the present invention that the accuracy of the feature word extraction scheme provided by the prior art is not high

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Characteristic word extraction method, text similarity calculation method, device and equipment
  • Characteristic word extraction method, text similarity calculation method, device and equipment
  • Characteristic word extraction method, text similarity calculation method, device and equipment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0031] Embodiment 1 of the present invention provides a feature word extraction method, which is used to extract feature words of a target text by using an improved TF-IDF algorithm. specifically, figure 1 A flow chart of the feature word extraction method according to Embodiment 1 of the present invention is schematically shown. Such as figure 1 As shown, the feature word extraction method may include steps S101 to S106, wherein:

[0032] Step S101, in response to a word segmentation instruction for the target text, perform word segmentation on the target text to obtain a word segmentation set.

[0033] Among them, the target text can be any text, such as papers, patents or technical articles. A participle can be a word or a word, for example, a participle is "most", and another example is "similar".

[0034] One solution is: the word segmentation set includes all the word segmentations that make up the target text.

[0035] For example, the target text is "Beijing welco...

Embodiment 2

[0069] Embodiment 2 of the present invention provides a text similarity calculation method. Some steps of the text similarity calculation method are corresponding to the steps in the above-mentioned embodiment 1. These steps will not be repeated in this embodiment 2. Specifically Reference may be made to the first embodiment above. specifically, figure 2 A flow chart of a method for calculating text similarity according to Embodiment 2 of the present invention is schematically shown. Such as figure 2 As shown, the text similarity calculation method may include steps S201 to S204, wherein:

[0070]Step S201, selecting the characteristic words of the target text, wherein the characteristic words of the target text are selected through the method described in the first embodiment.

[0071] Step S202, input the feature words into the first text retrieval database to obtain several first texts.

[0072] In this embodiment, the first text retrieval library is composed of text ...

Embodiment 3

[0078] Embodiment 3 of the present invention provides a method for calculating text similarity. Some steps of the method for calculating text similarity are the same as those in Embodiment 1 and Embodiment 2 above. These steps are not included in Embodiment 3. For further details, reference may be made to the above-mentioned Embodiment 1 and Embodiment 2 for details. specifically, image 3 A flow chart of a method for calculating text similarity according to Embodiment 3 of the present invention is schematically shown. Such as image 3 As shown, the text similarity calculation method may include steps S301 to S307, wherein:

[0079] Step S301, selecting the characteristic words of the target text, wherein the characteristic words of the target text are selected through the method described in the first embodiment.

[0080] Step S302, input the feature words into the first text retrieval database to obtain several first texts.

[0081] Step S303, expand the feature words to...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a feature word extraction method, comprising: in response to a word segmentation instruction for the target text, performing word segmentation on the target text to obtain a word segmentation set; combining each of the word segmentations in the word segmentation set to obtain several phrases, Wherein, each described phrase includes some described participles; Calculate the first TF value and TF-IDF value of described phrase; Calculate the second TF value of each described participle of combined described phrase, obtain several described first Two TF values; Utilize the TF-IDF value, the first TF value and some of the second TF values ​​to calculate the probability limit TF-IDF value of the phrase; select the probability limit TF arranged before the predetermined position The phrase corresponding to the -IDF value is used as a feature word of the target text. The present disclosure also provides a text similarity calculation method, a feature word extraction device, a text similarity calculation device, a computer device and a computer-readable storage medium.

Description

technical field [0001] The present invention relates to the field of computer technology, in particular to a method for extracting feature words, a method for calculating text similarity, a device, computer equipment and a computer-readable storage medium. Background technique [0002] Feature words refer to the words or phrases that can best represent the topic and key content of the text, and they have very important applications in many fields. For example, text comparison, text classification, content push and search engine, etc. However, in most cases, the text does not directly give its corresponding feature words, so it is necessary to provide a feature word extraction scheme. [0003] However, the inventor found in the process of researching the present invention that the accuracy of the feature word extraction scheme provided by the prior art is not high. Contents of the invention [0004] The purpose of the present invention is to provide a feature word extract...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/289G06F40/194G06F40/216
CPCG06F40/216G06F40/289G06F40/194
Inventor 刘翔姚飞
Owner PING AN TECH (SHENZHEN) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products