Method, system, device and medium for screening Chinese nouns based on edit distance

A technology of editing distance and screening method, which is applied in the field of text processing, can solve the problems of complex calculation methods, training corpus, and low accuracy, and achieve the effect of expanding the screening range, increasing the amount of data samples, and high accuracy

Active Publication Date: 2022-04-29
CHENGDU SHULIANYUNSUAN TECH CORP
View PDF17 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0015] The purpose of the present invention is to solve the technical problems that the existing Chinese noun screening method has low accuracy and complicated calculation methods, and needs training corpus

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method, system, device and medium for screening Chinese nouns based on edit distance
  • Method, system, device and medium for screening Chinese nouns based on edit distance
  • Method, system, device and medium for screening Chinese nouns based on edit distance

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0070] Please refer to figure 1 , figure 1 It is a schematic flow chart of a Chinese noun screening method based on edit distance. Embodiment 1 of the present invention provides a Chinese noun screening method based on edit distance. The method includes:

[0071] Build a data dictionary, wherein, in the data dictionary, words are stored in groups, and each phrase corresponds to a word quotation and a plurality of similar words;

[0072] Obtaining a reference word, matching the reference word with the index in the data dictionary, if the matching is successful, obtaining a plurality of similar words corresponding to the index;

[0073] Combining a plurality of similar words obtained by matching with the reference word to obtain a screening phrase;

[0074] Compute the similarity between each word in the screened phrase and each word in the screened data set;

[0075] Screening out words corresponding to the similarity greater than a threshold from the screening data set to o...

Embodiment 2

[0095] Please refer to Figure 9 , Figure 9 It is a schematic diagram of the composition of the Chinese noun screening system based on edit distance. Embodiment 2 of the present invention provides a Chinese noun screening system based on edit distance. The system includes:

[0096] A construction unit is used to construct a data dictionary, wherein the words in the data dictionary are stored in groups, and each phrase corresponds to a word quotation and a plurality of similar words;

[0097] The matching unit is used to obtain a reference word, and matches the reference word with the index in the data dictionary, and if the matching is successful, obtains a plurality of similar words corresponding to the index;

[0098] A combination unit, configured to combine a plurality of similar words obtained by matching with the reference word to obtain a screening phrase;

[0099] A computing unit, used to calculate the similarity between each word in the screening phrase group and ea...

Embodiment 3

[0102] Embodiment 3 of the present invention provides a device for screening Chinese nouns based on edit distance, including a memory, a processor, and a computer program stored in the memory and operable on the processor, and the processor executes the The computer program realizes the steps of the Chinese noun screening method based on edit distance.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a Chinese noun screening method, system, device and medium based on edit distance, and relates to the field of text processing, including: constructing a data dictionary, wherein words in the data dictionary are stored in groups, and each phrase corresponds to A word leads and a plurality of similar words; Obtain benchmark word, described benchmark word is matched with the word lead in the data dictionary, if matching is successful, then obtain a plurality of similar words corresponding to the word lead; A similar word is combined with the reference word to obtain a screening phrase; Calculate the similarity between each word in the screening phrase and each word in the screening data set; Filter out the similarity greater than a threshold from the screening data set The corresponding words obtain the screening results; the present invention adopts a data dictionary to expand the scope of noun data screening and improve the accuracy of data screening.

Description

technical field [0001] The invention relates to the field of text processing, in particular to a method, system, device and medium for screening Chinese nouns based on edit distance. Background technique [0002] In the process of text processing, it is often necessary to use text screening techniques to process text screening to obtain the required results. [0003] In the process of text processing, the processed texts come from different scenes. Some Chinese nouns may have different characters, but the meaning of expression is the same. Therefore, at a certain processing level, such Chinese nouns should be classified into one category. The method of calculating the similarity between two pairs by means of editing distance, and setting the threshold to filter to get the final result can only solve the case where the difference in character composition between two Chinese nouns is small, and cannot solve the problem that the difference in character composition is large or e...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/335G06F16/903G06F40/169G06F40/242G06F40/247G06K9/62
CPCG06F16/335G06F16/90344G06F40/242G06F40/247G06F40/169G06F18/22G06F18/25
Inventor 不公告发明人
Owner CHENGDU SHULIANYUNSUAN TECH CORP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products