Unlock instant, AI-driven research and patent intelligence for your innovation.

Method, system and equipment for automatically constructing thesaurus and computer storage medium

A technology for automatic construction and thesaurus, applied in computer parts, computing, natural language data processing, etc., to achieve the effect of high similarity and even distribution between words

Pending Publication Date: 2021-08-03
CAPITAL NORMAL UNIVERSITY
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the traditional thesaurus is facing certain difficulties in the compilation and maintenance of the vocabulary, as well as in the application of the network information retrieval environment, so it is of great significance to study how to automatically construct the natural language thesaurus

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method, system and equipment for automatically constructing thesaurus and computer storage medium
  • Method, system and equipment for automatically constructing thesaurus and computer storage medium
  • Method, system and equipment for automatically constructing thesaurus and computer storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0059] see figure 1 , figure 1 It is a schematic flowchart of a method for automatically constructing a thesaurus disclosed in an embodiment of the present application. like figure 1 As shown, the first aspect of the present application provides a method for automatically constructing a thesaurus, the method comprising:

[0060] S1. Vocabulary collection, inputting the raw data files required for constructing the thesaurus;

[0061] S2. Extract each word according to the original data file to form a set of descriptors;

[0062] S3, calculate the co-occurrence weight between each word according to the frequency of each word itself in the file, the co-occurrence frequency between each word and the adjustment factor, so as to obtain the degree of association between each word;

[0063] S4. Construct the feature vectors of each word and other words according to the degree of association, wherein the other words are selected as the most relevant K words;

[0064] S5, for the h...

Embodiment 2

[0089] see image 3 , image 3 It is a schematic structural diagram of a system for automatically constructing a thesaurus disclosed in the embodiment of the present application. like image 3 As shown, the second aspect of the present application provides a system for automatically constructing a thesaurus, wherein the system includes: an original file acquisition module, a word division module, a thesaurus extraction module, and a thesaurus construction module ,in:

[0090] The original file obtaining module is used to obtain the original file data;

[0091] Divide word modules for obtaining each word in the original file;

[0092] Descriptor extraction module realizes the calculation method of the method as mentioned above, thereby determining the correlation between words and the relationship between upper and lower positions;

[0093] The thesaurus building module, constructs the thesaurus according to the correlation between words and the hypernymy relationship.

Embodiment 3

[0095] see Figure 4 , Figure 4 It is a schematic structural diagram of a device for automatically constructing a thesaurus disclosed in an embodiment of the present application. like Figure 4 As shown, the third aspect of the present application provides a device for automatically constructing a thesaurus, wherein the device includes:

[0096] a memory storing executable program code;

[0097] a processor coupled to the memory;

[0098] The processor invokes the executable program code stored in the memory to execute the method for automatically constructing a thesaurus in Embodiment 1.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

According to the method for automatically constructing the thesaurus, the following steps are included: combining co-occurrence statistics and distribution similarity calculation, then recognizing the grade relation between words, and compiling the natural language thesaurus, wherein the co-occurrence weight between the words is calculated through the frequency of each word in the file, the co-occurrence frequency between the words and the adjustment factor; constructing feature vectors, calculating semantic similarity, and accordingly combining all words into clusters; converting words in the clusters into all levels according to the level coefficient, and recognizing the hyponymy relation of the words; and finally, constructing the thesaurus according to the inter-word correlativity and the hyponymy of the thesaurus set.

Description

technical field [0001] The present application relates to the field of artificial intelligence, in particular, to a method, system, device and computer storage medium for automatically constructing a thesaurus. Background technique [0002] The rapid development of the network has brought about the explosive growth of information resources. While providing convenience for people, it also makes people gradually realize that they are "submerged" in the ocean of information, how to accurately and efficiently obtain the information they need from massive amounts of information. become an urgent problem to be solved. Most of the current network information retrieval tools (such as search engines, etc.) use the full-text retrieval method based on keyword literal matching. This method is simple, feasible, convenient to search, and has a high recall rate. Only a small part meets the searcher's requirements, and the accuracy rate is low. At the same time, there are also missed and f...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/33G06F16/35G06F40/216G06F40/284G06K9/62
CPCG06F16/3346G06F16/35G06F16/3344G06F40/216G06F40/284G06F18/23
Inventor 张凯周建设刘杰王伟丽
Owner CAPITAL NORMAL UNIVERSITY