Method, system and equipment for automatically constructing thesaurus and computer storage medium
A technology for automatic construction and thesaurus, applied in computer parts, computing, natural language data processing, etc., to achieve the effect of high similarity and even distribution between words
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0059] see figure 1 , figure 1 It is a schematic flowchart of a method for automatically constructing a thesaurus disclosed in an embodiment of the present application. like figure 1 As shown, the first aspect of the present application provides a method for automatically constructing a thesaurus, the method comprising:
[0060] S1. Vocabulary collection, inputting the raw data files required for constructing the thesaurus;
[0061] S2. Extract each word according to the original data file to form a set of descriptors;
[0062] S3, calculate the co-occurrence weight between each word according to the frequency of each word itself in the file, the co-occurrence frequency between each word and the adjustment factor, so as to obtain the degree of association between each word;
[0063] S4. Construct the feature vectors of each word and other words according to the degree of association, wherein the other words are selected as the most relevant K words;
[0064] S5, for the h...
Embodiment 2
[0089] see image 3 , image 3 It is a schematic structural diagram of a system for automatically constructing a thesaurus disclosed in the embodiment of the present application. like image 3 As shown, the second aspect of the present application provides a system for automatically constructing a thesaurus, wherein the system includes: an original file acquisition module, a word division module, a thesaurus extraction module, and a thesaurus construction module ,in:
[0090] The original file obtaining module is used to obtain the original file data;
[0091] Divide word modules for obtaining each word in the original file;
[0092] Descriptor extraction module realizes the calculation method of the method as mentioned above, thereby determining the correlation between words and the relationship between upper and lower positions;
[0093] The thesaurus building module, constructs the thesaurus according to the correlation between words and the hypernymy relationship.
Embodiment 3
[0095] see Figure 4 , Figure 4 It is a schematic structural diagram of a device for automatically constructing a thesaurus disclosed in an embodiment of the present application. like Figure 4 As shown, the third aspect of the present application provides a device for automatically constructing a thesaurus, wherein the device includes:
[0096] a memory storing executable program code;
[0097] a processor coupled to the memory;
[0098] The processor invokes the executable program code stored in the memory to execute the method for automatically constructing a thesaurus in Embodiment 1.
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


