Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Mining method of field specific word

A proprietary word and domain technology, applied in the computer technology application field of natural language processing, can solve problems such as inconvenient migration to different fields, high labor requirements, and large missing vocabulary summaries

Inactive Publication Date: 2016-06-08
贺惠新
View PDF7 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The current computer has a strong memory function but insufficient reasoning skills. In terms of cognition of domain-specific words, the method based on artificial rules focuses on analyzing and constructing word-forming rules from the grammatical structure, and fully utilizes the rules to find out when analyzing the corpus. Words, this method has high requirements on the language and domain expertise of the participants, and the omission of human thinking and design will inevitably lead to a larger amount of missing vocabulary summaries, and this system is not easy to migrate to different fields; based on statistics The method analyzes the probability of lexical composition from natural language texts. Since the volume of training data in the field is large enough, this requires a lot of labor for manual labeling. The current main processing method is mostly based on familiar corpus regardless of field. Perform unified training to learn to generate models, and finally use them in different fields, which leads to a decrease in accuracy
Excavating domain-specific vocabulary and forming a proprietary vocabulary dictionary is for subsequent application tasks. However, due to the lack of pertinence of the general method, adding specific words from different fields will cause the failure of subsequent tasks.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Mining method of field specific word
  • Mining method of field specific word
  • Mining method of field specific word

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0062] Below with the accompanying drawings figure 1 and figure 2 This embodiment will be described.

[0063] The method that the present invention designs is made up of training model and application model two stages, and it comprises the following steps:

[0064] training phase

[0065] Training step 1: Obtain dependent resources in the model training phase: Obtain a set of NS sentences S={S(i) } is the training corpus, and each sentence is recorded as S(i), where 1≤i≤NS, requiring NS≥10000; the domain subject dictionary Dz;

[0066] Training step 2: perform characteristic representation on all characters of the training corpus S, and obtain the extracted feature representation result of each character, denote s(i, j i ) is the jth of the sentence S(i) i words, where 1≤j i ≤ the total number of characters of the sentence S(i), then s(i,j i ) and the corresponding characterization result is:

[0067] ;

[0068] The detailed steps of extracting each feature are as f...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a mining and construction method of a field specific word, and belongs to the application field of the computer technology of natural language processing. The mining and construction method has the advantages that a specific word mining method which combines with a field topic dictionary and a statistic model is provided on the basis of the field correlation of corpus, an algorithm is provided with an efficient characteristic generation and combination way, computation complexity can be fully lowered, and finally, a specific word mining mode with high accuracy is finally and effectively generated; and when the method is applied, relevant words can be conveniently added into a new dictionary in a targeted way so as to effectively improve the applicability of the algorithm under different scenes. The mining and construction method effectively realizes a purpose that a computer automatically extracts the specific word associated with the field in the wide corpus of the natural language, the field specific word formed by extracted vocabularies can be supplemented into a field specific dictionary, and the field specific word is further provided for the computer to carry out various types of subsequent analysis.

Description

technical field [0001] The invention relates to a method for mining and constructing domain-specific vocabulary, which belongs to the field of computer technology application of natural language processing. Background technique [0002] Natural language is an information-carrying communication symbol formed by human beings in their long-term life. The meaning of this symbolic language is influenced by people's living environment, field division of labor, and work experience. As words are the basic elements of language information expression, people with common experience will splice words together to form special vocabulary in order to express an entity or behavior in a specific field. [0003] With the continuous differentiation of social division of labor, the types of fields that people are engaged in are increasing, and the number of special vocabulary produced in each field has also become huge, and the meanings of words in different fields are not the same. The cognit...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/27G06K9/62
CPCG06F40/284G06F18/214
Inventor 贺惠新
Owner 贺惠新
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products