Unlock instant, AI-driven research and patent intelligence for your innovation.

New word discovery methods for specific domains

A new word discovery and field-specific technology, applied in the field of new word discovery and text mining, can solve problems such as high algorithm complexity, incomplete recognition of new words, and difficulty in fully recognizing new words

Inactive Publication Date: 2018-06-29
NAT COMP NETWORK & INFORMATION SECURITY MANAGEMENT CENT
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Although a small amount of new word discovery algorithms have occurred in the prior art, the existing new word discovery algorithms generally have the problem of high algorithm complexity, and are difficult to identify new words quickly and accurately; they also have the problem of incomplete new word recognition. problem, it is difficult to comprehensively identify all new words contained in the analyzed documents

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • New word discovery methods for specific domains
  • New word discovery methods for specific domains

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0029] The present invention is described in detail below in conjunction with accompanying drawing:

[0030] With the continuous development of the field of Chinese word segmentation, two concepts have emerged: new words and unregistered words. Although there is a distinction between the two, unregistered words refer to words that have not been included in the dictionary, and new words are words with new forms, meanings and usages. New words are also words that do not appear in the dictionary and are also unregistered. words, but the new words have broader meanings. After statistical arrangement, new words can be divided into the following five categories:

[0031] (1) Acronyms refer to words that use certain words in a longer vocabulary to replace the entire vocabulary, which are divided into Chinese abbreviations and English abbreviations. For example, "China National Petroleum Corporation" is abbreviated as "PetroChina", and "General Manager" is abbreviated as "GM";

[0...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a new word finding method aiming at the specific field. The new word finding method comprises the following steps: S1, preprocessing a document; S2, constructing a candidate new word set, wherein each candidate new word is formed by a word, a distance vector value from the word to a center word, and expression of the center word in a new word expression manner; S3, mining the candidate new words. The new word finding method has the advantage that the new word finding method aiming at the specific field adopts a more flexible new word expression mode, introduces an association rule method in the field of data mining into the new word finding process and innovatively proposes that a distance vector of each word and a designated keyword is used as an important feature of association rule mining, so that all new words included in the document can be rapidly, accurately and comprehensively identified.

Description

technical field [0001] The invention belongs to the technical field of new word discovery and text mining, and in particular relates to a new word discovery method for a specific field. Background technique [0002] With the rapid development of information technology and Internet technology, the network is flooded with all kinds of information, showing an exponential growth trend. In various professional fields, Internet information has also exploded and grown. [0003] In the above-mentioned process of network information growth, new words are constantly emerging, which is of great significance to the discovery of Chinese new words, especially in specific fields: On the one hand, the large number and rapid emergence of new words seriously affect Chinese word segmentation. The quality of the results leads to more unrecognizable "words" in the word segmentation results, which greatly reduces the accuracy of the word segmentation results. Recent research also shows that 60% ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/27
CPCG06F40/216G06F40/284
Inventor 王卿吴琼程工杜漫庞琳李雄刘春阳张旭
Owner NAT COMP NETWORK & INFORMATION SECURITY MANAGEMENT CENT