Unlock instant, AI-driven research and patent intelligence for your innovation.

Field-specific vocabulary discovery and classifier training method and device

A technology with a specific field and a discovery method, applied in text database clustering/classification, instrumentation, unstructured text data retrieval, etc., can solve problems such as inability to capture semantic information, complex CRF and HMM models, and difficulty in threshold selection

Active Publication Date: 2022-04-08
BEIJING UNIV OF POSTS & TELECOMM +2
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The unsupervised word formation scheme based entirely on mutual information PMI, although the algorithm is simple and efficient, has two serious problems, one is the difficulty of threshold selection, and the other is that basic statistics cannot capture semantic information
[0005] The supervised new word discovery scheme mainly relies on two models of CRF and HMM. The effect of this scheme depends heavily on the training data, and the CRF and HMM models are relatively complicated.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Field-specific vocabulary discovery and classifier training method and device
  • Field-specific vocabulary discovery and classifier training method and device
  • Field-specific vocabulary discovery and classifier training method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0047] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be described in further detail below in conjunction with specific embodiments and with reference to the accompanying drawings.

[0048] Embodiments of the present invention are described in detail below, examples of which are shown in the drawings, wherein the same or similar reference numerals designate the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary only for explaining the present invention and should not be construed as limiting the present invention.

[0049] Those skilled in the art will understand that unless otherwise stated, the singular forms "a", "an", "said" and "the" used herein may also include plural forms. It should be further understood that when an element is referred to as being "connected" or "coupled" to another el...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method and device for discovering vocabulary in a specific field and training a classifier. The method includes: after segmenting a text to obtain a number of character string fragments, selecting word string fragments; using the pre-trained A classifier of a specific domain vocabulary, which classifies a specific domain vocabulary and a general domain vocabulary from the word-forming string fragments; wherein, the classifier is based on the word-forming characteristics of the vocabulary in the general vocabulary and the specific domain vocabulary vectors and domain-specific feature vectors, trained using a logistic regression model. By applying the present invention, a complete, simple and highly efficient scheme for discovering vocabulary in a specific field and training a classifier can be constructed.

Description

technical field [0001] The invention relates to the technical field of finding new words, in particular to a method and device for finding vocabulary in a specific field and training a classifier. Background technique [0002] With the development of information technology, electronic documents in various fields have become increasingly abundant. In various professional fields, the number of documents in the document database has shown an exponential growth trend, and the difficulty of processing these document information has become more and more difficult. Chinese has a special organizational structure, and its information processing has higher requirements for word segmentation technology, so Chinese documents are more difficult to process than English documents. How to find out the new professional vocabulary in a specific field of Chinese grammatical features plays an extremely important role in information retrieval. The specific field may be the financial field, the ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/35G06F40/289
CPCG06F40/289
Inventor 熊永平邓春宇伍贵宾季知祥史梦洁陈睿陈立斌王冠群王頔朱承治孙黎滢谷纪亭
Owner BEIJING UNIV OF POSTS & TELECOMM