Subject area-oriented method for recognizing new specialized vocabulary

A technology of professional vocabulary and recognition methods, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problem of no fixed formation rules, unfavorable low-frequency new words and long-length new words discovery, and test dependence of results And other issues

Inactive Publication Date: 2011-01-19
HUAZHONG NORMAL UNIV
View PDF3 Cites 34 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] At present, due to the rapid emergence of new words, the flexibility of forms, and the lack of fixed formation rules, there is currently no authoritative standard to judge whether a word is a new word, so the test of the results largely relies on artificial intelligence. empirical judgment
Among the commonly used methods, the statistical method will be affected by the data sparsity problem, which is not conducive to the discovery of low-frequency new words and long-length new words

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Subject area-oriented method for recognizing new specialized vocabulary
  • Subject area-oriented method for recognizing new specialized vocabulary
  • Subject area-oriented method for recognizing new specialized vocabulary

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0038] The present invention will be further described below in conjunction with the accompanying drawings and embodiments.

[0039] Such as figure 1Shown is the basic principle diagram of the present invention. After the initial document has undergone text preprocessing, new word string statistics, junk word string filtering, and result sorting, the new words found in the text will be output, in which a word segmentation system, one or more rule bases are used, and the relevance degree in the vocabulary field is In the calculation part, the already constructed domain vocabulary is used, and when calculating the sequence value of new words to sort the results, new words discovered are also used to enrich the general dictionary of the word segmentation system. The core algorithm of the present invention is used in the statistical part of candidate new word strings, and at the same time, factors such as part of speech, word-forming ability and word-forming mode of words are ful...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention belongs to the fields of computer application and natural language processing and provides a subject area-oriented method for recognizing a new specialized vocabulary. The principle of the method is that: an initial document is subjected to the steps of text preprocessing, new word string counting, garbage word string filtration, result sorting and the like, and new words found from the document are output. Specialized terms of the subject area can be well found and results are sorted, and therefore the method overcomes the shortcomings of the conventional algorithm and is more favorable for mastering the development trend and the core value of the subject.

Description

technical field [0001] The invention belongs to the fields of computer application and natural language processing, and in particular relates to a new professional vocabulary recognition method facing the subject field. Background technique [0002] Different from the general vocabulary, the professional vocabulary oriented to the subject field has distinct domain characteristics, usually consists of basic roots, professional basic vocabulary, etc., and is a vocabulary developed on the basis of the general vocabulary. The new professional vocabulary refers to unregistered words and new words in the professional field. Unregistered words are defined as words that do not appear in the dictionary, usually including abbreviations, proper nouns, derivatives, compound words, numerical compound words, etc. New words are also words that do not appear in the dictionary. They are unregistered words and contain two layers of meaning: words that have new forms, new meanings, or new usa...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/27
Inventor 刘清堂黄涛刘瑶瑶黄焕吴林静
Owner HUAZHONG NORMAL UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products