Subject term extraction method and system based on sequence labeling model

A technology of sequence labeling and discipline, applied in instrumentation, computing, electrical digital data processing, etc., can solve the problems of not adapting to various languages, not considering term denoising, not considering term context information, etc.

Active Publication Date: 2015-07-22
明博教育科技股份有限公司 +1
View PDF4 Cites 22 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] These extraction methods mentioned above save a lot of labor costs, but these methods do not consider the context information of the term occurrence, and require a large amount of data to train the extractor, and some emerging fields lack relevant corpus, so the extraction results are relatively poor; at the same time, these The methods are all done on the basis of English and are not suitable for various languages; finally, after extracting new subject terms, no further term denoising is considered, and there are still some irrelevant words in the terms

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Subject term extraction method and system based on sequence labeling model
  • Subject term extraction method and system based on sequence labeling model
  • Subject term extraction method and system based on sequence labeling model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0191] In this embodiment, the English grammar knowledge of junior high school students is taken as a specific subject area, and the purpose is to extract subject terms in the corpus of the subject area, and to improve the existing knowledge system structure related to the English grammar knowledge of junior high school students.

[0192] image 3It shows an existing knowledge system structure of junior high school English grammar in this embodiment, as can be seen from the figure, even if said existing knowledge system structure reflects the knowledge points (subject terms in this embodiment) and knowledge The knowledge structure tree of the hierarchical relationship between points, such as the subject clause, predicative clause and appositive clause are at the same level in the architecture, the noun clause is the first-level parent node of the three, and the syntactic knowledge is the second-level parent node.

[0193] The steps of extracting subject terms in this field by ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a subject term extraction method and system based on a sequence labeling model, and belongs to the technical field of data extraction. The method includes the steps that firstly, labeling and class label setting are performed on subject terms in training linguistic data to obtain a labeling sequence, a subject term extraction model is obtained through training with the training linguistic data serving as an observation sequence and the labeling sequence serving as a state sequence, and the subject terms in the linguistic data to be extracted are preliminarily extracted with the model serving as an extractor; then, preliminary extraction results are screened according to the similarity between the subject terms to obtain the true subject terms belonging to corresponding subject fields. According to the extraction method and system, when the subject terms are extracted, by performing labeling on the subject terms in a small quantity of training linguistic data, rapid and accurate extraction of the subject terms in the linguistic data is achieved, meanwhile, existing knowledge hierarchy structures of the subject fields can be gradually improved, and the defects of a traditional subject term extraction method are overcome.

Description

technical field [0001] The invention relates to the technical field of data extraction, in particular to a method and system for extracting subject terms based on a sequence labeling model. Background technique [0002] Subject terminology is the most basic unit in a field. In order to better describe the knowledge system in this field, all subject terms in this field need to be obtained. With the development of the Internet and information technology, knowledge in the same field is expanding rapidly, new fields are emerging one after another, and the extraction and application of subject terms are getting more and more attention. For example, many online education companies (such as Mingbo Education, etc.) Using the corpus that the user is reading, the subject terms in the corpus will be marked for the user, and relevant educational resources can be recommended for the user based on these subject terms, which better meets the needs of the user. With the continuous increase...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/27
Inventor 杨硕高飞冯岩松贾爱霞赵东岩卢作伟王冬
Owner 明博教育科技股份有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products