Domain ontology concept automatic-acquisition method based on Bootstrapping technology

A technology of domain ontology and automatic acquisition, applied in special data processing applications, instruments, electronic digital data processing, etc., can solve the problems of low frequency, omission, and ignoring compound word extraction.

Inactive Publication Date: 2012-08-01
BEIJING UNIV OF POSTS & TELECOMM
View PDF0 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] At present, the concept acquisition mainly adopts methods based on linguistics and statistics, but when using linguistic methods, it is difficult to establish and maintain rules and templates, and the portability is poor; most methods based on statistics do not consider the extraction of compound words, so Domain concepts in the form of compound words are often unable to be extracted. In addition, statistical methods generally ignore semantic factors, resulting in some domain concepts with similar semantics being missed due to their low frequency of occurrence.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Domain ontology concept automatic-acquisition method based on Bootstrapping technology
  • Domain ontology concept automatic-acquisition method based on Bootstrapping technology
  • Domain ontology concept automatic-acquisition method based on Bootstrapping technology

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] In order to make the purpose, algorithm and advantages of the present invention clearer, the present invention will be further described in detail below with reference to the accompanying drawings.

[0031] The algorithm process of automatic acquisition of domain ontology concepts based on Bootstrapping technology is as follows: figure 1 As shown, it includes three parts: compound word extraction, semantic similarity judgment and domain concept acquisition. This algorithm uses seed concept as an important concept to learn unlabeled corpus. Firstly, the corpus is divided into words and stop words are removed, and then multiple words are extracted, and their mutual information and information entropy are calculated. When the threshold condition is met, it is judged as a compound word and added to the word set. Then select candidate concepts according to the selection conditions of candidate concepts, comprehensively apply M evaluation and T evaluation for evaluation, and ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

In order to realize the automatic acquisition of domain concepts, the invention provides a domain ontology concept automatic-acquisition method based on a Bootstrapping technology. The method comprises the following steps of: first, in order to solve the problem that ontology concept in a composite form cannot be extracted, extracting compound words based on mutual information and left and right information entropies; then extracting candidate domain concepts based on the decision condition of the candidate concepts with co-occurring sentence frequency; evaluating the candidate concepts by applying a method for combining M evaluation with T evaluation, extracting the domain concept conforming to evaluation criterions, and choosing the domain concept with a higher evaluating value as a key concept for a new round of learning process; in order to avoid missing the domain concepts with low frequency of occurrence and similar meanings, introducing a semantic factor, and extracting the domain concepts with similar meanings by calculating semantic similarity; and finally, providing a detailed algorithm realization flow.

Description

technical field [0001] The invention belongs to a method for automatically acquiring domain concepts, in particular to a method for automatically acquiring domain ontology concepts based on Bootstrapping technology. This method improves the existing methods, can extract compound domain concepts and domain concepts with similar semantics, and improves the recall rate and accuracy rate of automatic acquisition. Background technique [0002] Ontology is a knowledge representation method used to describe concepts and the relationship between concepts. Since it was proposed, it has attracted extensive attention from many researchers at home and abroad. It has been applied to many fields such as semantic Web, intelligent information retrieval, and information integration. , the construction of domain ontology is the basis of these studies. However, many ontologies currently rely heavily on domain experts to build them. This method of building ontologies that relies entirely on ma...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27
Inventor 杜军平杨月华李雪
Owner BEIJING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products