Domain concept extraction method for open texts

A field and concept technology, applied in special data processing applications, knowledge expression, instruments, etc., can solve problems such as the recognition accuracy and recall rate need to be improved, and achieve the effect of improving the accuracy and recall rate

Inactive Publication Date: 2016-06-15
INST OF COMPUTING TECH CHINESE ACAD OF SCI
View PDF4 Cites 18 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, whether it is based on traditional statistical methods or statistical methods based on machine learning, the recognition accuracy and recall rate need to be improved.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Domain concept extraction method for open texts
  • Domain concept extraction method for open texts
  • Domain concept extraction method for open texts

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0039] As mentioned above, the accuracy and recall of existing domain concept extraction methods for open text need to be improved. The inventor made an in-depth analysis on this and found that: on the one hand, in the existing domain concept extraction schemes, usually only the literal features of the candidate domain concepts themselves are considered, and their contextual features are not considered, so it is difficult to introduce The influence caused by contextual information leads to poor performance in practical applications. On the other hand, in existing recognition schemes, word frequency is often used as an important basis for recognition. However, in many fields, some important domain concepts do not appear frequently in open texts. This leads to the possibility of ignoring low-frequency domain concepts that are actually important during domain concept extraction. Based on this, on the one hand, the inventor introduces the context features of candidate domain conc...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a domain concept extraction method for open texts. The method includes the steps of firstly, traversing an open text set, and extracting candidate domain concepts from all the open texts; secondly, obtaining the word vector associated with the corresponding candidate domain concept for each candidate domain concept through the phrase resolution result, contextual information and encyclopedia classification information of the candidate domain concept, and using all words in the word vector as domain labels associated with the candidate domain concept; thirdly, establishing a candidate domain concept set A through all the candidate domain concepts obtained the first step, establishing a domain label set B through the domain labels obtained in the second step, and conducting iterative computation through the HITS algorithm to obtain the domain relevancy of all the candidate domain concepts; fourthly, judging the domain concepts through the domain relevancy of all the candidate domain concepts. By means of the method, accuracy and the recall rate can be increased, and the important low-frequency concepts can be better identified.

Description

technical field [0001] The invention relates to the technical field of domain knowledge base construction, in particular, the invention relates to an open text-oriented domain concept extraction method. Background technique [0002] The world has entered the era of networked big data. Networked big data is huge in quantity, complex in form, and low in density. If you want to fully tap the huge value contained in it, you need to organize these data in the form of a knowledge base. The knowledge base is divided into general knowledge base and domain knowledge base. The domain knowledge base focuses on the depth of knowledge and reflects domain concepts and their relationships. Domain concept is a manifestation of domain knowledge, which is an abstract description of specific things in the process of human cognition. The domain concept recognition of open text mainly focuses on how to use computers to automatically or semi-automatically obtain the above domain concepts from m...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06N5/02
CPCG06F40/279G06N5/022
Inventor 贾岩涛陈新蕾王元卓徐君程学旗
Owner INST OF COMPUTING TECH CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products