A deepdive-based domain text knowledge extraction method

A knowledge extraction and text technology, applied in text database query, unstructured text data retrieval, instruments, etc., can solve the problems of difficulty and lack of data utilization, achieve strong practicability and flexibility, and reduce costs.

Active Publication Date: 2019-09-20
ZHEJIANG UNIV
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

For structured and semi-structured data, there are already a lot of tools that can help us transform into knowledge in the knowledge base, but most of the data sources are currently unstructured, including data data, dialogue data, etc., for this There is a lack of automatic knowledge extraction methods for a class of Chinese data, which makes data utilization very difficult. There is an urgent need for a domain text knowledge extraction method to make up for this lack

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A deepdive-based domain text knowledge extraction method
  • A deepdive-based domain text knowledge extraction method
  • A deepdive-based domain text knowledge extraction method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0033] In order to describe the present invention more specifically, the technical solutions of the present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0034] This example requires the analysis of financial announcement data to extract the knowledge of equity changes in the financial sector, so as to build a corresponding company equity knowledge base. The construction method of the overall corresponding corporate equity knowledge base is as follows: figure 1 Shown:

[0035] S01, obtain the corresponding financial announcement data, convert it into txt text content through a series of tools, and use the jieba tool to segment the announcement data, and use Stanford's core NLP tool to perform part-of-speech tagging and named entity tagging on the word-segmented announcement data and syntax-dependent processing to obtain the preprocessed announcement data, figure 2 Shown is a schematic diagram of the res...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a Deepdive-based field text knowledge extracting method comprising the steps of: (1) acquiring original texts required by a knowledge base construction system and performing pretreatment on the texts; (2) performing entity connection on the pre-treated texts, finding out target entities corresponding to a preset specific relation, generating entity-relation-entity triads and forming a candidate relation-entity pair set; (3) learning and labeling a plurality of candidate relation-entity pairs by using a weak supervising method and generating training samples of a Deepdive tool; (4) inputting the training samples into the Deepdive tool to train Deepdive, and outputting candidate relation-entity pairs with probability values greater than a threshold value to form an extracted knowledge base. The method can complete the work of construction of a field knowledge base, has great expandability and is of high practical value for utilization and extraction of unstructured data.

Description

technical field [0001] The invention relates to computer natural language processing technology, and specifically designs a method for extracting domain text knowledge based on Deepdive. Background technique [0002] The construction of knowledge base has practical significance and application prospect in reality. The daily operation of Apple's Siri and Microsoft's Cortana is based on a large knowledge base, and quickly returns correct answers to users' questions. However, in some vertical fields, such as customer service, finance, chat robots, etc., there is a lack of knowledge bases for specific relationships, or lack of knowledge bases with complete information and timely content updates. If the knowledge base can be automatically constructed for a specific field and some specific relationships, and achieve high accuracy, it can effectively reduce the manpower and time costs in knowledge base construction, and provide more downstream applications. good service. [0003...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/33G06F16/36G06F17/27G06N99/00
CPCG06F16/3344G06F16/367G06F40/253G06N20/00
Inventor 陈华钧陈曦张宁豫吴朝晖
Owner ZHEJIANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products