Distant supervision relationship extractor

A technology of relations and facts, applied in the field of remote supervised relation extractors, which can solve the problems of unscalable, collected, expensive, etc.

Inactive Publication Date: 2017-08-18
MICROSOFT TECH LICENSING LLC
View PDF9 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Adding more patterns can reduce the number of missed expressions and may lead to collection of irrelevant data
Ultimately, while careful pattern matching may improve, creating patterns is time-consuming, expensive, and not scalable
[0003] Natural language processing using statistical models is not bound by a particular paradigm, but building good models requires lots of properly annotated training data
Manually annotating large datasets to build high-precision models is time-consuming and expensive

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Distant supervision relationship extractor
  • Distant supervision relationship extractor
  • Distant supervision relationship extractor

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0018] Aspects of the invention are described more fully hereinafter with reference to the accompanying drawings which form a part hereof and which illustrate certain exemplary aspects of the invention. However, this invention may be embodied in many different forms and should not be construed as limited to the aspects set forth herein; rather, these aspects are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the aspects to this present invention. technical personnel in the field. Aspects may be practiced as methods, systems or devices. Therefore, the embodiment can be practiced using hardware, software, or a combination of hardware and software. Therefore, the following detailed description should not be read in a limiting sense.

[0019] Aspects of a relation extractor and accompanying methods are described herein and illustrated in the accompanying drawings. The relation extractor will interactively combine machine learn...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

Automatic relationship extraction is provided. A machine learning approach using statistical entity-type prediction and relationship predication models built from large unlabeled datasets is interactively combined with minimal human intervention and a light pattern-based approach to extract relationships from unstructured, semi-structured, and structured documents. Training data is collected from a collection of unlabeled documents by matching ground truths for a known entity from existing fact databases with text in the documents describing the known entity and corresponding models are built for one or more relationship types. For a modeled relationship-type, text chunks of interest are found in a document. A machine learning classifier predicts the probability that one of the text chunks is the entity being sought. The combined machine learning and light pattern-based approach provides both improved recall and high precision through filtering and allows constraining and normalization of the extracted relationships.

Description

Background technique [0001] Populating a database of facts describing the relationships between entities and their attributes typically requires the aggregation of a lot of information with a high level of precision. Populating large fact databases manually is time-consuming, expensive, and often impractical. Automatically populating fact databases can also be a challenge due to the difficulty in extracting data with the necessary precision from varying sources of structured, semi-structured and unstructured information using inconsistent language, units and formats without human supervision. time consuming. Conventional automatic fact extraction techniques include pattern matching and natural language processing. [0002] Pattern matching typically uses hand-crafted and hard-coded regular expressions and / or specific rules that rely on information being expressed in the same order using the same words. Without a comprehensive schema set, many expressions of relationships ca...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06F17/30G06N20/00
CPCG06F16/313G06F16/36G06F40/289G06N20/00G06N7/01
Inventor A·夏尔马张见闻S·阿罗尼超柳元沇汪瑜婧
Owner MICROSOFT TECH LICENSING LLC
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products