Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method and system for extracting information from unstructured text using symbolic machine learning

Inactive Publication Date: 2006-01-12
IBM CORP
View PDF17 Cites 35 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Benefits of technology

[0024] It is another exemplary feature of the present invention to provide a method that allows a user with no special knowledge of linguistics to dynamically define patterns on the basis of a small number of example sentences or pseudo-examples in which the user has marked those named entity mentions that are involved in a relation instance. The defined patterns can then be used to identify relation instances in hitherto unseen sentences with high precision.
[0030] Thus, the present invention provides an improved method for relational learning in which a non-specialist can intuitively use the tool that embodies this method to develop a PI pattern template to be used for comparison with unseen text.

Problems solved by technology

Extracting relational information from text is an important and unsolved problem in the area of Unstructured Information Management.
Manual approaches are very costly to develop, since they require experts in computational linguistics or related disciplines to develop formal grammars or special purpose programs.
Non-specialists cannot customize manual systems for new domains, tasks or languages.
Statistical methods are quite popular, but they suffer from the problem of labeling sufficient data accurately for training a model.
This is a major problem for such approaches.
There are currently no adequate solutions to the problem of trainable relation extraction systems, especially no adequate systems that can be used by non-specialists.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for extracting information from unstructured text using symbolic machine learning
  • Method and system for extracting information from unstructured text using symbolic machine learning
  • Method and system for extracting information from unstructured text using symbolic machine learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0044] Referring now to the drawings, and more particularly to FIGS. 1-12, exemplary embodiments of the present invention will now be described.

[0045] Machine learning approaches have the advantage that they require only labeled examples of the information sought. Much recent work on relational learning has been statistical. One such approach that reflects the state of the art for statistical methods is “Kernel Methods for Relation Extraction” by D. Zelenko, C. Aone, and A. Richardella, where the learning is of a function measuring similarity between shallow parses of examples. Statistical methods, in particular, need to have a large amount of labeled training data before anything useful can be done. This is a major problem for statistical approaches.

[0046] Work in another vein has concerned various attempts to accomplish relational learning by using heuristics to learn finite state recognizers or regular expressions, as exemplified by “Learning Information Extraction Rules for Se...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

A method (and structure) of extracting information from text, includes parsing an input sample of text to form a parse tree and using user inputs to define a machine-labeled learning pattern from the parse tree.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS [0001] The present Application is related to U.S. Provisional Patent Application No. 60 / 586,877, filed on Jul. 12, 2004, to Johnson et al., entitled “System and Method for Extracting Information from Unstructured Text Using Symbolic Machine Learning”, having IBM Docket YOR920040239US1, assigned to the present assignee, and incorporated herein by reference.BACKGROUND OF THE INVENTION [0002] 1. Field of the Invention [0003] The present invention generally relates to extracting information from text. More specifically, in a relational learning system, a pattern learner module receives a small number of learning samples defined by user interactions in relational pattern templates format wherein elements are defined in a precedence relation and in an inclusion relation, and calculates a minimal most specific generalization (MMSG) for these samples so that information matching the generalized template can then be extracted from unseen text. [0004] 2...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/21G06F40/00
CPCG06F17/2705G06F40/205
Inventor JOHNSON, DAVID E.OLES, FRANK J.
Owner IBM CORP
Features
  • Generate Ideas
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More