Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Creation of structured data from plain text

a structured data and text technology, applied in the field of plain text structured data creation, can solve the problems of program2 not quite working to recognize these phrases, nml is not good, and the syntax which exploits this freedom is computationally intractable, so as to reduce the developer effort and augment the functionality available to users

Inactive Publication Date: 2008-05-29
ARIBA INC
View PDF5 Cites 40 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Benefits of technology

"The present invention provides a system, method, and architecture for receiving unstructured text, and converting it to structured data. This is done by mapping the grammatical parse of a sentence into an instance tree of application domain objects. The system is portable across different application domains and can be used for creating structured data from plain text, allowing for efficient storing in a database. The system uses a natural language interface, which users interact with by entering natural English sentences, which are then executed by the program. The system also includes a parsing algorithm and a mapping algorithm to extract object representations of the sentence. The reduced form of the NML object description is created as an instance of a Domain Markup Language (“DML”) and is passed to the application program for execution. The technical effects of the invention include improved efficiency in creating structured data from unstructured text and enhanced functionality of applications."

Problems solved by technology

However, grammars which exploit this freedom are computationally intractable.
Unlike most programming languages, however, NML isn't good for printing “hello, world”; rather, it's good for recognizing “hello, world”.
However, Program2 does not quite work to recognize these phrases.
Unlike types in programming languages, however, an object in NML has no real implementation.
The NML document produced the mapper 220 can, however, be too cumbersome for easy processing.
For such sets, the use of IDENTIFIERs is unwieldy: the NML file would be very large and in a state of constant update.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Creation of structured data from plain text
  • Creation of structured data from plain text
  • Creation of structured data from plain text

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

I. System Architecture

[0060]FIG. 1 illustrates an overview of the architecture of a system in accordance with one embodiment of the present invention. The system comprises a content engine 110, an online dictionary 120, a domain dictionary 130, a Natural Markup Language (“NML”) module 140, a vertical domain concepts module 150, a custom client specifications module 160, a grammar storage 170, and a client data module 182.

[0061]The content engine 110 receives as input plain text, parses it, and maps the parses into instance trees. As can be seen from FIG. 1, in one embodiment of the present invention, the content engine 110 receives input from both the online dictionary 120 (which includes words in a natural language), and a domain dictionary 130 (which includes terms specific to a domain).

[0062]In addition, the content engine 110 receives input from the NML module 140, which contains an NML model specific to the application or domain for which the system is being used. The applicati...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

A method and system for converting plain text into structured data. Parse trees for the plain text are generated based on the grammar of a natural language, the parse trees are mapped on to instance trees generated based on an application-specific model. The best map is chosen, and the instance tree is passing to an application for execution. The method and system can be used both for populating a database and / or for retrieving data from a database based on a query.

Description

[0001]A portion of the disclosure of this patent document contains material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.REFERENCE TO A COMPUTER PROGRAM LISTING APPENDIX[0002]A computer program listing appendix is included in the attached CD-R created on Dec. 12, 2000, labeled “Creation of Structured Data from Plain Text,” and including the following files: CommodityProperty.nml (13 KB), DefaultSeg14Result.xml, (2 KB), ElectricalProperty.nml (16 KB), Example.txt, Grammar.txt, INML.xml, (5 KB), MeasurementProperty.nml (22 KB), Output.txt, (3 KB), PeriodProperty.nml (6 KB), PhysicalProperty.nml (36 KB), ReservedNameProperty.nml (6 KB), Seg14.nml (30 KB), Seg14Phrasing.nml (71 KB), UsageProperty.nml (7 KB), and Utility.nml (6 KB). These fil...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(United States)
IPC IPC(8): G06F17/27G06F17/30G06F40/143
CPCG06F17/2229G06F17/2247G06F17/271Y10S707/99942G06F17/30569Y10S707/99943Y10S707/99936G06F17/2785G06F16/258G06F40/131G06F40/211G06F40/30G06F40/143
Inventor SALDANHA, ALEXANDERMCGEER, PATRICK C.CARIONL, LUCA
Owner ARIBA INC
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products