Creation of structured data from plain text

a structured data and text technology, applied in the field of plain text structured data creation, can solve the problems of program2 not quite working to recognize these phrases, nml is not good, and the syntax which exploits this freedom is computationally intractable, so as to reduce the developer effort and augment the functionality available to users

Inactive Publication Date: 2008-05-29
ARIBA INC
View PDF5 Cites 40 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Benefits of technology

[0040]A system in accordance with the present invention can be used for creating structured data from plain text, to allow for the efficient storing this structured data in a database. For example, from the free text description of a number of products, the structured data (which could be an extracted object and its attributes) can be used to create individual entries in a product database, and thus create content for an ecommerce website or web market. Alternately, or in addition, such a system can be used for creating structured data from a plain text query, for using this structured data to retrieve relevant data from a database. For example, a user's free text query can be converted to a database query that corresponds to the objects of the database and their attributes. Such a system overcomes the limitations of conventional search engines by accepting free form text, and mapping it accurately into a structured search query.
[0043]In one embodiment, the present invention transforms an English sentence into a set of software objects that are subsequently passed to the given application for execution. One of the advantages of this approach is the ability to attach a natural language interface to any software application with minimal developer effort. The objects of the application domain are captured, in one embodiment, by using the Natural Markup Language (“NML”). The resulting interface is robust and intuitive, as the user now interacts with an application by entering normal English sentences, which are then executed by the program. In addition, an application enhanced with the present invention significantly augments the functionality available to a user.

Problems solved by technology

However, grammars which exploit this freedom are computationally intractable.
Unlike most programming languages, however, NML isn't good for printing “hello, world”; rather, it's good for recognizing “hello, world”.
However, Program2 does not quite work to recognize these phrases.
Unlike types in programming languages, however, an object in NML has no real implementation.
The NML document produced the mapper 220 can, however, be too cumbersome for easy processing.
For such sets, the use of IDENTIFIERs is unwieldy: the NML file would be very large and in a state of constant update.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Creation of structured data from plain text
  • Creation of structured data from plain text
  • Creation of structured data from plain text

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

I. System Architecture

[0060]FIG. 1 illustrates an overview of the architecture of a system in accordance with one embodiment of the present invention. The system comprises a content engine 110, an online dictionary 120, a domain dictionary 130, a Natural Markup Language (“NML”) module 140, a vertical domain concepts module 150, a custom client specifications module 160, a grammar storage 170, and a client data module 182.

[0061]The content engine 110 receives as input plain text, parses it, and maps the parses into instance trees. As can be seen from FIG. 1, in one embodiment of the present invention, the content engine 110 receives input from both the online dictionary 120 (which includes words in a natural language), and a domain dictionary 130 (which includes terms specific to a domain).

[0062]In addition, the content engine 110 receives input from the NML module 140, which contains an NML model specific to the application or domain for which the system is being used. The applicati...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

A method and system for converting plain text into structured data. Parse trees for the plain text are generated based on the grammar of a natural language, the parse trees are mapped on to instance trees generated based on an application-specific model. The best map is chosen, and the instance tree is passing to an application for execution. The method and system can be used both for populating a database and / or for retrieving data from a database based on a query.

Description

[0001]A portion of the disclosure of this patent document contains material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.REFERENCE TO A COMPUTER PROGRAM LISTING APPENDIX[0002]A computer program listing appendix is included in the attached CD-R created on Dec. 12, 2000, labeled “Creation of Structured Data from Plain Text,” and including the following files: CommodityProperty.nml (13 KB), DefaultSeg14Result.xml, (2 KB), ElectricalProperty.nml (16 KB), Example.txt, Grammar.txt, INML.xml, (5 KB), MeasurementProperty.nml (22 KB), Output.txt, (3 KB), PeriodProperty.nml (6 KB), PhysicalProperty.nml (36 KB), ReservedNameProperty.nml (6 KB), Seg14.nml (30 KB), Seg14Phrasing.nml (71 KB), UsageProperty.nml (7 KB), and Utility.nml (6 KB). These fil...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(United States)
IPC IPC(8): G06F17/27G06F17/30G06F40/143
CPCG06F17/2229G06F17/2247G06F17/271Y10S707/99942G06F17/30569Y10S707/99943Y10S707/99936G06F17/2785G06F16/258G06F40/131G06F40/211G06F40/30G06F40/143
Inventor SALDANHA, ALEXANDERMCGEER, PATRICK C.CARIONL, LUCA
Owner ARIBA INC
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products