XML Database Mixed Structural-Textual Classification System

a classification system and database technology, applied in the field of xml database mixed structural textual classification system, can solve the problems of inability to properly analyze certain data, often needed data quality,

Inactive Publication Date: 2007-06-14
MARKLOGIC CORP
View PDF82 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

One problem with data analysis is that qualities of data often need to be determined for classification, comparison or other analytical purposes.
While many comparison and similarity measuring techniques have been developed, most are unsuitable to properly analyze certain data, such as structured text as might be found in an XML document

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • XML Database Mixed Structural-Textual Classification System
  • XML Database Mixed Structural-Textual Classification System
  • XML Database Mixed Structural-Textual Classification System

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0037] This detailed description illustrates some embodiments of the invention and variations thereof, but should not be taken as a limitation on the scope of the invention. In this description, structured documents are described, along with their processing, storage and use, with XML being the primary example. However, it should be understood that the invention might find applicability in systems other than XML systems, whether they are later-developed evolutions of XML or entirely different approaches to structuring data.

Overview

[0038] Systems for generating and managing XML databases are described in Lindblad I-A. The nodes may be of any type, such as element nodes, attribute nodes, text nodes, processing instruction nodes or comment nodes. The notation u(n) is used herein to indicate an update operation u applied to the node n. “Elements” are generally understood in the context of XML documents, but would also apply where the data being manipulated is other than XML documents...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

One aspect of the present invention is a system for classifying element nodes in a subtree-structured XML database. The XQE structural-textual classification system is sensitive to both the textual resemblance between document elements as well as the structural resemblance between document elements. The XQE structural-textual classification system might use the XQE parent-child index described in Lindblad II-A for the purpose of forming vectors of “terms” which encode both the structural and the textual content of XML elements. The element vectors are processed by a classifier to create class prototype vectors which can be used to classify elements as they are added to the database.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS [0001] This application is a continuation of U.S. Patent No. ______ (filed as U.S. patent application Ser. No. 10 / 461,935 (Attorney Docket No. 021512-000410) which claims the benefit of U.S. Provisional Application No. 60 / 388,714, filed Jun. 13, 2002, entitled “XML DATABASE MIXED STRUCTURAL-TEXTUAL CLASSIFICATION SYSTEM,” the entire disclosures of these applications are incorporated herein by reference for all purposes.[0002] The present disclosure is related to the following commonly assigned co-pending U.S. Patent Applications: [0003] Application Ser. No. 10 / 462,100 (Attorney Docket No. 021512 000110 US, filed on the same date as the present application, entitled “A SUBTREE STRUCTURED XML DATABASE” (hereinafter “Lindblad II-A”); [0004] Application Ser. No. 10 / 462,019 (Attorney Docket No. 021512 000210US, filed on the same date as the present application, entitled “PARENT-CHILD QUERY INDEXING FOR XML DATABASES” (hereinafter “Lindblad II-A”); ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(United States)
IPC IPC(8): G06F17/30G06F7/00
CPCG06F17/3071G06F17/30923G06F17/30929Y10S707/99943G06F16/355G06F16/835G06F16/83
Inventor LINDBLAD, CHRISTOPHERPEDERSEN, PAUL
Owner MARKLOGIC CORP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products