Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

825results about "Semi-structured data indexing" patented technology

Visual and interactive wrapper generation, automated information extraction from web pages, and translation into xml

A method and a system for information extraction from Web pages formatted with markup languages such as HTML [8]. A method and system for interactively and visually describing information patterns of interest based on visualized sample Web pages [5,6,16-29]. A method and data structure for representing and storing these patterns [1]. A method and system for extracting information corresponding to a set of previously defined patterns from Web pages [2], and a method for transforming the extracted data into XML is described. Each pattern is defined via the (interactive) specification of one or more filters. Two or more filters for the same pattern contribute disjunctively to the pattern definition [3], that is, an actual pattern describes the set of all targets specified by any of its filters. A method and for extracting relevant elements from Web pages by interpreting and executing a previously defined wrapper program of the above form on an input Web page [9-14] and producing as output the extracted elements represented in a suitable data structure. A method and system for automatically translating said output into XML format by exploiting the hierarchical structure of the patterns and by using pattern names as XML tags is described.
Owner:LIXTO SOFTWARE

Network based classified information systems

A system for automatically creating databases containing industry, service, product and subject classification data, contact data, geographic location data (CCG-data) and links to web pages from HTML, XML or SGML encoded web pages posted on computer networks such as the Internet or Intranets. The web pages containing HTML, XML or SGML encoded CCG-data, database update controls and web browser display controls are created and modified by using simple text editors, HTML, XML or SGML editors or purpose built editors. The CCG databases may be searched for references (URLs) to web pages by use of enquiries which reference one or more of the items of the CCG-data. Alternatively, enquiries referencing the CCG-data in the databases may supply contact data without web page references. Data duplication and coordination is reduced by including in the web page CCG-data display controls which are used by web browsers to format for display the same data that is used to automatically update the databases.
Owner:HANGER SOLUTIONS LLC +1

Indexing XML datatype content system and method

Storing and querying XML data in a primary table or document utilizes an index of XML data and includes creating a primary table structure, creating a primary XML index commensurate with the primary table structure, populating the primary table and the primary XML index, and running a query on the XML data in a primary table by utilizing the XML index. The XML index can be implemented as a node table. The node table may have a B+-tree structure and be populated by shredding the XML values in the primary table. The XML data may be stored as binary large objects in an XML column of the primary table. Secondary XML indexes may be created to assist in the search and retrieval of XML data stored in the primary table. Both the primary XML index and the secondary XML index tables may be created using data definition language statements.
Owner:MICROSOFT TECH LICENSING LLC

Data collection system having reconfigurable data collection terminal

There is provided in one embodiment a data collection system including a data collection terminal having an encoded information reader device and a computer spaced apart from the data collection terminal. The data collection terminal in one embodiment can be configured to be responsive to configuration data expressed in an extensible markup language. The computer in one embodiment can use an existing extensible markup language document to create a data entry screen to received desired parameter settings for the data collection terminal within data entry fields of the data entry screen. The computer can further combine the extensible markup language document with the desired parameter settings to create configuration data and can initiate a transfer of the configuration data to the data collection terminal. The computer in one embodiment can be used to create for transfer to the data collection terminal a data package including file data corresponding to one or more selected files, together with additional data. The system provided can be used to transfer data, including but not limited to configuration data, between computers that are not data collection terminals and which are devoid of encoded information reader devices.
Owner:HAND HELD PRODS

Stubbing systems and methods in a data replication environment

Stubbing systems and methods are provided for intelligent data management in a replication environment, such as by reducing the space occupied by replication data on a destination system. In certain examples, stub files or like objects replace migrated, de-duplicated or otherwise copied data that has been moved from the destination system to secondary storage. Access is further provided to the replication data in a manner that is transparent to the user and / or without substantially impacting the base replication process. In order to distinguish stub files representing migrated replication data from replicated stub files, priority tags or like identifiers can be used. Thus, when accessing a stub file on the destination system, such as to modify replication dath or perform a restore process, the tagged stub files can be used to recall archived data prior to performing the requested operation so that an accurate copy of the source data is generated.
Owner:COMMVAULT SYST INC

Mapping and query service between object oriented programming objects and deep key-value data stores

A mapping and query service for mapping between object-oriented programming objects and deep key-value data stores. The service to implement a store operation for a mapping and query service that supports the storage of a set of one or more objects having classes and fields written in source code of an object-oriented programming language in a deep key-value data store.
Owner:SALESFORCE COM INC

Method for scalable, fast normalization of XML documents for insertion of data into a relational database

Disclosed is a method of transferring data from a hierarchical file (having a hierarchical structure, e.g., a markup language file) to a relational database structure (made up of columns and rows. Before processing the actual data, the invention first partitions the hierarchical structure into sections, where each section is dedicated to at least one node of the hierarchical structure. The partitioning process is based on the document type definition file, which is separate from, and different than the hierarchical file. After completing the partitioning, the invention then parses the actual data contained in the hierarchical data file to produce a stream of data pairs and end of section indicators. During the data parsing process, the invention loads the data pairs into corresponding “sections” (created prior to the parsing process) as the data pairs are output from the parsing process. The invention also transfers the node data from these sections to the columns and rows of the relational database structure.
Owner:IBM CORP

Parent-child query indexing for XML databases

A method for processing queries for a document of elements is provided. The document includes a plurality of subsections where each subsection includes at least a portion of elements in the document. The method comprises: receiving a query for a npath of elements in the document of elements; determining a plurality of step queries from the query, each step query including at least a part of the path of elements; for each step query in the plurality of step queries, determining one or more subsections that include elements that correspond to a step query; and determining at least one subsection that includes the path of elements of the query. A result for the query is generated using the at least one subsection.
Owner:MARKLOGIC

Index structure of metadata, method for providing indices of metadata, and metadata searching method and apparatus using the indices of metadata

An index structure of metadata provided for searching for information on contents and a method for providing indices of the metadata, and a method and an apparatus for searching for the metadata using the index structure of the metadata are provided, in which the index structure of the metadata includes values of multi-keys and identification information of the metadata corresponding to the value of the multi-key, wherein the multi-keys are structured by combination of predetermined fields of the metadata.
Owner:SAMSUNG ELECTRONICS CO LTD

Visual and interactive wrapper generation, automated information extraction from Web pages, and translation into XML

A method and a system for information extraction from Web pages formatted with markup languages such as HTML [8]. A method and system for interactively and visually describing information patterns of interest based on visualized sample Web pages [5,6,16-29]. A method and data structure for representing and storing these patterns [1]. A method and system for extracting information corresponding to a set of previously defined patterns from Web pages [2], and a method for transforming the extracted data into XML is described. Each pattern is defined via the (interactive) specification of one or more filters. Two or more filters for the same pattern contribute disjunctively to the pattern definition [3], that is, an actual pattern describes the set of all targets specified by any of its filters. A method and for extracting relevant elements from Web pages by interpreting and executing a previously defined wrapper program of the above form on an input Web page [9-14] and producing as output the extracted elements represented in a suitable data structure. A method and system for automatically translating said output into XML format by exploiting the hierarchical structure of the patterns and by using pattern names as XML tags is described.
Owner:LIXTO SOFTWARE

Efficient type annontation of XML schema-validated XML documents without schema validation

InactiveUS20050177578A1Efficiently annotate either entire XML document and XMLDigital data processing detailsSemi-structured data indexingXML schemaPaper document
Type annotation record information storage for annotated automaton encoding for high-performance XML schema validation is optimized in a space efficient aspect. Subsequent to type annotation record information organization, type annotation records are used for type annotation of validated XML documents, either by implementing annotation records and type annotation part of an algorithm only, or by skipping one or more validation steps in a full validation implementation. Given a schema context, a type annotation may be performed for a validated XML fragment as opposed to an entire document. In addition, default features such as attribute and type are supported.
Owner:IBM CORP

Parallel tree searches for matching multiple, hierarchical data structures

Methods and systems in a data-processing system for matching data contained in a hierarchical data tree structure. One or more sets of data contained within a first data tree structure can be associated with one or more sets of data contained within a second data tree structure, such that the data associated with the first data tree structure is generally utilized to process the data associated with the second data tree structure. The first data tree structure can then be compared in parallel to the second data tree structure beginning with a first root thereof to thereby identify data similarities between the first and second data tree structures based on a predefined search criteria. Finally, one or more matching set of data between the first data tree structure and the second data tree structure can be identified, in response to comparing the first data structure to the second data structure.
Owner:IBM CORP +1

Electronic information management system for abstracting and reporting document information

An electronic information management system abstracts and reports document information. The system utilizes a computer communicating with an electronic database. An input device is operatively connected to the computer for entering document information into the database. A plurality of electronic indexing tags are provided at predetermined locations within the stored document information and cooperate with a computer program to capture and retrieve selected portions of the document information. A display device is operatively connected to the computer for displaying the selected portions of document information apart from non-selected portions of document information.
Owner:AMTDIRECT HLDG LLC

Systems and methods for generating longitudinal data profiles from multiple data sources

A computer-implemented method for generating a longitudinal data profile from multiple disparate data sources is provided. The method includes storing, at a central data hub, first de-identified data received from a first data source, the first de-identified data including a plurality of data records having encrypted identifying data and an anonymous ID assigned to each record, wherein the anonymous ID is assigned based on a master list that includes a list of identifiers and corresponding anonymous IDs for each identifier. The method further includes storing second de-identified data received from a second data source, and storing third de-identified data received from a third data source. The method further includes processing the first, second, and third de-identified data to link the first, second, and third de-identified data using the anonymous ID, and generating the longitudinal data profile from the linked first, second, and third de-identified data.
Owner:ABBVIE INC

Method and apparatus for semantic search of schema repositories

Mechanisms for searching XML repositories for semantically related schemas from a variety of structured metadata sources, including web services, XSD documents and relational tables, in databases and Internet applications. A search is formulated as a problem of computing a maximum matching in pairwise bipartite graphs formed from query and repository schemas. The edges of such a bipartite graph capture the semantic similarity between corresponding attributes of the schema based on their name and type semantics. Tight upper and lower bounds are also derived on the maximum matching that can be used for fast ranking of matchings whilst still maintaining specified levels of precision and recall. Schema indexing is performed by ‘attribute hashing’, in which matching schemas of a database are found by indexing using query attributes, performing lower bound computations for maximum matching and recording peaks in the resulting histogram of hits.
Owner:IBM CORP

Method and system for storing, retrieving, and managing data for tags

This invention relates generally to a method and system for storing, retrieving, and managing data for tags that are associated in some manner to any type of object. More particularly, the present invention writes data to these tags, reads data from these tags, and manages data that is written to and / or read from these tags. In addition, the invention accesses and / or stores data associated with tags from or into repositories, constructs and maintains data structures from these repositories and responds to queries using the data structures.

Network based classified information systems

A system for automatically creating databases containing industry, service, product and subject classification data, contact data, geographic location data (CCG-data) and links to web pages from HTML, XML or SGML encoded web pages posted on computer networks such as the Internet or Intranets. The web pages containing HTML, XML or SGML encoded CCG-data, database update controls and web browser display controls are created and modified by using simple text editors, HTML, XML or SGML editors or purpose built editors. The CCG databases may be searched for references (URLs) to web pages by use of enquiries which reference one or more of the items of the CCG-data. Alternatively, enquiries referencing the CCG-data in the databases may supply contact data without web page references. Data duplication and coordination is reduced by including in the web page CCG-data display controls which are used by web browsers to format for display the same data that is used to automatically update the databases.
Owner:HANGER SOLUTIONS LLC +1

Indexing profile for efficient and scalable XML based publish and subscribe system

The present invention provides a system and method for the efficient indexing and delivery of information to interested users who have expressed an interest in or “subscribed” to information items that are continuously released or “published” by some data source in XML format. Previously, publish and subscribe systems accepted keyword-based subscription profiles and did not support subscription to XML documents according to their structures. Direct approach to implement XML-based publish and subscribe system by checking each user profile against an XML document is very time consuming. The presentation invention, though, provides an efficient method to identify interested subscribers for each XML document by indexing queries utilizing a graphical structure of nodes. When an XML document is published, the index identifies all matched expressions in the index and delivers at least a portion of an XML document to a user who has expressed an interest in receiving this information.
Owner:THE BOEING CO

Indexing and querying semi-structured data

Generating an inverted index is disclosed. Semi-structured data from a plurality of sources is parsed to extract structure from at least a portion of the semi-structured data. The inverted index is generated using the extracted structure. The inverted index includes a location identifier and a data type identifier for one or more entries of the inverted index.
Owner:VMWARE INC

Structured-document search apparatus and method, recording medium storing structured-document searching program, and method of creating indexes for searching structured documents

A query in which a sibling relationship among document parts, which are elements of a structured document, can be designated as a search condition is input, and a query tree which represents the query in a tree structure is created. A query converting unit refers to a hierarchical index in which a hierarchical relationship among document parts of each structured document to be searched is expressed in a tree structure, and converts the query tree to a Boolean expression. A text-index referring unit refers to a text index in which is registered information representing a relationship between each set including a character string in text data and a part-ID of a meta part and a document-ID of a document, thereby searching a document corresponding to the Boolean expression converted from the query tree.
Owner:FUJITSU LTD

Automated tagging of objects in databases

Embodiments of the present invention provide systems and methods for automatically generating tag terms (or tags) for objects in databases of a web site. The metadata of the objects (or data) of the web site are processed and parsed to automatically generate tag terms for the corresponding objects. Information (or data, or content) downloaded from the Internet often comes with metadata, which can exist in titles, description, sources, and authors of the information, etc. The metadata of downloaded information can be process and parsed to generate tag terms for the corresponding objects. The system can automatically generate tag terms for the data, which are stored as objects in the databases, and make the data (or objects) searchable. In addition, the automatically generated tag terms allow associated data to maintain their relationship. For example, data from the same sources, same authors, or same subjects can be identified based on the common tag terms. Automatically generated tag terms enable searching and association of data (or objects) in databases in a web site.
Owner:R2 SOLUTIONS

Keyword based evaluation expert intelligent search and recommendation method

The invention discloses a keyword based evaluation expert intelligent search and recommendation method. The keyword based evaluation expert intelligent search and recommendation method specifically comprises step 1, segmenting an expert information main text into substring sequences, performing ICTCLAS word segmentation of Chinese academy of sciences and performing stop word filtering on the result of the word segmentation to obtain the word collection; step 2, extracting feature words of the expert information according to fields; step 3, building an expert knowledge representation model based on the fields and the weight of the feature words and establishing an expert information index database; step 4, performing automatic prompting according to a search term thesaurus when a user inputs keywords and meanwhile performing real-time update on the search term thesaurus through a search term counter; step 5, calculating the search relevance between the keywords and the expert information based on the semantic information and the like; step 6, listing relevant experts from high to low according to the matching degree. According to the keyword based evaluation expert intelligent search and recommendation method, the intelligent full-text search and recommendation of the expert information can be achieved through the keyword input and accordingly the experts which are matched with a pended science and technology project can be searched out accurately.
Owner:HANGZHOU DIANZI UNIV

Index structure for supporting structural XML queries

The present invention provides a ViST (or “virtual suffix tree”), which is a novel index structure for searching XML documents. By representing both XML documents and XML queries in structure-encoded sequences, it is shown that querying XML data is equivalent to finding (non-contiguous) subsequence matches. A variety of XML queries, including those with branches, or wild-cards (‘*’ and ‘ / / ’), can be expressed by structure-encoded sequences. Unlike index methods that disassemble a query into multiple sub-queries, and then join the results of these sub-queries to provide the final answers, ViST uses tree structures as the basic unit of query to avoid expensive join operations. Furthermore, ViST provides a unified index on both content and structure of the XML documents, hence it has a performance advantage over methods indexing either just content or structure. ViST supports dynamic index update, and it relies solely on B+Trees without using any specialized data structures that are not well supported by common database management systems (hereinafter referred to as “DBMSs”).
Owner:IBM CORP

Associating objects in databases by rate-based tagging

Embodiments of the present invention provide automatic systems and methods for associating objects in databases of a web site by rate-based tagging. The frequencies of users entering specific tag terms for objects stored in the databases of the web site are used to determine hard associations between objects and tag terms and between objects. When the frequencies of user tags exceed established thresholds, hard associations between objects and tag terms are established. When objects are identified or determined to have hard association with tag terms, the objects are determined to be more clearly associated with the corresponding tag terms. Therefore, they should be highlighted or featured in more prominent locations on web pages of the web site to increase users' confidence in content of the web site. To identify hard-associated objects, more weights can be assigned to the hard-associated objects, which allows them to be more likely to be selected for display in prominent locations. In addition, objects that are determined to have hard associations with tag terms can also have hard associations with one another due to the common tag terms they share. The hard association relationship between objects can be displayed through links to associated objects when an object is selected for display.
Owner:R2 SOLUTIONS

Element query method and system

Methods, systems, and computer-readable media for representing and querying positional information for a hierarchical document (such as an XML document) are disclosed. In one set of embodiments, at least one word in the hierarchical document is associated with one or more word positions, and at least one element in the hierarchical document is associated with one or more word position ranges. The word positions and word position ranges are analyzed to determine whether a particular word or phrase is a direct or indirect descendant of a particular element in the hierarchical document. In various embodiments, the word positions are indexed in a first index and the word position ranges are indexed in a second index. Thus, the analysis may be efficiently performed by intersecting the first and second indexes. In further embodiments, the word position ranges may be encoded in a space efficient format for storage or transmittal.
Owner:MARKLOGIC

Automated method for detecting and repairing configuration conflicts in a content management system

Embodiments of the invention provide for detecting and (in at least some cases) repairing XML configuration conflicts in a content management system (CMS). One method allows a CMS to evaluate various configuration components and determine when those components may conflict with one another. If a conflict is detected, the CMS may be configured to notify an administrator of the problem, and in some cases, correct the problem. As a result, administrators may not have to carefully evaluate each configuration file associated with a document type definition for a given document type before creating or modifying a content processing rule.
Owner:INT BUSINESS MASCH CORP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products