Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

782 results about "Structured document" patented technology

A structured document is an electronic document where some method such as markup or embedded coding, is used to identify the whole and parts of the document as having various meanings beyond their formatting. For example, a structured document might identify a certain portion as a "chapter title" (or "code sample" or "quatrain") rather than as "Helvetica bold 24" or "indented Courier". Such portions in general are commonly called "components" or "elements" of a document.

Document management system with enhanced intelligent document recognition capabilities

InactiveUS20050289182A1Enhances document management qualityImprove efficiencyCharacter and pattern recognitionOffice automationXMLData extraction
An intelligent document recognition-based document management system includes modules for image capture, image enhancement, image identification, optical character recognition, data extraction and quality assurance. The system captures data from electronic documents as diverse as facsimile images, scanned images and images from document management systems. It processes these images and presents the data in, for example, a standard XML format. The document management system processes both structured document images (ones which have a standard format) and unstructured document images (ones which do not have a standard format). The system can extract images directly from a facsimile machine, a scanner or a document management system for processing.
Owner:SAND HILL SYST

Enhanced transcoding of structured documents through use of annotation techniques

Methods, systems, and computer program products for improving the transcoding operations which are performed on structured documents (such as those encoded in the Hypertext Markup Language, or "HTML") through use of annotations. Source documents may be annotated according to one or more types of annotations. Representative types of annotations direct an annotation engine to perform selective clipping of document content, provide enhanced HTML form support, request node and / or attribute replacement or the insertion of HTML or other rendered markup syntax, and direct a transcoding engine to provide fine-grained transcoding preference support (such as controlling transcoding of tables on a per-row or per-column basis). The disclosed techniques may be used with statically-generated document content and with dynamically-generated content. Annotation is performed as a separate step preceding transcoding, and a modified document resulting from processing annotations may therefore be re-used for multiple different transcoding operations.
Owner:IBM CORP

Integrated retrieval scheme for retrieving semi-structured documents

An integrated retrieval scheme retrieves data involved in a plurality of semi-structured documents scattering over open networks and collects the required information item by item from the semi-structured documents through a unified interface without regard to differences in the document structures, presentation styles, and elements of the semi-structured documents.The search scheme receives a query consisting of search items and search conditions from a user. The search scheme finds, according to location data that specifies the location of each of the semi-structured documents, the location of each semi-structured document that contains all search items and converts, if necessary, item presentation styles of the entered query into that of the location found semi-structured documents according to style conversion data, and forms queries for the location found semi-structured documents, and transmits the queries to the found locations and obtains the location found semi-structured documents, and extracts item data from the obtained semi-structured documents according to structure data being used to delimit document into items and attribute data being used for conditional retrieval, and prepares a search result, and converts, if necessary, item presentation styles of the search result into the item presentation styles of each user according to the style conversion data.
Owner:NIPPON TELEGRAPH & TELEPHONE CORP

High-performance extensible document transformation

The present invention provides a method, system, and computer program product for applying transformations to extensible documents, enabling reductions in the processing time required to transform arbitrarily-structured documents having particular well-defined elements. Signatures for structured document types are defined, along with one or more transformations to be performed upon documents of that type. The transformations are specified using syntax elements referred to as maps. A map specifies an operation code for the transformation to be performed, and describes the input and output of the associated transformation. A special map processing engine locates an appropriate transformation object to a particular input document at run-time, and applies the transformation operation according to the map definition. This technique is preferably used for a set of predetermined core transformations, with other transformations being processed using stylesheet engines of the prior art. The input documents may be encoded in the Extensible Markup Language (XML), or in other structured notations. The techniques of the present invention are particularly well suited to use in high-volume and throughput-sensitive environments such as that encountered by business-to-business transaction servers.
Owner:IBM CORP

Accelerated compositing of fixed position elements on an electronic device

A device, system and method are provided for processing structured documents for display. Content of a first viewable portion of the structured document having a fixed position in relation to a viewport is rendered as first rendered image data. Content of a second viewable portion that does not have a fixed position is rendered as second rendered image data. The first and second rendered image data are composited, and a resultant composited image is output for display. In response to a scroll or zoom command applied to the document, and in particular to the second viewable portion, the second rendered image data is updated and composited with the first rendered image data. Compositing can be carried out by a graphics processor separate from a main processor in the electronic device. When no fixed position elements are present in the structured document, the main processor renders the entire content without compositing.
Owner:BLACKBERRY LTD

System and method for dynamically evaluating latent concepts in unstructured documents

A system and method for dynamically evaluating latent concepts in unstructured documents is disclosed. A multiplicity of concepts are extracted from a set of unstructured documents into a lexicon. The lexicon uniquely identifies each concept and a frequency of occurrence. A frequency of occurrence representation is created for the documents set. The frequency representation provides an ordered corpus of the frequencies of occurrence of each concept. A subset of concepts is selected from the frequency of occurrence representation filtered against a pre-defined threshold. A group of weighted clusters of concepts selected from the concepts subset is generated. A matrix of best fit approximations is determined for each document weighted against each group of weighted clusters of concepts.
Owner:NUIX NORTH AMERICA

Automated extraction of semantic content and generation of a structured document from speech

Techniques are disclosed for automatically generating structured documents based on speech, including identification of relevant concepts and their interpretation. In one embodiment, a structured document generator uses an integrated process to generate a structured textual document (such as a structured textual medical report) based on a spoken audio stream. The spoken audio stream may be recognized using a language model which includes a plurality of sub-models arranged in a hierarchical structure. Each of the sub-models may correspond to a concept that is expected to appear in the spoken audio stream. Different portions of the spoken audio stream may be recognized using different sub-models. The resulting structured textual document may have a hierarchical structure that corresponds to the hierarchical structure of the language sub-models that were used to generate the structured textual document.
Owner:MULTIMODAL TECH INC

Interactive User Interface for Converting Unstructured Documents

An interactive interface facilitates the conversion of unstructured documents into XML-compliant documents. A document is parsed to identify fact items in the content of the document. A classifier associates initial labels with an identified fact items, and the fact items and associated initial labels are forwarded to a user for review and correction. An interface executing on a client computer presents the initial labels associated with fact items, and enables a user to correct the labels associated with the identified fact items. Upon receipt of corrected labels from the user, the classifier is trained to update probable associations of labels and fact items in accordance with the corrected labels. The interface enables the user to enter new labels and / or concepts for a taxonomy, and an extension to the taxonomy is automatically generated.
Owner:COMPSCI RESOURCES

Content Profiling to Dynamically Configure Content Processing

Some embodiments provide a method that receives an unstructured document including a number of primitive elements. The method identifies a default set of document reconstruction operations for reconstructing the unstructured document to define a structured document the method performs at least one of the document reconstruction operations from the default set. Based on results of the performed document reconstruction operations, the method identifies a profile for the unstructured document. The method modifies the set of document reconstruction operations for reconstructing the unstructured document according to the identified profile.
Owner:APPLE INC

Remote operation system, communication apparatus remote control system and document inspection apparatus

This invention is, in a remote operation apparatus which transmits information to a terminal device having a display part which displays received information, for the purpose of operability improvement on the occasion of inspecting structured documents such as Web pages, in a document inspection apparatus with a small screen size, equipped with an input part which inputs various instructions, a display part which displays various information, a communication processing part which obtains display information which is displayed on a display screen of the display part through a network, an area recognition processing part which extracts a size of a rectangular area which is included in a window which was obtained by the communication processing part and is displayed on the display part, and display information in the rectangular area, a storage part which stores size information of the display screen of the display part of the terminal device, an area change processing part which modifies a size of the rectangular area to a size of the display screen which was stored in the storage part, and obtains display information in the rectangular area, and control means which controls the communication processing part so as to transmit the display information which was modified by the area change processing part to the terminal device.
Owner:PANASONIC CORP

System and method for efficiently generating cluster groupings in a multi-dimensional concept space

A system and method for efficiently generating cluster groupings in a multi-dimensional concept space is described. A plurality of terms is extracted from each document in a collection of stored unstructured documents. A concept space is built over the document collection. Terms substantially correlated between a plurality of documents within the document collection are identified. Each correlated term is expressed as a vector mapped along an angle θ originating from a common axis in the concept space. A difference between the angle θ for each document and an angle σ for each cluster within the concept space is determined. Each such cluster is populated with those documents having such difference between the angle θ for each such document and the angle σ for each such cluster falling within a predetermined variance. A new cluster is created within the concept space those documents having such difference between the angle θ for each such document and the angle σ for each such cluster falling outside the predetermined variance.
Owner:NUIX NORTH AMERICA

System and method for dynamic generation of structured documents

A method and apparatus for representing complex data schemas and generating type validated output documents in a markup language. The methods apply to transforming document type definitions into extensible markup language coded information that can readily accommodate logical constraints imposed by recursion or repetition within the DTD structure. Furthermore, non-determinism arising from repetition or recursion in a data schema is resolved by traversal path coding using a matrix representation of the data schema.
Owner:STAPEL KEVIN +1

Method, system, program, and data structures for managing structured documents in a database

Provided is a method, system, program and data structures for managing structured documents. Each structured document has at least one element in common and each element is capable of having one defined data object. At least one table is generated based on a schema of elements in the managed structured documents. Further, at least one table is designed to include entries for each element instance in the managed structured documents and at least one object for one element instance. For each element instance in the managed structured documents, one entry is added to at least one table including information on an element identifier for the element instance, the data object for the element instance, and a document identifier for the structured document including the element instance. The at least one table provides an association of the element instance, the at least one data object for the element instance, and the document identifier of the structured document including the element instance.
Owner:IBM CORP

Apparatus, method, and program for retrieving structured documents

A system retrieves structured documents based on first desired concept item having first concept items classified hierarchically and subordinated to first desired concept item, second desired concept item having second concept items classified hierarchically and subordinated to second desired concept item, generates a table displaying retrieval results, and associates groups of desired component, one of first items classified as first concept items immediately lower than first desired concept item, and desired second concept item with column index cells of table respectively. When a display area where one of the column index cells is displayed is designated, the system acquire the groups associated with the designated area, and retrieves, based on the acquired group, structured documents each including the desired component including a value in which one of the first concept items subordinated to the one of the first item and one of the second concept items are included.
Owner:KK TOSHIBA

LDAP-based distributed cache technology for XML

The design, internal data representation and query model of the invention, a hierarchical distributed caching system for semi-structured documents based on LDAP technology is presented that brings both, the semi-structured data model and the LDAP data model together into a system that provides the ideal characteristics for the efficient processing of XPath queries over XML documents. Transformation algorithms and experimental results have also been shown that prove the feasibility of the invention as a distributed caching system especially tailored for semi-structured data.
Owner:MARRON PEDRO JOSE +1

Method, system, and program for preprocessing a document to render on an output device

Provided is a method, program and system for processing a source document in a structured document format including elements providing content to render. A source document and page layout data structure providing formatting properties specifying a layout and format of the content output are received. The source document and the page layout data structure are processed to determine page divisions and formatting properties for the content in the source document. Multiple page objects are generated, wherein each page object includes content and formatting properties for at least one page. The page objects are transmitted to a rasterizer to transform into renderable information capable of being generated by an output device.
Owner:INFOPRINT SOLUTIONS COMPANY LLC

Structured document converting method and data converting method

A technique aimed to decrease the resource required for operations on a structured document, decrease the amount of a memory used, and increase the processing speed when the structured document is processed. Elements constituting a structured document to be converted are separated into key elements and nonkey elements, a new element given a predetermined tag name and a predetermined attribute name is created, tag name conversion is performed to create a tag name character string and describe the tag name character string as an attribute value corresponding to the predetermined attribute name in the new element, content conversion is performed to create a content character string including contents of the nonkey elements and describe the content character string as a content of the new element, and the key elements are described unchanged in a converted structured document. The method is applied to a system handling structured documents such as XML.
Owner:FUJITSU LTD

Scalable data extraction techniques for transforming electronic documents into queriable archives

A method for extracting an attribute occurrence from template generated semi-structured document comprising multi-attribute data records comprises identifying a first set of attribute occurrences in the template generated semi-structured document using an ontology. The method further comprises determining a boundary of each multi-attribute data record in the template generated semi-structured document, learning a pattern for an attribute corresponding to an identified attribute occurrence of the first set in the template generated semi-structured document, and applying the pattern within the boundary of each multi-attribute data record in the template generated semi-structured document to extract a second set of attribute occurrences.
Owner:THE RES FOUND OF STATE UNIV OF NEW YORK

Method for extracting, interpreting and standardizing tabular data from unstructured documents

A system, method, and computer program for automatically identifying, parsing, and interpreting tabular data from unstructured documents stored in various formats such as ASCII text, Unicode text, HTML, PDF text, and PDF image format is provided. A set of table identification, parsing / tokenizing, and interpreting / mapping rules are developed with grammar descriptors. These rules are then applied to a set of documents to identify a table, parse the content of the table, and interpret the parsed content, if required, thereby standardizing the tabular data.
Owner:RAGE FRAMEWORKS +1

System and method for applying dynamic contextual grammars and language models to improve automatic speech recognition accuracy

The invention involves the loading and unloading of dynamic section grammars and language models in a speech recognition system. The values of the sections of the structured document are either determined in advance from a collection of documents of the same domain, document type, and speaker; or collected incrementally from documents of the same domain, document type, and speaker; or added incrementally to an already existing set of values. Speech recognition in the context of the given field is constrained to the contents of these dynamic values. If speech recognition fails or produces a poor match within this grammar or section language model, speech recognition against a larger, more general vocabulary that is not constrained to the given section is performed.
Owner:NUANCE COMM INC

Comparing and merging structured documents syntactically and semantically

A method of performing a three-way merge includes receiving first, second, and third versions of a structured document containing first, second, and third pluralities of elements respectively; deserializing the first, second, and third versions to generate first, second, and third tree-structured data models respectively representing the first, second, and third versions; generating an identifier for each node of each data model that is unique within the data model by applying identifier determination rules to a context describing the element corresponding to the node; comparing each identifier in the first data model with each identifier in the second data model to identify each node in the first data model not having matching identifiers with any node in the second data model and to link each pair of nodes having matching identifiers; and applying comparison rules to the contexts of each linked pair of nodes to identify differences therebetween.
Owner:IBM CORP

Automatic generation of a search engine for a structured document

We describe a search engine generator that automates the process of creating a search engine for a particular structured document written in a natural language such as English. The search engine allows more convenient and flexible analysis of information stored in natural language documents than is currently available with World Wide Web search engines or portal builders. Specifically, it displays matching records in a tabular format for easy comparison; this may include information calculated with data from the document. Further, the search engine's graphical user interface (GUI) is available in different natural languages to facilitate searches by international users, and the GUI has a customizable graphic design.
Owner:ABRAIDO FANDINO LEONOR MARIA

Integrated retrieval method for structured data and non-structured data

The invention discloses a method for synthetically searching structurized data and non-structurized data, which comprises the following steps: a database for storing the structurized data is enlarged, different types of documents with the non-structurized data are processed in a filtering manner, the corresponding interface processing types are realized according to the document suffixes, the corresponding configuration is finished, the type of document is analyzed, and the index is established, so that the inquiry of the type of document is realized; and the searching of the database and the documents is performed according to key words, and the searching results are displayed. By adopting the method, whatever the searching source is an ordinary database for storing the structurized data or an FTP for storing the non-structurized documents, the searching results can be intuitively and accurately revealed to users.
Owner:TSINGHUA UNIV

Dynamic data migration for structured markup language schema changes

Techniques are disclosed for programmatically migrating structured documents created according to one version of a schema such that those structured documents may adhere to a revised version of the schema (or schema equivalent, alternatively). A "schema change document" is used to record changes that have been made to the schema. This schema change document provides a single point of access for implementing programmatic revisions for a single source file or for an entire set of source files that may have become out of alignment with its schema. The source file(s), or a copy thereof, can then be changed programmatically in view of the recorded schema changes, without having to manually search for and change all of the source files that are dependent on a changed schema
Owner:IBM CORP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products