Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

313 results about "Document analysis" patented technology

Document analysis is used to determine requirements by analyzing the existing documents. This process also identifies the types of information that are important to the requirements. There are numerous types of documents that are analyzed in project management to draw out the important requirements.

System and method for analysis and clustering of documents for search engine

A system and method for searching documents in a data source and more particularly, to a system and method for analyzing and clustering of documents for a search engine. The system and method includes analyzing and processing documents to secure the infrastructure and standards for optimal document processing. By incorporating Computational Intelligence (CI) and statistical methods, the document information is analyzed and clustered using novel techniques for knowledge extraction. A comprehensive dictionary is built based on the keywords identified by the these techniques from the entire text of the document. The text is parsed for keywords or the number of its occurrences and the context in which the word appears in the documents. The whole document is identified by the knowledge that is represented in its contents. Based on such knowledge extracted from all the documents, the documents are clustered into meaningful groups in a catalog tree. The results of document analysis and clustering information are stored in a database.
Owner:NUTECH SOLUTIONS

Category based, extensible and interactive system for document retrieval

In information retrieval (IR) systems with high-speed access, especially to search engines applied to the Internet and / or corporate intranet domains for retrieving accessible documents automatic text categorization techniques are used to support the presentation of search query results within high-speed network environments. An integrated, automatic and open information retrieval system (100) comprises an hybrid method based on linguistic and mathematical approaches for an automatic text categorization. It solves the problems of conventional systems by combining an automatic content recognition technique with a self-learning hierarchical scheme of indexed categories. In response to a word submitted by a requester, said system (100) retrieves documents containing that word, analyzes the documents to determine their word-pair patterns, matches the document patterns to database patterns that are related to topics, and thereby assigns topics to each document. If the retrieved documents are assigned to more than one topic, a list of the document topics is presented to the requester, and the requester designates the relevant topics. The requester is then granted access only to documents assigned to relevant topics. A knowledge database (1408) linking search terms to documents and documents to topics is established and maintained to speed future searches. Additionally, new strategies are presented to deal with different update frequencies of changed Web sites.
Owner:COGISUM INTERMEDIA

System and method for document analysis, processing and information extraction

A method and system for retrieving information in response to an information retrieval request comprises extracting additional information from a first corpus of data elements based on the request. The request is modified based on the additional information to refine the scope of information to be retrieved from a second corpus of data elements. The information is retrieved from the second corpus of data elements based on the modified request.
Owner:PLAIN SIGHT SYST

Semantic document profiling

A method of semantic profiling of documents comprises receiving a document to be profiled, the document comprising a plurality of terms, for each of at least a portion of the plurality of terms in the document determining a part of speech and a grammatical function of the term, obtaining senses of the term, selecting a sense as a most likely meaning of the term, and calculating an information value of the term, and generating a semantic profile of the document comprising at least some of the calculated information values.
Owner:GRAMMARLY

Systems and methods for document processing using machine learning

Disclosed herein are embodiments of systems, devices, and methods automated document analysis and processing using machine learning techniques. In one embodiment, systems and methods are disclosed for automatically classifying documents. In another embodiment, systems and methods are disclosed for identifying new tags for untagged documents. In another embodiment, systems and methods are disclosed for identifying documents related to a target document.
Owner:NOVABASE SGPS SA

Category based, extensible and interactive system for document retrieval

An integrated, automatic and open information retrieval system comprises an hybrid method based on linguistic and mathematical approaches for an automatic text categorization. It solves the problems of conventional systems by combining an automatic content recognition technique with a self-learning hierarchical scheme of indexed categories. In response to a word submitted by a requestor, said system retrieves documents containing that word, analyzes the documents to determine their word-pair patterns, matches the document patterns to database patterns that are related to topics, and thereby assigns topics to each document. If the retrieved documents are assigned to more than one topic, a list of the document topics is presented to the requestor, and the requestor designates the relevant topics. The requestor is then granted access only to documents assigned to relevant topics. A knowledge database linking search terms to documents and documents to topics is established and maintained to speed future searches. Additionally, new strategies are presented to deal with different update frequencies of changed Web sites.
Owner:COGISUM INTERMEDIA

Secure search of private documents in an enterprise content management system

An enterprise content management system such as an electronic contract system manages a large number of secure documents for many organizations. The search of these private documents for different organizational users with role-based access control is a challenging task. A content-based extensible mark-up language (XML)-annotated secure-index search mechanism is provided that provides an effective search and retrieval of private documents with document-level security. The search mechanism includes a document analysis framework for text analysis and annotation, a search indexer to build and incorporate document access control information directly into a search index, an XML-based search engine, and a compound query generation technique to join user role and organization information into search query. By incorporating document access information directly into the search index and combining user information in the search query, search and retrieval of private contract documents can be achieved very effectively and securely with high performance.
Owner:IBM CORP

System and method for defining characteristic data of a scanned document

A system and a method for providing characteristic data associated with a scanned document is provided. The characteristic data of the document may include a title, a creation date, a scan date, an author, a subject matter, a total page count, a starting page number, an ending page number, a color type, a document type, a language, and / or a document direction. The method includes analyzing a bitmapped image file of a document, determining at least one characteristic data of the document based on the analysis of the bitmapped image file, and linking the characteristic data to the bitmapped image file, wherein the characteristic data is useable by a document management system to identify the document in a search. Analyzing the bitmapped image of the document may include a natural language analysis technique, an optical character recognition analysis technique, an image layout analysis technique, and / or a color analysis technique.
Owner:KK TOSHIBA +1

Similar document detection and electronic discovery

Systems and methods are disclosed for performing duplicate document analyses to identify texturally identical or similar documents, which may be electronic documents stored within an electronic discovery platform. A process is described which includes representing each of the documents, including a target document, as a relatively large n-tuple vector and also as a relatively small m-tuple vector, performing a series of one-dimensional searches on the set of m-tuple vectors to identify a set of documents which are near-duplicates to the target document, and then filtering the near set of near duplicate documents based upon the distance of their n-tuple vectors from that of the target document.
Owner:STROZ FRIEDBERG

System, method, and service for automatically and dynamically composing document management applications

A document management system applies relevant document analysis, metadata extraction, and business process association algorithms and methodology to automatically and dynamically classify documents for routing, processing, and executing customized business logic. The document management system accepts documents from one or more channels, classifies the document and extracts metadata, executes customized application profiles and triggers business logic associated with the process. The document management system comprises a rules engine to detect and classify unstructured forms as well as structured forms, where the locations of attributes and visual layout are not fixed. The document management system provides automatic linkage between disparate systems that manages documents for the complete execution of a business process.
Owner:IBM CORP

System and method for automatically assigning a filename to a scanned document

Automatic filename assignation logic automatically assigns a filename and extension to a scanned image, thus eliminating the need for the user to interactively assign the filename. The automatic filename assignation logic together with the document analysis and processing logic determines the appropriate filename under which to save the scanned document by searching a predefined region within the document for a filename and an extension. A user may preselect the region or the region may be a default region applied to all scanned documents unless the automatic filename assignation logic is otherwise instructed. Alternatively, the presence of a notation, such as a “POST-IT®” note on a page is searched for, and, if discovered, the information contained in the notation (the desired filename and extension) becomes the name under which the automatic filename assignation logic saves the document.
Owner:HEWLETT PACKARD DEV CO LP

Document analysis

A particular computer-implemented method includes generating a plurality of intent maps based on a plurality of documents. The plurality of intent maps includes a first intent map based on a first document and a second intent map based on a second document. Each intent map of the plurality of intent maps corresponds to a document of the plurality of documents and includes a set of event structures. Each event structure includes data descriptive of an actor and an action described in the document that corresponds to the intent map. The method also includes performing a comparison of event structures of the first intent map and event structures of the second intent map. The method further includes determining, based on the comparison, whether at least a portion of the first document is duplicative of at least a portion of the second document.
Owner:THE BOEING CO

A decentralized identifier management system based on an Ethereum block chain

The embodiment of the invention discloses a decentralized identifier management system based on an Ethereum block chain. The system comprises an intelligent contract on an Ethereum block chain and a decentralized identifier document analysis module under the block chain, a representation form of the distributed identifier is specified in the smart contract, an attribute analysis function of the decentralized identifier is realized through a decentralized identifier document analysis module; the user can use the public key which is publicly and autonomously used by the smart contract to realizekey alternation and management; the third-party service provider website and other users can verify the authenticity of the digital signature of the user under the block chain by inquiring the data in the decentralized identifier document so as to determine the user identity; and the entity can freely use any shared trust root to manage the decentralized identifier of the entity, so that the entity has no centralized authority and no single-point fault, and has high information management security and privacy.
Owner:领信智链(北京)科技有限公司

Granular knowledge based search engine

The application borrows terminology from data mining, association rule learning and topology. A geometric structure represents a collection of concepts in a document set. The geometric structure has a high-frequency keyword set that co-occurs closely which represents a concept in a document set. Document analysis seeks to automate the understanding of knowledge representing the author's idea. Granular computing theory deals with rough sets and fuzzy sets. One of the key insights of rough set research is that selection of different sets of features or variables will yield different concept granulations. Here, as in elementary rough set theory, by “concept” we mean a set of entities that are indistinguishable or indiscernible to the observer (i.e., a simple concept), or a set of entities that is composed from such simple concepts (i.e., a complex concept).
Owner:WANG ANDREW CHIEN CHUNG +2

Document Analysis, Commenting, and Reporting System

A document analysis, commenting, and reporting system provides tools that automate quality assurance analysis tailored to specific document types. As one example, the specific document type may be a requirements specification and the system may tag different parts of requirements, including actors, entities, modes, and a remainder. However, the flexibility of the system permits analysis of any other document type, such as instruction manuals and best practices guides. The system helps avoid confusion over the document when it is delivered because of non-standard terms, ambiguous language, conflicts between document sections, incomplete or inaccurate descriptions, size and complexity of the document, and other issues.
Owner:ACCENTURE GLOBAL SERVICES LTD

Enhanced mechanism for automatically generating a transformation document

A transformation document generation mechanism (TDGM) for automatically generating a transformation document given a source document and a target document is disclosed. The TDGM analyzes each document and builds a pattern dictionary for each that records the patterns found in that document. Thereafter, the TDGM processes the pattern dictionaries to automatically generate the transformation document. In doing so, the TDGM automatically generates pattern creation templates in the transformation document. These templates (when invoked by a transformation processor at a later time while processing a source document with the transformation document) will cause particular patterns to be created in a result document. In addition, the TDGM generates zero or more copy templates in the transformation document to copy identical elements, if any, from the source document to the result document. Once that is done, the transformation document is created and may be refined by a user. By performing much of the underlying document analysis for the user, and by generating an initial transformation document, the TDGM simplifies the transformation document creation process.
Owner:ORACLE INT CORP

Method of vector analysis for a document

The invention provides a document representation method and a document analysis method including extraction of important sentences from a given document and / or determination of similarity between two documents.The inventive method detects terms that occur in the input document, segments the input document into document segments, each segment being an appropriately sized chunk and generates document segment vectors, each vector including as its element values according to occurrence frequencies of the terms occurring in the document segments. The method further calculates eigenvalues and eigenvectors of a square sum matrix in which a rank of the respective document segment vector is represented by R and selects from the eigenvectors a plural (L) of eigenvectors to be used for determining the importance. Then, the method calculates a weighted sum of the squared projections of the respective document segment vectors onto the respective selected eigenvectors and selects document segments having the significant importance based on the calculated weighted sum of the squared projections of the respective document segment vectors.
Owner:MICRO FOCUS LLC

Document analyzer and metadata generation and use

A document analyzer receives a collection of text-based terms associated with a document. The document analyzer performs a statistical analysis on the text-based terms to identify a distribution of where the text-based terms appear in the document and relative frequency indicating how often the text-based terms appear in the document. The document analyzer utilizes the distribution and relative frequency information derived from the statistical analysis to rank multiple themes associated with the document. For example, a received listing of multiple themes may not be presented in any useful order, although it can be assumed that the themes in the listing are present in the document. Based on application of distribution and relative frequency information derived from the analysis, the document analyzer can identify which themes are most relevant to the document as a whole and / or which of themes correspond to different portions (e.g., pages or sections) of the document.
Owner:ADOBE INC

Display apparatus and method for summarizing of document

A display apparatus including a communicator configured to perform data communication with a content server and to receive at least one of a main document and a sub document related to the main document; a document analyzer configured to extract a keyword having a high frequency of occurrence from the main document and to determine a head keyword for generating a summarized document from the extracted keyword with reference to the received sub document; and a processor configured to determine a reliability of each sentence of the main document based on the head keyword, extract a sentence that matches a predetermined condition with reference to the determined reliability, and analyze a structural format of the extracted sentence so as to re-configure a word that forms the sentence and generate a summarized sentence, thereby generating a summarized document where information and logical cohesion have been obtained.
Owner:SAMSUNG ELECTRONICS CO LTD

Document analysis, commenting, and reporting system

A document analysis, commenting, and reporting system provides tools that automate quality assurance analysis tailored to specific document types. As one example, the specific document type may be a requirements specification and the system may tag different parts of requirements, including actors, entities, modes, and a remainder. The system also includes tools for visualizing the relationships between entities in a requirements specification and for identifying whether the requirements specification provides for attributes specified by a non-functional attribute glossary. The system facilitates the visualization of interactions of individual entities, of a system of entities, or entities identified for a specific use. The different types of visualizations distinguish between interacting and non-interacting entities, and highlight where a set of requirements may be deficient with respect to the non-interacting entities. However, the flexibility of the system permits analysis of any other document type, such as instruction manuals and best practices guides.
Owner:ACCENTURE GLOBAL SERVICES LTD

Document Analysis System and Document Adaptation System

A document analysis system which can execute a layout analysis intended by a document provider and an exhaustive title analysis and output the analysis result which can be used by a third person is provided by the present invention. The input unit (11) obtains a structured or semi-structured document and renders it. The basic layout analysis unit (14) obtains the rendering result and analyzes the layout by grouping document description elements juxtaposed in a determined direction by referencing an arrangement of the document description elements. The title analysis unit (15) obtains the rendering result and a title analysis rule from the title analysis rule storing unit (23) and analyzes the title by comparing the name, attribute, style or the content of the document analysis elements with the title analysis rule. The layout analysis unit (16) obtains the layout components and the hierarchical relationship thereof and the titles for generating a new layout by grouping the layout components. The output unit (13) obtains the layout components and the hierarchical relationship thereof, the relationship between the components and the titles, shapes them into a format having an expression which uses the reference to the document description elements and output them.
Owner:NEC CORP

Method and system for document presentation and analysis

A document analysis system receives multiple concepts along with multiple reference documents and generates sensory indicators that assist a researcher in assessing the relevance of each of the documents to the concepts. In one exemplary aspect, the document analysis system displays a table of keywords separated into blocks, each block of keywords corresponding to one of the concepts. Each block is colored according to the prevalence of any keyword within a given keyword group. The color of a block thus indicates the relative presence of a concept in the document. The document analysis system also determines a unique color for each block of keywords for highlighting in the text of the document. In this manner a researcher can quickly identify passages that contain multiple concepts. Additionally, the researcher is provided the ability to quickly locate reference characters, figure numbers and patent numbers in the document.
Owner:WALSH PATRICK SANDER

Systems and methods to automatically classify electronic documents using extracted image and text features and using a machine learning subsystem

A document analysis system that automatically classifies documents by recognizing in each document distinctive features comprises a document acquisition system, a document recognition training system, a document classification system, a document recognition system, and a job organization system. The document acquisition system receives jobs wherein each job containing at least one electronic document. The document feature recognition system automatically extracts image and text features from each received document. The document classification system automatically classifies recognized electronic documents by finding the best match between the extracted features of each of the document and feature sets associated with each category of document. The document recognition training system automatically trains the feature set for each corresponding category of documents, wherein the training system using extracted features of unrecognized documents automatically modifies the feature set for a document category. The job organization system automatically organizes each job according to the document categories it contains.
Owner:GRUNTWORX

Systems and methods for automatically extracting data from eletronic documents using multiple character recognition engines

In a document analysis system that receives and processes jobs from a plurality of users, in which each job may contain multiple electronic documents, to extract data from the electronic documents, a method of automatically extracting data from each received electronic document using a plurality of character recognition engines is provided. The method includes: automatically processing each received electronic document page using each of a plurality of recognition engines to extract data; comparing quality of data extracted from each of the recognition engines to assign a confidence score to the extracted data; and selecting extracted data having highest confidence score as the correct extracted data.
Owner:GRUNTWORX

Systems and methods for automatically processing electronic documents using multiple image transformation algorithms

ActiveUS20110255782A1Improve subsequent data extractionCharacter recognitionElectronic documentDocument analysis
In a document analysis system that receives and processes jobs from a plurality of users, in which each job may contain multiple electronic documents, to extract data from the electronic documents, a method of automatically pre-processing each received electronic document using a plurality of image transformation algorithms to improve subsequent data extraction from said document is provided. The method includes: electronically partitioning each received electronic document page into pieces; automatically processing each piece of the received electronic document page using each of a plurality of image pre-processing algorithms to produce a plurality of image variations of each piece; and analyzing the outputs of subsequent processing and data extraction, on each of the image variations of the pieces to determine which output is best, from the plurality of outputs for each piece.
Owner:GRUNTWORX

Systems and methods for enabling manual classification of unrecognized documents to complete workflow for electronic jobs and to assist machine learning of a recognition system using automatically extracted features of unrecognized documents

A method in a document analysis system automatically extracts image and text features from each received electronic document and compares the extracted features with feature sets associated with each category of document to determine whether the document is recognizable as belonging to a document category. If an electronic document is recognized as belonging to one of the document categories, the method classifies the electronic document as belonging to that document category. If, however, an electronic document is unrecognized, the method submits the unrecognized document to a learning phase, in which the unrecognized document is presented to a human trainer for manual classification of the unrecognized electronic document into a document category, and automatically modifies at least one of the features and the weights of the feature set of the document category corresponding to the manually-classified electronic document using the automatically extracted features of the manually-classified document.
Owner:GRUNTWORX

Systems and methods for automatically extracting data from electronic document page including multiple copies of a form

In a document analysis system that receives and processes jobs from a plurality of users, in which each job may contain multiple electronic documents, to extract data from the electronic documents, a method of extracting data from a received electronic document page that includes multiple copies of a form is provided. The method comprising: automatically processing a received electronic document page that includes multiple copies of a form to group the multiple copies into corresponding number of records; automatically extracting data from each of the multiple copies of the form and saving the extracted data into the corresponding record; automatically comparing the extracted data in the records to determine which copy of the extracted data to select; if all extracted data instances are identical, assigning a high confidence score to the extracted data; and, if all extracted data instances are not identical, flagging the extracted data for a further processing.
Owner:GRUNTWORX

System and methods of an expense management system based upon business document analysis

The disclosure herein relates to business content analysis. In particular, the disclosure relates to systems and methods of an expense management system operable to perform automatic business documents' content analysis for generating business reports associated with automated value added tax (VAT) reclaim, Travel and Expenses (T&E) management, Import / Export management and the like. The system is further operable to provide various organizational expense management aspects for the corporate finance department and the business traveler based upon stored data. Additionally, the system is configured to use a content recognition engine, configured as an enhanced OCR mechanism used for extracting tagged text from invoice images and also provides continuous learning mechanism in a structured mode allowing classification of invoice images by type, providing continual process of improvement and betterment throughout.
Owner:WAY2VAT LTD

Systems and methods for training document analysis system for automatically extracting data from documents

A method of training a document analysis system to extract data from documents is provided. The method includes: automatically analyzing images and text features extracted from a document to associate the document with a corresponding document category; comparing the extracted text features with a set of text features associated with corresponding category of the document, in which the set of text features includes a set of characters, words, and phrases; if the extracted features are found to consist of the characters, words, and phrases belonging to the set of text features associated with the corresponding document category, storing the extracted text features as the data contained in the corresponding document; and, if the extracted text features are found to include at least one text feature that does not belong to the set of text features associated with the corresponding document category, submitting the unrecognized text features to a training phase.
Owner:GRUNTWORX
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products