Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

119 results about "Analysis Documentation" patented technology

A Subcategory name for the eTMF domain used to classify documentation related to the analysis of clinical trial data.

Automatic, personalized online information and product services

A method for providing automatic, personalized information services to a computer user includes the following steps: transparently monitoring user interactions with data during normal use of the computer; updating user-specific data files including a set of user-related documents; estimating parameters of a learning machine that define a User Model specific to the user, using the user-specific data files; analyzing a document to identify its properties; estimating the probability that the user is interested in the document by applying the document properties to the parameters of the User Model; and providing personalized services based on the estimated probability. Personalized services include personalized searches that return only documents of interest to the user, personalized crawling for maintaining an index of documents of interest to the user; personalized navigation that recommends interesting documents that are hyperlinked to documents currently being viewed; and personalized news, in which a third party server customized its interaction with the user. The User Model includes continually-updated measures of user interest in words or phrases, web sites, topics, products, and product features. The measures are updated based on both positive examples, such as documents the user bookmarks, and negative examples, such as search results that the user does not follow. Users are clustered into groups of similar users by calculating the distance between User Models.
Owner:PERSONALIZED USER MODEL PUM

Computer-implemented system and method for text-based document processing

A computer-implemented system and method for processing text-based documents. A frequency of terms data set is generated for the terms appearing in the documents. Singular value decomposition is performed upon the frequency of terms data set in order to form projections of the terms and documents into a reduced dimensional subspace. The projections are normalized, and the normalized projections are used to analyze the documents.
Owner:SAS INSTITUTE

System and method for automatically generating XML schema for validating XML input documents

Techniques, systems and apparatus for automatically generating schema using an initial documents constructed in an XML compatible format are disclosed. A method involves providing an initial XML document that and analyzing the XML document to identify the XML data structures in the document and generating a data framework that corresponds to the data structure of the XML document. The data items of the initial XML document are analyzed to determine data constraints based on the data items of the initial XML. Schema are then generated based on the data framework generated and the data constraints determined from the raw xml data. These principles can be implemented as software operating on a computer system, as a computer module, as a computer program product and as a series of related devices and products.
Owner:SUN MICROSYSTEMS INC

Principles and methods for personalizing newsfeeds via an analysis of information novelty and dynamics

A system and methodology is provided for filtering temporal streams of information such as news stories by statistical measures of information novelty. Various techniques can be applied to custom tailor news feeds or other types of information based on information that a user has already reviewed. Methods for analyzing information novelty are provided along with a system that personalizes and filters information for users by identifying the novelty of stories in the context of stories they have already reviewed. The system employs novelty-analysis algorithms that represent articles as a bag of words and named entities. The algorithms analyze inter- and intra-document dynamics by considering how information evolves over time from article to article, as well as within individual articles.
Owner:MICROSOFT TECH LICENSING LLC

Methods and systems for analyzing XML documents

Methods and systems for analyzing XML documents. The system scans an XML document, identifies different dimensions that span the XML document and detects scoping relationships amongst them. The system uses the dimensional information to create a logical hierarchical scoped dimension analysis model, maps the logical XML tree to this model, and then implements the analytical method over the logical model. The logical model allows both structural features and numeric / non-numeric data to be used for analysis. The analytical method allows users to query irregular structural properties of the XML documents using the XPath navigational API.
Owner:IBM CORP

Integrating external related phrase information into a phrase-based indexing information retrieval system

An information retrieval system uses phrases to index, retrieve, organize and describe documents, analyzing documents and storing the results of the analysis as phrase data. Phrases are identified that predict the presence of other phrases in documents. Documents are the indexed according to their included phrases. Related phrases and phrase extensions are also identified. Changes to existing phrase data about a document collection submitted by a user is captured and analyzed, and the existing phrase data is updated to reflect the additional knowledge gained through the analysis.
Owner:GOOGLE LLC

System and method for defining characteristic data of a scanned document

A system and a method for providing characteristic data associated with a scanned document is provided. The characteristic data of the document may include a title, a creation date, a scan date, an author, a subject matter, a total page count, a starting page number, an ending page number, a color type, a document type, a language, and / or a document direction. The method includes analyzing a bitmapped image file of a document, determining at least one characteristic data of the document based on the analysis of the bitmapped image file, and linking the characteristic data to the bitmapped image file, wherein the characteristic data is useable by a document management system to identify the document in a search. Analyzing the bitmapped image of the document may include a natural language analysis technique, an optical character recognition analysis technique, an image layout analysis technique, and / or a color analysis technique.
Owner:KK TOSHIBA +1

Method and system for automatically detecting a background type of a scanned document utilizing a leadedge histogram thereof

A method and system automatically determines proper background values for a document. A histogram of the document is generated and it is determined whether a background characteristic of the document is uniform. If the background is uniform, a white background value is set equal to a first calculated white peak value. If the background is non-uniform, the white background value is set equal to a second calculated white peak value. A foreground value of the document is analyzed for validity when it is determined that a background characteristic of the document is uniform. A black background value is set equal to a black peak value calculated from the histogram when it is determined that a foreground value of the document is valid.
Owner:XEROX CORP

Dynamic access control for documents in electronic communications within a cloud computing environment

The present invention provides a solution to manage and control document transmission and electronic communication. Specifically, the present invention solves the problem of having control over data (documents, image files, and attachments hereafter referenced as “documents”) that are associated with multiple types of data communication. Along these lines, the present invention provides a hub and spoke communication model in order to achieve multiple benefits in terms of effectiveness, efficiency, flexibility, and control. This type of granular control is critical for information sharing within a Cloud computing environment. This approach is also useful for collaboration tools and can be augmented by the creation and management of access control lists (ACL) for the hub-spoke system. To this extent, this present invention solves the problem of being able to automatically update ACL's as documents are being forwarded or otherwise communicated between multiple people. These ACL's are kept up to date through the analysis of to whom (and where) a document has been sent.
Owner:KYNDRYL INC

Document animation system

InactiveUS20060197764A1Expand their vocabularyEnhance vocabulary quicklyAnimationSpeech synthesisAnimationPaper document
An animating system converts a text-based document into a sequence of animating pictures for helping a user to understand better and faster. First, the system provides interfaces for a user to build various object models, specify default rules for these object models, and construct the references for meanings and actions. Second, the system will analyze the document, extract desired information, identify various objects, and organize information. Then the system will create objects from corresponding object models and provide interfaces to modify default values and default rules and define specific values and specific rules. Further, the system will identify the meanings of words and phrases. Furthermore, the system will identify, interpolate, synchronize, and dispatch events. Finally, the system provides interface for the user to track events and particular objects.
Owner:YANG GEORGE L

Method for content mining of semi-structured documents

Embodiments of the present invention are directed to a method for content mining of semi-structured documents. In one embodiment, a semi-structured document is first converted from a document-type specific format such as HTML or PDF, to a document-type independent format such as XML. The document formatting, which contains basic level information about the document's structure, is then analyzed by a series of modules to develop a higher level understanding of the document's structure. These modules append information to the document describing the features which collectively comprise the higher level document structure. The appended information facilitates finding specified information within the document when content mining is performed.
Owner:MICRO FOCUS LLC

Techniques for comparing and clustering documents

Certain example embodiments relate to techniques for analyzing documents. A plurality of documents / document portions are imported into a database, with at least some of the documents / document portions being structured and at least some being unstructured. The imported documents / document portions are organized into one or more collections. A selection of at least one of the one or more collections is made. An index of words and / or groups of words is built (and optionally refined in accordance with one or more predefined rules) based on each of the document or document portion in each selection. A document-word matrix is built (and optionally weighted using a semantic approach), with the matrix including a value indicative of a number of times each word and / or group of words in the index appears in each document / document portion. One or more clusters of documents are generated using the document-word matrix.
Owner:SOFTWARE AG

Malware detection using file names

Descriptions of files detected at endpoints are submitted to a security server. The descriptions describe the names of the files and unique identifiers of the files. The security server uses the unique identifiers to identify files having different names at different endpoints. For a given file having multiple names, the names are processed to account for name differences unlikely to have been caused by malware. The processed names for the file are analyzed to determine the amount of dissimilarity among the names. This analysis is used to generate a score indicating a confidence that the computer file contains malicious software, where a greater amount of dissimilarity among the names generally indicates a greater confidence that the computer file contains malicious software. The score is weighted based on file name frequency, the age of the file, and the prevalence of the file. The weighted score is used to determine whether the computer file contains malicious software.
Owner:CA TECH INC

Methods and apparatus for automatic translation of a computer program language code

Embodiments of the methods and apparatus for automatic cross language program code translation are provided. One or more characters of a source programming language code are tokenized to generate a list of tokens. Thereafter, the list of tokens is parsed to generate a grammatical data structure comprising one or more data nodes. The grammatical data structure may be an abstract syntax tree. The one or more data nodes of the grammatical data structure are processed to generate a document object model comprising one or more portable data nodes. Subsequently, the one or more portable data nodes in the document object model are analyzed to generate one or more characters of a target programming language code.
Owner:XENOGENIC DEV LLC

Method and a device for ranking linked documents

A method of determining a ranking for a number of linked documents. The method comprises the following steps: a) analyzing the documents for documenting links to and from each of the documents, b) virtually adding a link to each of the documents from a virtual document, c) virtually adding a link to the virtual document from each of the plurality of documents, and d) assigning rankings to each of the plurality of documents based on the links and the virtual links.
Owner:APMATH

File-system-independent malicious content detection

The present invention enables a large number of files to be processed for evidence of malicious content, independently of the file system that maintains the files. The processed files can be obtained from live data or a point-in-time copy (e.g., a snapshot) of the data, based on mapping information that maps the files to the physical storage device. In one embodiment, a method involves accessing mapping information corresponding to a set of data. The mapping information maps at least a portion of a file to a physical storage location. The portion of the file can be read from the physical storage location using the mapping information, without accessing a file system. The portion of the file can then be analyzed for evidence of malicious content.
Owner:CA TECH INC

Method and System for Suggesting Revisions to an Electronic Document

Disclosed is a method for suggesting revisions to a document-under-analysis (“DUA”) from a seed database, the seed database including a plurality of original texts each respectively associated with one of a plurality of final texts. The method includes tokenizing the DUA into a plurality of statements-under-analysis (“SUAs”), selecting a first SUA of the plurality of SUAs, generating a first similarity score for each of the plurality of the original texts, the similarity score representing a degree of similarity between the first SUA and each of the original texts, selecting a first candidate original text of the plurality of the original texts, and creating an edited SUA (“ESUA”) by modifying a copy of the first SUA consistent with a first candidate final text associated with the first candidate original text.
Owner:BLACKBOILER INC

Analyzing externally generated documents in document management system

A computer implemented method for analyzing an externally generated document for use in a document management system having a Native Template database including a list of templates for one or more types of documents having common characteristics and a Conversion Database including a list of one or more data points associated with each listed document type, one or more descriptive text entries associated with each listed data point, and proximity range information relating to the location of the data point within the descriptive text. The externally generated document is introduced into the system. The locations of words, sentences, paragraphs, and sections within the document are recorded. A document type is selected from the Native Template database that has characteristics in common with the externally generated document. A data point is selected from the template. The introduced document is searched for Possible Data Points based on the Data Type of the selected data point in the Conversion Database. Proximity range information is obtained from the Conversion Database for the Descriptive Text entries associated with the selected data point. A determination is made as to whether Possible Data Point values for the selected data point are located within the Proximity range for each Descriptive Text entry. A cumulative Evaluation Score is calculated for each Possible Data Point value based on its proximity to each Descriptive Text entry. The Possible Data Point with the highest score that has been accepted by the user is recorded. Upon user acceptance of a Possible Data Point, additional Descriptive Text entries are stored to apply to other externally generated documents. These steps are repeated until each data point has been selected. The user reviews the recorded data which is approved, modified or rejected.
Owner:DATIX USA INC

Method and apparatus for autonomic discovery of sensitive content

A data loss prevention (DLP) system provides a policy-based mechanism for managing how data is discovered and classified on an endpoint workstation, file server or other device within an enterprise. The technique described herein works in an automated manner by analyzing file system activity as one or more endpoint applications interact with a file system to build a statistical model of which areas of the file system are (or will be deemed to be) active or highly active. Using this information, scanning to those areas by the DLP software is then prioritized appropriately to focus compute resources on scanning and classifying preferably only those files and folders that are necessary to be scanned, i.e., the file system portions in which the user is applying the majority of his or her activity. As a result, the technique limits scanning to only those areas that have meaningful activity (thereby conserving compute resources with respect to files or folders that have not changed), improving scanning efficiency.
Owner:SAILPOINT TECH HLDG INC

Method, Program, and Device for Analyzing Document Structure

A device, a control method, and a program to increase the accuracy of voice read-out and text mining by automatically structuring a presentation file. The arrangement and practice of the invention involves an overlap grouping part for extracting overlap information between objects in a presentation file and grouping the objects as a parent-child relationship; a graph dividing grouping part for grouping the objects as a sibling relationship by representing the objects as nodes of a graph and by recursively dividing the graph so that a predefined cost between the nodes is minimized; a distance information grouping part for further grouping the objects as a sibling relationship if distance information between the objects is below a threshold determined by a predefined computation from a distribution histogram of the distance information; and a link information extraction part for extracting arrow graphics that represents a link relationship and generating link information including the link relationship and a link label. The resulting structured data is output as meta-information.
Owner:TWITTER INC

Method and system for utilizing profiles

A method and system for utilizing profiles. Browsing data received from a site may be parsed using a behavior file. The behavior file may define how the browsing data is parsed for the site to identify at least one heading and / or topic within the browsing data. The parsed browsing data may be analyzed with an analysis file to identify one or more system generated user interests. The analysis file may define how the parsed browsing data is to be analyzed to generate the system generated user interests. A user profile may be updated with the system generated user interests. The browsing data may be stored as one or more identification data structures. Each identification data structure may include an identified terms field to retain one or more identified terms that are identified when parsing the browsing data and historical data regarding identification of the identified terms within the browsing data.
Owner:EBAY INC

Information component based data storage and management

Provided are methods, apparatus and computer programs for improved data storage and management. The invention can be implemented in a replacement or add-on to existing operating system file systems. Files in a file system are separated into a set of information components and then all information components of the file system are analyzed to identify duplication of information content. When information components with duplicate content are identified, duplicates are deleted from physical storage and indexes are generated to reflect inclusion of the retained copy of an information component in a plurality of different files. Improvements to content searching is enabled, since relevant components can be identified without retrieving whole files and since search results will include fewer duplicate results.
Owner:IBM CORP

Document publishingmodel

Some embodiments described herein provide a content publishing tool for publishing documents to a content distribution system. The content publishing tool in some embodiments guides the application's user through different operations in preparing a document for publication. In some embodiments, these operations include one or more of the following: (1) analyzing the document for errors, (2) specifying a version number for the document, (3) creating a sample document, and (4) exporting the document for publication.
Owner:APPLE INC

Similarity analysis method, device and system

The embodiment of the invention provides a similarity analysis method, device and system. The method comprises the following steps: getting file fingerprint information of a file to be analyzed; sending an analysis request carrying the file fingerprint information to at least two MDS (metadata servers) so as to enable the at least two MDS to respectively inquire all local file fingerprint information sets according to the file fingerprint information; selecting at least one sub-group according to analysis results returned by all the MDS, wherein each analysis result comprises the group number and the similarity of the at least one sub-group, which is inquired by the MDS and has the highest similarity with the file fingerprint information; and sending block fingerprint information in all data blocks in the file to be analyzed to the MDS to which the selected sub-group belongs so as to enable the MDS to perform repeated block inquiry in the selected local sub-group. Each MDS only needs to inquire the file fingerprint information sets in the sub-group for which the MDS is responsible, the data retrieval quantity is reduced and the waiting time for reading, writing and locking the file in a database can be reduced.
Owner:HUAWEI TECH CO LTD

Systems and methods for cloud data loss prevention integration

A system, method, and computer readable medium is provided to provide an integrated storage system. For example, an embodiment may detect, by an enterprise computer system, an activity notification from a cloud service that stores data on behalf of an enterprise. The activity notification may specify a file name involved in an activity performed by the cloud service (e.g., creating or modifying a file). The enterprise computer system may then download a file (or contents thereof) from the cloud service using the file name specified by the activity notification. After downloading the file, the enterprise computer system may analyze the file against a data loss prevention rule. Based on an outcome from the data loss prevention rule, the enterprise computer system may communicate an action response to the cloud service. The action response may direct the cloud service to perform an action on the file stored by the cloud service.
Owner:PAYPAL INC

Apparatus, system, and method for analyzing a file system

An apparatus, system, and method are disclosed for analyzing a file system. A record module records file parameters comprising a file size, a file age, a time of last access, a file type, a recovery time objective, and an initial access time service level objective for each file in the file system. A file score module calculates a file score for each file using the file parameters. A system score module calculates the file system score as the sum of the normalized file scores. A process module processes the file system if the file system score exceeds a specified threshold.
Owner:IBM CORP

Methods and apparatus for document management

One embodiment of the invention is directed to the analysis of a document. The document may be retrieved and automatically analyzed to measure quality metrics defined for the document. A quality metric is any attribute of the document and may be, for example, a word count, a sentence count, a paragraph count, or any other suitable attribute. A set of results based on the act of analyzing the document may be generated and stored and a report based, at least in part, on the set of results that indicates measurements of the quality metrics over a period of time.
Owner:MICROSOFT TECH LICENSING LLC

Method and System for Accurately Detecting, Extracting and Representing Redacted Text Blocks in a Document

A computer-implemented method, system and a computer program product are provided for automatically detecting redaction blocks in an image file document by analyzing the document to identify any redaction block areas and then detecting location information for each redaction block area identified in the document which may be mapped to any associated text fragments in the document based on the location information for each redaction block area and text fragment in the document.
Owner:IBM CORP

Computing device and file searching method using the computing device

In a file searching method using a computing device, the computing device connects to one or more terminal devices. An electronic file is obtained from a database when a file name is inputted from one of the terminal devices, and the file is analyzed to obtain a title and text content of the file. One or more keywords are extracted from each of the text content of the file using a term frequency-inverse document frequency (TF-IDF) rule. One or more interested terms are obtained from the keywords according to an importance factor of each of the keywords. The method obtains search results from the database according to the interested terms, and ranks the files according to a relevance degree between each file in the search results and the interested terms. The computing device sends the files with a ranking order to the terminal device.
Owner:GDS SOFTWARE SHENZHEN +1
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products