Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

498 results about "Document classification" patented technology

Document classification or document categorization is a problem in library science, information science and computer science. The task is to assign a document to one or more classes or categories. This may be done "manually" (or "intellectually") or algorithmically. The intellectual classification of documents has mostly been the province of library science, while the algorithmic classification of documents is mainly in information science and computer science. The problems are overlapping, however, and there is therefore interdisciplinary research on document classification.

Document similarity detection and classification system

A document similarity detection and classification system is presented. The system employs a case-based method of classifying electronically distributed documents in which content chunks of an unclassified document are compared to the sets of content chunks comprising each of a set of previously classified sample documents in order to determine a highest level of resemblance between an unclassified document and any of a set of previously classified documents. The sample documents have been manually reviewed and annotated to distinguish document classifications and to distinguish significant content chunks from insignificant content chunks. These annotations are used in the similarity comparison process. If a significant resemblance level exceeding a predetermined threshold is detected, the classification of the most significantly resembling sample document is assigned to the unclassified document. Sample documents may be acquired to build and maintain a repository of sample documents by detecting unclassified documents that are similar to other unclassified documents and subjecting at least some similar documents to a manual review and classification process. In a preferred embodiment the invention may be used to classify email messages in support of a message filtering or classification objective.
Owner:GLASS JEFFREY B MR

Document Searching Tool and Method

A method of automatically searching through a store of electronic documents comprises controlling a user interface to permit (410) a user to enter a search term, carrying out a search using the search term, categorising the documents returned by the search into a plurality of distinct categories, and controlling the user interface to present in a left-hand panel (512) the plurality of distinct categories and in a right-hand panel (514) the documents returned by the search, or references thereto, in a grouped manner such that documents, or references thereto, of a particular category are grouped together, wherein the categories are selected in dependence upon the search term.
Owner:BRITISH TELECOMM PLC

Method and system for document classification or search using discrete words

A method of operating a computerized document search system where information is matched against a database containing documents in response to user queries includes receiving a query identifying a source document that has information content related to the documents within the database. Important words within the source document are detected automatically, where at least one of the important words has been processed using at least two dictionary functions consisting of Derived Words, Acronym, Word Capitalization, and Hyphenation. An importance value is generated for important words in a processed document using a WordRatio and at least one of a selected set of values. A score is generated for a processed document based partly on the importance value of at least one important word in that document. A document list is created for identifying documents that are related to a source document.
Owner:NEXTGEN DATACOM

Document classification system and method for classifying a document according to contents of the document

A document classification system and method reflects operator's intention in a result of classification of document so that an accurate result of classification can be achieved. The document to be classifies has contents contains a plurality of items. At least one of the items contained in the document is designated. The document data is converted into converted data so that the converted data contains only data corresponding to the designated item. Classification of the document is done by using the converted data.
Owner:RICOH KK

Document categorisation system

A document categorization system, including a clusterer for generating clusters of related electronic documents based on features extracted from the documents, and a filter module for generating a filter on the basis of the clusters to categorize further documents received by the system. The system may include an editor for manually browsing and modifying the clusters. The categorization of the documents is based on n-grams, which are used to determine significant features of the documents. The system includes a trend analyzer for determining trends of changing document categories over time, and for identifying novel clusters. The system may be implemented as a plug-in module for a spreadsheet application for permitting one-off or ongoing analysis of text entries in a worksheet.
Owner:TELSTRA CORPORATION LIMITD

Template identification with differential caching

It is desirable to send documents to a user in such a way as to minimize the bandwidth and other computer resources required. To this end, a document may be categorized as (1) delta information (information that changes rapidly), (2) sub-template information (information that changes less frequently) and (3) template information, which changes very seldom. The template information and sub-template information are compressed and cached at a site remote from the requesting party. Compressing and caching both sub-template information and template information results in a significant savings of bandwidth and computing resources, such as would be required if the sub-template information were treated as delta information and were not stored in a cache as is the case in the prior art. This savings is enhanced when the compressed template and sub-template information are sent to a large number of users.
Owner:DIGITAL RIVER INC

Method and apparatus for document clustering and document sketching

A first embodiment of the invention provides a system that automatically classifies documents in a collection into clusters based on the similarities between documents, that automatically classifies new documents into the right clusters, and that may change the number or parameters of clusters under various circumstances. A second embodiment of the invention provides a technique for comparing two documents, in which a fingerprint or sketch of each document is computed. In particular, this embodiment of the invention uses a specific algorithm to compute the document's fingerprint, One embodiment uses a sentence in the document as a logical delimiter or window from which significant words are extracted and, thereafter, a hash is computed of all pair-wise permutations. Words are extracted based on their weight in the document, which can be computed using measures such as term frequency and the inverse document frequency.
Owner:EBRARY

System and method for adaptive text recommendation

Network system provides a real-time adaptive recommendation set of documents with a high statistical measure of relevancy to the requestor device. The recommendation set is optimized based on analyzing text of documents of the interest set, categorizing these documents into clusters, extracting keywords representing the themes or concepts of documents in the clusters, and filtering a population of eligible documents accessible to the system utilizing site and or Internet-wide search engines. The system is either automatically or manually invoked and it develops and presents the recommendation set in real-time. The recommendation set may be presented as a greeting, notification, alert, HTML fragment, fax, voicemail, or automatic classification or routing of customer e-mail, personal e-mail, job postings, and offers for sale or exchange.
Owner:NEVMANN MILIESSA +1

Personalized navigation trees

A method for constructing and maintaining a navigation tree based on external document classifiers is provided. In one embodiment, based on the returned category labels from the classifiers, a navigation tree is constructed by taking usability and user preferences into consideration. Control parameters and algorithms are provided for inserting into and deleting documents from the navigation tree, and for splitting and merging nodes of the navigation tree, are provided.
Owner:NEC CORP

Methods for display, notification, and interaction with prioritized messages

Prioritization of document, such as email messages, is disclosed. In one embodiment, a computer-implemented method first receives a document. The method generates a priority of the document, based on a document classifier such as a Bayesian classifier or a support-vector machine classifier. The method then outputs the priority. In one embodiment, the method includes alerting the user based on an expected loss of now-review of the document as compared to an expected cost of alerting the user of the document, at a current time. Several methods are reviewed for display and interaction that leverage the assignment of priorities to documents, including a means for guiding visual and auditory actions by priority of incoming messages. Other aspects of the machinery include a special viewer that allows users to scope a list of email sorted by priority so that it can include varying histories of time, to annotate a list of messages with color or icons based on the automatically assigned priority, to harness the priority to control the level of detail provided in a summarization of a document, and to use a priority threshold to invoke an interaction context that lasts for some period of time that can be dictated by the priority of the incoming message.
Owner:MICROSOFT TECH LICENSING LLC

Document retrieval system with access control

An electonic document retrieval system and method for a collection of information distributed over a network having documents stored in web or document servers in which an access control list relates user identification to documents to which a user has access. No access control lists are contained in the documents themselves nor are comparisons made between lists of users, with their access levels, and the classifications of documents. Rather, by the use of URLs or pointers, it is possible to associate every document to which a user has access with the user identification number or code. URLs have a hierchical format which allows partial URLs to indicate levels of access. HTTP protocol, FTP and CGI protocol employ URL calls for documents and can use the access control method and system of the present invention. When a search query is applied to a query server, a list of hits is returned, together with pertinent URLs. The query server consults each access control list associated with each document server, to present to the user only those URLs for which he has a proper access level. Other URLs for which the user does not have proper access are kept hidden from the user.
Owner:GOOGLE LLC

System and method for automatically classifying text

A method is provided for automatically classifying text into categories. In operation, a plurality of tokens or features are manually or automatically associated with each category. A weight is then coupled to each feature, wherein the weight indicates a degree of association between the feature and the category. Next, a document is parsed into a plurality of unique tokens with associated counts, wherein the counts are indicative of the number of times the feature appears in the document. A category score representative of a sum of products of each feature count in the document times the corresponding feature weight in the category for each document is then computed. Next, the category scores are sorted by perspective, and a document is classified into a particular category, provided the category score exceeds a predetermined threshold.
Owner:CONSONA CRM A WASHINGTON CORP

System and method for document categorization

The present invention provides methods and systems for automatic categorization of documents. More specifically, the present invention provides for the automatic assignment of a set of pre-defined topics to a set of documents.
Owner:STEICHEN TERRIL JOHN

Theme-based system and method for classifying documents

A classification system (10) having a controller (12), a document storage memory (14), and a document input (16) is used to classify documents (20). The controller (12) is programmed to generate a theme score from a plurality of source documents in a plurality of predefined source documents. A theme score is also generated for the unclassified document. The unclassified document theme score and the theme scores for the various classes are compared and the unclassified document is classified into the classification having the nearest theme score.
Owner:FORD GLOBAL TECH LLC

Method and apparatus for document clustering and document sketching

A first embodiment of the invention provides a system that automatically classifies documents in a collection into clusters based on the similarities between documents, that automatically classifies new documents into the right clusters, and that may change the number or parameters of clusters under various circumstances. A second embodiment of the invention provides a technique for comparing two documents, in which a fingerprint or sketch of each document is computed. In particular, this embodiment of the invention uses a specific algorithm to compute the document's fingerprint, One embodiment uses a sentence in the document as a logical delimiter or window from which significant words are extracted and, thereafter, a hash is computed of all pair-wise permutations. Words are extracted based on their weight in the document, which can be computed using measures such as term frequency and the inverse document frequency.
Owner:EBRARY

System and method for automatically classifying text

A method is provided for automatically classifying text into categories. In operation, a plurality of tokens or features are manually or automatically associated with each category. A weight is then coupled to each feature, wherein the weight indicates a degree of association between the feature and the category. Next, a document is parsed into a plurality of unique tokens with associated counts, wherein the counts are indicative of the number of times the feature appears in the document. A category score representative of a sum of products of each feature count in the document times the corresponding feature weight in the category for each document is then computed. Next, the category scores are sorted by perspective, and a document is classified into a particular category, provided the category score exceeds a predetermined threshold.
Owner:AVOLIN LLC

Method and apparatus for explaining categorization decisions

Feature selection is used to determine feature influence for a given categorization decision to identify those features in a categorized document that were important in classifying the document into one or more classes. In one embodiment, model parameters of a categorization model are used to determine the features that contributed to the categorization decision of a document. In another embodiment, the model parameters of the categorization model and the features of the categorized document are used to determine the features that contributed to the categorization decision of a document.
Owner:XEROX CORP

Text sentiment classification method facing Chinese Web comments

The invention belongs to the field of data processing technology and discloses a text sentiment classification method facing Chinese Web comments. The text sentiment classification method includes a training process and a classification process. The training process includes the steps of carrying out training text preprocessing, carrying out feature selecting, carrying out vectorization representation of a text and obtaining a training classifier. The classification process includes the steps of carrying out test text preprocessing, carrying out feature selecting, utilizing the classifier to classify and outputting a classification result. On the basis of an original document classification method, document frequency (DF) and information gain (IG) are used and a sentiment dictionary of negative words, degree adverbs and dynamic sentiment words are built to distinguish sentiment tendency of Chinese feature words, select feature words, calculate a feature weight value and build a feature vector. Moreover, a NaiveBayes classification algorithm is used for training to obtain the classifier, carrying out sentiment classification on the text, providing effective data mining for users and then carrying out analysis processing.
Owner:WUXI NANLIGONG TECH DEV +1

System And Method for Adaptive Text Recommendation

Network system provides a real-time adaptive recommendation set of documents with a high statistical measure of relevancy to the requestor device. The recommendation set is optimized based on analyzing text of documents of the interest set, categorizing these documents into clusters, extracting keywords representing the themes or concepts of documents in the clusters, and filtering a population of eligible documents accessible to the system utilizing site and or Internet-wide search engines. The system is either automatically or manually invoked and it develops and presents the recommendation set in real-time. The recommendation set may be presented as a greeting, notification, alert, HTML fragment, fax, voicemail, or automatic classification or routing of customer e-mail, personal e-mail, job postings, and offers for sale or exchange.
Owner:SONICWALL US HLDG INC

System and method for classification of documents

The invention provides a classification engine for classifying documents that makes use of functions included in a similarity search engine. The classification engine executes a classify command from a client that makes use of similarity search results, and rules files, classes files, and a classification profile embedded in the classification command. When the classification receives a classify command from a client, it retrieves a classification profile and input documents to be classified, sends extracted values from the input documents based on anchor values to a XML transformation engine to obtain a search schema, requests a similarity search by a search manager to determine the similarity between input documents and anchor values, and classifies the input documents according to the rules files, classes files, and the classification profile. The client is then notified that the classify command has been completed and the classification results are stored in a database.
Owner:FAIR ISAAC & CO INC

Personalization engine for classifying unstructured documents

Unstructured electronic documents are classified for profiling and targeting users for additional relevant content. Behavioral data is gathered from user activity, and user documents and actions are categorized. Profile information is combined with collaborative and editorial data to provide users with credible information regarding products. Author-generated document classification information is analyzed and assigned a first taxonomic noun to characterize the document. User-generated tags characterizing a portion of the document are assigned a second taxonomic noun. Search terms that resulted in the user accessing the document are identified and assigned a third taxonomic noun. Attributes related to how the document was accessed are evaluated and assigned a fourth taxonomic noun. The document is processed using pattern rules to extract a fifth taxonomic noun. The taxonomic nouns are aggregated to determine term vectors representing the document, and the document is categorized using the term vectors, the taxonomic nouns, or the author-generated classification.
Owner:CBS INTERACTIVE

Image-based indexing and classification in image databases

The invention enables the digital management of large scale image databases, to efficiently classify and index image data independent of language. Complex processing requirements are required only on reduced and operably small subsets of the entire collection, thereby effectively scaling large document collections. Embodiments of the present invention provide image-based classification and retrieval of documents based on image recognition, e.g., signatures, logos, stamps, or word-spotting; in documents within real time for large datasets such as in the millions of documents.
Owner:ILLINOIS INSTITUTE OF TECHNOLOGY

Differential LSI space-based probabilistic document classifier

A computerized method for automatic document classification based on a combined use of the projection and the distance of the differential document vectors to the differential latent semantics index (DLSI) spaces. The method includes the setting up of a DLSI space-based classifier to be stored in computer storage and the use of such classifier by a computer to evaluate the possibility of a document belonging to a given cluster using a posteriori probability function and to classify the document in the cluster. The classifier is effective in operating on very large numbers of documents such as with document retrieval systems over a distributed computer network.
Owner:SUNFLARE CO LTD

Browser for managing documents

A method for classifying documents to a browser is disclosed, the method comprising establishing communication with a selected network address; displaying at least one document with the browser; moving the at least one document inside the browser; and generating a document vector grouping a plurality of documents.
Owner:APPLE INC
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products