Patents

Literature

Patsnap Eureka AI that helps you search prior art, draft patents, and assess FTO risks, powered by patent and scientific literature data.

498 results about "Document classification" patented technology

Filter

Efficacy Topic

Property

Owner

Technical Advancement

Application Domain

Technology Topic

Technology Field Word

Patent Country/Region

Patent Type

Patent Status

Application Year

Inventor

Document classification or document categorization is a problem in library science, information science and computer science. The task is to assign a document to one or more classes or categories. This may be done "manually" (or "intellectually") or algorithmically. The intellectual classification of documents has mostly been the province of library science, while the algorithmic classification of documents is mainly in information science and computer science. The problems are overlapping, however, and there is therefore interdisciplinary research on document classification.

Document similarity detection and classification system

InactiveUS20050060643A1Natural language data processingData switching networksDocument similarityDocument preparation

A document similarity detection and classification system is presented. The system employs a case-based method of classifying electronically distributed documents in which content chunks of an unclassified document are compared to the sets of content chunks comprising each of a set of previously classified sample documents in order to determine a highest level of resemblance between an unclassified document and any of a set of previously classified documents. The sample documents have been manually reviewed and annotated to distinguish document classifications and to distinguish significant content chunks from insignificant content chunks. These annotations are used in the similarity comparison process. If a significant resemblance level exceeding a predetermined threshold is detected, the classification of the most significantly resembling sample document is assigned to the unclassified document. Sample documents may be acquired to build and maintain a repository of sample documents by detecting unclassified documents that are similar to other unclassified documents and subjecting at least some similar documents to a manual review and classification process. In a preferred embodiment the invention may be used to classify email messages in support of a message filtering or classification objective.

Document similarity detection and classification system

Document similarity detection and classification system

Document similarity detection and classification system

Owner:GLASS JEFFREY B MR

Document Searching Tool and Method

ActiveUS20080140657A1Digital data information retrievalDigital data processing detailsElectronic documentUser input

A method of automatically searching through a store of electronic documents comprises controlling a user interface to permit (410) a user to enter a search term, carrying out a search using the search term, categorising the documents returned by the search into a plurality of distinct categories, and controlling the user interface to present in a left-hand panel (512) the plurality of distinct categories and in a right-hand panel (514) the documents returned by the search, or references thereto, in a grouped manner such that documents, or references thereto, of a particular category are grouped together, wherein the categories are selected in dependence upon the search term.

Document Searching Tool and Method

Document Searching Tool and Method

Document Searching Tool and Method

Owner:BRITISH TELECOMM PLC

Method and system for document classification or search using discrete words

ActiveUS20120117082A1Quick matchImprove versatilityDigital data information retrievalDigital data processing detailsDocument preparationDocumentation

A method of operating a computerized document search system where information is matched against a database containing documents in response to user queries includes receiving a query identifying a source document that has information content related to the documents within the database. Important words within the source document are detected automatically, where at least one of the important words has been processed using at least two dictionary functions consisting of Derived Words, Acronym, Word Capitalization, and Hyphenation. An importance value is generated for important words in a processed document using a WordRatio and at least one of a selected set of values. A score is generated for a processed document based partly on the importance value of at least one important word in that document. A document list is created for identifying documents that are related to a source document.

Method and system for document classification or search using discrete words

Method and system for document classification or search using discrete words

Method and system for document classification or search using discrete words

Owner:NEXTGEN DATACOM

Information classification paradigm

InactiveUS7529748B2Complex analysisData processing applicationsDigital data information retrievalDocument preparationLibrary science

A mechanism to classify source documents into one of two categories, either likely to contain desired information or unlikely to contain desired information. Generally some form of rules based classification in conjunction with deeper analysis using advanced techniques on difficult cases is utilized. The rules based classification is generally good for eliminating cases from further consideration and for identifying documents of interest based on generally discernable relationships between data or based on the presence or absence of data. The deeper analysis is used to uncover more complex relationships between data that may identify documents of interest. Portions of the process may use the entire document while other portions of the process may use only a portion of the document.

Information classification paradigm

Information classification paradigm

Information classification paradigm

Owner:MICROSOFT TECH LICENSING LLC

Document classification system and method for classifying a document according to contents of the document

InactiveUS7194471B1Eliminate the problemThe result is accurateData processing applicationsDigital data information retrievalDocument preparationDocumentation

A document classification system and method reflects operator's intention in a result of classification of document so that an accurate result of classification can be achieved. The document to be classifies has contents contains a plurality of items. At least one of the items contained in the document is designated. The document data is converted into converted data so that the converted data contains only data corresponding to the designated item. Classification of the document is done by using the converted data.

Document classification system and method for classifying a document according to contents of the document

Document classification system and method for classifying a document according to contents of the document

Document classification system and method for classifying a document according to contents of the document

Owner:RICOH KK

Document categorisation system

InactiveUS7971150B2Digital data information retrievalIndoor gamesElectronic documentDocumentation procedure

A document categorization system, including a clusterer for generating clusters of related electronic documents based on features extracted from the documents, and a filter module for generating a filter on the basis of the clusters to categorize further documents received by the system. The system may include an editor for manually browsing and modifying the clusters. The categorization of the documents is based on n-grams, which are used to determine significant features of the documents. The system includes a trend analyzer for determining trends of changing document categories over time, and for identifying novel clusters. The system may be implemented as a plug-in module for a spreadsheet application for permitting one-off or ongoing analysis of text entries in a worksheet.

Document categorisation system

Document categorisation system

Document categorisation system

Owner:TELSTRA CORPORATION LIMITD

Template identification with differential caching

ActiveUS7092997B1Save bandwidthSave resourcesMultiple digital computer combinationsTransmissionComputer resourcesDocumentation procedure

It is desirable to send documents to a user in such a way as to minimize the bandwidth and other computer resources required. To this end, a document may be categorized as (1) delta information (information that changes rapidly), (2) sub-template information (information that changes less frequently) and (3) template information, which changes very seldom. The template information and sub-template information are compressed and cached at a site remote from the requesting party. Compressing and caching both sub-template information and template information results in a significant savings of bandwidth and computing resources, such as would be required if the sub-template information were treated as delta information and were not stored in a cache as is the case in the prior art. This savings is enhanced when the compressed template and sub-template information are sent to a large number of users.

Template identification with differential caching

Template identification with differential caching

Template identification with differential caching

Owner:DIGITAL RIVER INC

Method and apparatus for document clustering and document sketching

ActiveUS7433869B2Data processing applicationsDigital data information retrievalDocument preparationDocumentation

A first embodiment of the invention provides a system that automatically classifies documents in a collection into clusters based on the similarities between documents, that automatically classifies new documents into the right clusters, and that may change the number or parameters of clusters under various circumstances. A second embodiment of the invention provides a technique for comparing two documents, in which a fingerprint or sketch of each document is computed. In particular, this embodiment of the invention uses a specific algorithm to compute the document's fingerprint, One embodiment uses a sentence in the document as a logical delimiter or window from which significant words are extracted and, thereafter, a hash is computed of all pair-wise permutations. Words are extracted based on their weight in the document, which can be computed using measures such as term frequency and the inverse document frequency.

Method and apparatus for document clustering and document sketching

Method and apparatus for document clustering and document sketching

Method and apparatus for document clustering and document sketching

Owner:EBRARY

System and method for adaptive text recommendation

InactiveUS6845374B1Easy to measureData processing applicationsRelational databasesNetworked systemDocument preparation

Network system provides a real-time adaptive recommendation set of documents with a high statistical measure of relevancy to the requestor device. The recommendation set is optimized based on analyzing text of documents of the interest set, categorizing these documents into clusters, extracting keywords representing the themes or concepts of documents in the clusters, and filtering a population of eligible documents accessible to the system utilizing site and or Internet-wide search engines. The system is either automatically or manually invoked and it develops and presents the recommendation set in real-time. The recommendation set may be presented as a greeting, notification, alert, HTML fragment, fax, voicemail, or automatic classification or routing of customer e-mail, personal e-mail, job postings, and offers for sale or exchange.

System and method for adaptive text recommendation

System and method for adaptive text recommendation

System and method for adaptive text recommendation

Owner:NEVMANN MILIESSA +1

Personalized navigation trees

InactiveUS6393427B1Minimize the numberData processing applicationsDigital data processing detailsPersonalizationUsability

A method for constructing and maintaining a navigation tree based on external document classifiers is provided. In one embodiment, based on the returned category labels from the classifiers, a navigation tree is constructed by taking usability and user preferences into consideration. Control parameters and algorithms are provided for inserting into and deleting documents from the navigation tree, and for splitting and merging nodes of the navigation tree, are provided.

Personalized navigation trees

Personalized navigation trees

Personalized navigation trees

Owner:NEC CORP

Methods for display, notification, and interaction with prioritized messages

InactiveUS7120865B1Digital computer detailsTransmissionLevel of detailSupport vector machine classifier

Prioritization of document, such as email messages, is disclosed. In one embodiment, a computer-implemented method first receives a document. The method generates a priority of the document, based on a document classifier such as a Bayesian classifier or a support-vector machine classifier. The method then outputs the priority. In one embodiment, the method includes alerting the user based on an expected loss of now-review of the document as compared to an expected cost of alerting the user of the document, at a current time. Several methods are reviewed for display and interaction that leverage the assignment of priorities to documents, including a means for guiding visual and auditory actions by priority of incoming messages. Other aspects of the machinery include a special viewer that allows users to scope a list of email sorted by priority so that it can include varying histories of time, to annotate a list of messages with color or icons based on the automatically assigned priority, to harness the priority to control the level of detail provided in a summarization of a document, and to use a priority threshold to invoke an interaction context that lasts for some period of time that can be dictated by the priority of the incoming message.

Methods for display, notification, and interaction with prioritized messages

Methods for display, notification, and interaction with prioritized messages

Methods for display, notification, and interaction with prioritized messages

Owner:MICROSOFT TECH LICENSING LLC

Document retrieval system with access control

InactiveUS7031954B1Data processing applicationsDigital data information retrievalElectronic documentUniform resource locator

An electonic document retrieval system and method for a collection of information distributed over a network having documents stored in web or document servers in which an access control list relates user identification to documents to which a user has access. No access control lists are contained in the documents themselves nor are comparisons made between lists of users, with their access levels, and the classifications of documents. Rather, by the use of URLs or pointers, it is possible to associate every document to which a user has access with the user identification number or code. URLs have a hierchical format which allows partial URLs to indicate levels of access. HTTP protocol, FTP and CGI protocol employ URL calls for documents and can use the access control method and system of the present invention. When a search query is applied to a query server, a list of hits is returned, together with pertinent URLs. The query server consults each access control list associated with each document server, to present to the user only those URLs for which he has a proper access level. Other URLs for which the user does not have proper access are kept hidden from the user.

Document retrieval system with access control

Document retrieval system with access control

Owner:GOOGLE LLC

Method of learning associations between documents and data sets

InactiveUS20060282442A1More disadvantageDigital data processing detailsCharacter and pattern recognitionData setPaper document

A method of learning associations between classes of documents and one or more structured data sets comprises a step of classifying an input document into a class selected from a predefined set of classes (step 115). One or more structured data sets are displayed (step 130), wherein the displayed structured data sets are dependent on association information for the class. One or more indications of changes to the displayed structured data sets are received (steps 815, 830, 845) and the association information for the class is amended (step 850) based on the received indications.

Method of learning associations between documents and data sets

Method of learning associations between documents and data sets

Method of learning associations between documents and data sets

Owner:CANON KK

System and method for automatically classifying text

InactiveUS20060143175A1Digital data information retrievalNatural language analysisDegree of associationPaper document

A method is provided for automatically classifying text into categories. In operation, a plurality of tokens or features are manually or automatically associated with each category. A weight is then coupled to each feature, wherein the weight indicates a degree of association between the feature and the category. Next, a document is parsed into a plurality of unique tokens with associated counts, wherein the counts are indicative of the number of times the feature appears in the document. A category score representative of a sum of products of each feature count in the document times the corresponding feature weight in the category for each document is then computed. Next, the category scores are sorted by perspective, and a document is classified into a particular category, provided the category score exceeds a predetermined threshold.

System and method for automatically classifying text

System and method for automatically classifying text

System and method for automatically classifying text

Owner:CONSONA CRM A WASHINGTON CORP

System and method for document categorization

InactiveUS7496567B1Data processing applicationsDigital data information retrievalText categorizationSubject matter

The present invention provides methods and systems for automatic categorization of documents. More specifically, the present invention provides for the automatic assignment of a set of pre-defined topics to a set of documents.

System and method for document categorization

System and method for document categorization

System and method for document categorization

Owner:STEICHEN TERRIL JOHN

Theme-based system and method for classifying documents

InactiveUS7376635B1Data processing applicationsDigital data information retrievalDocumentationDocument classification

A classification system (10) having a controller (12), a document storage memory (14), and a document input (16) is used to classify documents (20). The controller (12) is programmed to generate a theme score from a plurality of source documents in a plurality of predefined source documents. A theme score is also generated for the unclassified document. The unclassified document theme score and the theme scores for the various classes are compared and the unclassified document is classified into the classification having the nearest theme score.

Theme-based system and method for classifying documents

Theme-based system and method for classifying documents

Theme-based system and method for classifying documents

Owner:FORD GLOBAL TECH LLC

Methods and systems for automated semantic knowledge leveraging graph theoretic analysis and the inherent structure of communication

InactiveUS7571177B2Digital data information retrievalData processing applicationsGraphicsDocument analysis

A system that processes a collection of one or more documents and thereby constructs a knowledge base is described. The system leverages innovative graph theoretical analysis of documents leveraging the inherent structure in communication. Through the generation of the automated knowledge base the system is able to provides intra-document analysis such as variable summarization and indexing, document key concepts, better filtering and relevance matching on a semantic level for documents, context dependant directories, document categorization, better basis for natural language processing, new knowledge and information through the amalgamation of the data (collection intelligence).

Methods and systems for automated semantic knowledge leveraging graph theoretic analysis and the inherent structure of communication

Methods and systems for automated semantic knowledge leveraging graph theoretic analysis and the inherent structure of communication

Methods and systems for automated semantic knowledge leveraging graph theoretic analysis and the inherent structure of communication

Owner:2028 INC

Method and apparatus for document clustering and document sketching

ActiveUS20070005589A1Digital data information retrievalData processing applicationsDocument preparationDocumentation

A first embodiment of the invention provides a system that automatically classifies documents in a collection into clusters based on the similarities between documents, that automatically classifies new documents into the right clusters, and that may change the number or parameters of clusters under various circumstances. A second embodiment of the invention provides a technique for comparing two documents, in which a fingerprint or sketch of each document is computed. In particular, this embodiment of the invention uses a specific algorithm to compute the document's fingerprint, One embodiment uses a sentence in the document as a logical delimiter or window from which significant words are extracted and, thereafter, a hash is computed of all pair-wise permutations. Words are extracted based on their weight in the document, which can be computed using measures such as term frequency and the inverse document frequency.

Method and apparatus for document clustering and document sketching

Method and apparatus for document clustering and document sketching

Method and apparatus for document clustering and document sketching

Owner:EBRARY

System and method for automatically classifying text

InactiveUS7028250B2Digital data information retrievalNatural language analysisText categorizationDegree of association

A method is provided for automatically classifying text into categories. In operation, a plurality of tokens or features are manually or automatically associated with each category. A weight is then coupled to each feature, wherein the weight indicates a degree of association between the feature and the category. Next, a document is parsed into a plurality of unique tokens with associated counts, wherein the counts are indicative of the number of times the feature appears in the document. A category score representative of a sum of products of each feature count in the document times the corresponding feature weight in the category for each document is then computed. Next, the category scores are sorted by perspective, and a document is classified into a particular category, provided the category score exceeds a predetermined threshold.

System and method for automatically classifying text

System and method for automatically classifying text

System and method for automatically classifying text

Owner:AVOLIN LLC

Method and apparatus for explaining categorization decisions

ActiveUS7457808B2Character and pattern recognitionSpecial data processing applicationsAlgorithmModel parameters

Feature selection is used to determine feature influence for a given categorization decision to identify those features in a categorized document that were important in classifying the document into one or more classes. In one embodiment, model parameters of a categorization model are used to determine the features that contributed to the categorization decision of a document. In another embodiment, the model parameters of the categorization model and the features of the categorized document are used to determine the features that contributed to the categorization decision of a document.

Method and apparatus for explaining categorization decisions

Method and apparatus for explaining categorization decisions

Method and apparatus for explaining categorization decisions

Owner:XEROX CORP

Text sentiment classification method facing Chinese Web comments

InactiveCN103116637AEffective data miningEffective classificationSpecial data processing applicationsFeature vectorClassification methods

The invention belongs to the field of data processing technology and discloses a text sentiment classification method facing Chinese Web comments. The text sentiment classification method includes a training process and a classification process. The training process includes the steps of carrying out training text preprocessing, carrying out feature selecting, carrying out vectorization representation of a text and obtaining a training classifier. The classification process includes the steps of carrying out test text preprocessing, carrying out feature selecting, utilizing the classifier to classify and outputting a classification result. On the basis of an original document classification method, document frequency (DF) and information gain (IG) are used and a sentiment dictionary of negative words, degree adverbs and dynamic sentiment words are built to distinguish sentiment tendency of Chinese feature words, select feature words, calculate a feature weight value and build a feature vector. Moreover, a NaiveBayes classification algorithm is used for training to obtain the classifier, carrying out sentiment classification on the text, providing effective data mining for users and then carrying out analysis processing.

Text sentiment classification method facing Chinese Web comments

Text sentiment classification method facing Chinese Web comments

Text sentiment classification method facing Chinese Web comments

Owner:WUXI NANLIGONG TECH DEV +1

System And Method for Adaptive Text Recommendation

InactiveUS20080189253A1Data processing applicationsRelational databasesNetworked systemDocument preparation

Network system provides a real-time adaptive recommendation set of documents with a high statistical measure of relevancy to the requestor device. The recommendation set is optimized based on analyzing text of documents of the interest set, categorizing these documents into clusters, extracting keywords representing the themes or concepts of documents in the clusters, and filtering a population of eligible documents accessible to the system utilizing site and or Internet-wide search engines. The system is either automatically or manually invoked and it develops and presents the recommendation set in real-time. The recommendation set may be presented as a greeting, notification, alert, HTML fragment, fax, voicemail, or automatic classification or routing of customer e-mail, personal e-mail, job postings, and offers for sale or exchange.

System And Method for Adaptive Text Recommendation

System And Method for Adaptive Text Recommendation

System And Method for Adaptive Text Recommendation

Owner:SONICWALL US HLDG INC

Event Based Document Sorter and Method

InactiveUS20090048927A1Database queryingWeb data indexingElectronic systemsSubject matter

Events and / or topics are studied and classified according to their temporal qualities to determine their relative state. The content of a document relating to such event / topic is analyzed to identify temporal components. These components can be compared with corresponding counterparts in other documents to identify a relative temporal order. The invention can be used in environments such as automated news aggregators, search engines, and other electronic systems which compile information having temporal qualities.

Event Based Document Sorter and Method

Event Based Document Sorter and Method

Event Based Document Sorter and Method

Owner:JOHN NICHOLAS & KRISTIN GROSS

System and method for classification of documents

ActiveUS7188107B2Simple to defineEasy maintenanceData processing applicationsDigital data information retrievalClient-sideDocument preparation

The invention provides a classification engine for classifying documents that makes use of functions included in a similarity search engine. The classification engine executes a classify command from a client that makes use of similarity search results, and rules files, classes files, and a classification profile embedded in the classification command. When the classification receives a classify command from a client, it retrieves a classification profile and input documents to be classified, sends extracted values from the input documents based on anchor values to a XML transformation engine to obtain a search schema, requests a similarity search by a search manager to determine the similarity between input documents and anchor values, and classifies the input documents according to the rules files, classes files, and the classification profile. The client is then notified that the classify command has been completed and the classification results are stored in a database.

System and method for classification of documents

System and method for classification of documents

System and method for classification of documents

Owner:FAIR ISAAC & CO INC

Personalization engine for classifying unstructured documents

ActiveUS20090327243A1Optimize business processesIncrease contentDigital data processing detailsRelational databasesPersonalizationElectronic document

Unstructured electronic documents are classified for profiling and targeting users for additional relevant content. Behavioral data is gathered from user activity, and user documents and actions are categorized. Profile information is combined with collaborative and editorial data to provide users with credible information regarding products. Author-generated document classification information is analyzed and assigned a first taxonomic noun to characterize the document. User-generated tags characterizing a portion of the document are assigned a second taxonomic noun. Search terms that resulted in the user accessing the document are identified and assigned a third taxonomic noun. Attributes related to how the document was accessed are evaluated and assigned a fourth taxonomic noun. The document is processed using pattern rules to extract a fifth taxonomic noun. The taxonomic nouns are aggregated to determine term vectors representing the document, and the document is categorized using the term vectors, the taxonomic nouns, or the author-generated classification.

Personalization engine for classifying unstructured documents

Personalization engine for classifying unstructured documents

Personalization engine for classifying unstructured documents

Owner:CBS INTERACTIVE

Image-based indexing and classification in image databases

ActiveUS20070211964A1Reduce complexityEffective classificationCharacter and pattern recognitionData setDocument preparation

The invention enables the digital management of large scale image databases, to efficiently classify and index image data independent of language. Complex processing requirements are required only on reduced and operably small subsets of the entire collection, thereby effectively scaling large document collections. Embodiments of the present invention provide image-based classification and retrieval of documents based on image recognition, e.g., signatures, logos, stamps, or word-spotting; in documents within real time for large datasets such as in the millions of documents.

Image-based indexing and classification in image databases

Image-based indexing and classification in image databases

Image-based indexing and classification in image databases

Owner:ILLINOIS INSTITUTE OF TECHNOLOGY

Differential LSI space-based probabilistic document classifier

InactiveUS7024400B2Retention characteristicImprove adaptabilityDigital computer detailsChaos modelsSemanticsCombined use

A computerized method for automatic document classification based on a combined use of the projection and the distance of the differential document vectors to the differential latent semantics index (DLSI) spaces. The method includes the setting up of a DLSI space-based classifier to be stored in computer storage and the use of such classifier by a computer to evaluate the possibility of a document belonging to a given cluster using a posteriori probability function and to classify the document in the cluster. The classifier is effective in operating on very large numbers of documents such as with document retrieval systems over a distributed computer network.

Differential LSI space-based probabilistic document classifier

Differential LSI space-based probabilistic document classifier

Differential LSI space-based probabilistic document classifier

Owner:SUNFLARE CO LTD

Browser for managing documents

InactiveUS20080071822A1Data processing applicationsDigital data processing detailsNetwork addressingNetwork address

A method for classifying documents to a browser is disclosed, the method comprising establishing communication with a selected network address; displaying at least one document with the browser; moving the at least one document inside the browser; and generating a document vector grouping a plurality of documents.

Browser for managing documents

Browser for managing documents

Browser for managing documents

Owner:APPLE INC

Document classification method and apparatus

InactiveUS7185008B2Small valueLittle changeData processing applicationsDigital data information retrievalDocument preparationDocumentation

A document is classified into at least one document class by selecting terms for use in the classification from among terms that occur in the document. A similarity between the input document and each class is calculated using information saved for every document class. The calculated similarity to each class is corrected. The class to which the input document belongs is determined in accordance with the corrected similarity to each class.

Document classification method and apparatus

Document classification method and apparatus

Document classification method and apparatus

Owner:HEWLETT PACKARD DEV CO LP

Automatic document classification using lexical and physical features

ActiveUS8503797B2Digital data information retrievalVisual presentationText alignmentType frequency

An automatic document classification system is described that uses lexical and physical features to assign a class ciεC{c1, c2, . . . , ci} to a document d. The primary lexical features are the result of a feature selection method known as Orthogonal Centroid Feature Selection (OCFS). Additional information may be gathered on character type frequencies (digits, letters, and symbols) within d. Physical information is assembled through image analysis to yield physical attributes such as document dimensionality, text alignment, and color distribution. The resulting lexical and physical information is combined into an input vector X and is used to train a supervised neural network to perform the classification.

Automatic document classification using lexical and physical features

Automatic document classification using lexical and physical features

Automatic document classification using lexical and physical features

Owner:THE NEAT COMPANY INC DOING BUSINESS AS NEATRECEIPTS

Popular searches

Electronic mail Annotation Case base Degree of similarity Data mining Data science Search terms User interface Digital document Word generation