Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

104 results about "Document clustering" patented technology

Document clustering (or text clustering) is the application of cluster analysis to textual documents. It has applications in automatic document organization, topic extraction and fast information retrieval or filtering.

Dynamic information extraction with self-organizing evidence construction

A data analysis system with dynamic information extraction and self-organizing evidence construction finds numerous applications in information gathering and analysis, including the extraction of targeted information from voluminous textual resources. One disclosed method involves matching text with a concept map to identify evidence relations, and organizing the evidence relations into one or more evidence structures that represent the ways in which the concept map is instantiated in the evidence relations. The text may be contained in one or more documents in electronic form, and the documents may be indexed on a paragraph level of granularity. The evidence relations may self-organize into the evidence structures, with feedback provided to the user to guide the identification of evidence relations and their self-organization into evidence structures. A method of extracting information from one or more documents in electronic form includes the steps of clustering the document into clustered text; identifying patterns in the clustered text; and matching the patterns with the concept map to identify evidence relations such that the evidence relations self-organize into evidence structures that represent the ways in which the concept map is instantiated in the evidence relations.
Owner:TECHTEAM GOVERNMENT SOLUTIONS

Chinese Web document online clustering method based on common substrings

The invention discloses a Chinese Web document online clustering method based on common substrings. As known to all, search engines are important in application of information searching and positioning with sharp increase of information on the internet. Web document clustering can automatically classify return results of the search engines according to different themes so as to assist users to reduce query range and fast position needed information. The Web document online clustering is characterized in that non-numerical and non-structured characteristics of Web documents are required to be met on the one hand, and clustering time is required to meet online search requirements of users on the other hand. According to the two characteristics, the invention provides the Chinese Web document online clustering method based on common substrings, and the method comprises steps as follows: (1) firstly, preprocessing the first n query results returned by the search engines so as to realize deleting and replacing operation of non-Chinese characters in the return results of the search engines, (2) extracting common substrings in the Web documents by utilizing GSA, (3) presenting a weighting calculation formula referring to TF*IDF according to the common substrings which are extracted and then building a document characteristic vector model, (4) computing pairwise similarity of the Web documents on the basis of the model to acquire a similarity matrix, (5) adopting an improved hierarchical clustering algorithm to achieve clustering of the Web documents on the basis of the matrix, and (6) executing clustering description and label extraction. The Chinese Web document online clustering method based on common substrings has obvious advantages on performance, clustering label generation and clustering time effects.
Owner:BEIHANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products