Hierarchy clustering method of successive dichotomy for document in large scale
A hierarchical clustering and large-scale technology, applied in the field of text information, can solve problems such as slow speed, and achieve fast speed and good effect
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0015] The basic process is to express the text as a space vector, calculate the similarity between two texts, obtain a graph, and use the "sequential binary" hierarchical clustering algorithm for clustering.
[0016] 1. Vector space representation of text.
[0017] Assuming that there are n articles now, a total of m words appear. Then each article is represented by an m-dimensional vector, and n articles form an m×n matrix, denoted as M. m ij Indicates the tfidf value of the i-th word in the j-th article: M ij = tf ij × log n df i , where tf ij Indicates the frequency of the i-th word appearing in the j-th article, df i Indicates the number of articles containing the i-th word. In order to eliminate the difference in the length of the text, after the text is expressed as a vector, it is then normaliz...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com