A Topic Visualization Method for Chinese Document Collection
A document collection and topic technology, applied in the field of text visualization and topic analysis, can solve problems such as inapplicability to Chinese documents, lack of topic visualization technology for Chinese documents, user misunderstandings, etc.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0072] Embodiment 1: Take the journal data of "Journal of Software" as an example below, combined with figure 1 , showing a method for visualizing Chinese document topics.
[0073] Step 1, classify the document set according to the theme: suppose the document set has n subjects l j , j=0, 1, 2, ..., n-1, classify all the documents in the document set according to the topic, and get n document subsets D j , j=0, 1, 2, ..., n-1; among them, topic l j The corresponding document subset is D j . Specifically: input the data of the papers from the first to the ninth period of the journal "Journal of Software". The document set is classified according to five themes of system software and software engineering, database technology, computer network and information security, pattern recognition and artificial intelligence, and operating system, and five document subsets are obtained.
[0074] Step 2, divide the time period of the document set: set the start time of the document se...
Embodiment 2
[0082] Embodiment 2: In the above-mentioned method for visualizing topics of Chinese document collections, the order of each topic is randomly arranged. When generating topic streams, if the intensity of a topic varies too much, the shapes of adjacent topics will be distorted, making the result unsightly, and the relative intensities between topics are difficult to discern. Additionally, distorted themes also affect word cloud placement. At the same time, for all topics in a document set, users tend to care more about the specific content of the topic with the strongest topic intensity. Therefore, the present invention further improves the step of sorting topics in Embodiment 1, and designs a sorting method based on topic frequency and geometric complementarity to sort topics. Combine below image 3 To elaborate on this sorting method:
[0083] Step 1, set the theme l j The start time is OT j ; when v j,0 When not equal to zero, take the start time t of the document set ...
Embodiment 3
[0106] Embodiment 3: Aiming at the problems of word cloud shape and unstable layout in TIARA technology, the present invention also improves word cloud, first theme is divided into several sub-regions, and then adopts scalable algorithm (quoted from "Tag Cloud++-Scalable Tag Clouds for Arbitrary Layouts" article) represents the area as a set of horizontal line segments, and then places keywords in sequence to generate a word cloud. The visual features are as follows: 1) The greater the weight of the keyword, the larger the font; 2) The closer the keyword with the greater weight is to the center of the area. Combine below Figure 5 , Figure 6 To elaborate:
[0107] Step 1: Select the topic l on the topic flow map j Corresponding area G j , whose start time and end time are respectively equal to the start time t of the document set start and end time t end , the region G j time period [t start ,t end ] are equally divided into m-1 segments, and the length of each time ...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com