Subject visualization method for Chinese document set
A document collection and theme technology, applied in the field of text visualization and theme analysis, can solve problems such as inapplicability of Chinese documents, lack of universality of visualization technology, lack of theme visualization technology of Chinese documents, etc.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0072] Example 1: Taking the journal data of "Journal of Software" as an example, combining figure 1 , To show the visualization method of Chinese document theme.
[0073] Step one, classify the document set by subject: suppose that the document set has n topics l j ,j=0,1,2,...,n-1, classify all documents in the document set according to the theme, and get n document subset D j ,J=0,1,2,...,n-1; among them, subject l j The corresponding document subset is D j . Specifically: Enter the paper data from the 1st to 9th issues of the Journal of Software. The document set is classified according to five themes of system software and software engineering, database technology, computer network and information security, pattern recognition and artificial intelligence, and operating system, and five document subsets are obtained.
[0074] Step 2: Divide the document set time period: set the start time of the document set to t start , The end time is t end , For the document set time perio...
Embodiment 2
[0082] Embodiment 2: In the above-mentioned method for visualizing topics in a Chinese document set, the topics are arranged in random order. When generating a topic stream, if the intensity of a topic changes too much, the shape of the adjacent topic will be distorted, making the result unsightly, and the relative strength between the topics is also difficult to identify. In addition, the distorted theme will also affect the placement of the word cloud. At the same time, for all topics in a document set, users tend to be more concerned about the specific content of the topic with the strongest topic strength. Therefore, the present invention further improves the step of sorting topics in Embodiment 1, and designs a sorting method based on topic frequency and geometric complementarity to sort topics. Combine below image 3 Explain the sorting method in detail:
[0083] Step 1, set theme l j The start time is OT j ; When v j,0 When not equal to zero, take the start time t of the...
Embodiment 3
[0106] Example 3: In view of the unstable shape and layout of the word cloud in TIARA technology, the present invention also improves the word cloud. First, the topic is divided into several sub-areas, and then a scalable algorithm is used (quoted from "Tag Cloud++-Scalable "Tag Clouds for Arbitrary Layouts" article) express the area as a set of horizontal line segments, and then place keywords in sequence to generate a word cloud. The visual characteristics are as follows: 1) the greater the weight of the keyword, the larger the font; 2) the greater the weight of the keyword, the closer to the center of the area. Combine below Figure 5 , Image 6 Detailed description:
[0107] Step 1: Select topic l on the topic flow chart j Corresponding area G j , Its start time and end time are respectively equal to the start time t of the document set start And end time t end , The area G j Time period [t start ,t end ] Equally divided into m-1 segments, the length of each time segment is...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com