Chapter content tiering method and device, and article content tiering method and device

A chapter and content technology, applied in special data processing applications, instruments, electronic digital data processing, etc., can solve problems such as no document structure, difficult document information, and no consideration of document structure characteristics, and achieve the effect of saving processing time.

Inactive Publication Date: 2013-04-10
HITACHI CHINA RES & DEV CORP
View PDF3 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Although the above method involves the method of knowledge classification, because the patent application only uses the similarity of vocabulary frequency distribution for layering, clustering and concept extraction, it does not consider the structural characteristics of the document itself. In addition, the above patent The application only considers the method of concept extraction through multiple documents, and does not base on the structure of the document itself, so it is difficult to effectively manage document information

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chapter content tiering method and device, and article content tiering method and device
  • Chapter content tiering method and device, and article content tiering method and device
  • Chapter content tiering method and device, and article content tiering method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0031] The analysis process of the present invention is illustrated below through a simple embodiment. For example, in a document there are the following chapters, which are paragraphs with headings.

[0032]

[0033] (1) The hypothesis space of the ID3 algorithm includes all decision trees, and the search space is a complete hypothesis space. Because every finite discrete-valued function can be represented as some decision tree, it avoids the risk that the hypothesis space might not contain the objective function.

[0034] (2) The ID3 algorithm uses all the current training samples at each step of the search, and decides how to simplify the current hypothesis based on the information gain criterion. An advantage of using the statistical property of information gain is that it greatly reduces the susceptibility to errors in individual training samples, so the algorithm can be easily extended to handle noisy training samples by modifying the algorithm.

[0035] (3) The ID3...

Embodiment 2

[0049] In Example 1, the hierarchical processing of chapter content is simply explained by giving an example. Using this method, different chapters in an article can be analyzed to obtain multiple subtree merge graphs, as shown in Figure 5A and 5B shown. For different subtree merge graphs, the relevance of vocabulary in the same level can be judged according to the associated vocabulary. If there is a correlation, the different subtree merge graphs are connected through their corresponding associated vocabulary to generate a higher-level tree merge graph (See Figure 5c). For example, it can be seen from the related word list that according to the core words "C4.5 algorithm" and "ID3 algorithm" are related to "decision tree", the core words "C4.5 algorithm" and "ID3 Algorithm" is listed in the hypernym related word "decision tree", forming a structural hierarchy diagram as shown in Figure 5c.

[0050] In addition, for different subtree merge graphs, Figure 6A The existing...

Embodiment 3

[0052] For chapters without titles, a tree merge graph is formed through the following implementation manners.

[0053] First of all, for each sentence of a chapter without a title input through the input unit, when it is judged that the chapter has no title, the input sentence is divided into words, and the frequency of occurrence of each word in the chapter is arranged according to the frequency of occurrence, and then according to The associative vocabulary list finds out the vocabulary most associated with the second level of the multiple subtree merge graph, and puts the sentence containing the found vocabulary most associated with the second level under the second level as the third level, forming Structural hierarchy diagram.

[0054] Similarly, the tree merge graphs of different chapters can also be merged to form an article information merge graph.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a chapter content tiering method, a chapter content tiering device, an article content tiering method and an article content tiering device. A tiered structure is formed on the basis of taking structural information (such as information in a title of each level in a document) of an article into a full account by utilizing the frequency of occurrence of a selected word in the article to tier the contents of the article. Therefore, the structural level relationship of the contents of the article can be effectively reflected. Cross-document contents can be effectively combined by performing structural processing on the contents of different chapters and different articles, so that cross-document information can be effectively managed, and a user can acquire required information favorably and quickly.

Description

technical field [0001] The present invention relates to a method and device for stratifying article content, in particular to a method and device for stratifying chapter and article content according to structural information of chapter and article content. Background technique [0002] In recent years, with the development of information technology, the ability to collect and store information has grown rapidly. The advancement of data management technology has promoted the informatization of business and government affairs, and produced a large amount of data information. Especially with the development of Internet technology, the information on the Internet has grown at an exponential rate, and most of the information is in the form of software files. To manage these data, large databases are being widely used in fields such as business and scientific engineering. [0003] However, although the progress of database technology has made the collection and storage of infor...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/27
Inventor 刘宏建周泉邓攀小林义行
Owner HITACHI CHINA RES & DEV CORP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products