Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Apparatus, method and program for document classification

A document classification and document technology, applied in text database clustering/classification, unstructured text data retrieval, data processing input/output process, etc., can solve problems such as difficult classification result prompts

Active Publication Date: 2013-05-22
KK TOSHIBA +1
View PDF7 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Therefore, it is difficult to clearly display the classification results of files

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Apparatus, method and program for document classification
  • Apparatus, method and program for document classification
  • Apparatus, method and program for document classification

Examples

Experimental program
Comparison scheme
Effect test

no. 1 Embodiment approach >

[0051] figure 1 It is a block diagram showing the document classification device of the first embodiment. The document classification device of the first embodiment is as follows: figure 1 As shown, a storage device 1 , a data processing device 2 and an input / output device 3 are provided. The storage device 1, the data processing device 2, and the input / output device 3 are connected by wire or wirelessly so that information can be exchanged with each other. In addition, the storage device 1, the data processing device 2, and the input / output device 3 may also be realized by a single information processing device.

[0052] The storage device 1 includes a file storage unit 101 , an intention dictionary storage unit 102 , and a thesaurus storage unit 103 .

[0053] The file storage unit 101 stores a set of files to be classified.

[0054] figure 2 is a diagram showing an example of a file set stored in the file storage unit 101 . The documents included in the document set ...

no. 2 Embodiment approach >

[0113] Figure 14 It is a block diagram showing the document classification device of the second embodiment. In addition, the same code|symbol is attached|subjected to the structure common to 1st Embodiment. The document classification device of the second embodiment is as follows: Figure 14 As shown, it includes a storage device 1a, a data processing device 2a, and an input / output device 3a. The storage device 1a, the data processing device 2a, and the input / output device 3a are connected by wire or wirelessly so that information can be exchanged with each other. In addition, the storage device 1a, the data processing device 2a, and the input / output device 3a may also be realized by a single information processing device.

[0114] The storage device 1 a includes a designated document storage unit 104 in addition to the document storage unit 101 , the intention dictionary storage unit 102 , and the thesaurus dictionary storage unit 103 .

[0115] The specified document st...

no. 3 approach >

[0132] Figure 19 It is a block diagram showing the document classification device of the third embodiment. In addition, the same code|symbol is attached|subjected to the structure common to 1st Embodiment. The document classification device of the third embodiment is as follows: Figure 19 As shown, it includes a storage device 1b, a data processing device 2b, and an input / output device 3b. The storage device 1b, the data processing device 2b, and the input / output device 3b are connected by wire or wirelessly so that information can be exchanged with each other. In addition, the storage device 1b, the data processing device 2b, and the input / output device 3b may also be realized by a single information processing device.

[0133] The storage device 1 b includes a viewpoint dictionary storage unit 105 in addition to the document storage unit 101 , the intention dictionary storage unit 102 , and the thesaurus dictionary storage unit 103 .

[0134] The viewpoint dictionary s...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

A feature word extraction means (201) extracts feature words from a document included in a document set. A feature word clustering means (202) clusters the extracted feature words into a plurality of clusters that constitute subtrees of a thesaurus having a tree structure, such that the difference between the number of documents wherein feature words belonging to one cluster appear and the number of documents wherein feature words belonging to other clusters appear is not greater than a predetermined reference value. A document classification means (203) classifies the document included in the document set into clusters to which the feature words appearing in the document belong. A classification label granting means (204) grants, to each of the plurality of clusters, a classification label that is a word representing the feature words belonging to the cluster. A presentation means (302) presents the result of the classification of the documents, in association with the classification labels granted to the classified clusters.

Description

technical field [0001] The present invention relates to a file classification device, method and program. Background technique [0002] As one of techniques for analyzing documents, there is known a judgment analysis that analyzes judgments of things based on intention expressions in a document. Critical analysis is not just judging the good or bad of simple things, but judging the good or bad according to each point of view of evaluating things. Therefore, in the conventional critical analysis, in addition to the dictionary of intention expression, a dictionary of viewpoints to be the object of intention expression is necessary. The former, that is, a dictionary intended to be expressed does not depend on a specific field, so it is versatile and can be used in various fields. On the other hand, the latter, that is, a dictionary of viewpoints is strongly dependent on a specific field and thus lacks versatility, and must be created for each field. [0003] On the other han...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30G06F17/27
CPCG06F17/30G06F3/048G06F17/30705G06F16/353G06F16/35
Inventor 稻叶真纯真锅俊彦国分智晴仲野亘
Owner KK TOSHIBA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products