Visual text information discovery method and device based on multilevel cooccurrence relationship word graph

A text information and discovery method technology, which is applied in the field of visual text information discovery methods and systems, can solve the problems of unchecked document filtering, inability to quickly locate target information, and time-consuming manual browsing of documents, so as to avoid repeated inspections and improve The efficiency of investigation and the effect of reducing the number of inspections

Inactive Publication Date: 2018-08-17
INST OF INFORMATION ENG CHINESE ACAD OF SCI
View PDF9 Cites 20 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] At present, there are three problems in the "exploratory" information discovery method: first, the efficiency of manual inspection of search results is low, and manual browsing of documents (search results) is a very time-consuming process, and the target information cannot be quickly located; second, the entire process lacks The global control of the target document collection causes users to often fall into the problem of not knowing where to come from and where to go during the discovery process, and the state of information inspection cannot be recovered and effectively used in the next inspection; Checked documents are filtered, it is difficult to avoid double checking

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Visual text information discovery method and device based on multilevel cooccurrence relationship word graph
  • Visual text information discovery method and device based on multilevel cooccurrence relationship word graph
  • Visual text information discovery method and device based on multilevel cooccurrence relationship word graph

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0045] In order to make the above-mentioned features and advantages of the present invention more comprehensible, the following specific embodiments are described in detail in conjunction with the accompanying drawings.

[0046] This embodiment provides a method for discovering visual text information based on a multi-level co-occurrence relationship word graph, and performs information discovery on a document collection, which contains two documents, such as figure 1 As shown, the method steps include:

[0047] 1. Document preprocessing:

[0048] For each document in the document set, output . The specific processing process includes: (1) parsing the format of the document to extract valid text content; (2) segmenting the text content, and the segmented text fragments generally correspond to meaningful semantic units; the following two types can be used for segmentation Method: (a) use symbols for segmentation, and the symbols are specified by the user. These symbols includ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a visual text information discovery method based on a multilevel cooccurrence relationship word graph. The method comprises the following steps that: extracting the text contents of a document, and segmenting the text contents to obtain text segments; segmenting the text segments, extracting a keyword, and labeling a word category tag; according to the cooccurrence relationship of the keyword in the text segments, constructing the multilevel cooccurrence relationship word graph, wherein nodes in the graph correspond to the keyword, and an edge in the graph corresponds tokeyword co-occurrence constructing a word-document inverted index for each keyword in the graph for retrieving a document which contains the keyword; and through the cooccurrence relationship word graph, obtaining the visual text information. The invention also provides a visual text information discovery system based on the multilevel cooccurrence relationship word graph. The system comprises adocument preprocessing module, a keyword extraction module, a multilevel word graph construction module, a word-document index construction module and a visual information discovery module.

Description

technical field [0001] The invention belongs to the fields of text mining and natural language processing, and relates to a method and system for discovering visual text information based on a multi-level co-occurrence relationship word graph. Background technique [0002] With the development of the Internet and office electronics, text information has shown an explosive growth trend, and the amount of text generated has surpassed any previous era. On the one hand, texts contain a lot of valuable information, on the other hand, massive texts significantly increase the discovery cost of effective information. For the vast majority of applications (such as publishing, research, and supervision), it is impossible for users to read every document in the collected documents to find effective information. How to use computers to assist in mining valuable information from massive texts ( Text mining) has become an important problem to be solved urgently. [0003] According to th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27
CPCG06F40/289G06F40/30
Inventor 李鹏王斌郭莉梅钰
Owner INST OF INFORMATION ENG CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products