Method for extracting milestone event from mass texts

A milestone and text technology, applied in the field of extracting milestone events, can solve problems such as inability to aggregate, scattered milestone node information, and inability to extract event milestone information, so as to improve clustering results, improve accuracy and completeness

Active Publication Date: 2019-09-17
GUIZHOU POWER GRID CO LTD
View PDF7 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] Existing information extraction methods have been able to extract events and time from text, but based on massive data, there may be multiple documents describing the same event, if the event and time are directly e

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for extracting milestone event from mass texts

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0011] Example 1: Such as figure 1 As shown, a method for extracting milestone events from a large amount of text includes the following steps:

[0012] (1) Extract the folder-level association information of files from a large amount of text, use the file name and folder as the node, and the hierarchical relationship as the edge, and store data through a tree structure;

[0013] (2) Concatenate the file name and file path name as the text of the current file, use the K-Means clustering algorithm to calculate (each file is a node of the tree, and the path is the branch of the tree, such as a multi-layer file Folder nested file package, there are many files in it, the pure file stored in the first folder is the first level, and other folders stored in the first folder can be extended further down) The tree of each file Distance, divide files with the same hierarchical relationship together as initial clusters, and determine the initial cluster size of the K-Means clustering algorit...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for extracting a milestone event from mass texts, which comprises the following steps of: (1) extracting folder level association information of a file from the mass texts, and storing data through a tree structure; (2) splicing the file name and the path name of the file as a text of the current file, and using the K-Means clustering algorithm to calculate the tree-shaped distance of each file, dividing the files with the same hierarchical relationship together to serve as an initial clustering cluster, and determining an initial class cluster size of the K-Means clustering algorithm; and (3) extracting milestone events and time nodes under each cluster, and screening extraction results to form a milestone node list of the events. According to the method, the milestone events and the event nodes are extracted from each cluster after clustering, so that the problem that the same events cannot be combined after being extracted into a plurality of sub-events can be avoided, and meanwhile, the extraction accuracy and integrity are also improved.

Description

technical field [0001] The invention belongs to the technical field of extracting milestone events, and relates to a method for extracting milestone events from massive texts. Background technique [0002] Existing information extraction methods have been able to extract events and time from text, but based on massive data, there may be multiple documents describing the same event, if the event and time are directly extracted from the document, it may lead to The milestone node information of the same event is scattered into multiple events and cannot be aggregated, so that the completed event milestone information cannot be extracted. Contents of the invention [0003] The technical problem to be solved by the present invention is to provide a method for extracting milestone events from massive texts, so as to solve the problems existing in the prior art. [0004] The technical solution adopted by the present invention is: a method for extracting milestone events from ma...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/31G06F16/35G06K9/62
CPCG06F16/322G06F16/35G06F18/23213
Inventor 王鹏宇吴漾罗念华孔庆波缪新萍李文科
Owner GUIZHOU POWER GRID CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products