Text hotspot extraction method and device

An extraction method and text technology, applied in the field of data processing, can solve problems such as not accurately reflecting the overall content of the text

Active Publication Date: 2020-10-23
北京中科闻歌科技股份有限公司
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, keyword extraction can only identify the most representative segment or vocabulary for a certain event or topic in a document, and cannot accurately reflect the overall content of the text

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text hotspot extraction method and device
  • Text hotspot extraction method and device
  • Text hotspot extraction method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0047] In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments It is a part of embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0048] In order to facilitate the understanding of the embodiments of the present invention, further explanations will be given below with specific embodiments in conjunction with the accompanying drawings, which are not intended to limit the embodiments of the present invention.

[0049] figure 1 A schematic flow diagram of a method for ext...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the present invention relates to a method and device for extracting text hotspots, including: using a regular expression to segment at least one piece of input text data according to set rules to obtain a plurality of first short text data; using a dependency syntax analysis algorithm to extract Generate corresponding fourth short text data from the second short text data; vectorize the third short text data and the fourth short text data to obtain multiple corresponding text vectors; determine any two text vectors based on the similarity algorithm The similarity between two text vectors whose similarity is greater than the similarity threshold is merged, and the short sentences formed by syntactic analysis and extraction of relative words improve the observability and accuracy of information extraction, allowing users to better understand The text content thus obtains the core key information points, uses Word2vec to vectorize short sentences for similarity comparison, and retains the semantic information between words, thereby ensuring the accuracy of the deduplication work and avoiding the redundancy of hot information as much as possible.

Description

technical field [0001] Embodiments of the present invention relate to the field of data processing, and in particular, to a text hotspot extraction method and device. Background technique [0002] Hotspot extraction is to extract the core summary sentences from known texts as classification categories, so as to enable users to quickly find interesting event topics on the application platform and obtain relevant information. In order to improve the accuracy of information extraction and promote the comprehensibility of the extraction results, this scheme proposes to use the method based on dependency syntactic analysis to realize the extraction of short texts with semantic understanding; and to merge the extraction results based on similarity technology. [0003] At present, many relevant information extraction tasks are based on keyword extraction technology. Keywords can be a single word or a phrase composed of several words, which is the smallest unit to express the meanin...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/211G06F40/289G06F16/951G06F16/9535G06F16/35
CPCG06F16/951G06F16/9535G06F16/355G06F40/211G06F40/289
Inventor 王宇琪孔庆超黄秋曼方省曹家罗引王磊赵菲菲张西娜
Owner 北京中科闻歌科技股份有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products