Topic segmentation method and device based on target area fusion, equipment and medium

A target area and target technology, applied in the field of topic detection, can solve problems such as inability to fully mine information, affect use, and insensitivity to text line distinctions

Pending Publication Date: 2020-09-11
GUANGDONG XIAOTIANCAI TECH CO LTD
View PDF7 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] At present, there is an end-to-end topic segmentation algorithm based on deep learning. This algorithm can divide the topic areas of documents such as test papers and exercise books. However, due to the black-box characteristics of deep learning, the results obtained are not ideal. There are the following problems :
[0003] 1. The divided area is not accurate enough, and the text is often cut, resulting in the lack of information and affecting subsequent use
[0004] 2. It is not sensitive to the distinction of text lines, cannot get specific texts, and cannot fully mine the information contained in them

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Topic segmentation method and device based on target area fusion, equipment and medium
  • Topic segmentation method and device based on target area fusion, equipment and medium
  • Topic segmentation method and device based on target area fusion, equipment and medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0086] see figure 1 , figure 1 It is a schematic flowchart of a topic segmentation method disclosed in an embodiment of the present invention. Such as figure 1 As shown, the topic segmentation method includes the following steps:

[0087] 110. Acquire the target picture, and obtain the subject area mask and text line information of the target picture.

[0088] The target picture is an image input by the user through the electronic device. Exemplarily, the target picture may be an image sent to the smart device after the user takes a photo of the document through the image acquisition device, or may be an image downloaded by the user from the Internet. Before the target image is recognized, preprocessing may be performed on the target image, and the preprocessing includes but is not limited to image enhancement and image correction.

[0089] The topic area mask of the target picture can be passed through any end-to-end instance segmentation algorithm based on deep learning,...

Embodiment 2

[0106] see Figure 5 , Figure 5 is a schematic flowchart of another topic segmentation method disclosed in the embodiment of the present invention. Such as Figure 5 As shown, the topic segmentation method includes the following steps:

[0107] 210. Acquire the target picture, and obtain the subject area mask and text line information of the target picture.

[0108] 220. Determine a target text line mask according to the target topic area mask, and calculate a first intersection area between the target text line mask and the target topic area.

[0109] 230. Determine the first proportion of the target text line in the target topic area according to the first intersection area and the area of ​​the target text line; when the first proportion is greater than or equal to a first preset threshold, pass the The target text line mask expands the target topic area to obtain the expanded target topic area.

[0110] 240. Expand the expanded target topic area again through the cha...

Embodiment 3

[0129] see Figure 7 , Figure 7 It is a structural schematic diagram of a topic segmentation device disclosed in an embodiment of the present invention. Such as Figure 7 As shown, the topic segmentation device may include:

[0130] An acquisition unit 310, configured to acquire a target picture, and obtain the subject area mask and text line information of the target picture;

[0131] Calculation unit 320, used to determine the target text line mask according to the target topic area mask, and calculate the first intersection area between the target text line mask and the target topic area;

[0132] Judging unit 330, configured to determine a first proportion of the target text line in the target title area according to the first intersection area and the area of ​​the target text line; when the first proportion is greater than or equal to a first preset threshold , expanding the target topic area by using the target text line mask to obtain the expanded target topic are...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the invention relates to the technical field of topic detection, and discloses a topic segmentation method and device based on target area fusion, equipment and a medium. The methodcomprises the steps of obtaining a target picture, and obtaining a topic region mask and text line information of the target picture; determining a target text line mask according to the target topicarea mask, and calculating a first intersection area of the target text line mask and the target topic area; determining a first proportion of the target text line in the target topic area according to the first intersection area and the area of the target text line; and when the first proportion is greater than or equal to a first preset threshold, expanding the target topic area through the target text line mask to obtain an expanded target topic area. By implementing the method of the invention, topic segmentation can be supplemented through character recognition, so that the fused boundarycannot cut the text line, and meanwhile, the result of the character information is combined into the topic detection area, so that the topic detection result is more complete.

Description

technical field [0001] The present invention relates to the technical field of topic detection, in particular to a topic segmentation method, device, electronic equipment and storage medium based on fusion of target areas. Background technique [0002] At present, there is an end-to-end topic segmentation algorithm based on deep learning. This algorithm can divide the topic areas of documents such as test papers and exercise books. However, due to the black-box characteristics of deep learning, the results obtained are not ideal. There are the following problems : [0003] 1. The divided area is not accurate enough, and the text is often cut, resulting in the lack of information and affecting subsequent use. [0004] 2. It is not sensitive to the distinction of text lines, cannot obtain specific texts, and cannot fully mine the information contained therein. Contents of the invention [0005] In view of the above defects, the embodiment of the present invention discloses...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/00G06K9/20G06K9/34G06K9/62G06N3/04G06N3/08
CPCG06N3/08G06V30/414G06V10/22G06V10/267G06N3/045G06F18/23
Inventor 邓小兵许多张春雨
Owner GUANGDONG XIAOTIANCAI TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products