Unlock instant, AI-driven research and patent intelligence for your innovation.

Text segmentation method, apparatus, computer equipment and readable storage medium

A text and sample technology, applied in the field of computer equipment and readable storage media, devices, and text segmentation methods, can solve problems such as inaccurate segmentation results

Active Publication Date: 2021-06-08
BEIJING BAIDU NETCOM SCI & TECH CO LTD
View PDF9 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, in actual application, the applicant found that the segmentation results are often not accurate enough in this segmentation method

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text segmentation method, apparatus, computer equipment and readable storage medium
  • Text segmentation method, apparatus, computer equipment and readable storage medium
  • Text segmentation method, apparatus, computer equipment and readable storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0029] Embodiments of the present application are described in detail below, examples of which are shown in the drawings, wherein the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary, and are intended to explain the present application, and should not be construed as limiting the present application.

[0030] In the prior art, the cosine similarity of adjacent sentences is calculated based on the frequency of words with "word" as the granularity, or the article is segmented based on the graph cutting technology based on word similarity. Specifically, the article is mainly divided through the following steps:

[0031] The first step is to calculate the pairwise similarity of each sentence in the article, for example, use the KM algorithm to align words and calculate the weighted cosine similarity of two sentences;

[0032]...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present application proposes a text segmentation method, device, computer equipment, and readable storage medium, wherein the method includes: using a sliding window to divide the text to be segmented into multiple recognition units; performing topic feature extraction on multiple recognition units; According to the topic characteristics of each recognition unit, the topic relationship between each recognition unit and adjacent recognition units is recognized; according to the topic relationship between each recognition unit and adjacent recognition units, the text to be segmented is segmented. This method can realize the segmentation of the text to be segmented according to the subject relationship, obtain each paragraph belonging to the same subject, and improve the accuracy of the segmentation result.

Description

technical field [0001] The present application relates to the technical field of natural language processing, and in particular to a text segmentation method, device, computer equipment and readable storage medium. Background technique [0002] In the current self-media era, people can publish articles on the Internet. However, due to the uneven publishing level of the authors, some authors even shoddy for the number of published articles. For example, the author piles up or combines different chapters of different articles to generate a new article, which often results in multiple articles. theme. This kind of suspected cheating article comes from a normal article, so it is often difficult for reviewers to find that it is a cheating article. Therefore, how to segment the paragraphs of different themes of the article so as to facilitate review by reviewers is very important. [0003] In the prior art, in order to segment an article into paragraphs of different topics, the...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/35G06F40/30G06F40/279
Inventor 杨宇鸿付志宏袁德璋何径舟
Owner BEIJING BAIDU NETCOM SCI & TECH CO LTD