Text sentence segmentation method and device, electronic device and storage medium

A sentence segmentation and text technology, applied in the field of natural language processing, can solve the problem of time-consuming and laborious, achieve the effect of accurate text segmentation and avoid the loss of labor cost and time cost

Active Publication Date: 2020-01-17
IFLYTEK CO LTD
View PDF4 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Embodiments of the present invention provide a text segmentation method, device, electronic equipment, and storage medium to solve the problem that the existing subtitle text segmentation relies on manual completion, which takes time and effort

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text sentence segmentation method and device, electronic device and storage medium
  • Text sentence segmentation method and device, electronic device and storage medium
  • Text sentence segmentation method and device, electronic device and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0050] In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments It is a part of embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

[0051] The traditional subtitle generation method needs to transcribe the voice information in the audio and video into subtitle text through manual audiometry, and then segment the subtitle text to make it meet the requirements of the audio and video itself for subtitles. Assuming that the audio and video requirements for the number of subtitles ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

Embodiments of the invention provide a text sentence segmentation method and device, an electronic device and a storage medium. The method comprises the steps of determining a character feature vectorof each character in a text; inputting the character feature vector of each character into a sentence segmentation model to obtain a sentence segmentation probability of each character output by thesentence segmentation model; wherein the sentence segmentation model is obtained by training based on a sample word feature vector and a sentence segmentation identifier of a sample word in the sampletext; determining a plurality of candidate sentence segmentation results based on the sentence segmentation probability of each character; and determining a sentence segmentation result based on a preset word number threshold and the plurality of candidate sentence segmentation results. According to the method and device, the electronic device and the storage medium provided by the embodiment ofthe invention, the sentence segmentation result that the length of each clause is smaller than or equal to the preset word number threshold value is obtained while the local semantics are not cut off,so that efficient and accurate text sentence segmentation is realized, and the loss of labor cost and time cost is avoided.

Description

technical field [0001] The present invention relates to the technical field of natural language processing, in particular to a text segmentation method, device, electronic equipment and storage medium. Background technique [0002] Subtitles refer to the explanatory text displayed on the playback interface when the audio and video are playing, which can help viewers understand the audio and video content. [0003] At present, sentence segmentation of subtitle texts is mostly done manually, which consumes a lot of human resources and time. Although the rapid development of natural language processing technology has made the text segmentation technology more and more mature, there are usually specific text segmentation requirements when segmenting subtitle texts, and the general text segmentation technology cannot meet the segmentation requirements of subtitle texts. Contents of the invention [0004] Embodiments of the present invention provide a text segmentation method, ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/211
Inventor 孔常青高建清刘聪胡国平胡郁
Owner IFLYTEK CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products