Check patentability & draft patents in minutes with Patsnap Eureka AI!

Sentence segmentation model establishing method and device

A sentence segmentation and model technology, applied in the field of sentence segmentation model establishment, can solve problems such as unfavorable reading and comprehension, and difficult reading for users

Active Publication Date: 2020-05-15
BEIJING ORION STAR TECH CO LTD
View PDF4 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] However, when the voice information is recognized to obtain the corresponding sentence (the meaning of the sentence is complete or incomplete), the sentence does not have a sentence break mark, which is not conducive to reading and understanding, and when the voice information is relatively long, the recognized sentence When there are many characters in , it will be very difficult for users to read. Therefore, relevant personnel began to study how to segment sentences without sentence segmentation marks, so as to improve user experience

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Sentence segmentation model establishing method and device
  • Sentence segmentation model establishing method and device
  • Sentence segmentation model establishing method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment approach

[0124] In a possible implementation manner, the preprocessing module 301 is specifically configured to obtain corpus sentences according to the following steps:

[0125] Obtaining a preset number of sample sentences, the end of the sample sentence has a sentence break mark;

[0126] Splicing some or all sample sentences;

[0127] Each spliced ​​sample sentence is segmented, and the segmented sample sentence is determined as the corpus sentence.

[0128] In a possible implementation manner, the preprocessing module 301 is specifically configured to:

[0129] For each spliced ​​sample sentence, segment it according to the set step size; or

[0130] Randomly split each spliced ​​sample sentence.

[0131] In a possible implementation manner, the subword segmentation algorithm is a byte pair encoding BPE algorithm.

[0132] In a possible implementation manner, the tagging module 302 is specifically configured to control the deep learning model to tag each word in the word seque...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a sentence segmentation model establishing method and device. The method comprises the steps: carrying out the word segmentation of each obtained corpus sentence, and determining words contained in the corpus sentence; determining rare words in words contained in the corpus sentence, and performing segmentation processing on the rare words by utilizing a sub-word segmentation algorithm; inputting a word sequence formed by words obtained after word segmentation processing and segmentation processing into the deep learning model for sentence segmentation labeling; According to the original sentence segmentation identifier of each corpus sentence and the sentence segmentation annotation corresponding to the corpus sentence output by the deep learning model, performingdeep learning on the corpus sentence. Adjusting parameters of the deep learning model, and establishing a sentence segmentation model, so that sentences without sentence segmentation identifiers are subjected to sentence segmentation processing by utilizing the established sentence segmentation model, the sentences which cannot be seen by the user are very long sentences without sentence segmentation identifiers any more, the readability and the comprehensibility of the sentences are improved, and the user experience can be improved.

Description

technical field [0001] The present application relates to the technical field of natural language processing, and in particular to a method and device for establishing a sentence segmentation model. Background technique [0002] In recent years, with the rapid development of speech recognition technology, there are more and more application fields of speech recognition, such as sending voice messages, voice memos, and simultaneous interpretation. [0003] However, when the voice information is recognized to obtain the corresponding sentence (the meaning of the sentence is complete or incomplete), the sentence does not have a sentence break mark, which is not conducive to reading and understanding, and when the voice information is relatively long, the recognized sentence When there are many characters in , it will be very difficult for users to read. Therefore, relevant personnel began to study how to segment sentences without sentence segmentation marks, so as to improve us...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/211G06F40/289G06F40/117
CPCY02D10/00
Inventor 李晓普王阳阳
Owner BEIJING ORION STAR TECH CO LTD
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More