Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Corpus recognition method and device

A corpus recognition and corpus technology, applied in the computer field, can solve the problems of low accuracy in identifying proper nouns, difficult to cover proper nouns, insufficient training of language model proper nouns, etc., to solve incomplete coverage and improve accuracy The effect of rate and breadth expansion

Active Publication Date: 2020-08-14
BEIJING SINOVOICE TECH CO LTD
View PDF10 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] However, for some rare proper nouns, due to the lack of applications, it is difficult to cover all relevant proper nouns when selecting training corpus, so that the language model is not fully trained for proper nouns, and the recognition of proper nouns is difficult. low accuracy

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Corpus recognition method and device
  • Corpus recognition method and device
  • Corpus recognition method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present invention are shown in the drawings, it should be understood that the invention may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided for more thorough understanding of the present invention and to fully convey the scope of the present invention to those skilled in the art.

[0031] figure 1 It is a flow chart of the steps of a corpus recognition method provided by an embodiment of the present invention, such as figure 1 As shown, the method may include:

[0032] Step 101, according to the preset proper noun category, train the sub-language model corresponding to the proper noun category.

[0033] In the embodiment of the present invention, the proper noun category may include names of people, places, organization...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a corpus recognition method and device, and relates to the technical field of computers. According to the corpus recognition method. The method comprises the following steps: marking according to classes of proper noun classes to which proper noun belongs; replacing and occupying the proper nouns in the corpus data to obtain first training data, performing training accordingto the first training data to obtain a main language model, and combining the main language model with corresponding sub-language models according to the class marks, the sub-language models being obtained by training according to the training data of the classes of the proper nouns corresponding to the class marks. Therefore, according to the embodiment of the invention, the proper noun in the corpus data is replaced by taking the class mark as a proper noun placeholder; and the proper noun of the proper noun category corresponding to the class mark is expanded according to the sub-languagemodel in subsequent model construction, so that the breadth of the proper noun corpus in the target language model is expanded, the problem of incomplete context coverage of the proper noun in a traditional method is solved, and the recognition accuracy of the proper noun corpus is improved.

Description

technical field [0001] The present invention relates to the field of computer technology, in particular to a method and device for identifying corpus. Background technique [0002] In the recognition of speech text, that is, corpus, the effect of recognition usually depends on the performance of the corresponding language model, and the performance of the language model is related to the coverage and depth of the training corpus. [0003] However, for some rare proper nouns, due to the lack of applications, it is difficult to cover all relevant proper nouns when selecting training corpus, so that the language model is not fully trained for proper nouns, and the recognition of proper nouns is difficult. The accuracy rate is lower. Contents of the invention [0004] In view of the above problems, the present invention is proposed to provide a corpus recognition method and device that overcome the above problems or at least partially solve the above problems. [0005] Accor...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G10L15/06G10L15/26G10L15/22
CPCG10L15/063G10L15/22G10L2015/0631G10L2015/0635
Inventor 吴帅李健武卫东
Owner BEIJING SINOVOICE TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products