A corpus recognition method and device

A corpus recognition and corpus technology, applied in the computer field, can solve the problems of insufficient training of language model proper nouns, difficulty in covering proper nouns, low accuracy in identifying proper nouns, etc., so as to solve the problem of incomplete coverage and improve accuracy. rate, the effect of expanding the breadth
CN111540343BActive Publication Date: 2021-02-05BEIJING SINOVOICE TECH CO LTD

Patent Information

Authority / Receiving Office
CN Β· China
Patent Type
Patents(China)
Current Assignee / Owner
BEIJING SINOVOICE TECH CO LTD
Publication Date
2021-02-05

Smart Images

  • Figure 1
    Figure 1
  • Figure 2
    Figure 2
  • Figure 3
    Figure 3
Patent Text Reader

Abstract

The invention provides a corpus recognition method and device, and relates to the technical field of computers. According to the corpus recognition method. The method comprises the following steps: marking according to classes of proper noun classes to which proper noun belongs; replacing and occupying the proper nouns in the corpus data to obtain first training data, performing training accordingto the first training data to obtain a main language model, and combining the main language model with corresponding sub-language models according to the class marks, the sub-language models being obtained by training according to the training data of the classes of the proper nouns corresponding to the class marks. Therefore, according to the embodiment of the invention, the proper noun in the corpus data is replaced by taking the class mark as a proper noun placeholder; and the proper noun of the proper noun category corresponding to the class mark is expanded according to the sub-languagemodel in subsequent model construction, so that the breadth of the proper noun corpus in the target language model is expanded, the problem of incomplete context coverage of the proper noun in a traditional method is solved, and the recognition accuracy of the proper noun corpus is improved.
Need to check novelty before this filing date? Find Prior Art

Description

technical field

[0001] The present invention relates to the field of computer technology, in particular to a method and device for identifying corpus. Background technique

[0002] In the recognition of speech text, that is, corpus, the effect of recognition usually depends on the performance of the corresponding language model, and the performance of the language model is related to the coverage and depth of the training corpus.

[0003] However, for some rare proper nouns, due to the lack of applications, it is difficult to cover all relevant proper nouns when selecting training corpus, so that the language model is not fully trained for proper nouns, and the recognition of proper nouns is difficult. The accuracy rate is lower. Contents of the invention

[0004] In view of the above problems, the present invention is proposed to provide a corpus recognition method and device that overcome the above problems or at least partially solve the above problems.

[0005] Accor...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More