Corpus selection and processing method, device, equipment and computer-readable storage medium
A processing method and corpus technology, applied in the field of corpus screening, can solve the problem that the selection of corpus does not meet the requirements of sentence length distribution, etc.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0031] figure 1 A flow chart of the corpus selection processing method provided by Embodiment 1 of the present invention; figure 2 It is a schematic diagram of the sentence length distribution of the corpus provided by the embodiment of the present invention. The embodiment of the present invention aims at the problem that in the corpus selected by the existing corpus selection method, the sentence length distribution is far from the sentence length distribution of the real corpus, and the corpus selection does not meet the requirements for the sentence length distribution in the corpus design, a corpus is provided. Choose a processing method.
[0032] The method in this embodiment is applied to a terminal device, and the terminal device may be a mobile terminal such as a smart phone or a smart speaker, or may be a server, etc. In other embodiments, the method may also be applied to other devices. In this embodiment, A terminal device is taken as an example for schematic il...
Embodiment 2
[0047] image 3 It is a flow chart of the corpus selection processing method provided by Embodiment 2 of the present invention. On the basis of the first embodiment above, in this embodiment, the original sentence length distribution can be the sentence length distribution of the original corpus, and according to the original sentence length distribution, select from the original corpus that meets the sentence number requirement and the sentence length requirement, and is similar to the original sentence length distribution. The corpus whose sentence length distribution matches is used as the initial sentence length distribution model, including: obtaining the ratio of the number of target sentences to the number of sentences in the original corpus; calculating each target sentence according to the ratio of the number of target sentences to the number of sentences in the original corpus The number of long sentences; according to the number of sentences of each target sentence ...
Embodiment 3
[0143] Figure 5 It is a schematic structural diagram of a corpus selection processing device provided in Embodiment 3 of the present invention. The corpus selection processing device provided in the embodiment of the present invention can execute the processing flow provided in the corpus selection processing method embodiment. Such as Figure 5 As shown, the corpus selection processing device 30 includes: an initial selection module 301 and a modification module 302 .
[0144] Specifically, the initial selection module 301 is used to select from the original corpus a corpus that meets the sentence number and sentence length requirements and matches the original sentence length distribution according to the original sentence length distribution, as the initial sentence length distribution model.
[0145] The correction module 302 is used to correct the initial sentence length distribution model to obtain a final sentence length distribution model that meets the requirements...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


