Supercharge Your Innovation With Domain-Expert AI Agents!

Model training method and device, corpus processing method and device and computer equipment

A model training and corpus technology, applied in the field of text processing, can solve the problems of low corpus processing efficiency, large labor cost and time cost, inability to screen out abnormal corpus, etc., and achieve the effect of improving processing efficiency and accuracy.

Pending Publication Date: 2020-12-08
大众问问(北京)信息科技有限公司
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] At present, most of the processing methods for screening abnormal corpus use manual processing, which requires a lot of labor and time costs, and manual screening sentence by sentence leads to low efficiency of corpus processing
Although there is already a model for judging the dependency relationship of corpus in the prior art, this type of model will analyze the dependency relationship between normal corpus and abnormal corpus at the same time, and cannot determine the correctness of the dependency analysis results, nor can it screen out abnormal corpus

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Model training method and device, corpus processing method and device and computer equipment
  • Model training method and device, corpus processing method and device and computer equipment
  • Model training method and device, corpus processing method and device and computer equipment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0044] figure 1 It is a flow chart of a model training method provided in Embodiment 1 of the present invention. This embodiment is applicable to the case of using corpus sample data to perform dependency analysis training to obtain a dependency analysis model. This method can be executed by a model training device , the device can be implemented by software and / or hardware, and generally can be integrated in computer equipment, such as figure 1 As shown, the method includes the following operations:

[0045] S110. Acquire corpus sample data in which there is a dependency relationship between word segmentation samples.

[0046] Wherein, the corpus sample data may be standard corpus data, which is used as sample data for model training. The word segmentation samples may be word segmentation data obtained by performing word segmentation processing on each corpus sample in the corpus sample data. There is a normal and reasonable dependency relationship among the word segmentat...

Embodiment 2

[0075] image 3 It is a flow chart of a corpus processing method provided in Embodiment 2 of the present invention. This embodiment is applicable to the situation of screening out abnormal corpus from corpus to be processed. The method can be executed by a corpus processing device, which can be implemented by software and / or hardware, and generally can be integrated in computer equipment, such as image 3 As shown, the method includes the following operations:

[0076] S210. Acquire the corpus to be processed, and input the corpus to be processed into the dependency analysis model.

[0077] Among them, the corpus to be processed may need to filter out the original corpus data of the abnormal corpus, and the corpus data may be voice data collected in real time and the like. It can be understood that the corpus to be processed may include multiple sentences.

[0078] In the embodiment of the present invention, the obtained corpus to be processed may be input into the dependen...

Embodiment 3

[0101] Figure 9 is a schematic diagram of a model training device provided in Embodiment 3 of the present invention, such as Figure 9 As shown, the device includes: a corpus sample data acquisition module 310 and a dependency analysis training module 320, wherein:

[0102] The corpus sample data acquisition module 310 is used to obtain the corpus sample data with dependency between the word segmentation samples;

[0103] The dependency analysis training module 320 is used to input the sample data of the corpus into a preset machine learning model to perform dependency analysis training to obtain a dependency analysis model; the dependency analysis model is used to perform abnormal screening processing on abnormal corpus .

[0104] In the technical solution of this embodiment, the dependency analysis model can be obtained by performing dependency analysis training on the word segmentation sample data in the corpus sample data, and the obtained dependency analysis model can ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The embodiment of the invention discloses a model training method and device, a corpus processing method and device and computer equipment. The corpus processing method comprises the steps of obtaining a to-be-processed corpus, and inputting the to-be-processed corpus into a dependency relationship analysis model; performing dependency relationship analysis on the to-be-processed corpus through adependency relationship analysis model; and determining an abnormal corpus from the to-be-processed corpus according to an analysis result of the dependency relationship analysis model, wherein the dependency relationship analysis model is a model obtained through model training. According to the technical scheme of the embodiment of the invention, the corpus processing efficiency and accuracy canbe improved.

Description

technical field [0001] The embodiments of the present invention relate to the technical field of text processing, and in particular, to a model training, corpus processing method, device and computer equipment. Background technique [0002] Corpus processing is a technical point that is widely used in the field of text processing technology, and the efficiency of corpus processing directly affects the text processing time. [0003] It is understandable that many collected and processed corpora inevitably include a lot of abnormal corpus, that is, corpus with no practical meaning, and these abnormal corpus can also be called dirty predictions. The syntactic relationship of the abnormal corpus is chaotic, and the dependency relationship between each word is wrong. When processing the collected corpus to be processed, it is often necessary to filter out the abnormal predictions with chaotic syntactic relationship and delete them to ensure that all the corpus in the corpus to be...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F40/289G06F40/253G06F40/30G06N20/00
CPCG06N20/00G06F40/253G06F40/289G06F40/30
Inventor 张文瑜
Owner 大众问问(北京)信息科技有限公司
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More