Text extraction method and system based on fusion pre-training and medium

A pre-training, text technology, applied in the computer field, can solve the problem of insufficient text extraction accuracy, and achieve the effect of avoiding blurred boundaries, enhancing learning ability, and improving accuracy

Pending Publication Date: 2022-04-26
北京快确信息科技有限公司
View PDF0 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, in the financial field, especially in the field of cash bonds, the existing text extraction methods still have certain boundary problems, such as "1Y 000001 3.0975 4000 5.29+0A Fund TO B Fund", for...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text extraction method and system based on fusion pre-training and medium
  • Text extraction method and system based on fusion pre-training and medium
  • Text extraction method and system based on fusion pre-training and medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0044] In order to make the object, technical solution and effect of the present invention more clear and definite, the present invention will be further described in detail below. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention. Embodiments of the present invention will be described below in conjunction with the accompanying drawings.

[0045] see figure 1 , figure 1 It is a flowchart of an embodiment of the text extraction method based on fusion pre-training provided by the present invention. The text extraction method based on fusion pre-training provided in this embodiment is applicable to the situation of automatically identifying the counterparty in the transaction process. Such as figure 1 As shown, the method specifically includes the following steps:

[0046] S100. Obtain text to be extracted.

[0047] In this embodiment, the text to be extracted may be the tr...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a text extraction method and system based on fusion pre-training and a medium. The method comprises the steps of obtaining a to-be-extracted text; performing pre-training coding on the to-be-extracted text through a pre-training model to obtain a corresponding character vector; selecting at least part of the character vectors to perform semantic extraction on adjacent texts, and splicing to obtain semantic feature vectors; performing feature selection and fusion on the semantic feature vectors to obtain effective word feature vectors; and carrying out shunt decoding on the effective word feature vectors to respectively obtain a word segmentation result and an entity recognition result. Coding is performed based on the pre-training model framework to obtain the character vector, and at least part of the character vector is fused to perform semantic extraction of the adjacent text to learn semantic information of the text, so that the semantic learning ability is enhanced, the finally obtained word segmentation result can effectively avoid the problem of fuzzy boundary, and the accuracy of text extraction is improved.

Description

technical field [0001] The present invention relates to the field of computer technology, in particular to a text extraction method, system and medium based on fusion pre-training. Background technique [0002] Text information extraction is a relatively mature algorithm technology in the field of deep learning, and it has also been successfully applied in various business scenarios. However, in the financial field, especially in the field of cash bonds, the existing text extraction methods still have certain boundary problems, such as "1Y 000001 3.0975 4000 5.29+0A Fund TO B Fund", for the extraction of digital text "3.0975", only The case where "3.09" is extracted, or the case where only "400" is extracted for the digital text "4000", makes the accuracy of text extraction not high enough. [0003] Therefore, the prior art still needs to be improved and developed. Contents of the invention [0004] In view of the above deficiencies in the prior art, the purpose of the p...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F40/126G06F40/30G06F40/279G06N3/04G06N3/08
CPCG06F40/126G06F40/30G06F40/279G06N3/08G06N3/044G06N3/045
Inventor 林远平甘伟超喻广博邹鸿岳周靖宇
Owner 北京快确信息科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products