Supercharge Your Innovation With Domain-Expert AI Agents!

Text element extraction method

An element extraction and text technology, applied in the field of data processing, can solve the problems of low processing efficiency and large data volume requirements, and achieve the effect of enhancing the recognition effect, reducing the cost of manual labeling, and shortening the time for business implementation.

Pending Publication Date: 2021-06-18
杭州云嘉云计算有限公司
View PDF1 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] Aiming at the problems of low processing efficiency and large data volume demand in the element extraction method in the prior art, the present invention provides a text element extraction method, which uses limited text data to split and combine the generated character sequences and vector sets, etc. Reuse of methods, improve data utilization efficiency, and reduce data volume requirements

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0017] A method for extracting text elements, comprising the following steps: segmenting the text and converting it into a character sequence; assigning several representation modes to the characters, and merging them to obtain a vector set; dividing the vector set into multiple subsets and performing several element extraction model trainings Get the final model; use the final model to extract text elements according to the matching rules. There are multiple representation methods for each character, which makes the data volume of the vector set huge. At the same time, by splitting the subsets, mutual verification and optimization can make full use of data resources, reduce data volume requirements, and improve efficiency.

[0018] The subsets include the test set, training set and tuning set. The training set and the tuning set are divided into several data packets that do not cross each other. The data packets of the training set are used as the objects to train in sequence,...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a text element extraction method, which comprises the following steps of: segmenting a text, and converting the segmented text into a character sequence; assigning a plurality of representation modes to the characters, and combining to obtain a vector set; dividing the vector set into a plurality of subsets and carrying out element extraction model training for a plurality of times to obtain a final model; and performing text element extraction by utilizing the final model according to the matching rule. Each character has multiple representation modes, so that the data volume of the vector set is huge, and meanwhile, data resources are fully utilized, the data volume requirement is reduced and the efficiency is improved through the split sub-set form, mutual verification and optimization. The method has the substantive effects that the data acquisition problem of the newly-added elements in the irregular mode can be solved, the manual labeling cost is reduced, the newly-added elements with the regular mode can be directly added on an existing model to serve as a new optimization model, the service implementation time is shortened, and the recognition effect is enhanced.

Description

technical field [0001] The invention relates to the field of data processing, in particular to a method for extracting text elements. Background technique [0002] Named entity recognition aims to extract named entities from text and classify them into different categories. Common identification categories include personal names, place names, and organization names. For example, the invention of authorized announcement number CN102750390B discloses a method for automatically extracting news webpage elements, which can automatically identify news elements. Named entities are one of the most important fields in the field of natural language processing, and they are also the prerequisites for many downstream application tasks or the pre-tasks to improve their accuracy, such as entity linking, both of which are very important for building large-scale knowledge graphs. [0003] Machine learning refers to the process of using certain algorithms to guide the computer to use known ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/205G06F40/284G06F40/295G06K9/62
CPCG06F40/205G06F40/295G06F40/284G06F18/214
Inventor 朱宇
Owner 杭州云嘉云计算有限公司
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More