Text element extraction method

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
An element extraction and text technology, applied in the field of data processing, can solve the problems of low processing efficiency and large data volume requirements, and achieve the effect of enhancing the recognition effect, reducing the cost of manual labeling, and shortening the time for business implementation.

Pending Publication Date: 2021-06-18

杭州云嘉云计算有限公司

View PDF1 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0006] Aiming at the problems of low processing efficiency and large data volume demand in the element extraction method in the prior art, the present invention provides a text element extraction method, which uses limited text data to split and combine the generated character sequences and vector sets, etc. Reuse of methods, improve data utilization efficiency, and reduce data volume requirements

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Examples

Experimental program

Comparison scheme

Effect test

Embodiment

[0017] A method for extracting text elements, comprising the following steps: segmenting the text and converting it into a character sequence; assigning several representation modes to the characters, and merging them to obtain a vector set; dividing the vector set into multiple subsets and performing several element extraction model trainings Get the final model; use the final model to extract text elements according to the matching rules. There are multiple representation methods for each character, which makes the data volume of the vector set huge. At the same time, by splitting the subsets, mutual verification and optimization can make full use of data resources, reduce data volume requirements, and improve efficiency.

[0018] The subsets include the test set, training set and tuning set. The training set and the tuning set are divided into several data packets that do not cross each other. The data packets of the training set are used as the objects to train in sequence,...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a text element extraction method, which comprises the following steps of: segmenting a text, and converting the segmented text into a character sequence; assigning a plurality of representation modes to the characters, and combining to obtain a vector set; dividing the vector set into a plurality of subsets and carrying out element extraction model training for a plurality of times to obtain a final model; and performing text element extraction by utilizing the final model according to the matching rule. Each character has multiple representation modes, so that the data volume of the vector set is huge, and meanwhile, data resources are fully utilized, the data volume requirement is reduced and the efficiency is improved through the split sub-set form, mutual verification and optimization. The method has the substantive effects that the data acquisition problem of the newly-added elements in the irregular mode can be solved, the manual labeling cost is reduced, the newly-added elements with the regular mode can be directly added on an existing model to serve as a new optimization model, the service implementation time is shortened, and the recognition effect is enhanced.

Description

technical field [0001] The invention relates to the field of data processing, in particular to a method for extracting text elements. Background technique [0002] Named entity recognition aims to extract named entities from text and classify them into different categories. Common identification categories include personal names, place names, and organization names. For example, the invention of authorized announcement number CN102750390B discloses a method for automatically extracting news webpage elements, which can automatically identify news elements. Named entities are one of the most important fields in the field of natural language processing, and they are also the prerequisites for many downstream application tasks or the pre-tasks to improve their accuracy, such as entity linking, both of which are very important for building large-scale knowledge graphs. [0003] Machine learning refers to the process of using certain algorithms to guide the computer to use known ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F40/205G06F40/284G06F40/295G06K9/62

CPCG06F40/205G06F40/295G06F40/284G06F18/214

Inventor 朱宇

Owner 杭州云嘉云计算有限公司

Text element extraction method

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Examples

Embodiment

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology