Named entity recognition and extraction using genetic programming

A genetic operation and program technology, applied in the field of named entity recognition and extraction using genetic programming, can solve problems such as low efficiency and slow speed, and achieve the effects of reducing manual input and errors, saving computing resources, and reducing the amount of iterative genetic operations

Pending Publication Date: 2020-10-09
ALIPAY (HANGZHOU) INFORMATION TECH CO LTD
View PDF6 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, generating such programs usually involves extensive expert programming effort, which is inefficient and slow
In the era of big data and cloud-based services, service providers or platforms are faced with the need to handle entity recognition tasks in a large variety of data stream categories that cannot be handled by manual programming

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Named entity recognition and extraction using genetic programming
  • Named entity recognition and extraction using genetic programming
  • Named entity recognition and extraction using genetic programming

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0021] This paper describes techniques for generating pattern programs using genetic algorithms. Genetic algorithms operate on sample data strings representing categories of data to be identified or extracted by named entity recognition. Such an example data string is called a "positive example" data string. Genetic algorithms can also operate on negative example data strings, which represent data strings that are not positive example data strings, eg, not the target of a named entity recognition task. In an initialization phase, an initial schema program is generated based on sample data strings representing categories of data to be identified or extracted by named entity recognition. In some embodiments, the byte pair encoding technique is used to extract frequent substrings from the sample data string, and each extracted frequent substring is regarded as a single expression unit when generating the initial pattern program. Starting from the initial pattern program, geneti...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

Disclosed herein are methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating a pattern program using a genetic algorithm. The genetic algorithm operates on example data strings that represent the data categories to be recognized or extracted through named entity recognition. In the initialization stage, the initial pattern programs are generated based on example data strings that represent the data categories to be recognized or extracted through named entity recognition. Starting from the initial pattern programs, genetic operations are iteratively conducted to generate generations of offspring pattern programs. In each round of the genetic operation, offspring pattern programs are generated through the crossover operation and the mutation operation.

Description

Background technique [0001] Advances in network and storage subsystem design continue to enable the processing of ever-increasingly large data flows between and within computer systems. At the same time, the content of such data streams has come under increasing scrutiny. For example, the collection, analysis and storage of personal data is subject to scrutiny and regulation. Organizations must ensure that personal data is collected legally under strict conditions. Organizations that collect and manage personal data have an obligation to protect it from misuse and unlawful use, and an obligation to respect the rights of data owners. Personal data or other sensitive data includes, but is not limited to, name, date of birth, place of birth, ID number, home address, credit card number, phone number, email address, URL, IP address, bank account number, etc. [0002] Classifying and extracting personal or other sensitive data from data streams involves named entity recognition. ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/295G06N3/12G06F16/903
CPCG06F40/295G06N3/126G06F16/90344G06F40/279G06N5/01
Inventor 王德胜刘佳伟章鹏
Owner ALIPAY (HANGZHOU) INFORMATION TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products