Internet text entity recognition method and system, electronic equipment and storage medium

A technology of entity recognition and Internet, which is applied in the direction of network data query, network data retrieval, electronic digital data processing, etc., and can solve the problem of high operation cost of entity recognition

Pending Publication Date: 2021-06-29
北京智慧星光信息技术有限公司
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] In view of this, the embodiment of the present invention provides an Internet text entity recognition method, system, electronic equipment and storage medium to solve the problem of high operating cost of entity recognition in the prior art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Internet text entity recognition method and system, electronic equipment and storage medium
  • Internet text entity recognition method and system, electronic equipment and storage medium
  • Internet text entity recognition method and system, electronic equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0032] The technical solutions of the present invention will be clearly and completely described below in conjunction with the accompanying drawings. Apparently, the described embodiments are some of the embodiments of the present invention, but not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0033] At present, the common practice of entity recognition in the industry is to manually mark a large amount of text, and then rely on the marked corpus to use neural and network models such as Bert / BiLSTM / TextCNN and CRF algorithm to realize the NER model. The recognized text is recognized and the result output.

[0034] The entity recognition methods in the prior art have the following problems.

[0035] (1) Labor cost is too high

[0036] The common practice in the industry not only needs to use a lot ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an internet text entity recognition method and system, electronic equipment and a storage medium. The method comprises the following steps: inputting a historical internet text into an entity recognition AI model to obtain an initialized full quantizer table; constructing a full quantizer dictionary tree according to the initialized full quantizer table; according to an entity recognition AI model and the full quantizer dictionary tree, performing recognition processing on the real-time sampled internet text to obtain a selected word list; constructing a selected word dictionary tree according to the selected word list; splitting the real-time internet text to be recognized according to preset Chinese sentence segmentation symbols to obtain split sub sentences; matching the split sub-sentences with the selected word dictionary tree to obtain matched sub-sentences; and splicing the matched clauses according to a preset sequence, inputting the spliced matched clauses into an entity recognition AI model to obtain an entity recognition result, and performing category output according to entity categories. The to-be-recognized real-time internet text is screened sentence by sentence according to the selected word list, and only sentences possibly containing entities are left, so that the calculated text quantity is greatly reduced, and the operation cost is reduced.

Description

technical field [0001] The invention relates to the field of text data processing, in particular to an Internet text entity recognition method, system, electronic equipment and storage medium. Background technique [0002] Entity recognition is a very important part of text sequence labeling tasks. Its full name in Chinese is "Named Entity Recognition", and its English name is "NER". Identification and extraction of name and other information. [0003] With the development of the Internet, the text data carried by the Internet is also increasing rapidly, and the demand for entity recognition is becoming increasingly urgent in more and more scenarios. Better and faster requirements are put forward for the recognition effect and computing performance of the entity recognition model. [0004] At present, the common practice in the industry for entity recognition is to manually mark a large amount of text, and then rely on the marked corpus to use Bert / BiLSTM / TextCNN and other...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/295G06F16/953G06F40/242
CPCG06F16/953G06F40/242G06F40/295
Inventor 李涛赵冲骆飞李青龙
Owner 北京智慧星光信息技术有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products