Named entity identification method oriented to field of network security

A technology for named entity recognition and network security, applied in the field of named entity recognition, it can solve the problems of poor applicability and migration, weak versatility, performance dependence, etc. Effect

Active Publication Date: 2020-09-25
XI AN JIAOTONG UNIV
View PDF4 Cites 27 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Methods based on rules and dictionaries have higher recognition accuracy, but building domain rules and dictionaries will consume a lot of manpower, and have poor applicability and transferability; methods based on machine learning, due to the existence of a large number of professional vocabulary and Chinese-English mixed vocabulary in network security text data , one-word polysemous vocabulary and unregistered vocabulary, artificially constructed features are not universal and performance depends on the size of training samples; methods based on deep learning can realize automatic feature extraction, but traditional deep neural networks cannot fully extract effective features, and at the same time A large amount of labeled corpus is required as training data, and it takes a lot of money and manpower to label network security text data, and the training cost is extremely high

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Named entity identification method oriented to field of network security
  • Named entity identification method oriented to field of network security
  • Named entity identification method oriented to field of network security

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0027] Embodiments of the present invention will be described in detail below in conjunction with the accompanying drawings and examples. It should be understood that the specific embodiments are for better explaining the present invention, and all technologies realized based on the content of the present invention belong to the scope of the present invention.

[0028] refer to Figure 1 to Figure 3 , a named entity recognition method oriented to the field of network security, comprising the following steps:

[0029] Step 1, design a crawler program, obtain relevant network security text data from various information sources such as vulnerability information database, security forum, and enterprise emergency response center, and store them in the database. The embodiment obtains network security text data by crawling respectively from China National Information Security Vulnerability Database (CNNVD), FreeBuf Security Forum, and Sangfor Security Center.

[0030] Step 2, use ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a named entity identification method oriented to the field of network security. The method comprises two stages of model training and sample selection. The first stage comprises the steps: acquiring an initial character vector which contains semantic information and changes dynamically through ALBERT training of a pre-training language model, sending the initial character vector into a Bi-LSTM + CRF network to be trained, and outputting a label sequence with the maximum input text sequence probability is output. The second stage comprises the steps: based on the model obtained through training in the first stage, selecting network security text data with marking value and training value for manual and machine marking in the mode of combining active learning and self-learning, and iteratively training the model after the network security text data with marking value and training value are added to existing marked text data. According to the method, the accuracy of network security entity identification is remarkably improved, and the problems of lack of marking corpora, high marking cost and the like in the network security field are effectively relieved.

Description

technical field [0001] The invention relates to the field of natural language processing of network security text data, in particular to a named entity recognition method oriented to the field of network security. Background technique [0002] With the rapid development and wide application of Internet technology and artificial intelligence technology, the amount of information on the Internet has shown explosive exponential growth, and today's society has entered the era of informatization and big data. With the rapid development of network information technology, the network environment has become increasingly complex. A large number of illegal organizations and individuals use viruses or loopholes to launch extensive and continuous network attacks on targets in multiple fields through the Internet to steal relevant confidential information or cause relevant damage. At present, people's production and life are increasingly dependent on network information, and the number o...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/295G06F40/30G06F40/126G06F40/117G06F40/216G06F16/951G06N3/04
CPCG06F40/295G06F40/30G06F40/126G06F40/117G06F40/216G06F16/951G06N3/049G06N3/045
Inventor 秦涛李致远王平辉管晓宏
Owner XI AN JIAOTONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products