Entity identification method and device, equipment and storage medium

An entity recognition and entity technology, applied in the field of information security, can solve the problems of low accuracy of entity recognition results, long text, unbalanced distribution of data labels, etc.

Active Publication Date: 2021-06-08
CAPITAL NORMAL UNIVERSITY +1
View PDF7 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, for threat intelligence, unlike common text data, the text data of threat intelligence usually has a longer length, and the length of each sentence is far longer than that of ordinary text data. The distance between the two target type entities in the text data is often very far, resulting in a serious imbalance in the distribution of data labels in the sample data.
In this way, when the existing entity recognition model is directly used for entity recognition, due to the serious imbalance in the distribution of data labels in the sample data, the accuracy of the existing entity recognition model for threat intelligence entity recognition results will be low.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Entity identification method and device, equipment and storage medium
  • Entity identification method and device, equipment and storage medium
  • Entity identification method and device, equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0110] figure 1 It shows a schematic flowchart of an entity recognition method provided by the embodiment of the present application, wherein the method includes steps S101-S104; specifically:

[0111] S101. Obtain an original threat intelligence text.

[0112] Specifically, the existing entity recognition models are mainly used to identify common types of entities such as person names, place names, and time in ordinary text data. Considering that entities that need to be identified in the threat intelligence field need to involve a large number of specialized vocabulary, and in There is a lack of open source data sets in the threat intelligence field. Therefore, it is necessary to first construct a sample data set for training entity recognition models.

[0113] In this embodiment of the present application, as an optional embodiment, text data such as articles, blogs, and paper reports related to threat intelligence may be crawled from a secure website as the original threa...

Embodiment 2

[0241] Figure 4 A schematic structural diagram of an entity recognition device provided by an embodiment of the present application is shown, and the device includes:

[0242]A data collection module 401, configured to obtain an original threat intelligence text;

[0243] The word segmentation marking module 402 is configured to, for each of the original threat intelligence texts, mark each word segmentation in the original threat intelligence text according to the entity type of the entity to which the segmentation belongs to obtain a training sample, wherein the entity type is at least Including: threat intelligence type and non-threat intelligence type, each word segmentation in the training sample corresponds to an entity tag;

[0244] The model training module 403 is used to input the training sample into the entity recognition model for each training sample, and train the entity recognition model by using each participle in the training sample and the entity tag corres...

Embodiment 3

[0287] Such as Figure 5 As shown, the embodiment of the present application provides a computer device 500 for executing the entity recognition method in the present application, the device includes a memory 501, a processor 502 and a A running computer program, wherein, when the processor 502 executes the computer program, the steps of the above-mentioned entity recognition method are realized.

[0288] Specifically, the above-mentioned memory 501 and processor 502 may be general-purpose memory and processor, which are not specifically limited here. When the processor 502 runs the computer program stored in the memory 501, it can execute the above-mentioned entity recognition method.

[0289] Corresponding to the entity recognition method in this application, the embodiment of this application also provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is run by a processor, the above-mentioned entity recognition is pe...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides an entity recognition method and device, equipment and a storage medium. The method comprises the steps of obtaining original threat intelligence texts; for each original threat intelligence text, marking each segmented word in the original threat intelligence text according to the entity type of the entity to which the segmented word belongs to obtain a training sample; and inputting the training sample into an entity recognition model, and training the entity recognition model by using each segmented word in the training sample and the entity mark corresponding to the segmented word to obtain a trained entity recognition model, wherein a loss function used by the entity recognition model in the training process is used for reducing the spatial distance between segmented words with the same entity mark and increasing the spatial distance between segmented words with different entity marks; and inputting to-be-recognized threat intelligence texts into the trained entity recognition model to obtain an entity recognition result. The recognition accuracy of specific types of entities in the threat intelligence field can be improved.

Description

technical field [0001] The present invention relates to the technical field of information security, in particular, to an entity identification method, device, equipment and storage medium. Background technique [0002] In the field of information security technology, APT (Advanced Persistent Threat, Advanced Persistent Threat) attack means that the attacker uses long-term intelligence collection, information monitoring, etc. Cyber-attacks by demanding technological industry sectors. Due to the strong concealment of APT attacks, in order to better deal with APT attacks, different technology industry departments need to share the threat intelligence collected by them. , attackers, malware, vulnerabilities, and other indicators of attack collected data sets. In this way, exchanging "space" for "time" through threat intelligence sharing is conducive to the adoption of a coordinated approach between different technology industry departments to jointly respond to APT attacks an...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/295G06F40/30G06N3/04G06N3/08
CPCG06F40/295G06F40/30G06N3/08G06N3/045
Inventor 王旭仁熊子晗刘润时何松恒姜政伟施智平江钧凌志婷李小萌刘宝旭熊梦博朱新帅张小庆陈蓉
Owner CAPITAL NORMAL UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products