Unlock instant, AI-driven research and patent intelligence for your innovation.

An entity recognition method, device, equipment and storage medium

An entity recognition and entity technology, applied in the field of information security, can solve the problems of unbalanced distribution of data labels, long length, and low accuracy of entity recognition results, and achieve the effect of improving accuracy and recognition accuracy.

Active Publication Date: 2022-05-20
CAPITAL NORMAL UNIVERSITY +1
View PDF7 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, for threat intelligence, unlike common text data, the text data of threat intelligence usually has a longer length, and the length of each sentence is far longer than that of ordinary text data. The distance between the two target type entities in the text data is often very far, resulting in a serious imbalance in the distribution of data labels in the sample data.
In this way, when the existing entity recognition model is directly used for entity recognition, due to the serious imbalance in the distribution of data labels in the sample data, the accuracy of the existing entity recognition model for threat intelligence entity recognition results will be low.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • An entity recognition method, device, equipment and storage medium
  • An entity recognition method, device, equipment and storage medium
  • An entity recognition method, device, equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0110] figure 1 A schematic flowchart of an entity identification method provided by an embodiment of the present application is shown, wherein the method includes steps S101-S104; specifically:

[0111] S101, obtain the original threat intelligence text.

[0112] Specifically, the existing entity recognition model is mainly used to recognize common types of entities such as person names, place names, and time in ordinary text data. Considering that the entities that need to be recognized in the field of threat intelligence need to involve a large number of special words, and in the field of threat intelligence There is a dearth of open source datasets in the threat intelligence field, so it is necessary to first build a sample dataset for training entity recognition models.

[0113] In the embodiment of the present application, as an optional embodiment, text data such as articles, blogs, and thesis reports related to threat intelligence may be crawled from a secure website ...

Embodiment 2

[0241] Figure 4 A schematic structural diagram of an entity identification device provided by an embodiment of the present application is shown, and the device includes:

[0242]A data collection module 401, used for acquiring original threat intelligence text;

[0243] The word segmentation marking module 402 is used to mark each word segmentation in the original threat intelligence text according to the entity type of the entity to which the segmentation belongs, for each of the original threat intelligence texts, to obtain a training sample, wherein the entity type is at least Including: threat intelligence type and non-threat intelligence type, each word segment in the training sample corresponds to an entity tag;

[0244] The model training module 403 is used to input the training sample into the entity recognition model for each of the training samples, and train the entity recognition model by using each word segment in the training sample and the entity tag correspon...

Embodiment 3

[0287] like Figure 5 As shown, an embodiment of the present application provides a computer device 500 for executing the entity identification method in the present application, the device includes a memory 501, a processor 502, and a computer device 500 stored on the memory 501 and available on the processor 502 A running computer program, wherein when the processor 502 executes the computer program, the steps of the entity identification method are implemented.

[0288] Specifically, the above-mentioned memory 501 and processor 502 may be general-purpose memory and processor, which are not specifically limited here. When the processor 502 runs the computer program stored in the memory 501, it can execute the above-mentioned entity identification method.

[0289] Corresponding to the entity identification method in the present application, an embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored on the com...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present application provides an entity recognition method, device, device, and storage medium, the method including: obtaining an original threat intelligence text; Each participle in the training sample is marked to obtain a training sample; the training sample is input into the entity recognition model, and each participle in the training sample and the entity tag corresponding to the participle are used to train the entity recognition model to obtain a trained Entity recognition model, wherein, the loss function used by the entity recognition model in the training process is used to reduce the spatial distance between the participles with the same entity tag and increase the spatial distance between the participle with different entity tags; the to-be-recognized Input the threat intelligence text into the trained entity recognition model to get the entity recognition result. The accuracy of identifying certain types of entities within the threat intelligence domain can be improved.

Description

technical field [0001] The present invention relates to the technical field of information security, and in particular, to an entity identification method, apparatus, device and storage medium. Background technique [0002] In the field of information security technology, APT (Advanced Persistent Threat, Advanced Persistent Threat) attack means that the attacker uses long-term intelligence collection, information monitoring, etc. Cyberattacks by demanding technology industry sectors. Because APT attacks are very concealed, in order to better deal with APT attacks, different technology industry departments need to share the threat intelligence they have collected. , Attackers, Malware, Vulnerabilities and other attack indicators collected data sets. In this way, through threat intelligence sharing, "space" is exchanged for "time", which is conducive to adopting a coordinated approach among different technology industry departments to jointly respond to APT attacks and prote...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/295G06F40/30G06N3/04G06N3/08
CPCG06F40/295G06F40/30G06N3/08G06N3/045
Inventor 王旭仁刘润时何松恒熊子晗姜政伟施智平江钧凌志婷李小萌
Owner CAPITAL NORMAL UNIVERSITY