Threat intelligence oriented entity identification method and system

An entity recognition and intelligence technology, applied in the field of threat intelligence-oriented entity recognition, can solve problems such as unsatisfactory, no uppercase and lowercase marks and word form transformation features, unsatisfactory performance, etc., and achieve the effect of improving the recognition level

Inactive Publication Date: 2019-06-07
INST OF INFORMATION ENG CHINESE ACAD OF SCI
View PDF5 Cites 47 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, although various entity extraction methods vary widely in technical implementation, their extraction effects often have a strong dependence on specific resources (artificial vocabulary or artificial word segmentation data), resulting in that although existing entity extraction methods are in various open evaluations Excellent performance, but the performance in the field of network threat intelligence where corpus resources are scarce is still not satisfactory, that is, the current technology cannot meet the high standards (mainly precision and recall) input expected by IOC, especially for threat intelligence field, the F1 value of entity extraction is only 0%-30% through experiments, so there is still a lot of room for entity extraction research in professional fields
In foreign countries, named entity recognition technology is also in the golden age of development, but Chinese sentences have particularity and complexity, unlike English words separated by spaces to complete word segmentation directly, and there are no uppercase and lowercase marks and word form transformation features, so Chinese Threat intelligence entity identification can only refer to and not directly refer to foreign threat intelligence entity identification tools
[0006] To sum up, the current artificial entity extraction for threat intelligence still requires experienced analysts to spend a lot of energy to complete, which cannot meet the needs. Although there have been some preliminary applications for automated analysis, most of them are very basic trends. For There is often a strong dependence on specific resources. At present, there is no mature entity extraction technology in the field of threat intelligence in China, which is one of the obstacles to making emergency judgments on domestic network security threats.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Threat intelligence oriented entity identification method and system
  • Threat intelligence oriented entity identification method and system
  • Threat intelligence oriented entity identification method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0043] In order to enable those skilled in the art to better understand the technical solutions in the embodiments of the present invention, and to make the objectives, features and advantages of the present invention more obvious and understandable, the technical core of the present invention will be further described in detail below with reference to the accompanying drawings. . It should be understood that the specific embodiments described here are only used to explain the present invention, but not to limit the present invention.

[0044] In the present invention, a threat intelligence-oriented entity identification method is designed. The idea of ​​this method is to first use the ready-made entity recognition tools to perform the first coarse segmentation of the threat intelligence text S, and initially generate the word with the word form i Attribute and part of speech pos i The word list L of the attribute, then perform dictionary matching and rule matching on the word se...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a threat intelligence oriented entity identification method and system. The method comprises the following steps: 1) performing coarse word segmentation on a threat information text serving as a training corpus; 2) constructing a threat information entity common word dictionary library and a rule library, and performing dictionary matching and rule matching on a coarse word segmentation result; 3) marking an entity label for each word based on a matching result to form a training set; 4) constructing a feature template, establishing an indication word bank to perfect the screening form of the feature template, generating context features for the training set by using the feature template, screening, and inputting the screened features into a machine learning modelto carry out parameter iterative training; and 5) performing coarse word segmentation, dictionary matching and rule matching on the threat information text to be identified, and performing entity identification by using the trained machine learning model. According to the threat information entity extraction method, the threat information entity extraction is completed by adopting a means of combining a rule, a dictionary and a model, so that the entity identification precision of the threat information is remarkably improved.

Description

Technical field [0001] The present invention proposes a threat intelligence-oriented entity identification method and system. It quotes linguistic standards in the threat intelligence field, covering natural language processing rule extraction, dictionary extraction and machine learning methods. A total of 28 related entities can be extracted, belonging to The intersection of computer science and network security. Background technique [0002] At present, the number of netizens in my country has reached 772 million. At the same time, my country is constantly suffering from serious cyber attacks, and the outbreak of large-scale security incidents has sharply endangered the security situation of cyberspace. In order to adapt to the rapid evolution of cyber threats, cyber security analysts in various countries are actively collecting statistics on cyber security indicators (Indicators of Compromise, IOC) (e.g. malicious information) from public sources of threat intelligence (such a...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27
Inventor 王璐姜波杜翔宇姜政伟卢志刚
Owner INST OF INFORMATION ENG CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products