Attack surface management cybersecurity platform and methods

An AI-driven cyber security system with an ASM cloud platform and appliance continuously identifies and reports network vulnerabilities using automated penetration testing, enhancing vulnerability detection and remediation efficiency.

WO2026136829A1PCT designated stage Publication Date: 2026-06-25DARKTRACE INC

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
DARKTRACE INC
Filing Date
2025-12-19
Publication Date
2026-06-25

AI Technical Summary

Technical Problem

Current cyber security methods rely on human penetration tests, which are not continuous and may not identify all vulnerabilities, while automated services are not effective in detecting and reporting potential vulnerabilities without damaging the network infrastructure.

Method used

An AI-based cyber security system using an ASM cloud platform and appliance for continuous automated penetration testing, employing a vulnerability scanner that selects and runs CVE test templates to identify and report vulnerabilities, with AI classifiers to verify assets and map the attack surface.

Benefits of technology

Provides continuous, automated vulnerability detection and reporting, reducing false positives and focusing remediation efforts on critical vulnerabilities, while maintaining network integrity.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure US2025060584_25062026_PF_FP_ABST
    Figure US2025060584_25062026_PF_FP_ABST
Patent Text Reader

Abstract

An exploit assessment and pentesting program tests assets of a network being protected by an ASM cloud platform against known CVE. The vulnerability scanner performs pentesting by selecting CVE test templates from the database of CVE test templates and running the CVE test templates to conduct one or more common vulnerabilities and exposures tests on an asset to see whether the asset is actually vulnerable or not. The vulnerability scanner can cooperate with at least one of a user interface, a display, and a report generator to both i) report a results of the augmented pentesting on the assets of the network as well as ii) present an attack surface of the assets of the network being protected by the ASM cloud platform detected by the exploit assessment and pentesting program in order to show actual CVE risks present in the assets in the network.
Need to check novelty before this filing date? Find Prior Art

Description

Attorney Docket No.: 034306-0031PCT1ATTACK SURFACE MANAGEMENT CYBERSECURITY PLATFORM AND METHODSNOTICE OF COPYRIGHT

[0001] A portion of this disclosure contains material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the material subject to copyright protection as it appears in the United States Patent & Trademark Office's patent file or records, but otherwise reserves all copyright rights whatsoever.RELATED APPLICATION

[0002] This application claims priority under 35 USC 119 to US provisional patent application SN 63 / 736,488, titled ‘CYBERSECURITY COMPONENTS, filed on December 19, 2024, and is incorporated herein by reference in its entirety.FIELD

[0003] Cyber security and in an embodiment use of Artificial Intelligence in cyber security.BACKGROUND

[0004] A company typically hires an external cyber company to perform a penetration test conducted by human cyber experts on its network to satisfy insurance and security needs; rather than, an automated service that performs a cyber based penetration test on a continuous basis, which identifies and reports the assets that are potentially vulnerable while being non-damaging to the network infrastructure.SUMMARY

[0005] Methods, systems, and apparatus are disclosed for an Artificial Intelligence-based cyber security system that use an ASM cloud platform and / or a cyber security appliance. In an embodiment, an assessment and testing module can implement an exploit assessment and pentesting program and a scheduler to trigger the exploit assessment and pentesting program to test assets of a network beingprotected by the ASM cloud platform against known common vulnerabilities and exposures (CVE). The vulnerability scanner of the exploit assessment and pentesting program can cooperate with a database of CVE test templates. The vulnerability scanner of the exploit assessment and pentesting program performs pentesting by selecting one or more CVE test templates from the database of CVE test templates and running the one or more CVE test templates to conduct a particular common vulnerabilities and exposures test on an asset, under analysis, in the network being protected by the ASM cloud platform to see whether the asset in the network being protected by the ASM cloud platform is actually vulnerable or not. The vulnerability scanner of the exploit assessment and pentesting program can cooperate with at least one of a user interface, a display, and a report generator to both i) report a results of the augmented pentesting on the assets of the network as well as ii) present an attack surface of the assets of the network being protected by the ASM cloud platform detected by the exploit assessment and pentesting program to show actual CVE risks present in the assets in the network.

[0006] These and other features of the design provided herein can be better understood with reference to the drawings, description, and claims, all of which form the disclosure of this patent application.DRAWINGS

[0007] The drawings refer to some embodiments of the design provided herein in which:

[0008] Figure 1 illustrates a block diagram of an embodiment of an example exploit assessment and pentesting program that cooperates with a set of ASM classifiers in an ASM cloud platform to discover and test assets of a network.

[0009] Figure 2 illustrates a block diagram of an embodiment of the cyber security appliance cooperating with the attack surface management cloud platform to execute a CVE test template on an asset of the network that, when executed, is nondamaging to an operation of the asset being tested but does confirm a comprisable status of the asset being tested, by capturing non-damaging concrete proof of the asset’s vulnerability.

[0010] Figure 3 illustrates a block diagram of an embodiment of the exploit assessment and pentesting program in the ASM cloud platform assessing and discovering information about the assets in the network being protected by passivelycollecting information on what web assets and externally facing assets that are being implemented in the network being protected, via a set of classifiers and web crawler tools, to map out an attack surface of that network being protected.

[0011] Figure 4 illustrates a flow diagram of an embodiment of an exploit assessment and pentesting program cooperating with other components to automatically determine and test assets of a network.

[0012] Figure 5 illustrates a block diagram of an embodiment of the Al-based cyber security appliance with example components that protects a system, including but not limited to a network / domain, from cyber threats.

[0013] Figure 6 illustrates a graph of an embodiment of an example chain of unusual behavior for, in this example, the email activities and IT network activities deviating from a normal pattern of life in connection with the rest of the system / network under analysis.

[0014] Figure 7 illustrates a block diagram of an embodiment of one or more computing devices that can be a part of the Artificial Intelligence-based cyber security system including the cyber security appliance and the ASM cloud platform discussed herein.

[0015] While the design is subject to various modifications, equivalents, and alternative forms, specific embodiments thereof have been shown by way of example in the drawings and will now be described in detail. It should be understood that the design is not limited to the particular embodiments disclosed, but - on the contrary - the intention is to cover all modifications, equivalents, and alternative forms using the specific embodiments.DESCRIPTION

[0016] In the following description, numerous specific details are set forth, such as examples of specific data signals, named components, number of servers in a system, etc., in order to provide a thorough understanding of the present design. It will be apparent, however, to one of ordinary skill in the art that the present design can be practiced without these specific details. In other instances, well known components or methods have not been described in detail but rather in a block diagram in order to avoid unnecessarily obscuring the present design. Further, specific numeric references such as a first server, can be made. However, the specific numeric reference should not be interpreted as a literal sequential order but rather interpreted that the first server is different than a second server. Thus, thespecific details set forth are merely exemplary. Also, the features implemented in one embodiment may be implemented in another embodiment where logically possible. The specific details can be varied from and still be contemplated to be within the spirit and scope of the present design.

[0017] An exploit assessment and pentesting program tests assets of a network being protected by an ASM cloud platform against known CVE. The vulnerability scanner performs pentesting by selecting CVE test templates from the database of CVE test templates and running the CVE test templates to conduct one or more common vulnerabilities and exposures tests on an asset to see whether the asset is actually vulnerable or not. The vulnerability scanner can cooperate with at least one of a user interface, a display, and a report generator to both i) report a results of the augmented pentesting on the assets of the network as well as ii) present an attack surface of the assets of the network being protected by the ASM cloud platform detected by the exploit assessment and pentesting program in order to show actual CVE risks present in the assets in the network.

[0018] Figure 1 illustrates a block diagram of an embodiment of an example exploit assessment and pentesting program that cooperates with a set of Attack Surface Management (ASM) classifiers in an ASM cloud platform to discover and test assets of a network. The ASM cloud platform 201 architecture can include example components such as follows.

[0019] One or more servers with an assessment and testing module 207 configured to implement an exploit assessment and pentesting program 211 and a scheduler 255 for the exploit assessment and pentesting program 211 . The exploit assessment and pentesting program 211 determines what technology is on each asset in a network, such as a website, and extracts that technology, and then compares that to a database of known common vulnerabilities and exposures (CVEs) 217.

[0020] A vulnerability scanner of the exploit assessment and pentesting program 211 performs the augmented pentesting by selecting and running a CVE template from a database of CVE test templates 217 to conduct that particular common vulnerabilities test on a corresponding asset to see if each asset in a network is actually vulnerable or not. The vulnerability scanner performs the augmented pentesting for each of the possible CVEs on that asset according to the technology being implemented on that asset.

[0021] A vulnerability scanner of the exploit assessment and pentesting program 211 cooperates with at least one of a user interface 241 , a display, and a report generator 243 (and in an embodiment all three of the user interface 241 , the display, and the report generator 243) to both i) report a results of the augmented pentesting on the assets of the network as well as ii) present an attack surface of the assets of the network being protected by the ASM cloud platform 201 detected by the exploit assessment and pentesting program 211 in order to show actual CVE risks present in the assets in the network.

[0022] The user interface 241 can cooperate with a trained risk rating model 231 to provide an Al-driven risk contextualization and prioritization to evaluate identified assets to determine their criticality and assign a risk score to help focus remediation efforts on the most significant vulnerabilities.

[0023] A CVE search module 221 (e.g. a CVE Search Processor) monitors for newly published CVEs for each particular technology and updates a database of CVEs 219 to store known CVEs associated with each particular technology to be a single source of truth for all CVE information for the ASM service in the ASM cloud platform 201 .

[0024] An assessment and testing module 207 with the exploit prediction assessment portion of the exploit assessment and pentesting program 211 cooperates with a set of ASM classifiers (e.g. an image classifier 223, a fuzzy hash classifier 225, a domain string classifier 227, and / or a HTML classifier 229). The set of ASM classifiers cooperate with web crawler tools 223 to detect and then verify what assets are definitely associated with the network being protected by the ASM cloud platform 201 to prevent randomly pentesting assets that do not belong to the network being protected. The exploit assessment and pentesting program 211 and the set of ASM classifiers form an ASM assessment and discovery tool hosted on this cloud platform 201 to cooperate to find new assets that potentially belong to a customer in a network and then test those assets to show a network’s risks.

[0025] One or more processors form a processor framework.

[0026] One or more non-transitory machine readable mediums, such as databases, datastores, memories, etc., where when software instructions form part of the exploit assessment and pentesting program 211 , the user interface 241 , the report generator 243, the assessment and testing module 207, the trained risk rating model 231 , and other components of the cloud platform 201 , then those instructionsare stored in an executable format in the one or more non-transitory machine readable mediums to be executed by the one or more processors.

[0027] The assessment and testing module 207 in the ASM cloud platform 201 can implement an exploit assessment and pentesting program 211 to perform an automated cyber based penetration test on a continuous regular basis set by a scheduler 255. The exploit assessment and pentesting program 211 is coded to discover assets in and then highlight the attack surface of the network being protected as well as the technology that is running on the assets in the network as well as identifies the assets in the network that are potentially vulnerable. The scheduler 255 in the assessment and testing module 207 triggers the exploit assessment and pentesting program 211 to test assets of a network being protected by an ASM service hosted on the ASM cloud platform 201 against known common vulnerabilities and exposures (CVE). The vulnerability scanner of the exploit assessment and pentesting program 211 cooperates with a database / data store of actual CVE test templates 217. The vulnerability scanner of the exploit assessment and pentesting program 211 performs the augmented pentesting by selecting one or more CVE test templates from the database of CVE test templates 217 and running the one or more CVE test templates to conduct each particular common vulnerabilities and exposures test on a corresponding asset, under analysis, in the network being protected by the ASM cloud platform 201 to see whether each asset in the network being protected by the ASM cloud platform 201 is actually vulnerable or not. The vulnerability scanner chooses each of the one or more CVE test templates to verify that specific CVE vulnerability on that specific asset (e.g. server) rather than attempting to validate all servers for all possible software and hardware CVEs, which assists in not abusing the processing power and traffic congestion within a network’s infrastructure.

[0028] Note, a common vulnerabilities and exposures (CVE) can be a standardized identifier for publicly known cybersecurity weaknesses for a specific flaw in software or hardware, allowing cyber security professionals worldwide to track, discuss, and fix issues. The flaw in code or design allows cyber attackers to exploit this flaw to do malicious acts such as gain unauthorized access, steal data, disrupt services, or install malware, which impacts confidentiality, integrity, and / or availability. Generally, vendors get around to sending a patch or update to fix the flaw in the code or hardware design.

[0029] The exploit assessment and pentesting program 211 is also configured to utilize various other tools including a set of artificial intelligence ASM classifiers to assist in the independent asset discovery process by doing an image classification, a fuzzy hash classification, a domain string classification, a HTML classification and / or combining results from these classifications that detect and then verify what assets are definitely associated with the network being protected by the ASM cloud platform 201 (e.g. customer) versus randomly pentesting anybody out there, or looking at anybody out there. The exploit assessment and pentesting program 211 cooperates with a set of classifiers and web crawler tools 223 to verify that they map out assets that are definitely the customer's assets. The exploit assessment and pentesting program 211 and the set of of ASM classifiers (e.g. image, fuzzy hash, domain string classification, etc.) cooperate to determine whether an asset, such as a website, is in a network being protected and extract the technology implemented on that asset, and then compare that to known CVEs, and then run one or more CVE test templates to conduct each particular common vulnerabilities test to determine and verify whether each asset in the network is actually vulnerable or not. The web crawler tools 223 cooperate with a set of ASM classifiers (e.g. an image classification, a fuzzy hash classification, a domain string classification, and / or a HTML classification) to find new assets that potentially belong to a customer in a network being protected by the ASM cloud platform 201 to continuously determine the attack surface of assets of the network and then the vulnerability scanner of the exploit assessment and pentesting program 211 performs the augmented pentesting of those assets of the network to show a network’s risks.

[0030] Figure 2 illustrates a block diagram of an embodiment of the cyber security appliance cooperating with the attack surface management cloud platform to execute a CVE test template on an asset of the network that, when executed, is nondamaging to an operation of the asset being tested but does confirm a comprisable status of the asset being tested, by capturing non-damaging concrete proof of the asset’s vulnerability. The assets of the network can include a firewall, servers, load balancers, databases, and other components in the network.

[0031] The vulnerability scanner of the exploit assessment and pentesting program 211 selects and executes a CVE test template that, when executed, is nondamaging to an operation of the asset being tested (e.g. does not shutdown the device, remove permissions, lock or encrypt data, or otherwise restrict the normaloperation of that device) but does confirm a comprisable status of each asset being tested by capturing non-damaging concrete proof of the asset’s vulnerability (such as copying a set of several files on that device that should be unique to that device being tested, a screen shot clearly proving that the device was compromised by the CVE test template, etc.).

[0032] The ASM cloud platform 201 goes beyond just known vulnerabilities by considering all potential entry points a hacker could use into a network, including misconfigurations, exposed assets, and third-party services.

[0033] Referring back to Figure 1 , the ASM cloud platform 201 uses components, such as a set of ASM classifiers and an exploit assessment and pentesting program 211 , to continuously monitor ongoing discovery and assessment of assets in a network to stay up to date on emerging vulnerabilities and changes in the attack surface of the network being protected. The Attack Surface Management cloud platform 201 provides continuous, tailored detection of externally exposed assets to help a network being protected by the ASM cloud platform 201 identify, prevent, and remediate digital risks to brands and assets of your network.

[0034] CVE TESTING

[0035] Next, an assessment and testing module 207 can implement an exploit assessment and pentesting program 211 and a scheduler 255 to trigger the exploit assessment and pentesting program 211 to test assets of a network being protected by an ASM cloud platform 201 against known common vulnerabilities and exposures (CVE). The exploit assessment and pentesting program 211 is configured to look up for each asset discovered to be in the network and see what common vulnerabilities exist in a database of common vulnerabilities and exposures 219 compared to the technology implemented in the customer's network, and then enact with templates of actual CVE tests to see if the technology implemented in the web assets and externally facing assets of customer's network is vulnerable, (e.g. run the common vulnerability to see if that asset is truly vulnerable or not) and then print out or otherwise notify the customer of the results of this pentesting assessment program.

[0036] A vulnerability scanner of the exploit assessment and augmented pentesting program 211 cooperates with a database of CVE test templates 217. The vulnerability scanner of the exploit assessment and pentesting program 211 performs the augmented pentesting by selecting one or more CVE test templates from the database of CVE test templates 217 and running the one or more CVE testtemplates to conduct each particular common vulnerabilities and exposures test on a particular asset, under analysis, in the network being protected by the ASM cloud platform 201 to see whether that particular asset in the network being protected by the ASM cloud platform 201 is actually vulnerable or not. The exploit assessment and pentesting program 211 repeats the selection of CVE test templates corresponding to the technology implemented on that asset for each of the assets in the network being protected until all of the technology has been tested.

[0037] The exploit assessment and pentesting program 211 uses a vulnerability scanner to scan for vulnerabilities. The vulnerability scanner of the exploit assessment and pentesting program 211 that cooperates with a database / datastore of templates of actual CVE tests 217 to see if the technology implemented in the network (e.g. web assets and externally facing assets of customer's network) is vulnerable and reports back the found vulnerabilities with evidence. The database / store of CVE test templates 217 can be a repository of templates available for the vulnerability scanner. This can consist of templates created by the community as well as by internal developers. CVEs can be grouped per technology implemented. For example, different versions of CVE test for different possible software versions are implemented. For each CVE, there is an indicator to show if validation is possible. The vulnerability scanner can run CVE test templates on YAML templates. A big library of these is available from the community, which is stored into the datastore of templates of actual CVE tests. The vulnerability scanner selects a particular CVE test template depending on the technology being implemented on that asset. Each CVE test template can be a written Python implementation of the exploit code for that particular technology implemented on the asset. Analysts can test CVE templates and revise / update them as needed. Thus, the exploit assessment and pentesting program 211 templates can be created and curated by the analyst team. The CVE test template can be populated with available public CVE tests and custom CVE test templates manually or autonomously made by the ASM cloud platform 201 .

[0038] The ASM cloud platform 201 architecture can use a trained LLM or manually code to make the set of CVE templates to test that CVE on an asset potentially vulnerable to that corresponding CVE. The set of CVE templates consists of two or more templates to test a particular CVE versus an asset potentiallyvulnerable to that corresponding CVE from the database of common vulnerabilities 219 known and associated with each particular technology.

[0039] To be able to validate more dependencies, the vulnerability scanner of the exploit assessment and pentesting program 211 can make use of a server, such as an Interactsh server. The server running the exploit assessment and pentesting program 211 does, for example: Make calls to a target asset; Call-backs on responses from a target asset; Perform tasks to carry out the CVE test; and Catch evidence that call-backs have taken place.

[0040] The server can create a subdomain for each CVE test. When this subdomain is called, it is known that the target asset is vulnerable. The vulnerability scanner can directly catch the results from an interactive server.

[0041] In an example, based on the input message that the exploit assessment and pentesting program 211 receives about the type and version of software and hardware, the exploit assessment and pentesting program 211 will select the correct one or more CVE test templates to run. The inputs will provide: Technologies and their versions found on each asset; and CVEs that are currently added to the asset risk.

[0042] The exploit assessment and pentesting program 211 can increase the scope of vulnerabilities and misconfigurations that ASM is capable of identifying.

[0043] The vulnerability scanner may not offer Python bindings. As such, the vulnerability scanner may need to be invoked as a new process and even invoked on a separate infrastructure. After selecting the CVE test templates, the exploit assessment and pentesting program 211 may do the following: Construct the command to run the vulnerability scanner with the selected CVE test templates; Spawn a process that runs a command; Run the exploits written for these CVEs on the target assets; Obtain the results and evidence from the vulnerability scanner and the server; Remove certain information from the evidence if demanded by the template, such as PH information; Parse the results and add them on the result queue; Cooperate with the user interface 241 to display the vulnerability status and if applicable provide a link to evidence of compromise, such that the evidence is not generically shown on to be visible in the user interface 241 ; Also, provide a suggested solution and a mitigation button; and Create a log of successful and failed tests, including the target asset, the failed test date and cause that the specific CVE test template that was run, and the point of failure. The vulnerability scanner of theexploit assessment and pentesting program 211 in the cloud platform 201 performs the augmented pentesting by selecting the one or more CVE test templates from the database of CVE test templates 217 corresponding to the technology implemented that asset and running each of the one or more CVE test templates to conduct each particular common vulnerabilities and exposures test on a corresponding asset in the network to see whether each asset in the network is actually vulnerable or not, and then coordinates with the user interface 241 to display on the user interface 241 a date of the test and present one or more CVE vulnerabilities that the first asset is actually vulnerable to.

[0044] When running CVE test templates, the traffic created can be of a suspicious nature. There are multiple ways that the traffic can be flagged as suspicious. While the use of the exploit assessment and pentesting program 211 is opt-in, meaning customers are aware that this traffic could be generated, there is still risk of abuse reports being send due to the amount of generated data. As running all possible CVE test templates can put a strain on the customer infrastructure, it is important to be selective of an amount of the CVE test templates to run and dispersing when these CVE tests are run on each asset being tested. Accordingly, the ASM cloud platform prevents randomly pentesting assets that do not belong to the network being protected.

[0045] To prevent scanning targets not owned by the client, a request / verification step can be sent to clients to confirm assets that are not theirs.Currently, the exploit assessment and pentesting program 211 may, in an example, prove ownership of a domain by an entity associated with the network being protected, setting a DNS TXT record with a value defined to an uncommon value. A unique hash can be made with the DNS TXT records and the asset target domain or well-known URI used. By setting this DNS record and its value, ownership of the domain can be verified and applications in the domain can be scanned. The exploit assessment and pentesting program 211 may run on target assets that have a defined TXT record in their respective DNS table.

[0046] The scheduler 255 for the exploit assessment and pentesting program 211 can triggered either i) set manually, ii) automatically off of the newsroom feed, or iii) it's triggered by something else, like a vulnerability scanning trigger and then the CVE tester template is selected for the identified technology and its known CVE.The schedular 255 can run the exploit assessment and pentesting program 211 asfollows: Run the exploit assessment and pentesting program 211 weekly on all confirmed applications; Distribute the scheduled attacks over time to avoid taking down customer's infrastructure; When new templates become available, run on all applicable confirmed applications, only for these templates; When there are updates to the technologies or CVE risks of an application, directly run the exploit assessment and pentesting program 211 on that asset; When an end-user manually triggers a rerun on a specific application; The run frequency of tests can be coupled to the rate of changes on the assets and to the addition of new templates; and / or When an asset has not changed its technology being implemented, a CVE test template will not run a second time automatically, but the CVE test template can always be triggered manually by the user.

[0047] A goal of the exploit assessment and pentesting program 211 can be to augment penetration testing by running an exploit code contained in a CVE test template against a customer network infrastructure that is known to be potentially vulnerable. The exploit code of the exploit assessment and pentesting program 211 aims at continuously and autonomously verifying whether the infrastructure is vulnerable. The exploit code itself is designed to penetrate and spread but do no lasting harm or trigger adverse effects, and thus, is harmless for the targeted systems. The exploit code is designed to extract and expose internal data as concrete proof of the system’s vulnerability. This concrete proof of the system’s vulnerability could typically be a snippet of a file retrieved during the exploit.

[0048] The results from the vulnerability scanner of the exploit assessment and pentesting program 211 conducting one or more CVE test templates can be evaluated by a trained risk rating model 231 to provide an Al-driven risk contextualization and prioritization to evaluate identified assets to determine their criticality and assign a risk score to help focus remediation efforts on the most significant vulnerabilities.

[0049] The exploit assessment and pentesting program 211 can improve the prioritization of risks and mitigation actions compared to a generic vulnerability scan, and provide a reduced rate of false positive detections.

[0050] DETERMINE WHAT TECHNOLOGY IS BEING IMPLEMENTED ON EACH ASSET

[0051] Next, the exploit assessment and pentesting program 211 can determine what technology (e.g. software including its type and version andhardware including its type and version) is being implemented on each asset (e.g. web assets and externally facing assets of the customer's network) in the network, (such as a website,) and then to extract that technology to compare that technology implemented on the each asset, under analysis, to a database of known common vulnerabilities and exposures (CVEs) 219.

[0052] The exploit assessment and pentesting program 211 is coded to use Artificial Intelligence tools, including the Al based ASM classifiers, and other tools, such as web crawlers, to passively collect technology and crawl across the Internet and its assets, such as a webpage, to scan and identify what technology is being used including what versions of software and so forth are being implemented on that webpage and / or other asset in a network. Next, once the exploit assessment and pentesting program 211 identifies the technology implemented in the network asset, then the vulnerability scanner of the exploit assessment and pentesting program 211 can figure out by doing a lookup of common vulnerabilities and exposures known with that set of identified technology and its current set of software and hardware.

[0053] The exploit assessment and pentesting program 211 can improve the reliability of software and vulnerability detections (using a fingerprinting partial and other techniques to passively identify or validate the software version running on a machine (e.g. an asset)).

[0054] MAP OUT THE ATTACK SURFACE

[0055] Figure 3 illustrates a block diagram of an embodiment of the exploit assessment and pentesting program in the ASM cloud platform assessing and discovering information about the assets in the network being protected by passively collecting information on what web assets and externally facing assets that are being implemented in the network being protected, via a set of classifiers and web crawler tools, to map out an attack surface of that network being protected. An important part of the Attack Surface Management cloud platform 201 is identifying all Internetfacing assets, like web applications, cloud services, and servers, in a network to understand the full scope of the attack surface. The ASM cloud platform 201 uses Al classifiers, including image recognition, to understand what makes an asset reflective of the network being protected.

[0056] The Attack Surface Management (ASM) cloud platform 201 utilizes the exploit assessment and pentesting program 211 to assess and discover information about the assets in the network being protected by the ASM cloud platform 201 bypassively collecting information on what web assets and externally facing assets that are being implemented in the network being protected via a set of classifiers and web crawler tools 223 to map out an attack surface of that network being protected and then present the attack surface on the user interface 241 to show CVE risks present in the assets in the network, without a need for a human to supply what assets make up the attack surface of the network being protected.

[0057] Thus, the exploit assessment and pentesting program 211 can be coded to assess and discover by passively collecting information about the assets in the network being protected by the ASM cloud platform 201 with the ASM classifiers, the web crawlers, and the cyber security appliance 100. This is a continuous loop of discovery because when a user adds new assets to their network, then those assets become discoverable assets. The discovery program aspect of the exploit assessment and pentesting program 211 cooperating with the user interface 241 allows users to discover and then present the full scope of their external attack service.

[0058] The assessment and testing module 207 with an exploit prediction assessment portion of the exploit assessment and pentesting program 211 is configured to cooperate with a set of ASM classifiers that allows both the discovery of assets and the validation of the CVE risk in those assets in the network in an automated flow, instead of having humans doing it, so the algorithms continuously learn from those newly confirmed assets of what assets have been discovered already and what CVE have been tested by the Al models used in the Al classifiers. In addition, the assessment and testing module 207 with an exploit prediction assessment portion of the exploit assessment and pentesting program 211 is configured to cooperate with a set of ASM classifiers to allow a user to assess the CVE risk and to verify vulnerabilities on that scope.

[0059] Referring back to Figure 1 , the assessment and testing module 207 with an exploit prediction assessment portion of the exploit assessment and pentesting program 211 is configured to cooperate with a set of ASM Classifiers. The set of ASM classifiers (e.g., an image classifier 223, a fuzzy hash classifier 225, a domain string classifier 227, and / or a HTML classifier 229) to detect by scraping information on the Internet and then verify what assets are definitely associated with the network being protected by the ASM service in the ASM cloud platform 201 (e.g. customer) to prevent randomly pentesting just anybody’s assets out there, and / orlooking at a non-relevant network out there; and thus, prevent randomly pentesting assets that do not belong to the network being protected. The set of ASM classifiers (e.g. image, Fuzzy hash, and Domain string, HTML classifiers) can be trained to relate the customer’s / network’s images / Domain / URL hash, via classification, are used to find similar assets in a data set of billions of host names that are all scraped for data and because the ASM classifiers work fully automated and autonomously, then the ASM cloud platform 201 via these ASM classifiers can analyze those billions of host names and their data to find the relevant ones.

[0060] The set of ASM classifiers and crawler discovery tools form an automated asset discovery system, which uses a multitude of different classifications in the combined set of ASM classifiers to potentially combine their outputs to automatically discover, without human input, a series of web assets and externally facing assets that belong to a network being protected (e.g. business). The exploit prediction assessment of the exploit assessment and pentesting program 211 can then test those discovered web assets and externally facing assets, which are known to be susceptible to running software vulnerable to particular CVEs.

[0061] ASM Classifiers related to the customer images / Domain / URL hash / HTML features

[0062] The ASM cloud platform 201 uses a set of Al classifiers with image recognition and OCR technologies for tasks (e.g. an image classification, a fuzzy hash classification, a domain string classification, and / or an HTML classification) to understand what makes an asset to being a part of and belong to a network being protected.

[0063] In general, the attack surface management (ASM) components on the cloud platform 201 include the assessment and testing module 207 with an exploit prediction assessment portion of the exploit assessment and pentesting program 211 cooperating with a set of ASM classifiers to provide Al-driven continuous asset discovery, risk contextualization, prioritization, and remediation / monitoring to identify and secure all digital, physical, and human attack points. The exploit prediction assessment portion of the exploit assessment and pentesting program 211 cooperates with the set of ASM classifiers to discover assets like a network’s cloud services and endpoints, evaluating their vulnerabilities, prioritizing based on risk, and a CVE search module 221 to continuously monitoring for new CVE threats.

[0064] The exploit prediction assessment portion of the exploit assessment and pentesting program 211 cooperates with the set of ASM classifiers to provide continuous, tailored detection of external exposed assets and potential risk. Using Al techniques in the set of ASM classifiers, the Attack Surface Management cloud platform 201 identifies Internet-facing assets unique to the network and organization being protected and cooperates with the exploit assessment and pentesting program 211 to provide a comprehensive view of the network and organization’s external attack surface in real time.

[0065] The Attack Surface Management cloud platform 201 uses Al-driven analysis from the set of ASM classifiers cooperating with the exploit assessment and pentesting program 211 to give a cyber security team comprehensive insights into the attack surface and digital risks of the network. The set of ASM classifiers of the Attack Surface Management cloud platform 201 can use a brand-centric approach, in which the exploit prediction assessment portion of the exploit assessment and pentesting program 211 cooperates with the set of ASM classifiers to identify Internet-facing assets unique to the network and organization being protected, ensuring essentially a zero-scope, zero-touch implementation. The set of ASM classifiers and web crawling tools can draw from a diverse set of information sources. The set of ASM classifiers detects potential threats beyond known servers and networks. The set of ASM classifiers cooperating with the exploit assessment and pentesting program 211 can provide continuous, tailored detection of externally exposed assets, allowing for immediate detection and response to emerging threats, and providing a dynamic and proactive approach to managing the network’s digital security.

[0066] The set of ASM classifiers can use Al-driven continuous asset discovery components to identify all known, unknown, and "shadow IT" assets like cloud services, endpoints, and APIs in the network being protected to provide a complete view of the attack surface. Unlike traditional methods that provide a static snapshot or update on a weekly / monthly cadence, the exploit assessment and pentesting program 211 and a scheduler 255 in the assessment and testing module 207 continuously monitor the digital estate of the network - identifying risks, high- impact vulnerabilities, and external threats quickly. The continual crawling tools used by the cloud platform 201 cooperate with the set of set of ASM classifiers tocrawl the Internet in general and the network itself to ensure any new or unconfirmed assets are accounted for, and any potential new risks or vulnerabilities are detected.

[0067] The Attack Surface Management cloud platform 201 uses a range of Al techniques, including natural language processing (NLP) in the set of ASM classifiers and the report generator 243, to understand what makes an external asset belong to the network and organization being protected - searching beyond known servers, networks, and IPs, to discover more assets than the cyber security team in the organization may realize it has. Unlike other solutions, the exploit prediction assessment portion of the exploit assessment and pentesting program 211 and the set of ASM classifiers in the Attack Surface Management cloud platform 201 , allow for manual input from the cyber security team but requires no technical input from the cyber security team on, for example, no IP ranges, no other parameters - the network and organization being protected brand name is all that is needed. Drawing from a wide array of information sources, the set of ASM classifiers cooperating with the crawling tools in the Attack Surface Management cloud platform 201 can uncover assets that either have a technical link with the network’s core infrastructure or can be associated with the network’s brand based on publicly available information - ensuring nothing gets missed.

[0068] The crawling tools of the cloud platform 201 monitors network digital assets of Web applications, APIs, cloud resources, servers, operating systems, and databases as well as physical assets of endpoint devices like laptops, workstations, servers, and the physical infrastructure connecting them.

[0069] The exploit prediction assessment portion of the exploit assessment and pentesting program 211 is configured to cooperate with a set of ASM classifiers to speed up discovery, assessment, and remediation processes, enabling a faster and more accurate response to CVE threats. Again, the results from multiple classifiers in the set can be analysed to confirm whether an asset belongs to, or does not belong to, the network being protected.

[0070] Image classification

[0071] Why

[0072] The image classifier 223 can use image recognition technology as well as Optical Character Recognition. One of the unique features of a brand can be a logo. This logo is usually a very distinct visual by which the brand can be recognized through image classification. For classifying whether webpages belong to / try toimpersonate / can be relevant to a brand, there is therefore a lot of value in finding brand logos.

[0073] The image classifier 223 in the set of ASM classifiers uses image recognition technology that detects image similarity based on keypoints found in compared images, scaled and rotated all in three dimensions, to do an image assessment on visual similarities between the keypoints of an image under analysis compared to the keypoints of images known to belong to the network being protected.

[0074] How

[0075] The Keypoint Image Classification Model can use various image classification techniques, filters, and other algorithms to compare images, scaled and rotated all in three dimensions, to do an image assessment on visual similarities between the keypoints of an image under analysis compared to the keypoints of images known to belong to the network being protected. The image classifier 223 can do an image assessment to look at similarities between the objects (e.g., keypoints) of the image, underneath analysis, compared to the objects of the images known to belong to the network being protected. The image classifier 223 can use an Al trained model. In an embodiment, the image classifier 223 is trained on image logos known to belong to the network being protected. The cloud platform 201 can also have a database of image logos 215 known to belong to the network being protected, so the image classifier 223 can see the similarities, and then have that comparison to the current object under analysis, and then how distinctive they are. The cloud platform 201 uses a tool to scrape as much images as possible off the Internet to get all the images that cloud platform 201 needs to find and build a database with images 215 known to be associated with the network being protected, images already compared and found not to be associated with the network being protected, and random images. The image classifier 223 analyzes in all three dimensions to look at how many points line up when the system rotates the image, under analysis, when determining the best match to the angle of known images to make the similarity determination.

[0076] In an embodiment, the image classifier 223 can use image recognition technology such as a trained Keypoint Image Classification Model that detects image similarity based on identical keypoints found in both images) to do an image assessment to look at least visual similarities (and possibly OCR similarities) scaledand rotated all in three dimensions between the objects of the image under analysis compared to the objects of the images known to belong to the network being protected. A homography transformation can be used to find outliers that are removed / filtered out from analysis.

[0077] The image classifier 223 can i) use the database on brand-specific and / or other objects of the images known to belong to the network being protected, ii) be trained on those brand-specific and / or other objects of the images known to belong to the network being protected, and iii) any combination of both. In an example, the Keypoint Image Classification process can utilize some of the points found in the research paper: Distinctive Image Features from Scale-Invariant Keypoints (2004, D.Lowe, University of British Columbia). The Keypoint Image Classification process extracts distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive in the sense that a single feature can be correctly matched with high probability against a large database of features from many images 215. The Keypoint Image Classification process uses these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects 215 to identify clusters belonging to a single object, and finally performing verification using, for example, an algorithm around the SIFT (Scaleinvariant feature transform) algorithm: the Keypoint Image Classification Model with an OCR feature. This approach to image recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance. The SIFT algorithm is merely used to return a list of keypoints found in an image, such as a vector describing some characteristics of the keypoint, data to describe the keypoints, the angle and size of the keypoints, so that bad keypoints matches can be filtered out.

[0078] The approach suggested here is to determine 'keypoints' (unique points that define the image) in both images and compare the similarity distance between all keypoints in both images but with improvements to the simple comparison. If a sufficiently large enough amount of keypoint matches (= similarenough) are found equal to or above a set threshold, then the approach looks for similar keypoints in imagel that can be found in image2.

[0079] An additional algorithm can be built around the SIFT algorithm to improve the Keypoint Image Classification Model. The Keypoint Image Classification Model has proven to rapidly automatically perform image detection with an extremely high accuracy.

[0080] Homoqraphv

[0081] After filtering on angle and size, a homography transformation can be used to find outliers. This function will try to map the keypoints of the predicted image to the keypoints of the input image. If a majority of those keypoints follow the same transformation, the keypoints that do not are labelled as outliers. The outliers are removed.

[0082] Filter out double matches

[0083] It is possible that two or more keypoints point to the same keypoint in the predicted image. It can also happen that one keypoint points to two or more keypoints in the predicted image. This can be because there is a repeating image in one of the images (this can be a brand logo that appears multiple times in the same image). The model merely wants to keep the relation of 1 keypoint to 1 keypoint. If there is a many to 1 relation, the model may merely keep the relation with the lowest KNN distance.

[0084] What

[0085] The keypoint model requires only a set of brand logos as input. Note, a more diverse set of brand logos increases the chance of detecting a diverse set of logos. An individual loop per brand logo (given each logo needs to be matched against the prediction image) will proceed as follows:

[0086] For performance reasons, each of the brand logos is pixelated down to various sizes, and the original logo image sizes are removed. When the highest resolution version of a logo does not find enough keypoint matches in the predicted image, the lower resolution versions of this logo will not be matched against the predicted image.

[0087] A drawback of the keypoint image classifier 223 is that the keypoint image classifier 223 looks for visual similarities in images. That means that the keypoint image classifier 223 is also very good at finding matching words in a similar font. For detecting brand logos, this can be challenging when brand logos consist ofmainly of a word in a particular font. For this reason, the keypoint classification model is extended with an OCR feature that can recognize whether text is available in the brand logo and take this into consideration when coming up with the final classification score.

[0088] Domain string classifier

[0089] The domain string classifier 227 in the set of ASM classifiers that is configured to use Optical Character Recognition and cooperate with the assessment and testing module 207 that uses at least a web crawler tool 223 to collect a plurality of (e.g. every) domain strings (including host names, potentially all the way to the Fully Qualified Domain Name, etc.) being used on the Internet (e.g. in the whole world) to analyze at least a domain string, under analysis, and an association / relevance of the domain string, under analysis, to the network being protected by the ASM cloud platform 201 . The domain string classifier 227 uses a human language-agnostic process to break up the words / parts in the domain string, under analysis, and then interpret and classify the words / parts for relevance to a specific network being protected by the ASM cloud platform 201 . After splitting the domain string, under analysis, (e.g., the URL string) into its parts, the algorithm in the domain string classifier 227 checks the separated words in the different parts and tries to determine the one with the highest score for customer relevancy and key words in light of the likely human language being utilized.

[0090] A "domain string” on the Internet can refer to, for example, a domain name, which is the human-readable text (e.g., google.com) that identifies, for example, a website, acting as an easy-to-rem ember alias for complex numerical IP addresses (like 142.250.190.78), with the Domain Name System (DNS) translating these names into IP addresses so users can find resources without memorizing numbers. These domain strings are structured hierarchically (e.g., www.example.com), with the rightmost part / suffix after the last dot being the toplevel domain (e.g., .com, .org, .net, .gov, etc.) and the part to its left is the second- level domain (the unique name, like “example” you register in “example.com”). Note, subdomains (third-level domains can be part to include optional prefixes that create sub-sections (e.g., “blog.example.com”). A domain string can include the core identifier (e.g., “example.com”). The hostname portion can be the specific name for a machine (e.g., www, en, mail, etc.) The FQDN (Fully Qualified Domain Name) or URL (Uniform Resource Locator) can be the full address, which includes all of theparts: the protocol, domain, and specific page path (e.g., “https: / / www.example.com / path / to / page, which uniquely identifies a host.) The domain strings analyzed by the domain string classifier 227 can include host names and, in some examples, the Fully Qualified Domain Name.

[0091] The domain string classifier 227 can cooperate with a database / training storing brand-specific keywords, and a word / part 215 that is associated with a customer and or the industry they operate in.

[0092] Domain string classification

[0093] What:

[0094] The domain string classifier 227 uses a domain string classification algorithm to see if a newly discovered, for example, URL might potentially be relevant to a customer. The way the domain string classification works is by splitting the domain string being evaluated into separate words / parts. (In an example, www.thebestwebsiteintheworld.org could be split up into www, the, best, website, in, the, world, .org. Thus, when deployed in operation, the domain string classifier 227 divvies up the domain string, under analysis. Again, the domain string, under analysis, may be, for example, a hostname, but it can be a little bit longer, such as a domain name, an entire URL, etc. After splitting, check the separated words for their customer relevance. This can be a keyword, or a word that is associated with a customer. The challenge with a URL is that it consists of a sequence of concatenated words. For example, https: / / www.darktrace.com / proactive-exposure- management. In an embodiment, the domain string classifier 227 just looks at the host name. The domain string classifier 227 looks at how many host names are relevant to the customer. Though rather simple to interpret to the human eye, a URL is challenging for a computer programmer to interpret the meaning of this concatenated word sequence. Especially given the volume of URLs and the multitude of different human languages present on the Internet. The domain string classifier 227 uses a human language-agnostic process to break up the words in the domain string, under analysis, (e.g. URL) and then interprets and classifies the parts / words for relevance to a specific network being protected by the ASM cloud platform 201 (e.g., customer). Next, once the classifier divvies that domain string up, then the classifier is going to try to identify keywords / phrases and see how the identified keywords / phrases compare to keywords and phrases associated with the customer who operates the network being protected. The challenge with being ableto break up a concatenated sequence of words is that the algorithm tries to determine the specific human language first for words to derive from. The challenge with determining a human language in which a piece of text is written is that it really helps to have a set of separate words, which a URL does not have. The way the domain string classification works is by splitting the string, under analysis, to evaluate into separate words in multiple possible human languages and then scores each possibility and goes with the highest likely language / word combination. After splitting the domain string, under analysis, (e.g. the URL string), the algorithm in the domain string classifier 227 checks the separated words for the one with the highest score for customer relevancy and key words considering the likely human language being utilized. Thus, the domain classifier also tries to determine different human languages that the identified keywords / phrases from the parts of the domain string could possibly be associated with and then come up with a highest score for the identified keywords / phrases plus (+) a most likely specific human spoken language. Next, a relevance determination can be made by a comparison to a database of brand-specific keyword, or other word / phrase 215 that is associated with a customer and / or the industry they operate in.

[0095] As discussed, the domain string classifier 227 splits the string into multiple words, and after that, selects a logical next step for each of the splits. The domain string classifier requires a string search algorithm as input. The domain string classifier 227 splits a string into separate words. The domain string classifier 227 does this using the string search algorithm and a recursive function to find (all) combinations of words that combined form the input string. To do so, a system is compiled using the dictionary of words of interest. This dictionary may be a unigram, a dictionary for several languages, a list, or some other equivalent format. The system is compiled when the classifier is initiated. The domain string classifier 227 can use NLP to identify words that occur in the input string. The domain string classifier 227 can also use the recursive split function. This recursive split function can be used to do the word split. This recursive split function makes different combinations of words that combined are the input string. The recursive split function internally tracks how many combinations are made, and if this exceeds a certain value, stops. In an embodiment, when the maximum depth is reached, the possibility is very high that the best combination is already found. This limits the time the model can spend on a single domain string under analysis.

[0096] After the recursive split function completes its operations, multiple word combinations are possible, but only one is correct. Identifying the right combination is done using the score function. Each valid combination will get a score; the highest score is the best combination of words. The score is calculated using features of the combinations, including but not limited to the word score of each individual word in the combination; a penalty per word if this word has only 1 character; the sum of all individual word scores; a penalty depending on the number of words in the combination with more words is a higher penalty,

[0097] It is also necessary to determine the best human language when the language of the input is unknown. First, the process needs to determine which human language fits this input the best. This can be done by doing word splitting with a very low depth for every language. The human language with the highest score will be used to do the real split. Processes described above are repeated for every language that the model runs on.

[0098] Keep relevant words

[0099] The domain string classifier 227 may also remove parts of the string that are not relevant to classify the string. If a part (string is split between hyphens and dots) does not contain the keyword or any other word that can be relevant (e.g., brand relevant words, industry keywords) it is removed.

[0100] To make the model more efficient, the domain string classifier will look at the parts that contain a relevant word for the model. The domain string classifier 227 may also include a speed parameter to determine how many word combinations the splitter will try. This prevents the splitter from becoming 'stuck' on very long string inputs.

[0101] The domain string classifier 227 will find (in general) the longest possible word that fits. For some words, this may introduce problems with splitting; for example, when a keyword is also part of a longer word, and is may be more frequent in specific languages (e.g. Dutch and German). The classifier may therefore contain a self-correction mechanism to prevent these cases.

[0102] The domain string classifier 227 may also contain a mechanism to check the format of input for invalid, non-standard, or malicious FQDN structure (for example, Punycode).

[0103] The domain string classifier 227 may also contain a mechanism to confirm if a keyword exists in a subdomain, domain or top-level domain, then check if a relevant word is a separate word in the string split.

[0104] HTML Model / Classifier

[0105] The HTML classifier 229 in the set of ASM classifiers compares HTML feature datapoints linked to hostnames that belong to the network being protected by the ASM cloud platform 201 to a neutral dataset. Note a set of HTML feature datapoints not unique to the network being protected by the ASM cloud platform 201 will likely occur at a high frequency in the neutral dataset and another set of HTML feature datapoints that are unique for the network being protected by the ASM cloud platform 201 occur at frequency less than the high frequency in the neutral dataset as well as mainly occur on hostnames associated with the network being protected by the ASM cloud platform 201 .

[0106] The HTML classifier 229 works with web crawling tools of the cloud platform 201 to scrape, (e.g. the whole Internet) and once the cloud platform 201 scrapes all of the Internet, then run its logic with the HTML classifier 229 looking at the frequencies.

[0107] The HTML classifier 229 that compares HTML feature datapoints linked to hostnames that belong to a customer to a neutral dataset of HTML feature datapoints that occur at a high frequency throughout the set of hostnames found on the Internet. The score can be calculated by looking at the ratio of neutral occurrences vs occurrences known to occur in the network being protected by the ASM cloud platform 201 , filtered with a list of HTML feature data that is confirmed not to be associated with the network being protected.

[0108] The HTML Machine Learning classifier

[0109] The HTML datapoints linked to hostnames that belong to a network being protected can be anything that could be unique for a customer. The HTML datapoints linked to hostnames throughout the Internet form a very big dataset. The comparison assumes that anything that is not unique for a customer will occur many times in the HTML neutral dataset and anything that is unique for the customer occurs only / mainly on the customer related hostnames. Example: facebook.com / darktrace should only be found on hostnames that are related to the network being protected - darktrace, so HTML features found with that hostnameshould not be found that often in the neutral dataset, whereas, facebook.com will likely occur many times in the neutral set.

[0110] The model can receive ML training to update its training, for example, every day on the most recent neutral dataset, and the most recent customer data. Retraining this Machine Learning model means calculating a uniqueness score for each datapoint found for the customer-related hostnames.

[0111] In an example, facebook.com / darktrace occurs 10 times in HTML data known to belong to the customer and occurs 11 times in the entire neutral set. The resulting score would be a very high score. In the example, facebook.com / wordpress occurs a single time in the HTML data known to belong to the customer but occurs 1000s of times in the neutral set. The resulting score would be a very low score, such as 0. The score can be calculated by looking at the ratio of neutral occurrences vs customer occurrences, applying some score penalties when neutral occurrence is too high and some feature dependent logic.

[0112] Note, the ratio gives a score between 0 and 100, where 100 means it is very likely to be unique for a customer. All the scores will be combined in the total score model that is separate from this model.

[0113] HTML Features

[0114] The goal of this part of the HTML model is to find a piece of the HTML that is only relevant for this customer. When this string is found in another HTML- content, there is no doubt it will be relevant for this customer. The HTML classifier 229 can use validation of the input data to increase the accuracy during training. If the customer confirms something that is or is not theirs, the HTML model will match and add hostnames, as well as filter / note hostnames that do not belong to the customer.

[0115] The HTML model may also analyse links, unique words, and email addresses identified within the HTML.

[0116] Fuzzy hash classification

[0117] The fuzzy hash classifier 225 is trained to predict from HTML content how similar it is to another HTML page. The fuzzy hash classifier 225 provides another classifier that can produce a relative scoring on similarity between owned ('Confirmed') domains and Fully Qualified Domain Names (FQDNs) (e.g. the hostname, domain, and top-level domain, uniquely identifying its location in the DNS hierarchy) discovered. The fuzzy hash classifier 225 in the set of ASM classifierspredicts from various features of HTML content on how similar an HTML page, under analysis, is to confirmed features of each of the HTML pages of the network being protected by the ASM cloud platform 201 . The fuzzy hash classifier 225 analyzes a fuzzy hash created on the various features of the HTML content of the HTML page, under analysis, to another fuzzy hash on the confirmed features of the one or more HTML pages of the network being protected by the ASM cloud platform 201 , and then determines whether the fuzzy hashes compared meet or exceed a threshold amount of similarity.

[0118] How

[0119] The fuzzy hash classifier 225 works with the web crawler tools 223 in the cloud platform 201 to scrape, again, the whole Internet and then do the fuzzy matching. The fuzzy matching algorithm gets scores, points, and then totals.

[0120] The HTML content is cut up into features and Fuzzy hashes are created from these features. The fuzzy hashes are used to compare files and identify similarities, even if the files have been slightly modified, which is useful in digital forensics and malware analysis. The fuzzy hash classifier 225 can create fuzzy hashes of any string and compare another fuzzy hash for the degree of similarity.

[0121] Training for the Al classifiers

[0122] In an embodiment, the image model, (e.g. the keypoint model) takes one or more images of a customer at different viewing angles, and compares another image under analysis, and the keypoint model compares them. The image model, the keypoint model takes one or more images of a customer and trains on a training set of images that the process knows belong to a customer. The image model can be trained with a one-shot learning process with a continuous periodic updates training process. Training to fine-tune the algorithm(s) inside the image model can occur to train on examples of images being correctly recognized as well as examples of images being wrongly recognized. The image model can be trained with all unlabeled data and then using supervised training to reinforce correct and wrong determinations.

[0123] The domain string classifier 227 can be trained to break up a URL string into different word / phrase segments and then compare each segment to different human spoken languages and keywords. The domain classifier can be trained on recognizing and understanding parts of a domain string in different human inlanguages. The domain string classifier 227 can start off as a base trained language model on different human languages and then be fine-tuned in its weights and coefficients with algorithm(s) inside the model to train on examples of words in domain strings being correctly recognized as well as examples of words in domain strings being wrongly recognized. The domain string classifier 227 can be trained with all unlabeled data and then using supervised training to reinforce correct and wrong determinations.

[0124] The fuzzy hash classifier 225 can be trained to predict from HTML content how similar it is to another HTML page. The fuzzy hash classifier 225 can be trained with a bunch of labeled data on Domain and URL features as well as those turned into hashes and the classifier learns space on the data. All the HTML features on these HTML page combinations were hashed and then compared using fuzzy hash comparisons to end up with a set of fuzzy hash compare feature scores per combination.

[0125] These HTML features can be used as input data for the fuzzy hash classifier 225 to compare in its machine learning model.

[0126] The HTML model for the HTML classifier 229 can be trained by fine- tuning the algorithm(s) to train on examples of HTML being correctly recognized as well as examples of HTML being wrongly recognized. The HTML classifier 229 can be trained with all unlabeled data and then using supervised training to reinforce correct and wrong determinations.

[0127] CVE search module and Newsroom

[0128] The CVE search module 221 (e.g. a CVE Search Processor) can monitor for newly published CVEs for each particular technology and update a database of CVEs 219 to store known CVEs associated with each particular technology to be a single source of truth for all CVE information for the ASM cloud platform 201 . The CVE search module 221 cooperates with the database of common vulnerabilities 219 and the exploit assessment and pentesting program 211 to provide an up to date source of CVEs for the technology being implemented on the assets. The CVE search module 221 regularly and routinely adds known CVEs to possible technology implemented by an asset. As the CVEs available on an asset are an input for the exploit assessment and pentesting program 211 , there is a dependency. In case the CVE search module 221 finds a new CVE risk, the CVE Search module 221 can enrich the risk with CVE information. To support the exploitassessment and pentesting program 211 , the CVE search module 221 provides subsequent extended CVE lookups and version ranges to bring additional information about the CVEs to be available for the manual creation and / or LLM creation of CVE test templates.

[0129] The CVE search module 221 can use information from the cyber security appliance 100 (e.g. Detect) and the cyber threat analyst module (e.g. Al Analyst) as a way to input and obtain the CVEs. The CVE search module 221 also monitors the news and Internet sources, such as the Newsroom feed and other CVE data sources, to obtain the CVEs.

[0130] Next, the exploit assessment and pentesting program 211 can be scheduled / triggered by a scheduler 255 on an automated basis, as well as possibly triggered by one or more manual triggers, or alternative inputs, to identify whether a particular web asset and / or externally facing asset happens to be vulnerable, and the ASM cloud platform 201 can run on a regular schedule to ensure that the exploit assessment and pentesting program 211 captures newly emerging vulnerabilities as well as be triggered off a routine schedule when the CVE search module 221 discovers a new CVE vulnerability is published / detected.

[0131] The CVE search module 221 provides the continuous monitoring of CVE risks from multiple sources to ensures potential risks and high-impact vulnerabilities relative to network and organization being protected are discovered and then cooperates with the scheduler 255 of the exploit assessment and pentesting program 211 to trigger the program as well as the vulnerability scanner to create new CVE test templates, as well as to eliminate gaps and blind spots in the attack surface of the network, while cooperating with the user interface 241 to present risk scoring and vulnerability mapping allow a cyber security team to prioritize mitigating risk on exposed critical assets.

[0132] The CVE search module 221 provides a security threat intelligence component to provide information on emerging threats and vulnerabilities to proactively inform ASM efforts. For example, the CVE search module 221 monitors the Newsroom threat feeds and information sources to deliver immediate threat context and instant updates on high-impact vulnerabilities. For the user interface 241 , the exploit assessment and pentesting program 211 can link to Newsroom posts. When a Newsroom article is being published for technology such as software ‘X’, then the vulnerability scanner can directly create the related CVE test template tobe able to expose on a very short notice if there are applications on assets being protected relevant to the Newsroom post that are actually vulnerable. The user interface 241 also sends a notice of the new CVE to the user of the network being protected.

[0133] Thus, the Attack Surface Management service on the cloud platform 201 can come with Newsroom to help a user quickly understand the potential impact of new vulnerabilities. Newsroom continuously monitors open-source intelligence sources for new vulnerabilities, such as CVEs, including misconfigurations, assesses your organization’s exposure, and reveals all assets on your external attack surface that could be affected by a new critical vulnerability, allowing the network’s security team to focus on preventative measures rather than manual monitoring and response management. The Attack Surface Management service via Newsroom monitors threat feeds and information sources to deliver immediate threat context and instant updates on high-impact vulnerabilities. The Attack Surface Management service on the cloud platform 201 can via the exploit assessment and pentesting program 211 cooperating with the database of known CVEs 217 reveal all assets on the network’s external attack surface that could potentially be affected by a new critical vulnerability, providing actionable insights. This lets the network’s team focus on preventative measures, rather than having to spend time manually monitoring intelligence sources and news feeds or managing a vulnerability response process.

[0134] The vulnerability scanner of the exploit assessment and pentesting program 211 cooperates with at least one of a user interface 241 , a display, and a report generator 243 to both i) report a results of the augmented pentesting on the assets of the network as well as ii) present an attack surface of the assets of the network being protected by the ASM cloud platform 201 detected by the exploit assessment and pentesting program 211 in order to show actual CVE risks present in the assets in the network being protected.

[0135] The validation status of each asset displayed can be presented on the user interface 241 : Validated, Available, Unavailable, Validation date of the risk, with CVE risks split into separate risks, which will have an impact on the statistics displayed in on the dashboard.

[0136] The user interface 241 can cooperate with the exploit assessment and pentesting program 211 to offer visibility into essential risk metrics, such as all of the assets currently making up the network’s attack surface as well as the number ofcritical vulnerabilities on the network’s attack surface. These deep contextual insights provided visually on the user interface 241 enable security teams to prioritize and make effective context-based decisions. A report generator 243 can generate and provide custom reports for specific use cases, and can also be created to deliver tailored insights based on the network’s business needs and priorities. The report generator 243 can provide suggested remediation steps based upon the results of the CVE test templates to assist in actively fixing identified vulnerabilities found from the CVE search module 221 continuously monitoring the environment for new risks as systems evolve and threats emerge.

[0137] The user interface 241 cooperates with the exploit assessment and pentesting program 211 to discover what technology is being implemented on each asset in the network to allow a user to provide network segmentation based on technology to divide an organization's network into smaller, isolated segments to limit the potential impact of an attack.

[0138] The user interface 241 and the vulnerability scanner of the exploit assessment and pentesting program 211 also cooperate to allow a user of the network to view the captured non-damaging concrete proof of the asset’s vulnerability. When the user compares the non-damaging concrete proof of the asset’s vulnerability, such as the copied set of files that should prove to the user that this set of files does exist on the compromised device and the combination of files proves those copied files could have only come from the compromised device.

[0139] Figure 3 illustrates a diagram of an embodiment of the cyber security appliance cooperating with the attack surface management cloud platform to execute a CVE test template on an asset of the network that, when executed, is nondamaging to an operation of the asset being tested but does confirm a comprisable status of the first asset being tested, by capturing non-damaging concrete proof of the first asset’s vulnerability.

[0140] The attack surface management cloud platform 201 can use an exploit assessment and pentesting program 211 (EPA) that is coded in software stored in an executable form in non-transitory machine-readable mediums in the cloud platform 201 and executed by one or more processors, and / or is implemented in electronic circuit hardware, and / or is partially implemented with electronic circuit hardware software instructions.

[0141] The Attack Surface Management cloud platform 201 and the cyber security appliance 100 cooperate (e.g. Detect) to help a network being protected from cyber security threats. The exploit assessment and pentesting program 211 in the attack surface management cloud platform 201 implements a cybersecurity strategy that involves continuously identifying, analysing, and mitigating potential attack vectors across an organization's entire digital footprint, essentially looking at all possible entry points a hacker could use to compromise systems and prioritizing their remediation to minimize cyber risks; attack surface management aims to provide comprehensive visibility into an organization's assets, both internal and external, to proactively identify and address vulnerabilities before attackers can exploit them.

[0142] The Attack Surface Management cloud platform 201 and the cyber security appliance 100 cooperate to survey the network estate from the inside out, factor this into, for example, proactive exposure management. The Attack Surface Management cloud platform 201 and the cyber security appliance 100 cooperate to monitor human / social components: employees, contractors, and third-party vendors, as human interaction points which can be exploited through social engineering. The Attack Surface Management cloud platform 201 and the cyber security appliance 100 cooperate to monitor network assets like firewalls and routers, external network interfaces, and the ports on those devices. The Attack Surface Management cloud platform 201 can integrate and cooperate with the Al driven cyber security appliance 100 to provide enhanced proactive attack surface management across a network and an organization’s security stack. Through integration with Al driven cyber security appliance 100, the Attack Surface Management cloud platform 201 offers unified visibility and high-fidelity coverage, linking externally identified assets with internal observations for end-to-end threat mitigation. For example, the Attack Surface Management cloud platform 201 seamlessly amplifies the Email protections of the Al driven cyber security appliance 100 by pre-emptively forewarning against spoofed domains impersonating the organization being protected. The Attack Surface Management cloud platform 201 also enhances detection and response mechanisms at the endpoint and network layer while providing the cyber threat analyst module (e.g. Al Analyst) in the Al driven cyber security appliance 100 with external data to enhance its investigations.

[0143] The cloud platform 201 can include digital assets of Web applications, APIs, cloud resources, servers, operating systems, and databases.

[0144] Prioritization and remediation:

[0145] Once vulnerabilities are identified, the ASM cloud platform 201 prioritizes them based on their potential impact and takes steps to remediate them effectively. Vulnerability risk scoring and asset mapping enable you to quickly and accurately find the most critical exposures in your digital estate. The ASM cloud platform 201 provides key risk metrics and prioritization recommendations, enabling teams to act swiftly to prioritize risk remediation in real-time on the user interface 241 . The ASM cloud platform 201 helps your cyber security team identify the most critical vulnerabilities relative to your business, enabling quick prioritization of patching, updating, and management of the network’s Internet facing assets. By providing asset context and vulnerability risk intelligence across the detection and response systems on the user interface 241 , the Attack Surface Management cloud platform 201 facilitates rapid decision making and enables security teams to address the most critical threats faster.

[0146] The Attack Surface Management cloud platform 201 effectively identifies exposed assets from the perspective of potential adversaries, creating a comprehensive risk profile of your digital estate. The Attack Surface Management cloud platform 201 uncovers a wide array of vulnerabilities including supply chain risks, potential phishing domains, misconfigurations, brand abuse and third-party risks. The Attack Surface Management cloud platform 201 can uniquely identify complex use cases, such as risks from network routing issues and shadow IT domain registrations, which most vendors typically do not do. The Attack Surface Management cloud platform 201 uses a trained risk rating model 231 and risk open time feature enable the identification and prioritization of the most critical vulnerabilities relative to your network and business, facilitating efficient patching, updating, and management of Internet-facing assets.

[0147] The vulnerability scanner of the exploit assessment and pentesting program 211 enhances security testing by conducting safe test-attacks on assets with potential security vulnerabilities, with the objective of confirming compromised systems. A functionality of the exploit assessment and pentesting program 211 aims to expand detection capabilities and improve the prioritization of security risks along with the measures taken to mitigate them. The exploit assessment and pentestingprogram 211 verifies vulnerabilities to confirm which risks are present on the attack surface of the network being protected and then shows them on the display, via the user interface 241 , via a shield icon. The exploit assessment and pentesting program 211 provides real-time clarity by promptly testing assets (servers and other network devices) in the network for new threats without waiting for vendor updates or patches to come from third party vendors by use of a set of CVE testing templates. The exploit assessment and pentesting program 211 also augments a penetration test with surgical targeted CVE test attacks on the selected particular asset(s) to assess for potential security vulnerabilities while still performing a safe penetration test to not harm the network being protected. The exploit assessment and pentesting program 211 provides evidence of the vulnerability on this user interface 241 display as well as the date that the vulnerability was conducted and thus the network checked for vulnerabilities as of that date. The exploit code is designed to extract and expose, for example, internal data as concrete proof of the system’s vulnerability. The exploit assessment and pentesting program 211 also, assigns a rating to an Internet facing asset and the rating indicates if the Internet facing asset is verified to be vulnerable or safe and the rating will go up or down based upon the vulnerability.

[0148] Figure 4 illustrates a flow diagram of an embodiment of an exploit assessment and pentesting program cooperating with other components to automatically determine and test assets of a network.

[0149] In step 502, the assessment and testing module 207 implements an exploit assessment and pentesting program 211 and a scheduler 255 to trigger the exploit assessment and pentesting program 211 to test assets of a network being protected by an ASM cloud platform 201 against known common vulnerabilities and exposures (CVE).

[0150] In step 504, the vulnerability scanner of the exploit assessment and pentesting program 211 cooperates with a database of CVE test templates 217. The vulnerability scanner of the exploit assessment and pentesting program 211 performs the augmented pentesting by selecting one or more CVE test templates from the database of CVE test templates 217 and running the one or more CVE test templates to conduct each particular common vulnerabilities and exposures test on a first asset, under analysis, in the network being protected by the ASM cloud platform 201 to see whether the first asset in the network being protected by the ASM cloudplatform is actually vulnerable or not. The vulnerability scanner of the exploit assessment and pentesting program 211 selects and executes a CVE test template of the one or more CVE test templates that, when executed, is non-damaging to an operation of the asset being tested but does confirm a comprisable status of the asset. The CVE test template captures non-damaging concrete proof of the asset’s vulnerability. The user interface 241 and the vulnerability scanner of the exploit assessment and pentesting program 211 cooperate to allow a user of the network to view the captured non-damaging concrete proof of the asset’s vulnerability.

[0151] In step 506, the vulnerability scanner of the exploit assessment and pentesting program 211 cooperates with at least one of a user interface 241 , a display, and a report generator 243 to both i) report a results of the augmented pentesting on the assets of the network as well as ii) present an attack surface of the assets of the network being protected by the ASM cloud platform 201 detected by the exploit assessment and pentesting program 211 in order to show actual CVE risks present in the assets in the network.

[0152] In step 508, the exploit assessment and pentesting program 211 determines what technology is being implemented on each asset in the network, and then extracts that technology to compare that technology implemented on an asset, under analysis, to a database of the known CVEs 217, and then the vulnerability scanner of the exploit assessment and pentesting program 211 performs the augmented pentesting with one or more CVE test templates and then coordinates with the user interface 241 to display on the user interface 241 a date of the test and present one or more CVE vulnerabilities that the asset is actually vulnerable to.

[0153] In step 510, the assessment and testing module 207 with an exploit prediction assessment portion of the exploit assessment and pentesting program 211 cooperates with a set of ASM classifiers. The set of ASM classifiers detect and then verify what assets are definitely associated with the network being protected by the ASM cloud platform 201 to prevent randomly pentesting assets that do not belong to the network being protected. A domain string classifier 227 can use Optical Character Recognition and cooperate with the assessment and testing module 207 to use at least a crawler source to collect a plurality of domain strings being used on the Internet to analyze at least a domain string, under analysis, and an association of the domain string, under analysis, to the network being protected by the ASM cloud platform 201 . The image classifier 223 in the set of ASMclassifiers use image recognition technology that detects image similarity based on keypoints found in compared images, scaled and rotated all in three dimensions, to do an image assessment on visual similarities between the keypoints of an image under analysis compared to the keypoints of images known to belong to the network being protected. The fuzzy hash classifier 225 in the set of ASM classifiers predicts, from various features of HTML content on, how similar an HTML page, under analysis, is to confirmed features of one or more HTML pages of the network being protected by the ASM cloud platform 201 . The fuzzy hash classifier 225 analyzes a first fuzzy hash created on the various features of the HTML content of the HTML page, under analysis, to a second fuzzy hash on the confirmed features of the one or more HTML pages of the network being protected by the ASM cloud platform 201 , and then compared to meet or exceed a threshold amount of similarity. The HTML classifier 229 in the set of ASM classifiers compares HTML feature datapoints linked to hostnames that belong to the network being protected by the ASM cloud platform 201 to a neutral dataset. The set of HTML feature datapoints not unique to the network being protected by the ASM cloud platform 201 will likely occur at a high frequency, such as greater than a median, in the neutral dataset. The set of HTML feature datapoints are unique to the network being protected by the ASM cloud platform 201 , which occur at frequency less than the high frequency, and less than the median, in the neutral dataset as well as mainly occur on hostnames associated with the network being protected by the ASM cloud platform 201 .

[0154] Additional Details

[0155] The following text below discusses how some of the other components in the cyber security system operate; and thus, how these components respond to the commands, requests, and communications with the ASM cloud platform 201.

[0156] Figure 5 illustrates a block diagram of an embodiment of the Al-based cyber security appliance with example components making up a detection engine that protects a system, including but not limited to a network / domain, from cyber threats. Various Artificial Intelligence models and modules of the cyber security appliance 100 cooperate to protect a system, such as one or more networks / domains under analysis, from cyber threats. In an embodiment, the Al- based cyber security appliance 100 may include a trigger module, a gather module 110, an analyzer module 115, a cyber threat analyst module 120, an assessment module 125, a user interface and formatting module 130, a data store 135, anautonomous response engine 140 and / or an interface to an autonomous response engine 140, an Information Technology network domain module 145, an email domain module 150, and a coordinator module 155, one or more Al models 160 (hereinafter, Al model(s)”), and / or other modules. The Al model(s) 160 may be trained i) with machine learning on a normal pattern of life for entities in the network(s) / domain(s) under analysis, ii) with machine learning on cyber threat hypotheses to form and investigate a cyber threat, iii) on what are a possible set of cyber threats and their characteristics, symptoms, remediations, etc., an interface to a restoration engine 190, an interface to a cyber-attack simulator 105, and other similar components.

[0157] The cyber security appliance 100 can host the cyber threat detection engine and other components. The cyber security appliance 100 includes a set of modules cooperating with one or more Artificial Intelligence models configured to perform a machine-learned task of detecting a cyber threat incident. The detection engine uses the set of modules cooperating with the one or more Artificial Intelligence models in the cyber security appliance 100 to prevent a cyber threat from compromising the nodes (e.g. devices, end users, etc.) and / or spreading through the nodes of the network being protected by the cyber security appliance 100.

[0158] The cyber security appliance 100 with the Artificial Intelligence (Al)- based cyber security system may protect a network / domain from a cyber threat (insider attack, malicious files, malicious emails, etc.). The cyber security appliance 100 can protect all of the devices on the network(s) / domain(s) being monitored. For example, the IT network domain module (e.g., first domain module 145) may communicate with network sensors to monitor network traffic going to and from the computing devices on the network as well as receive secure communications from software agents embedded in host computing devices / containers. Other domain modules such as the email domain module 150 and a cloud domain module operate similarly with their domain. The steps below will detail the activities and functions of several of the components in the cyber security appliance 100.

[0159] The gather module 110 may be configured with one or more process identifier classifiers. Each process identifier classifier may be configured to identify and track one or more processes and / or devices in the network, under analysis, making communication connections. The data store 135 cooperates with the process identifierclassifier to collect and maintain historical data of processes and their connections, which is updated over time as the network is in operation. In an example, the process identifier classifier can identify each process running on a given device along with its endpoint connections, which are stored in the data store 135. In addition, a feature classifier can examine and determine features in the data being analyzed into different categories.

[0160] The analyzer module 115 can cooperate with the Al model(s) 160 or other modules in the cyber security appliance 100 to confirm a presence of a cyber threat in cyberattack against one or more domains in an enterprise’s system (e.g., see system / enterprise network 791 , 792, and 747 of Figure 3). A process identifier in the analyzer module 115 can cooperate with the gather module 110 to collect any additional data and metrics to support a possible cyber threat hypothesis. Similarly, the cyber threat analyst module 120 can cooperate with the internal data sources as well as external data sources to collect data in its investigation. More specifically, the cyber threat analyst module 120 can cooperate with the other modules and the Al model(s) 160 in the cyber security appliance 100 to conduct a long-term investigation and / or a more in-depth investigation of potential and emerging cyber threats directed to one or more domains in an enterprise’s system. Herein, the cyber threat analyst module 120 and / or the analyzer module 115 can also monitor for other anomalies, such as model breaches, including, for example, deviations for a normal behavior of an entity, and other techniques discussed herein. The analyzer module 115 and / or the cyber threat analyst module 120 can cooperate with the Al model(s) 160 trained on potential cyber threats in order to assist in examining and factoring these additional data points that have occurred over a given timeframe to see if a correlation exists between 1 ) a series of two or more anomalies occurring within that time frame and 2) possible known and unknown cyber threats.

[0161] The cyber threat analyst module 120 allows two levels of investigations of a cyber threat that may suggest a potential impending cyberattack. In a first level of investigation, the analyzer module 115 and Al model(s) 160 can rapidly detect and then the autonomous response engine 140 will autonomously respond to overt and obvious cyberattacks (generally indicated by high scores of 80 or more see Figure 6). However, thousands to millions of low level anomalies occur in a domain under analysis all of the time; and thus, most other systems need to set the threshold of trying to detect a cyberattack by a cyber threat at level higher such as a score of 80 ormore than the low level anomalies examined by the cyber threat analyst module 120 just to not have too many false positive indications of a cyberattack when one is not actually occurring, as well as to not overwhelm a human cyber security analyst receiving the alerts with so many notifications of low level anomalies that they just start tuning out those alerts. However, advanced persistent threats attempt to avoid detection by making these low-level anomalies in the system over time during their cyberattack before making their final coup de grace I ultimate mortal blow against the system (e.g., domain) being protected. The cyber threat analyst module 120 also conducts a second level of investigation over time with the assistance of the Al model(s) 160 trained with machine learning on how to form cyber threat hypotheses and how to conduct investigations for a cyber threat hypothesis that can detect these advanced persistent cyber threats actively trying to avoid detection by looking at one or more of these low-level anomalies combined in with other anomalies and factors as a part of a chain of linked information (See Figure 6).

[0162] The artificial intelligence-based cyber security analyst tool can use the cyber threat analyst module 120 and its interaction with the other modules and Al models 160 in the cyber security appliance 100. The artificial intelligence-based cyber security analyst tool’s investigations into potential cyber-attacks from potential cyber threats has the ability for customers to review the outcomes of the artificial intelligence-based cyber security analyst tool’s investigations at the hypothesis-level (hypothesis steps taken and investigation steps taken and then its conclusion) and a human readable summary on why the system took the hypothesis steps taken and investigation steps taken and then its conclusion. For every compatible DETECT alert (e.g. model breach), the artificial intelligence-based cyber security analyst tool investigates a series of possible relevant hypotheses and we try and find data and find activity that meets the criteria, those hypotheses. When artificial intelligence-based cyber security analyst tool finds activity and / or data that meets the criteria that actually support a likelihood of a hypothesis, then those hypotheses are worth surfacing to an operator. The artificial intelligence-based cyber security analyst tool presents the most salient information to the end user. The artificial intelligence-based cyber security analyst autonomously investigates alerts, streamlines investigations and prioritizes incidents, thus reducing workload and alert fatigue. The artificial intelligence-based cyber security analyst uses various forms of machine learning, including unsupervised, supervised, and deep learning combined with human intuition and trade craft fromhundreds of world-class human cyber analysts across thousands of customer deployments. The artificial intelligence-based cyber security analyst relieves a human cyber analyst from spend anywhere between half an hour and half a day investigating a single suspicious security incident. The artificial intelligence-based cyber security analyst looks for patterns, forms hypotheses, reaches conclusions about how to mitigate the threat, and shares the findings with the rest of the business. The artificial intelligence-based cyber security analyst continuously conducts investigations behind the scenes and operating at a speed and scale beyond human capabilities. The artificial intelligence-based cyber security analyst tool as a large language model (LLM) is built to incorporate cyber threat knowledge from external data stores, external data sources, as well as from a network’s own cyber security appliance. The artificial intelligence-based cyber security analyst tool uses threat intelligence to understand a cyber threat adversary tactics and motivations. An effectiveness of the artificial intelligence-based cyber security analyst tool lies in its ability to access and integrate diverse data sources. The artificial intelligence-based cyber security analyst can tap into external data stores, such as threat intelligence platforms and vulnerability databases, to enrich its understanding of the threat landscape.

[0163] The cyber threat analyst module 120 forms in conjunction with the Al model(s) 160 trained with machine learning on how to form cyber threat hypotheses and how to conduct investigations for a cyber threat hypothesis investigate hypotheses on what are a possible set of cyber threats. The cyber threat analyst module 120 can also cooperate with the analyzer module 115 with its one or more data analysis processes to conduct an investigation on a possible set of cyber threats hypotheses that would include an anomaly of at least one of i) the abnormal behavior, ii) the suspicious activity, and iii) any combination of both, identified through cooperation with, for example, the Al model(s) 160 trained with machine learning on the normal pattern of life of entities in the system. For example, as shown in Figure 6, the cyber threat analyst module 120 may perform several additional rounds 220 of gathering additional information, including abnormal behavior, over a period of time, in this example, examining data over a 7-day period to determine causal links between the information. The cyber threat analyst module 120 may submit to check and recheck various combinations I a chain of potentially related information, including abnormal behavior of a device / user account under analysis for example, until each of the one or more hypotheses on potential cyber threats are one of 1 ) refuted, 2) supported, or 3)included in a report that includes details of activities assessed to be relevant activities to the anomaly of interest to the user and that also conveys at least this particular hypothesis was neither supported or refuted. For this embodiment, a human cyber security analyst is then needed to further investigate the anomaly (and / or anomalies) of interest included in the chain of potentially related information.

[0164] Returning back to Figure 5, an input from the cyber threat analyst module 120 of a supported hypothesis of a potential cyber threat will trigger the analyzer module 115 and / or assessment module 125 to compare, confirm, and send a signal to act upon and mitigate that cyber threat. In contrast, the cyber threat analyst module 120 investigates subtle indicators and / or initially seemingly isolated unusual or suspicious activity such as a worker is logging in after their normal working hours or a simple system misconfiguration has occurred. Most of the investigations conducted by the cyber threat analyst module 120 cooperating with the Al model(s) 160 trained with machine learning on how to form cyber threat hypotheses and how to conduct investigations for a cyber threat hypothesis on unusual or suspicious activities / behavior may not result in a cyber threat hypothesis that is supported but rather most cyber threat hypotheses are refuted or simply not supported. Typically, during the investigations, several rounds of data gathering to support or refute the long list of potential cyber threat hypotheses formed by the cyber threat analyst module 120 will occur before the algorithms in the cyber threat analyst module 120 will determine whether a particular cyber threat hypothesis is supported, refuted, or needs further investigation by a human. The rounds of data gathering will build chains of linked low- level indicators of unusual activity along with potential activities that could be within a normal pattern of life for that entity to evaluate the whole chain of activities to support or refute each potential cyber threat hypothesis formed. (See again, for example, Figure 6 and a chain of linked low-level indicators, including abnormal behavior compared to the normal patten of life for that entity, all under a score of 50 on a threat indicator score). The investigations by the cyber threat analyst module 120 can happen over a relatively long period of time (e.g. a week or longer) and be far more in depth than the analyzer module 115 which will work with the other modules and Al model(s) 160 to confirm that a cyber threat has in fact been detected by the presence of an anomaly with a score of 75 or more and / or the occurrence of a specific event deemed a serious cyber threat in itself occurring.

[0165] The gather module 110 cooperates with the cyber threat analyst module 120 and / or analyzer module 115 to collect data to support or to refute each of the one or more possible cyber threat hypotheses that could include this abnormal behavior or suspicious activity by cooperating with one or more of the cyber threat hypotheses mechanisms to form and investigate hypotheses on what are a possible set of cyber threats.

[0166] Thus, the cyber threat analyst module 120 is configured to cooperate with the Al model(s) 160 trained with machine learning on how to form cyber threat hypotheses and how to conduct investigations for a cyber threat hypothesis to form and investigate hypotheses on what are a possible set of cyber threats and then can cooperate with the analyzer module 115 with the one or more data analysis processes to confirm the results of the investigation on the possible set of cyber threats hypotheses that would include the at least one of i) the abnormal behavior, ii) the suspicious activity, and iii) any combination of both, identified through cooperation with the Al model(s) 160 trained with machine learning on the normal pattern of life / normal behavior of entities in the domains under analysis.

[0167] Note, in the first level of threat detection, the gather module 110 and the analyzer module 115 cooperate to supply any data and / or metrics requested by the analyzer module 115 cooperating with the Al model(s) 160 trained on possible cyber threats to support or rebut each possible type of cyber threat and generally that presence of an anomaly with a high threat / anomaly score and / or the occurrence of a specific event deemed a serious cyber threat in itself, will cause the analyzer module 115 to send a signal and this information to the autonomous response engine 140. Again, the analyzer module 115 can cooperate with the Al model(s) 160 and / or other modules to rapidly detect and then cooperate with the autonomous response engine 140 to autonomously respond to overt and obvious cyberattacks, (including ones found to be supported by the cyber threat analyst module 120).

[0168] As a starting point, the Al-based cyber security appliance 100 can use multiple modules, each capable of identifying abnormal behavior and / or suspicious activity against the Al model(s) 160 trained on a normal pattern of life for the entities in the network / domain under analysis, which is supplied to the analyzer module 115 and / or the cyber threat analyst module 120. The analyzer module 115 and / or the cyber threat analyst module 120 may also receive other inputs such as Al model breaches, Al classifier breaches, etc. a trigger to start an investigation from an external source.

[0169] Many other model breaches of the Al model(s) 160 trained with machine learning on the normal behavior of the system can send an input into the cyber threat analyst module 120 and / or the trigger module to trigger an investigation to start the formation of one or more hypotheses on what are a possible set of cyber threats that could include the initially identified abnormal behavior and / or suspicious activity.

[0170] The cyber threat analyst module 120, which forms and investigates hypotheses on what are the possible set of cyber threats, can use hypotheses mechanisms including any of 1 ) one or more of the Al model(s) 160 trained on how human cyber security analysts form cyber threat hypotheses and how to conduct investigations for a cyber threat hypothesis that would include at least an anomaly of interest, 2) one or more scripts outlining how to conduct an investigation on a possible set of cyber threats hypotheses that would include at least the anomaly of interest, 3) one or more rules-based models on how to conduct an investigation on a possible set of cyber threats hypotheses and how to form a possible set of cyber threats hypotheses that would include at least the anomaly of interest, and 4) any combination of these. Again, the Al model(s) 160 trained on ‘how to form cyber threat hypotheses and how to conduct investigations for a cyber threat hypothesis’ may use supervised machine learning on human-led cyber threat investigations and then steps, data, metrics, and metadata on how to support or to refute a plurality of the possible cyber threat hypotheses, and then the scripts and rules-based models will include the steps, data, metrics, and metadata on how to support or to refute the plurality of the possible cyber threat hypotheses. The cyber threat analyst module 120 and / or the analyzer module 115 can feed the cyber threat details to the assessment module 125 to generate a threat risk score that indicate a level of seventy of the cyber threat.

[0171] Training of Al pre-deployment and then during deployment

[0172] In step 1 , an initial training of the Artificial Intelligence model trained on cyber threats can occur using unsupervised learning and / or supervised learning on characteristics and attributes of known potential cyber threats including malware, insider threats, and other kinds of cyber threats that can occur within that domain. Each Artificial Intelligence model (e.g. neural network, decision tree, etc.) can be programmed and configured with the background information to understand and handle particulars, including different types of data, protocols used, types of devices, user accounts, etc. of the system being protected. The Artificial Intelligence pre-deployment can all be trained on the specific machine learning task that they will perform when put into deployment. For example, the Al model, such as Al model(s) 160 or example (hereinafter “Al model(s) 160”), trained on identifying a specific cyber threat learns at least both in the pre-deployment training i) the characteristics and attributes of known potential cyber threats as well as ii) a set of characteristics and attributes of each category of potential cyber threats and their weights assigned on how indicative certain characteristics and attributes correlate to potential cyber threats of that category of threats. In this example, one of the Al models 160 trained on identifying a specific cyber threat can be trained with machine learning such as Linear Regression, Regression Trees, Non-Linear Regression, Bayesian Linear Regression, Deep learning, etc. to learn and understand the characteristics and attributes in that category of cyber threats. Later, when in deployment in a domain / network being protected by the cyber security appliance 100, the Al model trained on cyber threats can determine whether a potentially unknown threat has been detected via a number of techniques including an overlap of some of the same characteristics and attributes in that category of cyber threats. The Al model may use unsupervised learning when deployed to better learn newer and updated characteristics of cyberattacks.

[0173] In an embodiment, one or more of the Al models 160 may be trained on a normal pattern of life of entities in the system are self-learning Al model using unsupervised machine learning and machine learning algorithms to analyze patterns and 'learn' what is the 'normal behavior' of the network by analyzing data on the activity on, for example, the network level, at the device level, and at the employee level. The self-learning Al model using unsupervised machine learning understands the system under analysis’ normal patterns of life in, for example, a week of being deployed on that system, and grows more bespoke with every passing minute. The Al unsupervised learning model learns patterns from the features in the day-to-day dataset and detecting abnormal data which would not have fallen into the category (cluster) of normal behavior. The self-learning Al model using unsupervised machine learning can simply be placed into an observation mode for an initial week or two when first deployed on a network / domain in order to establish an initial normal behavior for entities in the network / domain under analysis.

[0174] Thus, a deployed Artificial Intelligence model 160 trained on a normal behavior of entities in the system can be configured to observe the nodes in thesystem being protected. Training on a normal behavior of entities in the system can occur while monitoring for the first week or two until enough data has been observed to establish a statistically reliable set of normal operations for each node (e.g., user account, device, etc.). Initial training of one or more Artificial Intelligence models 160 trained with machine learning on a normal behavior of the pattern of life of the entities in the network / domain can occur where each type of network and / or domain will generally have some common typical behavior with each model trained specifically to understand components / devices, protocols, activity level, etc. to that type of network / system / domain. Alternatively, pre-deployment machine learning training of one or more Artificial Intelligence models trained on a normal pattern of life of entities in the system can occur. Initial training of one or more Artificial Intelligence models trained with machine learning on a normal behavior of the pattern of life of the entities in the network / domain can occur where each type of network and / or domain will generally have some common typical behavior with each model trained specifically to understand components / devices, protocols, activity level, etc. to that type of network / system / domain. What is the normal behavior of each entity within that system can be established either prior to the deployment and then adjusted during deployment or alternatively the model can simply be placed into an observation mode for an initial week or two when first deployed on a network / domain in order to establish an initial normal behavior for entities in the network / domain under analysis. During the deployment of the model, what is considered normal behavior will change as each different entity’s behavior changes and will be reflected through the use of unsupervised learning in the model such as various Bayesian techniques, clustering, etc. Again, the Al models 160 can be implemented with various mechanisms, such neural networks, decision trees, etc. and combinations of these. Likewise, one or more supervised machine learning Al models 160 may be trained to create possible hypotheses and perform cyber threat investigations on agnostic examples of past historical incidents of detecting a multitude of possible types of cyber threat hypotheses previously analyzed by human cyber security analyst.

[0175] At its core, the self-learning Al models 160 that model the normal behavior (e.g. a normal pattern of life) of entities in the network mathematically characterizes what constitutes ‘normal’ behavior, based on the analysis of a large number of different measures of a device’s network behavior - packet traffic andnetwork activity / processes including server access, data volumes, timings of events, credential use, connection type, volume, and directionality of, for example, uploads / downloads into the network, file type, packet intention, admin activity, resource and information requests, command sent, etc.

[0176] Clustering Methods

[0177] In order to model what should be considered as normal for a device or cloud container, its behavior can be analyzed in the context of other similar entities on the network. The Al models (e.g., Al model(s) 160) can use unsupervised machine learning to algorithmically identify significant groupings, a task which is virtually impossible to do manually. To create a holistic image of the relationships within the network, the Al models and Al classifiers employ a number of different clustering methods, including matrix-based clustering, density-based clustering, and hierarchical clustering techniques. The resulting clusters can then be used, for example, to inform the modeling of the normative behaviors and / or similar groupings.

[0178] The Al models and Al classifiers can employ a large-scale computational approach to understand sparse structure in models of network connectivity based on applying L1- regularization techniques (the lasso method). This allows the artificial intelligence to discover true associations between different elements of a network which can be cast as efficiently solvable convex optimization problems and yield parsimonious models. Various mathematical approaches assist.

[0179] Next, one or more supervised machine learning Al models are trained to create possible hypotheses and how to perform cyber threat investigations on agnostic examples of past historical incidents of detecting a multitude of possible types of cyber threat hypotheses previously analyzed by human cyber threat analysis. Al models 160 trained on forming and investigating hypotheses on what are a possible set of cyber threats can be trained initially with supervised learning. Thus, these Al models 160 can be trained on how to form and investigate hypotheses on what are a possible set of cyber threats and steps to take in supporting or refuting hypotheses. The Al models trained on forming and investigating hypotheses are updated with unsupervised machine learning algorithms when correctly supporting or refuting the hypotheses including what additional collected data proved to be the most useful. More on the training of the Al models that are trained to create one or more possible hypotheses and perform cyber threat investigations will be discussed later.

[0180] Next, the various Artificial Intelligence models and Al classifiers combine use of unsupervised and supervised machine learning to learn ‘on the job’ - it does not depend upon solely knowledge of previous cyber threat attacks. The Artificial Intelligence models and classifiers combine use of unsupervised and supervised machine learning constantly revises assumptions about behavior, using probabilistic mathematics, that is always up to date on what a current normal behavior is, and not solely reliant on human input. The Artificial Intelligence models and classifiers combine use of unsupervised and supervised machine learning on cyber security is capable of seeing hitherto undiscovered cyber events, from a variety of threat sources, which would otherwise have gone unnoticed. Next, these cyber threats can include, for example: Insider threat - malicious or accidental, Zero- day attacks - previously unseen, novel exploits, latent vulnerabilities, machine-speed attacks - ransomware and other automated attacks that propagate and / or mutate very quickly, Cloud and SaaS-based attacks, other silent and stealthy attacks advance persistent threats, advanced spear-phishing, etc.

[0181] Ranking the Cyber Threat

[0182] The assessment module 125 and / or cyber threat analyst module 120 of Figure 5 can cooperate with the Al model(s) 160 trained on possible cyber threats to use Al algorithms to account for ambiguities by distinguishing between the subtly differing levels of evidence that characterize network data.

[0183] More on the operation of the cyber security appliance

[0184] As discussed in more detail below, the analyzer module 115 and / or cyber threat analyst module 120 can cooperate with the one or more unsupervised Al (machine learning) model 160 trained on the normal pattern of life / normal behavior in order to perform anomaly detection against the actual normal pattern of life for that system to determine whether an anomaly (e.g., the identified abnormal behavior and / or suspicious activity) is malicious or benign. In the operation of the cyber security appliance 100, the emerging cyber threat can be previously unknown, but the emerging threat landscape data 170 representative of the emerging cyber threat shares enough (or does not share enough) in common with the traits from the Al models 160 trained on cyber threats to now be identified as malicious or benign. Note, if later confirmed as malicious, then the Al models 160 trained with machine learning on possible cyber threats can update their training. Likewise, as the cyber security appliance 100 continues to operate, then the one or more Al models trainedon a normal pattern of life for each of the entities in the system can be updated and trained with unsupervised machine learning algorithms. The analyzer module 115 can use any number of data analysis processes (discussed more in detail below and including the agent analyzer data analysis process here) to help obtain system data points so that this data can be fed and compared to the one or more Al models trained on a normal pattern of life, as well as the one or more machine learning models trained on potential cyber threats, as well as create and store data points with the connection fingerprints.

[0185] All of the above Al models 160 can continually learn and train with unsupervised machine learning algorithms on an ongoing basis when deployed in their system that the cyber security appliance 100 is protecting. Thus, learning and training on what is normal behavior for each user, each device, and the system overall and lowering a threshold of what is an anomaly.

[0186] Anomaly detection / deviations

[0187] Anomaly detection can discover unusual data points in your dataset. Anomaly can be a synonym for the word ‘outlier’. Anomaly detection (or outlier detection) is the identification of rare items, events or observations which raise suspicions by differing significantly from the majority of the data. Anomalous activities can be linked to some kind of problems or rare events. Since there are tons of ways to induce a particular cyber-attack, it is very difficult to have information about all these attacks beforehand in a dataset. But, since the majority of the user activity and device activity in the system under analysis is normal, the system overtime captures almost all of the ways which indicate normal behavior. And from the inclusion-exclusion principle, if an activity under scrutiny does not give indications of normal activity, the self-learning Al model using unsupervised machine learning can predict with high confidence that the given activity is anomalous / unusual. The Al unsupervised learning model learns patterns from the features in the day to day dataset and detecting abnormal data which would not have fallen into the category (cluster) of normal behavior. The goal of the anomaly detection algorithm through the data fed to it is to learn the patterns of a normal activity so that when an anomalous activity occurs, the modules can flag the anomalies through the inclusion-exclusion principle. The goal of the anomaly detection algorithm through the data fed to it is to learn the patterns of a normal activity so that when an anomalous activity occurs, the modules can flag theanomalies through the inclusion-exclusion principle. The cyber threat module can perform its two level analysis on anomalous behavior and determine correlations.

[0188] In an example, 95% of data in a normal distribution lies within two standard-deviations from the mean. Since the likelihood of anomalies in general is very low, the modules cooperating with the Al model of normal behavior can say with high confidence that data points spread near the mean value are non-anomalous. And since the probability distribution values between mean and two standarddeviations are large enough, the modules cooperating with the Al model of normal behavior can set a value in this example range as a threshold (a parameter that can be tuned over time through the self-learning), where feature values with probability larger than this threshold indicate that the given feature’s values are non-anomalous, otherwise it’s anomalous. Note, this anomaly detection can determine that a data point is anomalous / non-anomalous on the basis of a particular feature. In reality, the cyber security appliance 100 should not flag a data point as an anomaly based on a single feature. Merely, when a combination of all the probability values for all features for a given data point is calculated can the modules cooperating with the Al model of normal behavior can say with high confidence whether a data point is an anomaly or not. Anomaly detection can discover unusual data points in your dataset.

[0189] Again, the Al models trained on a normal pattern of life of entities in a network (e.g., domain) under analysis may perform the cyber threat detection through a probabilistic change in a normal behavior through the application of, for example, an unsupervised Bayesian mathematical model to detect the behavioral change in computers and computer networks. The Bayesian probabilistic approach can determine periodicity in multiple time series data and identify changes across single and multiple time series data for the purpose of anomalous behavior detection. Please reference US patent 10,701 ,093 granted June 30th, 2020, titled “Anomaly alert system for cyber threat detection” for an example Bayesian probabilistic approach, which is incorporated by reference in its entirety. In addition, please reference US patent publication number “US2021273958A1 filed February 26, 2021 , titled “Multi-stage anomaly detection for process chains in multi-host environments” for another example anomalous behavior detector using a recurrent neural network and a bidirectional long short-term memory (LSTM), which is incorporated by reference in its entirety. In addition, please reference US patentpublication number “US2020244673A1 , filed April 23, 2019, titled “Multivariate network structure anomaly detector,” which is incorporated by reference in its entirety, for another example anomalous behavior detector with a Multivariate Network and Artificial Intelligence classifiers.

[0190] Additional module interactions

[0191] Referring back to Figure 5, the gather module 110 cooperates with the data store 135. The data store 135 stores comprehensive logs for network traffic observed, email activity, cloud activity, etc. each domain can store their long term data storage in the data store. These logs can be filtered with complex logical queries and each, for example, IP packet can be interrogated on a vast number of metrics in the network information stored in the data store. The gather module 110 pulls data relevant for each possible hypothesis from the data store as well as from additional external and internal sources. In an example, the data store 135 can store the metrics and previous threat alerts associated with network traffic for a period of time, which is, by default, at least 27 days. This corpus of data is fully searchable. The cyber security appliance 100 works with network probes to monitor network traffic and store and record the data and metadata associated with the network traffic in the data store.

[0192] The gather module 110 may have a process identifier classifier. The process identifier classifier can identify and track each process and device in the network, under analysis, making communication connections. The data store 135 cooperates with the process identifier classifier to collect and maintain historical data of processes and their connections, which is updated over time as the network is in operation. In an example, the process identifier classifier can identify each process running on a given device along with its endpoint connections, which are stored in the data store. Similarly, data from any of the domains under analysis may be collected and compared. Examples of domains / networks under analysis being protected can include any of i) an Informational Technology network, ii) an Operational Technology network, iii) a Cloud service, iv) a SaaS service, v) an endpoint device, vi) an email domain, and vii) any combinations of these.

[0193] A domain module is constructed and coded to interact with and understand a specific domain. For instance, the IT network domain module 145 may receive information from and send information to, in this example, IT network-based sensors (i.e., probes, taps, etc.). The IT network domain module 145 also hasalgorithms and components configured to understand, in this example, IT network parameters, IT network protocols, IT network activity, and other IT network characteristics of the network under analysis. The second domain module 150 is, in this example, an email module. The email domain module 150 can receive information from and send information to, in this example, email-based sensors (i.e. , probes, taps, etc.). The email domain module 150 also has algorithms and components configured to understand, in this example, email parameters, email protocols and formats, email activity, and other email characteristics of the network under analysis. Additional domain modules, such as a cloud domain module can also collect domain data from another respective domain.

[0194] Determination of whether something is likely malicious

[0195] In the following examples the analyzer module 115 and / or cyber threat analyst module 120 can use multiple factors to the determination of whether a process, event, object, entity, etc. is likely malicious.

[0196] In an example, the analyzer module 115 and / or cyber threat analyst module 120 can cooperate with one or more of the Al model(s) 160 trained on certain cyber threats to detect whether the anomalous activity detected, such as suspicious email messages, exhibit traits that may suggest a malicious intent, such as phishing links, scam language, sent from suspicious domains, etc. The analyzer module 115 and / or cyber threat analyst module 120 can also cooperate with one of more of the Al model(s) 160 trained on potential IT based cyber threats to detect whether the anomalous activity detected, such as suspicious IT links, URLs, domains, user activity, etc., may suggest a malicious intent as indicated by the Al models trained on potential IT based cyber threats.

[0197] In the above example, the analyzer module 115 and / or the cyber threat analyst module 120 can cooperate with the one or more Al models 160 trained with machine learning on the normal pattern of life for entities in an email domain under analysis to detect, in this example, anomalous emails which are detected as outside of the usual pattern of life for each entity, such as a user, email server, etc., of the email network / domain. Likewise, the analyzer module 115 and / or the cyber threat analyst module 120 can cooperate with the one or more Al models trained with machine learning on the normal pattern of life for entities in a second domain under analysis (in this example, an IT network) to detect, in this example, anomalous network activity by user and / or devices in the network, which is detected as outsideof the usual pattern of life (e.g. abnormal) for each entity, such as a user or a device, of the second domain’s network under analysis.

[0198] Thus, the analyzer module 115 and / or the cyber threat analyst module 120 can be configured with one or more data analysis processes to cooperate with the one or more of the Al model(s) 160 trained with machine learning on the normal pattern of life in the system, to identify an anomaly of at least one of i) the abnormal behavior, ii) the suspicious activity, and iii) the combination of both, from one or more entities in the system. Note, other sources, such as other model breaches, can also identify at least one of i) the abnormal behavior, ii) the suspicious activity, and iii) the combination of both to trigger the investigation.

[0199] The analyzer module 115 and / or the cyber threat analyst module 120 may use the agent analyzer data analysis process that detects a potentially malicious agent previously unknown to the system to start an investigation on one or more possible cyber threat hypotheses. The determination and output of this step is what are possible cyber threats that can include or be indicated by the identified abnormal behavior and / or identified suspicious activity identified by the agent analyzer data analysis process.

[0200] In an example, the cyber threat analyst module 120 can use the agent analyzer data analysis process and the Al models(s) trained on forming and investigating hypotheses on what are a possible set of cyber threats to use the machine learning and / or set scripts to aid in forming one or more hypotheses to support or refute each hypothesis. The cyber threat analyst module 120 can cooperate with the Al models trained on forming and investigating hypotheses to form an initial set of possible hypotheses, which needs to be intelligently filtered down. The cyber threat analyst module 120 can be configured to use the one or more supervised machine learning models trained on i) agnostic examples of a past history of detection of a multitude of possible types of cyber threat hypotheses previously analyzed by human, who was a cyber security professional, ii) a behavior and input of how a plurality of human cyber security analysts make a decision and analyze a risk level regarding and a probability of a potential cyber threat, iii) steps to take to conduct an investigation start with anomaly via learning how expert humans tackle investigations into specific real and synthesized cyber threats and then the steps taken by the human cyber security professional to narrow down and identify a potential cyber threat, and iv) what type of data and metrics that were helpful tofurther support or refute each of the types of cyber threats, in order to determine a likelihood of whether the abnormal behavior and / or suspicious activity is either i) malicious or ii) benign?

[0201] The cyber threat analyst module 120 using Al models, scripts and / or rules based modules is configured to conduct initial investigations regarding the anomaly of interest, collected additional information to form a chain of potentially related / linked information under analysis and then form one or more hypotheses that could have this chain of information that is potentially related / linked under analysis and then gather additional information in order to refute or support each of the one or more hypotheses.

[0202] In an example, a behavioural pattern analysis of what are the unusual behaviours of the network / system / device / user under analysis by the machine learning models may be as follows. The coordinator module can tie the alerts, activities, and events from, in this example, the email domain to the alerts, activities, and events from the IT network domain. Figure 6 illustrates a graph 220 of an embodiment of an example chain of unusual behaviour for, in this example, the email activities and IT network activities deviating from a normal pattern of life in connection with the rest of the system / network under analysis. The cyber threat analyst module and / or analyzer module can cooperate with one or more machine learning models. The one or more machine learning models are trained and otherwise configured with mathematical algorithms to infer, for the cyber-threat analysis, ‘what is possibly happening with the chain of distinct alerts, activities, and / or events, which came from the unusual pattern,’ and then assign a threat risk associated with that distinct item of the chain of alerts and / or events forming the unusual pattern. The unusual pattern can be determined by examining initially what activities / events / alerts that do not fall within the window of what is the normal pattern of life for that network / system / device / user under analysis can be analysed to determine whether that activity is unusual or suspicious. A chain of related activity that can include both unusual activity and activity within a pattern of normal life for that entity can be formed and checked against individual cyber threat hypothesis to determine whether that pattern is indicative of a behaviour of a malicious actor - human, program, or other threat. The cyber threat analyst module can go back and pull in some of the normal activities to help support or refute a possible hypothesis of whether that pattern is indicative of a behavior of a malicious actor. An examplebehavioral pattern included in the chain is shown in the graph over a time frame of, an example, 7 days. The cyber threat analyst module detects a chain of anomalous behavior of unusual data transfers three times, unusual characteristics in emails in the monitored system three times which seem to have some causal link to the unusual data transfers. Likewise, twice unusual credentials attempted the unusual behavior of trying to gain access to sensitive areas or malicious IP addresses and the user associated with the unusual credentials trying unusual behavior has a causal link to at least one of those three emails with unusual characteristics. Again, the cyber security appliance 100 can go back and pull in some of the normal activities to help support or refute a possible hypothesis of whether that pattern is indicative of a behaviour of a malicious actor. The analyser module can cooperate with one or more models trained on cyber threats and their behaviour to try to determine if a potential cyber threat is causing these unusual behaviours. The cyber threat analyst module can put data and entities into 1 ) a directed graph and nodes in that graph that are overlapping or close in distance have a good possibility of being related in some manner, 2) a vector diagram, 3) a relational database, and 4) other relational techniques that will at least be examined to assist in creating the chain of related activity connected by causal links, such as similar time, similar entity and / or type of entity involved, similar activity, etc., under analysis. If the pattern of behaviours under analysis is believed to be indicative of a malicious actor, then a score of how confident is the system in this assessment of identifying whether the unusual pattern was caused by a malicious actor is created. Next, also assigned is a threat level score or probability indicative of what level of threat does this malicious actor pose. Lastly, the cyber security appliance 100 is configurable in a user interface, by a user, enabling what type of automatic response actions, if any, the cyber security appliance 100 may take when different types of cyber threats, indicated by the pattern of behaviours under analysis, that are equal to or above a configurable level of threat posed by this malicious actor. The chain of the individual alerts, activities, and events that form the pattern including one or more unusual or suspicious activities into a distinct item for cyber-threat analysis of that chain of distinct alerts, activities, and / or events. The cyber-threat module may reference the one or more machine learning models trained on, in this example, e-mail threats to identify similar characteristics from the individual alerts and / or events forming thedistinct item made up of the chain of alerts and / or events forming the unusual pattern.

[0203] The autonomous response engine 140 of the cyber security system is configured to take one or more autonomous mitigation actions to mitigate the cyber threat during the cyberattack by the cyber threat. The autonomous response engine 140 is configured to reference an Artificial Intelligence model trained to track a normal pattern of life for each node of the protected system to perform an autonomous act of restricting a potentially compromised node having i) an actual indication of compromise and / or ii) merely adjacent to a known compromised node, to merely take actions that are within that node’s normal pattern of life to mitigate the cyber threat. Similarly named components in the cyber security restoration engine 190 can operate and function similar to as described for the detection engine.

[0204] An assessment of the cyber threat in order to determine appropriate autonomous actions, for example, those by the autonomous response engine

[0205] In the next step, the analyzer module 115 and / or cyber threat analyst module 120 generates one or more supported possible cyber threat hypotheses from the possible set of cyber threat hypotheses. The analyzer module generates the supporting data and details of why each individual hypothesis is supported or not. The analyzer module can also generate one or more possible cyber threat hypotheses and the supporting data and details of why they were refuted.

[0206] In general, the analyzer module 115 cooperates with the following three sources. The analyzer module 115 cooperates with the Al models trained on cyber threats to determine whether an anomaly such as the abnormal behavior and / or suspicious activity is either 1 ) malicious or 2) benign when the potential cyber threat under analysis is previously unknown to the cyber security appliance 100. The analyzer module cooperates with the Al models trained on a normal behavior of entities in the network under analysis. The analyzer module cooperates with various Al-trained classifiers. With all of these sources, when they input information that indicates a potential cyber threat that is i) severe enough to cause real harm to the network under analysis and / or ii) a close match to known cyber threats, then the analyzer module can make a final determination to confirm that a cyber threat likely exists and send that cyber threat to the assessment module to assess the threat score associated with that cyber threat. Certain model breaches will always trigger a potential cyber threat that the analyzer will compare and confirm the cyber threat.

[0207] In the next step, an assessment module with the Al classifiers is configured to cooperate with the analyzer module. The analyzer module supplies the identity of the supported possible cyber threat hypotheses from the possible set of cyber threat hypotheses to the assessment module. The assessment module with the Al classifiers cooperates with the Al model trained on possible cyber threats can make a determination on whether a cyber threat exists and what level of seventy is associated with that cyber threat. The assessment module with the Al classifiers cooperates with the one or more Al models trained on possible cyber threats in order to assign a numerical assessment of a given cyber threat hypothesis that was found likely to be supported by the analyzer module with the one or more data analysis processes, via the abnormal behavior, the suspicious activity, or the collection of system data points. The assessment module with the Al classifiers output can be a score (ranked number system, probability, etc.) that a given identified process is likely a malicious process.

[0208] The assessment module with the Al classifiers can be configured to assign a numerical assessment, such as a probability, of a given cyber threat hypothesis that is supported and a threat level posed by that cyber threat hypothesis which was found likely to be supported by the analyzer module, which includes the abnormal behavior or suspicious activity as well as one or more of the collection of system data points, with the one or more Al models trained on possible cyber threats.

[0209] The cyber threat analyst module 120 in the Al-based cyber security appliance 100 component provides an advantage over competitors’ products as it reduces the time taken for cybersecurity investigations, provides an alternative to manpower for small organizations and improves detection (and remediation) capabilities within the cyber security platform.

[0210] The Al-based cyber threat analyst module 120 performs its own computation of threat and identifies interesting network events with one or more processors. These methods of detection and identification of threat all add to the above capabilities that make the Al-based cyber threat analyst module a desirable part of the cyber security appliance 100. The Al-based cyber threat analyst module 120 offers a method of prioritizing which is not just a summary or highest score alert of an event evaluated by itself equals the most bad, and prevents more complexattacks being missed because their composite parts / individual threats only produced low-level alerts.

[0211] The Al classifiers can be part of the assessment component, which scores the input data being compared. The Al classifier can be coded to take in multiple pieces of information about an entity, object, and / or thing and based on its training and then output a prediction about the entity, object, or thing. Given one or more inputs, the Al classifier model will try to predict the value of one or more outcomes. The Al classifiers cooperate with the range of data analysis processes that produce features for the Al classifiers. The various techniques cooperating here allow anomaly detection and assessment of a cyber threat level posed by a given anomaly; but more importantly, an overall cyber threat level posed by a series / chain of correlated anomalies under analysis.

[0212] The assessment and testing module improves the analysis and formalized report generation with less repetition to consume CPU cycles with greater efficiency than humans repetitively going through these steps and re-duplicating steps to filter and analyze the CVE threats.

[0213] Again, Figure 2 illustrates a block diagram of an embodiment of the Al- based cyber security appliance 100 with the ASM cloud platform 201 and other Artificial Intelligence-based engines plugging in as an appliance platform to protect a system. The probes and detectors monitor, in this example, email activity and IT network activity to feed this data to determine what is occurring in each domain individually to their respective modules configured and trained to understand that domain’s information as well as correlate causal links between these activities in these domains to supply this input into the modules of the cyber security appliance 100. The network can include various computing devices such as desktop units, laptop units, smart phones, firewalls, network switches, routers, servers, databases, Internet gateways, etc.

[0214] Referring back to Figure 5, a computer system within a building, can use the cyber security appliance 100 to detect and thereby attempt to prevent threats to computing devices within its bounds. In this exemplary embodiment of the cyber security appliance 100 with the multiple Artificial Intelligence-based engines is implemented on a computer. The computer has the electronic hardware, modules, models, and various software processes of the cyber security appliance 100; and therefore, runs threat detection for detecting threats to the first computer system. Assuch, the computer system includes one or more processors arranged to run the steps of the process described herein, memory storage components required to store information related to the running of the process, as well as a network interface for collecting the required information for the probes and other sensors collecting data from the network under analysis.

[0215] The cyber security appliance 100 in the computer builds and maintains a dynamic, ever-changing model of the 'normal behavior' of each user and machine within the system. The approach is based on Bayesian mathematics, and monitors all interactions, events, and communications within the system - which computer is talking to which, files that have been created, networks that are being accessed.

[0216] For example, a second computer is-based in a company's San Francisco office and operated by a marketing employee who regularly accesses the marketing network, usually communicates with machines in the company's U.K. office in second computer system 40 between 9.30 AM and midday, and is active from about 8:30 AM until 6 PM.

[0217] The same employee virtually never accesses the employee time sheets, very rarely connects to the company's Atlanta network and has no dealings in South-East Asia. The security appliance takes all the information that is available relating to this employee and establishes a 'pattern of life' for that person and the devices used by that person in that system, which is dynamically updated as more information is gathered. The model of the normal pattern of life for an entity in the network under analysis is used as a moving benchmark, allowing the cyber security appliance 100 to spot behavior on a system that seems to fall outside of this normal pattern of life, and flags this behavior as anomalous, requiring further investigation and / or autonomous action.

[0218] The cyber security appliance 100 is built to deal with the fact that today's attackers are getting stealthier and an attacker / malicious agent may be 'hiding' in a system to ensure that they avoid raising suspicion in an end user, such as by slowing their machine down. The Artificial Intelligence model(s) in the cyber security appliance 100 builds a sophisticated ‘pattern of life’ - that understands what represents normality for every person, device, and network activity in the system being protected by the cyber security appliance 100.

[0219] The self-learning algorithms in the Al can, for example, understand each node’s (user account, device, etc.) in an organization’s normal patterns of life inabout a week, and grows more bespoke with every passing minute. Conventional Al typically relies solely on identifying threats based on historical attack data and reported techniques, requiring data to be cleansed, labelled, and moved to a centralized repository. The detection engine self-learning Al can learn "on the job" from real-world data occurring in the system and constantly evolves its understanding as the system’s environment changes. The Artificial Intelligence can use machine learning algorithms to analyze patterns and 'learn' what is the 'normal behavior' of the network by analyzing data on the activity on the network at the device and employee level. The unsupervised machine learning does not need humans to supervise the learning in the model but rather discovers hidden patterns or data groupings without the need for human intervention. The unsupervised machine learning discovers the patterns and related information using the unlabeled data monitored in the system itself. Unsupervised learning algorithms can include clustering, anomaly detection, neural networks, etc. Unsupervised Learning can break down features of what it is analyzing (e.g., a network node of a device or user account), which can be useful for categorization, and then identify what else has similar or overlapping feature sets matching to what it is analyzing.

[0220] The cyber security appliance 100 can use unsupervised machine learning to works things out without pre-defined labels. In the case of sorting a series of different entities, such as animals, the system analyzes the information and works out the different classes of animals. This allows the system to handle the unexpected and embrace uncertainty when new entities and classes are examined. The modules and models of the cyber security appliance 100 do not always know what they are looking for, but can independently classify data and detect compelling patterns.

[0221] The cyber security appliance’s 100 unsupervised machine learning methods do not require training data with pre-defined labels. Instead, they are able to identify key patterns and trends in the data, without the need for human input. The advantage of unsupervised learning in this system is that it allows computers to go beyond what their programmers already know and discover previously unknown relationships. The unsupervised machine learning methods can use a probabilistic approach based on a Bayesian framework. The machine learning allows the cyber security appliance 100 to integrate a huge number of weak indicators / low threat values by themselves of potentially anomalous network behavior to produce a singleclear overall measure of these correlated anomalies to determine how likely a network device is to be compromised. This probabilistic mathematical approach provides an ability to understand important information, amid the noise of the network - even when it does not know what it is looking for.

[0222] The machine learning models can use a Recursive Bayesian Estimation to combine these multiple analyzes of different measures of network behavior to generate a single overall / comprehensive picture of the state of each device, the cyber security appliance 100 and ASM cloud platform can take advantage of the power of Recursive Bayesian Estimation (RBE) via an implementation of the Bayes filter.

[0223] Using RBE, the machine learning models are able to constantly adapt themselves, in a computationally efficient manner, as new information becomes available to the system. The cyber security appliance 100’s Al models continually recalculate threat levels in the light of new evidence, identifying changing attack behaviors where conventional signature-based methods fall down.

[0224] Training a model can be accomplished by having the model learn good values for all of the weights and the bias for labeled examples created by the system, and in this case; starting with no labels initially. A goal of the training of the model can be to find a set of weights and biases that have low loss, on average, across all examples.

[0225] The Al classifier can receive supervised machine learning with a labeled data set to learn to perform their task as discussed herein. An anomaly detection technique that can be used is supervised anomaly detection that requires a data set that has been labeled as "normal" and "abnormal" and involves training a classifier. Another anomaly detection technique that can be used is an unsupervised anomaly detection that detects anomalies in an unlabeled test data set under the assumption that the majority of the instances in the data set are normal, by looking for instances that seem to fit least to the remainder of the data set. The model representing normal behavior from a given normal training data set can detect anomalies by establishing the normal pattern and then test the likelihood of a test instance under analysis to be generated by the model. Anomaly detection can identify rare items, events or observations which raise suspicions by differing significantly from the majority of the data, which includes rare objects as well as things like unexpected bursts in activity.

[0226] The methods and systems shown in the Figures and discussed in the text herein can be coded to be performed, at least in part, by one or more processing components with any portions of software stored in an executable format on a computer readable medium. Thus, any portions of the method, apparatus and system implemented as software can be stored in one or more non-transitory storage devices in an executable format to be executed by one or more processors. The computer readable storage medium may be non-transitory and does not include radio or other carrier waves. The computer readable storage medium could be, for example, a physical computer readable storage medium such as semiconductor memory or solid-state memory, magnetic tape, a removable computer diskette, a random-access memory (RAM), a read-only memory (ROM), a rigid magnetic disc, and an optical disk, such as a CD-ROM, CD-R / W or DVD. The various methods described above may also be implemented by a computer program product. The computer program product may include computer code arranged to instruct a computer to perform the functions of one or more of the various methods described above. The computer program and / or the code for performing such methods may be provided to an apparatus, such as a computer, on a computer readable medium or computer program product. For the computer program product, a transitory computer readable medium may include radio or other carrier waves.

[0227] A computing system can be, wholly or partially, part of one or more of the server or client computing devices in accordance with some embodiments. Components of the computing system can include, but are not limited to, a processing unit having one or more processing cores, a system memory, and a system bus that couples various system components including the system memory to the processing unit.

[0228] Computing devices

[0229] Figure 7 illustrates a block diagram of an embodiment of one or more computing devices that can be a part of the Artificial Intelligence-based cyber security system including the multiple Artificial Intelligence-based engines and the ASM cloud platform 201 discussed herein.

[0230] The computing device may include one or more processors or processing units 620 to execute instructions, one or more memories 630-632 to store information, one or more data input components 660-663 to receive data input from a user of the computing device 600, one or more modules that include themanagement module, a network interface communication circuit 670 to establish a communication link to communicate with other computing devices external to the computing device, one or more sensors where an output from the sensors is used for sensing a specific triggering condition and then correspondingly generating one or more preprogrammed actions, a display screen 691 to display at least some of the information stored in the one or more memories 630-632 and other components. Note, portions of this design implemented in software 644, 645, 646 are stored in the one or more memories 630-632 and are executed by the one or more processors 620. The processing unit 620 may have one or more processing cores, which couples to a system bus 621 that couples various system components including the system memory 630. The system bus 621 may be any of several types of bus structures selected from a memory bus, an interconnect fabric, a peripheral bus, and a local bus using any of a variety of bus architectures.

[0231] Computing device 602 typically includes a variety of computing machine-readable media. Machine-readable media can be any available media that can be accessed by computing device 602 and includes both volatile and nonvolatile media, and removable and non-removable media. By way of example, and not limitation, computing machine-readable media use includes storage of information, such as computer-readable instructions, data structures, other executable software, or other data. Computer-storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other tangible medium which can be used to store the desired information, and which can be accessed by the computing device 602. Transitory media such as wireless channels are not included in the machine-readable media. Machine-readable media typically embody computer readable instructions, data structures, and other executable software. In an example, a volatile memory drive 641 is illustrated for storing portions of the operating system 644, application programs 645, other executable software 646, and program data 647.

[0232] A user may enter commands and information into the computing device 602 through input devices such as a keyboard, touchscreen, or software or hardware input buttons 662, a microphone 663, a pointing device and / or scrolling input component, such as a mouse, trackball, or touch pad 661. The microphone 663 cancooperate with speech recognition software. These and other input devices are often connected to the processing unit 620 through a user input interface 660 that is coupled to the system bus 621 , but can be connected by other interface and bus structures, such as a lighting port, game port, or a universal serial bus (USB). A display monitor 691 or other type of display screen device is also connected to the system bus 621 via an interface, such as a display interface 690. In addition to the monitor 691 , computing devices may also include other peripheral output devices such as speakers 697, a vibration device 699, and other output devices, which may be connected through an output peripheral interface 695.

[0233] The computing device 602 can operate in a networked environment using logical connections to one or more remote computers / client devices, such as a remote computing system 680. The remote computing system 680 can a personal computer, a mobile computing device, a server, a router, a network PC, a peer device, or other common network node, and typically includes many or all of the elements described above relative to the computing device 602. The logical connections can include a personal area network (PAN) 672 (e.g., Bluetooth®), a local area network (LAN) 671 (e.g., Wi-Fi), and a wide area network (WAN) 673 (e.g., cellular network). Such networking environments are commonplace in offices, enterprise-wide computer networks, intranets, and the Internet. A browser application and / or one or more local apps may be resident on the computing device and stored in the memory.

[0234] When used in a LAN networking environment, the computing device 602 is connected to the LAN 671 through a network interface 670, which can be, for example, a Bluetooth® or Wi-Fi adapter. When used in a WAN networking environment (e.g., Internet), the computing device 602 typically includes some means for establishing communications over the WAN 673. With respect to mobile telecommunication technologies, for example, a radio interface, which can be internal or external, can be connected to the system bus 621 via the network interface 670, or other appropriate mechanism. In a networked environment, other software depicted relative to the computing device 602, or portions thereof, may be stored in the remote memory storage device. By way of example, and not limitation, remote application programs 685 as reside on remote computing device 680. It will be appreciated that the network connections shown are examples and other means of establishing a communications link between the computing devices that may beused. It should be noted that the present design can be carried out on a single computing device or on a distributed system in which different portions of the present design are carried out on different parts of the distributed computing system.

[0235] Note, an application described herein includes but is not limited to software applications, mobile applications, and programs routines, objects, widgets, plug-ins that are part of an operating system application. Some portions of this description are presented in terms of algorithms and symbolic representations of operations on data bits within a computer memory. These algorithmic descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. An algorithm is here, and generally, conceived to be a self-consistent sequence of steps leading to a desired result. The steps are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like. These algorithms can be written in a number of different software programming languages such as Python, C, C++, Java, HTTP, or other similar languages. Also, an algorithm can be implemented with lines of code in software, configured logic gates in hardware, or a combination of both. In an embodiment, the logic consists of electronic circuits that follow the rules of Boolean Logic, software that contain patterns of instructions, or any combination of both. A module may be implemented in hardware electronic components, software components, and a combination of both. A machine learning model is a core component of a complex system consisting of hardware and software that is capable of performing its function discretely from other portions of the entire complex system but designed to interact with the other portions of the entire complex system. A software engine is a core component of a complex system consisting of hardware and software that is capable of performing its function discretely from other portions of the entire complex system but designed to interact with the other portions of the entire complex system. The systems and methods described herein can be implemented with these algorithms discussed herein.

[0236] Unless specifically stated otherwise as apparent from the above discussions, it is appreciated that throughout the description, discussions utilizingterms such as "processing" or "computing" or "calculating" or "determining" or "displaying" or the like, refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or registers, or other such information storage, transmission or display devices.

[0237] While the foregoing design and embodiments thereof have been provided in considerable detail, it is not the intention of the applicant(s) for the design and embodiments provided herein to be limiting. Additional adaptations and / or modifications are possible, and, in broader aspects, these adaptations and / or modifications are also encompassed. Accordingly, departures may be made from the foregoing design and embodiments without departing from the scope afforded by the following claims, which scope is only limited by the claims when appropriately construed.

Claims

Claims1. An apparatus, comprising: an assessment and testing module configured to implement an exploit assessment and pentesting program and a scheduler to trigger the exploit assessment and pentesting program to test assets of a network being protected by an ASM cloud platform against known common vulnerabilities and exposures (CVE); a vulnerability scanner of the exploit assessment and pentesting program is configured to cooperate with a database of CVE test templates, where the vulnerability scanner of the exploit assessment and pentesting program is configured to perform pentesting by selecting one or more CVE test templates from the database of CVE test templates and running the one or more CVE test templates to conduct a particular common vulnerabilities and exposures test on a first asset, under analysis, in the network being protected by the ASM cloud platform to see whether the first asset in the network being protected by the ASM cloud platform is actually vulnerable or not; where the vulnerability scanner of the exploit assessment and pentesting program is configured to cooperate with at least one of a user interface, a display, and a report generator to both i) report a results of the augmented pentesting on the assets of the network as well as ii) present an attack surface of the assets of the network being protected by the ASM cloud platform detected by the exploit assessment and pentesting program in order to show actual CVE risks present in the assets in the network; and one or more processors and one or more non-transitory machine readable mediums, where when software instructions form part of the exploit assessment and pentesting program, the user interface, the report generator, the assessment and testing module, then those instructions are stored in an executable format in the one or more non-transitory machine readable mediums to be executed by the one or more processors.

2. The apparatus of claim 1 , further comprising: where the assessment and testing module with an exploit prediction assessment portion of the exploit assessment and pentesting program is configured to cooperate with a set of ASM classifiers, where the set of ASM classifiers are configured to detect and then verify what assets are definitely associated with thenetwork being protected by the ASM cloud platform to prevent randomly pentesting assets that do not belong to the network being protected.

3. The apparatus of claim 2, further comprising: a domain string classifier in the set of ASM classifiers that is configured to use Optical Character Recognition and cooperate with the assessment and testing module to use at least a crawler source to collect a plurality of domain strings being used on the Internet to analyze at least a domain string, under analysis, and an association of the domain string, under analysis, to the network being protected by the ASM cloud platform.

4. The apparatus of claim 2, further comprising: an image classifier in the set of ASM classifiers that is configured to use image recognition technology that detects image similarity based on keypoints found in compared images, scaled and rotated all in three dimensions, to do an image assessment on visual similarities between the keypoints of an image under analysis compared to the keypoints of images known to belong to the network being protected.

5. The apparatus of claim 2, further comprising: a fuzzy hash classifier in the set of ASM classifiers that is configured to predict, from various features of HTML content on, how similar an HTML page, under analysis, is to confirmed features of one or more HTML pages of the network being protected by the ASM cloud platform, where the fuzzy hash classifier is configured to analyze a first fuzzy hash created on the various features of the HTML content of the HTML page, under analysis, to a second fuzzy hash on the confirmed features of the one or more HTML pages of the network being protected by the ASM cloud platform, and then compared to meet or exceed a threshold amount of similarity.

6. The apparatus of claim 2, further comprising: an HTML classifier in the set of ASM classifiers that is configured to compare HTML feature datapoints linked to hostnames that belong to the network being protected by the ASM cloud platform to a neutral dataset, where a first set of HTML feature datapoints not unique to the network being protected by the ASM cloudplatform is likely to occur at a high frequency in the neutral dataset and a second set of HTML feature datapoints that are unique to the network being protected by the ASM cloud platform is likely to occur at a frequency less than the high frequency in the neutral dataset but do occur on hostnames associated with the network being protected by the ASM cloud platform.

7. The apparatus of claim 1 , wherein the exploit assessment and pentesting program is configured to determine what technology is being implemented on each asset in the network, and then to extract that technology to compare that technology implemented on the first asset, under analysis, to a database of the known CVEs, and then the vulnerability scanner of the exploit assessment and pentesting program is configured to perform the augmented pentesting by selecting the one or more CVE test templates from the database of CVE test templates corresponding to the technology implemented of the first asset and running each of the one or more CVE test templates to conduct a particular common vulnerabilities and exposures test on the first asset in the network to see whether the first asset in the network is actually vulnerable or not, and then coordinate with the user interface to display on the user interface a date of the test and present one or more CVE vulnerabilities that the first asset is actually vulnerable to.

8. The apparatus of claim 1 , further comprising: a CVE search module configured to monitor for newly published CVEs for each particular technology and update a database of CVEs to store known CVEs associated with each particular technology to be a source of truth for CVE information for the ASM cloud platform.

9. The apparatus of claim 1 , where the vulnerability scanner of the exploit assessment and pentesting program is configured to select and execute a first CVE test template of the one or more CVE test templates that, when executed, is nondamaging to an operation of the first asset being tested but does confirm a comprisable status of the first asset being tested, by capturing non-damaging concrete proof of the first asset’s vulnerability, where the user interface and the vulnerability scanner of the exploit assessment and pentesting program areconfigured to allow a user of the network to view the captured non-damaging concrete proof of the asset’s vulnerability.

10. The apparatus of claim 1 , further comprising: where the exploit assessment and pentesting program is configured to assess and discover information about the assets in the network being protected by the ASM cloud platform by passively collecting information on what web assets and externally facing assets that are being implemented in the network being protected, via a set of classifiers and web crawler tools, to map out an attack surface of that network being protected and then present the attack surface on the user interface to show CVE risks present in the assets in the network, without a need for a human to supply what assets make up the attack surface of the network being protected.11 . A method to perform attack surface management, comprising: triggering an exploit assessment and pentesting program to test assets of a network being protected by an ASM cloud platform against known common vulnerabilities and exposures (CVE); using a vulnerability scanner of the exploit assessment and pentesting program to cooperate with a database of CVE test templates, using the vulnerability scanner of the exploit assessment and pentesting program to perform pentesting by selecting one or more CVE test templates from the database of CVE test templates and running the one or more CVE test templates to conduct a particular common vulnerabilities and exposures test on a first asset, under analysis, in the network being protected by the ASM cloud platform to see whether the first asset in the network being protected by the ASM cloud platform is actually vulnerable or not; and using the vulnerability scanner of the exploit assessment and pentesting program to cooperate with at least one of a user interface, a display, and a report generator to both i) report a results of the augmented pentesting on the assets of the network as well as ii) present an attack surface of the assets of the network being protected by the ASM cloud platform detected by the exploit assessment and pentesting program in order to show actual CVE risks present in the assets in the network.

12. The method of claim 11 , further comprising: using the exploit assessment and pentesting program to cooperate with a set of ASM classifiers, and using the set of ASM classifiers to detect and then verify what assets are definitely associated with the network being protected by the ASM cloud platform to prevent randomly pentesting assets that do not belong to the network being protected.

13. The method of claim 12, further comprising: using a domain string classifier in the set of ASM classifiers to use Optical Character Recognition and at least a crawler source to collect a plurality of domain strings being used on the Internet to analyze at least a domain string, under analysis, and an association of the domain string, under analysis, to the network being protected by the ASM cloud platform.

14. The method of claim 12, further comprising: using an image classifier in the set of ASM classifiers to use image recognition technology that detects image similarity based on keypoints found in compared images, scaled and rotated all in three dimensions, to do an image assessment on visual similarities between the keypoints of an image under analysis compared to the keypoints of images known to belong to the network being protected.

15. The method of claim 12, further comprising: using a fuzzy hash classifier in the set of ASM classifiers to predict, from various features of HTML content on, how similar an HTML page, under analysis, is to confirmed features of one or more HTML pages of the network being protected by the ASM cloud platform, and using the fuzzy hash classifier to analyze a first fuzzy hash created on the various features of the HTML content of the HTML page, under analysis, to a second fuzzy hash on the confirmed features of the one or more HTML pages of the network being protected by the ASM cloud platform, and then compared to meet or exceed a threshold amount of similarity.

16. The method of claim 12, further comprising: using an HTML classifier in the set of ASM classifiers to compare HTML feature datapoints linked to hostnames that belong to the network being protected by the ASM cloud platform to a neutral dataset, where a first set of HTML feature datapoints not unique to the network being protected by the ASM cloud platform is likely to occur at a high frequency in the neutral dataset and a second set of HTML feature datapoints that are unique to the network being protected by the ASM cloud platform are likely to occur at a frequency less than the high frequency in the neutral dataset as well as mainly occur on hostnames associated with the network being protected by the ASM cloud platform.

17. The method of claim 11 , further comprising: using the exploit assessment and pentesting program to determine what technology is being implemented on each asset in the network, and then to extract that technology to compare that technology implemented on the first asset, under analysis, to a database of the known CVEs, and then a vulnerability scanner of the exploit assessment and pentesting program is configured to perform the augmented pentesting by selecting the one or more CVE test templates from the database of CVE test templates corresponding to the technology implemented of the first asset and running each of the one or more CVE test templates to conduct a particular common vulnerabilities and exposures test on the first asset in the network to see whether the first asset in the network is actually vulnerable or not, and then coordinate with the user interface to display on the user interface a date of the test and present one or more CVE vulnerabilities that the first asset is actually vulnerable to.

18. The method of claim 11 , further comprising: using a CVE search module to monitor for newly published CVEs for each particular technology and update a database of CVEs to store known CVEs associated with each particular technology to be a source of truth for CVE information for the ASM cloud platform.

19. The method of claim 11 , further comprising:using the vulnerability scanner of the exploit assessment and pentesting program to select and execute a first CVE test template of the one or more CVE test templates that, when executed, is non-damaging to an operation of the first asset being tested but does confirm a comprisable status of the first asset being tested, by capturing non-damaging concrete proof of the first asset’s vulnerability, where the user interface and the vulnerability scanner of the exploit assessment and pentesting program are configured to allow a user of the network to view the captured nondamaging concrete proof of the asset’s vulnerability.

20. The method of claim 11 , further comprising: using the exploit assessment and pentesting program to assess and discover information about the assets in the network being protected by the ASM cloud platform by passively collecting information on what web assets and externally facing assets that are being implemented in the network being protected, via a set of classifiers and web crawler tools, to map out an attack surface of that network being protected and then present the attack surface on the user interface to show CVE risks present in the assets in the network, without a need for a human to supply what assets make up the attack surface of the network being protected.