Method and apparatus for matching based on ac automaton
By segmenting and parallelizing AC automata data and applying a state ID hierarchical structure, the inefficiency of existing AC algorithms is solved, achieving efficient parallel matching and improving CPU utilization and cache hit rate.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- NEW H3C TECH CO LTD
- Filing Date
- 2026-05-18
- Publication Date
- 2026-06-19
AI Technical Summary
Existing AC automaton matching algorithms are serial processes, resulting in low matching efficiency and failing to fully utilize the parallel processing capabilities of the CPU.
The data to be matched is divided into multiple segments, and the initial state in the state mapping relationship is used as the state of the first data unit for parallel matching. By combining the state ID hierarchical structure and the extended matching end event, the boundary conditions of parallel matching are monitored, and the parallel processing of the AC algorithm is optimized.
By combining parallel processing with a state ID hierarchical structure, the CPU cache hit rate and throughput are significantly improved, memory loading operations are reduced, and matching efficiency is enhanced.
Smart Images

Figure CN122240892A_ABST
Abstract
Description
Technical Field
[0001] This application relates to network technology, and in particular to a matching method and apparatus based on AC automata. Background Technology
[0002] The Aho-Corasick Automaton (AC Automaton) is a multi-pattern string / word matching algorithm used in fields such as network security (e.g., intrusion detection, virus scanning), text processing (e.g., sensitive word filtering), and bioinformatics. AC automata involve trie trees (also known as prefix trees or AC trees).
[0003] An AC tree is a tree built based on a specified domain, such as network security (e.g., all keywords in intrusion detection and virus scanning, such as "he", "she", "his"), where the root node is empty and contains no characters / words; it is the starting point. Each node in an AC tree represents a single character / word. For ease of description, characters and words are collectively referred to as data units.
[0004] Each node in the AC tree has a failure link. This allows the matching state to jump to another node with the longest common suffix when a match fails at a node, avoiding restarting the matching from the root node. For example, for the word "she", after finding "s" in the AC tree, "h" is found, but the next character found from "h" is "x" instead of "e", so the match fails. In this case, instead of backtracking to the root node to rematch, the failure link of "h" jumps to another possible matching position (such as the "h" in "he") and continues trying. This ensures that the text pointer only moves forward and never backward.
[0005] The current AC matching algorithm is essentially a strictly serial process, meaning that matching must be performed sequentially from beginning to end. For example, in character matching, it currently reads the characters to be matched in reverse order, updates the state, and then reads the next character. This serial matching significantly reduces matching efficiency. Summary of the Invention
[0006] This application provides a matching method and apparatus based on AC automata to avoid the technical problem of low matching efficiency caused by existing serial matching that follows a front-to-back order.
[0007] This embodiment provides a matching method based on the AC automaton, the method including: The data to be matched is divided into m segments, and the starting state in the established state mapping relationship is used as the state of the first data unit in each segment. The matching operation is started for each segment. The state mapping relationship records the jump results of each data unit in the AC tree in each state. For any given segment, if the state following the jump in the last data unit of the current segment is not the initial state, then an extended matching operation is performed on the current segment. This extended matching operation means: using the state following the jump in the last data unit of the current segment as the state of the first data unit in the next segment to initiate a matching operation on that next segment. During the matching process, an extended matching end event is monitored. If an extended matching end event is detected, the extended matching operation for that segment ends. The extended matching end event is defined as: the jump result of any data unit in the first state contains a second state, and the state ID in the established state ID hierarchy indicates that the state ID of the second state is less than or equal to the maximum state ID at the first level where the first state is located. The state ID hierarchy is obtained by hierarchically arranging the state IDs of each state in the AC tree.
[0008] A matching device based on an AC automaton, the device comprising: The first matching unit divides the data to be matched into m segments, and uses the starting state in the established state mapping relationship as the state of the first data unit in each segment to start the matching operation for each segment; the state mapping relationship records the jump results of each data unit in the AC tree in each state; The second matching unit is used to perform an extended matching operation on any segment other than the last segment if the state after the jump in the jump result of the last data unit in the current segment is not the starting state. This extended matching operation means taking the state after the jump in the jump result of the last data unit in the current segment as the state of the first data unit in the next segment of the current segment to initiate a matching operation on that next segment. The extended matching operation ends when an extended matching end event is detected during the matching process. The extended matching end event is defined as follows: the jump result of any data unit in the first state contains a second state, and the state ID in the established state ID hierarchy indicates that the state ID of the second state is less than or equal to the maximum state ID in the first level where the first state is located. The state ID hierarchy is obtained by hierarchically arranging the state IDs of each state in the AC tree.
[0009] An electronic device comprising: a processor and a machine-readable storage medium; The machine-readable storage medium stores machine-executable instructions that can be executed by the processor; The processor is used to execute machine-executable instructions to implement the steps in the above method.
[0010] As can be seen from the above technical solutions, this application first starts the matching operation of each segment in parallel from the starting state in the state mapping relationship by dividing the data to be matched, such as text, into segments. Then, in any segment, if the jump state of the last data unit, such as a character, in the current segment is not the starting state, the extended matching operation is performed on the current segment. During the matching process, the extended matching end event is monitored. Once the extended matching end event is detected, the extended matching operation of the segment is terminated. Compared with the existing serial matching in the order from front to back, this fully utilizes the multiple load / store units of the CPU to execute the matching of each segment in parallel, so that the CPU can load the states of multiple data units, such as characters, in parallel, and overcomes the bottleneck of the traditional AC algorithm that must process each string line one by one.
[0011] Furthermore, this application utilizes the maximum state ID at each level of the state ID hierarchy to monitor whether the state ID of the state after the jump is less than or equal to the maximum state ID at the level of the state before the jump during the extended matching operation. This allows for a decision on whether to terminate the extended matching operation, ensuring the feasibility of starting matching operations for each segment in parallel and significantly improving the CPU cache hit rate while reducing expensive memory loading operations.
[0012] Furthermore, this embodiment leverages the state mapping relationship to ensure that the state after the jump is directly based on the data unit and the state address. Combined with segmented parallel processing, this space-for-time strategy is maximized, solving the problem of insufficient throughput in large-scale pattern matching. Attached Figure Description
[0013] The accompanying drawings, which are incorporated in and form a part of this specification, illustrate embodiments consistent with this disclosure and, together with the description, serve to explain the principles of this disclosure.
[0014] Figure 1 This is a schematic diagram of an existing AC tree; Figure 2 Example structural diagram of the state mapping relationship provided in the embodiments of this application; Figure 3 This is a diagram showing the hierarchical structure of the status IDs provided in an embodiment of this application. Figure 4 A flowchart illustrating the method provided in this application embodiment; Figure 5 Detailed implementation flowcharts provided for embodiments of this application; Figure 6Another specific implementation flowchart provided for an embodiment of this application; Figure 7 Structural diagrams provided for embodiments of this application; Figure 8 Provided for the embodiments of this application Figure 7 The hardware structure diagram of the device shown is shown. Detailed Implementation
[0015] Exemplary embodiments will now be described in detail, examples of which are illustrated in the accompanying drawings. When the following description relates to the drawings, unless otherwise indicated, the same numbers in different drawings denote the same or similar elements. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with this application. Rather, they are merely examples of apparatuses and methods consistent with some aspects of this application as detailed in the appended claims.
[0016] The terminology used in this application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. The singular forms “a,” “the,” and “the” used in this application and the appended claims are also intended to include the plural forms unless the context clearly indicates otherwise.
[0017] Before describing the method provided in this embodiment, the state mapping relationship and state ID hierarchical structure introduced in this application will be described first: Here, the state mapping relationship and state ID hierarchical structure are built based on the AC tree. For example... Figure 1 An example of an AC tree is shown. Figure 1 In the AC tree shown, the root node (i.e., the initial state) is represented by 0, state nodes such as node 1 and node 3 (black circles) represent the state, and pattern string matching nodes such as node 2 and node 7 (white circles) represent the pattern string. Solid arrows indicate state transitions for successful matches, and dashed arrows indicate state transitions for failed matches. If the text to be matched is "mshis", then the state transition process based on the AC tree is: 0 -> node 0 (m) -> node 3 (s) -> node 4 (h) -> node 1 (i - failed) -> node 6 (i) -> node 7 (s).
[0018] In this embodiment, although the AC tree introduces a failure function to avoid restarting the matching from the root node, backtracking along the failure function is still necessary when a match fails, leading to numerous random memory accesses and branch prediction failures. To eliminate this overhead, this embodiment transforms the jump relationships between nodes in the AC tree into a deterministic state mapping relationship (e.g., a state mapping table, denoted as pTrie). The state mapping relationship describes the jump relationships between different nodes in the AC tree.
[0019] Optionally, in this embodiment, the state mapping relationship records the jump result corresponding to the ASCII code of each data unit, such as a character, in each state. Taking the state mapping relationship as represented by a two-dimensional array as an example, the first dimension subscript corresponds to the state ID, and the second dimension subscript corresponds to the ASCII code (or Unicode encoding) of the data unit. The state mapping relationship records the jump result (denoted as pTrie[S][c]) of any state S and any data unit's ASCII code (denoted as c). pTrie[S][c] indicates the state ID of c after jumping under S. That is to say, in this embodiment, by looking up the state mapping relationship once, the jump state of any data unit in any state can be obtained (recording the next state), thereby transforming complex logical judgments into simple access, such as array access. Figure 2 An example illustrating the state mapping relationship is provided. For example... Figure 2 As shown, Figure 2 It shows all possible data units, such as characters, under various states, such as S0 and S1. For example, S0[0] represents the character in state S0.
[0020] In addition, this embodiment also establishes a state ID hierarchical arrangement structure. The state ID hierarchical arrangement structure is used to describe the hierarchical arrangement of each state in the AC tree, that is, the state ID hierarchical arrangement structure is obtained by hierarchically arranging the state IDs of each state in the AC tree.
[0021] It should be noted that when constructing the AC tree, state IDs are typically allocated first in the order of breadth-first search (BFS). Using this allocation method, this embodiment ensures that state IDs within the same level are set in ascending order when creating the state ID hierarchical structure. Furthermore, within the state ID hierarchical structure, the largest state ID at the lower level is smaller than the smallest state ID at the higher level, ensuring that lower state IDs are allocated to lower levels and higher state IDs to higher levels, thus improving cache hit rate. Figure 3 An example is shown to illustrate the hierarchical structure of state IDs. For instance, the maximum ID of level 0 (root node) is S0, the maximum ID of level 1 is L1, the maximum ID of level 2 is L2, and so on.
[0022] Based on the above description, this embodiment, based on the state mapping relationship and the state ID layered structure, makes AC segmented matching feasible by calculating the layer state boundaries of the AC tree. This fully utilizes the multiple load / store units of the CPU to perform segmented matching in parallel, optimizes the AC algorithm matching process, and improves the AC algorithm matching efficiency. The following is a description: See Figure 4 , Figure 4 This is a flowchart illustrating a method provided in an embodiment of this application. This method is applied to electronic devices such as computers, but this embodiment is not specifically limited to these applications.
[0023] like Figure 4 As shown, the process may include the following steps: Step 401: Divide the data to be matched into m segments.
[0024] The data to be matched is the long text to be matched, which is TEXT and its length is Len. In order to utilize the parallel computing power of multi-core CPUs, this embodiment divides TEXT into m segments, denoted as TEXT(0), TEXT(1), ..., TEXT(m-1).
[0025] Optionally, in this embodiment, the segmentation strategy adopts the principle of equal division as much as possible to ensure load balancing among threads. If Len is not divisible by m, the remaining characters, such as data units, are allocated to the last segment TEXT(m-1), making it slightly longer than the other segments, or dynamically adjusted according to the specific hardware architecture. That is, in this embodiment, the m segments are of equal length, or, all segments in the m segments except the last one are of equal length.
[0026] Step 402: Using the starting state in the established state mapping relationship as the state of the first data unit in each segment, start the matching operation for each segment.
[0027] Optionally, in this embodiment, m parallel threads (or processing units) can be started, each thread responsible for a segment of matching. The key is that all threads initially start from the initial state in the state mapping relationship, for example... Figure 2 The root node state S0 of the AC tree shown is used as the first data unit, such as the state of a character, to start the matching process. As for the specific matching process, for example, matching "mshis" from a text, once the sequence 0->0(m)->3(s)->4(h)->1(i failure)->6(i)->7(s) is found, it is determined that "mshis" has been matched, and the matching of the next keyword continues.
[0028] Specifically, this embodiment can be implemented according to Figure 5 Matching is performed according to the indicated process: Step 501: For each segment, take the starting state in the established state mapping relationship as the current state, and take the ASCII code of the first data unit in the segment as the current ASCII code.
[0029] Step 502: Using the current ASCII code and the current state as indices, find the state corresponding to the jump in the state mapping relationship.
[0030] Step 503: If the current data unit is not the last data unit of the segment, take the state after the jump as the current state, take the ASCII code of the next data unit as the current ASCII code, and return to step 502. If the current data unit is the last data unit of the segment, end the data unit matching operation of the segment.
[0031] Steps 501 to 503 are used to perform a matching operation on any segment.
[0032] In this embodiment, we organize the iteration pointers (pIter0, pIter1, ..., pIterN) and state variables (State0, State1, ..., StateN) of each segment in a register or cache structure. Figure 5 The matching operation shown can be represented by the following code logic: while (pItern != pTextEnd)) / / Use the longest segment or the global end pointer as the criterion. { State0 = pTrie[State0][*pIter0]; / / Query state mapping relationship State1 = pTrie [State1][*pIter1]; / / Query state mapping relationship ... Staten = pTrie [Staten][*pItern]; / / Query state mapping relationship pIter0++; pIter1++; …; pItern++; / / Pointer step In the above code logic, the instructions for accessing pTrie are closely adjacent in memory layout to efficiently load the relevant data blocks of pTrie into the L1 / L2 cache; and to eliminate the control flow divergence between threads, enabling the CPU to execute these pTrie lookup instructions in parallel, which significantly improves the throughput of AC matching.
[0033] Step 403: For any segment, if the state after the jump in the jump result of the last data unit in the current segment is not the starting state, then perform an extended matching operation on the current segment. Here, performing an extended matching operation on the current segment means: taking the state after the jump in the jump result of the last data unit in the current segment as the state of the first data unit in the next segment of the current segment, so as to start the matching operation on the next segment, and monitoring the extended matching end event during the matching process. If the extended matching end event is detected, then the extended matching operation on the current segment ends.
[0034] Optionally, after each segment's matching is complete, each thread will stop after the last character of that segment is matched. At this point, the state after the jump of the last character in some segments may be the aforementioned initial state, such as S0 (indicating that the matching is completely finished), while the state after the jump of the last character in other segments may not be the aforementioned initial state, such as S0 (indicating that a prefix of a pattern string was matched, but this prefix crosses the segment boundary). For the TEXT(m) segment, if its last character's state after the jump, State_end, is the aforementioned initial state, such as S0, then the matching of that segment is completely finished, and no further operation is required. If State_end is not the aforementioned initial state, such as S0, then the "extended matching" operation must be initiated to capture the complete pattern string that crosses the boundary.
[0035] Optionally, enable matching for the next segment as follows: Figure 6 The process described below: Step 601: Take the state after the jump contained in the jump result of the last data unit in the current segment as the current state, and take the ASCII code of the first data unit in the next segment of the current segment as the current ASCII code.
[0036] Step 602: Using the current ASCII code and the current state as indices, find the state corresponding to the jump in the state mapping relationship.
[0037] Step 603: If the extended matching end event is determined to occur based on the current state and the state after the jump, then the extended matching operation of this segment ends; otherwise, proceed to step 604.
[0038] Step 604: If the current data unit is not the last data unit of the current segment, take the state after the jump as the current state, take the ASCII code of the next data unit as the current ASCII code, and return to step 603 until the current data unit is the last data unit of the current segment.
[0039] Optionally, when the current data unit is the last data unit of the current segment, if the state after the jump is not the aforementioned starting state, then return to the steps similar to the aforementioned extended matching operation for the current segment; if the state after the jump is the starting state, then end the extended matching operation for the current segment.
[0040] Optionally, in this embodiment, the extended matching end event refers to: any data unit's transition result in the first state includes the second state, and the state ID indicating the second state in the established state ID hierarchy is less than or equal to the maximum state ID under the first level where the first state is located.
[0041] comprehensive Figure 6The process described includes the following specific steps for extended matching: If the last character in the TEXT(m) segment is not in the initial state (State_end) after the jump, then State_end can be used as the state of the first character in the TEXT(m+1) segment to start matching the TEXT(m+1) segment. However, this matching process introduces a determination of the end of the extended matching event. For example, during the matching process, the state ID (denoted as State(i)) and Layer(i) are monitored in real time, where i represents the number of bytes exceeding the current segment TEXT(m) (i.e., the number of extension steps), and Layer(i) represents the maximum state ID at the i-th level. Once State(i) <= Layer(i), it means that an extended matching end event has occurred, ending the extended matching of the current segment TEXT(m). By using the extended matching end event, while ensuring matching accuracy (without missing cross-segment pattern strings), redundant calculations can be minimized, achieving precise matching truncation.
[0042] In summary, this embodiment eliminates backtracking overhead through state mapping, utilizes the parallel characteristics of the CPU through instruction adjacency encoding, and solves the boundary problems caused by segmented matching through state ID layering structure and extended matching end event, thereby achieving a high-efficiency, low-latency data unit matching.
[0043] The methods provided in the embodiments of this application have been described above. The apparatus provided in the embodiments of this application is described below: See Figure 7 , Figure 7 A structural diagram of a device provided in an embodiment of this application. The device includes: The first matching unit divides the data to be matched into m segments, and uses the starting state in the established state mapping relationship as the state of the first data unit in each segment to start the matching operation for each segment; the state mapping relationship records the jump results of each data unit in the AC tree in each state; The second matching unit is used to perform an extended matching operation on any segment other than the last segment if the state after the jump in the jump result of the last data unit in the current segment is not the starting state. This extended matching operation means taking the state after the jump in the jump result of the last data unit in the current segment as the state of the first data unit in the next segment of the current segment to initiate a matching operation on that next segment. The extended matching operation ends when an extended matching end event is detected during the matching process. The extended matching end event is defined as follows: the jump result of any data unit in the first state contains a second state, and the state ID in the established state ID hierarchy indicates that the state ID of the second state is less than or equal to the maximum state ID in the first level where the first state is located. The state ID hierarchy is obtained by hierarchically arranging the state IDs of each state in the AC tree.
[0044] Optionally, the step of using the starting state in the established state mapping relationship as the state of the first data unit in each segment and initiating the matching operation for each segment includes: For each segment, the starting state in the established state mapping relationship is taken as the current state, and the ASCII code of the first data unit in the segment is taken as the current ASCII code. Using the current ASCII code and current state as indices, find the state corresponding to the jump in the state mapping relationship. If the current data unit is not the last data unit of the segment, use the jump state as the current state, and the ASCII code of the next data unit as the current ASCII code. Return to the step of finding the jump state corresponding to the jump in the state mapping relationship using the current ASCII code and current state as indices. If the current data unit is the last data unit of the segment, end the matching operation for that segment; and / or, The step of using the state after the jump in the jump result of the last data unit in the current segment as the state of the first data unit in the next segment to initiate a matching operation for the next segment includes: The jump result of the last data unit in the current segment is taken as the jump state after the jump, and the ASCII code of the first data unit in the next segment of the current segment is taken as the current ASCII code. Using the current ASCII code and current state as indices, find the state corresponding to the jump in the state mapping relationship; If the extended matching end event is determined to occur based on the current state and the state after the jump, then the extended matching operation for this segment ends. Otherwise, if the current data unit is not the last data unit of the current segment, the state after the jump is used as the current state, and the ASCII code of the next data unit is used as the current ASCII code. The step of finding the state after the jump corresponding to the index in the state mapping relationship using the current ASCII code and the current state as indices is returned until the current data unit is the last data unit of the current segment; and / or, The second matching unit further, when the current data unit is the last data unit of the current segment, if the state after the jump is not the starting state, returns to the step of performing the extended matching operation on the current segment; if the state after the jump is the starting state, the extended matching operation of the current segment ends; and / or, In the state ID hierarchy structure, the maximum state ID of the lower level is less than the minimum state ID of the higher level within two adjacent levels. In the state ID hierarchy, state IDs at the same level are set in ascending order; and / or... The operation of enabling data unit matching for each segment includes: simultaneously enabling data unit matching for each segment in parallel; and / or, The m segments are of equal length, or, All segments in the m-segment except the last one are of equal length.
[0045] This concludes the process. Figure 7 Structural description of the device shown.
[0046] The embodiments of this application also provide Figure 7 The hardware structure diagram of the device is shown. Figure 8 The electronic device shown includes: a processor and a machine-readable storage medium; The machine-readable storage medium stores machine-executable instructions that can be executed by the processor; The processor is used to execute machine-executable instructions to implement the steps in the above method.
[0047] Based on the same application concept as the above method, this application embodiment also provides a machine-readable storage medium storing a plurality of computer instructions, which, when executed by a processor, can implement the method disclosed in the above examples of this application.
[0048] For example, the aforementioned machine-readable storage medium can be any electronic, magnetic, optical, or other physical storage device that can contain or store information such as executable instructions, messages, etc. For instance, machine-readable storage media can be: RAM (Random Access Memory), volatile memory, non-volatile memory, flash memory, storage drives (such as hard disk drives), solid-state drives, any type of storage disk (such as optical discs, DVDs, etc.), or similar storage media, or combinations thereof.
[0049] The systems, devices, modules, or units described in the above embodiments can be implemented by computer chips or entities, or by products with certain functions. A typical implementation device is a computer, which can take the form of a personal computer, laptop computer, cellular phone, camera phone, smartphone, personal digital assistant, media player, navigation device, email sending and receiving device, game console, tablet computer, wearable device, or any combination of these devices.
[0050] For ease of description, the above devices are described separately by function as various units. Of course, in implementing this application, the functions of each unit can be implemented in one or more software and / or hardware.
[0051] Those skilled in the art will understand that embodiments of this application can be provided as methods, systems, or computer program products. Therefore, this application can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, embodiments of this application can take the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.
[0052] This application is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of this application. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable message processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable message processing apparatus, generate instructions for implementing the flowchart illustrations and / or block diagrams. Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.
[0053] Furthermore, these computer program instructions can also be stored in a computer-readable storage medium that can direct a computer or other programmable message processing device to operate in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in the process. Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.
[0054] These computer program instructions can also be loaded onto a computer or other programmable message processing device to cause a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable device for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.
[0055] The above description is merely an embodiment of this application and is not intended to limit the scope of this application. Various modifications and variations can be made to this application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this application should be included within the scope of the claims of this application.
Claims
1. A matching method based on an Aho-Corasick automaton, characterized in that, The method includes: The data to be matched is divided into m segments, and the starting state in the established state mapping relationship is used as the state of the first data unit in each segment. The matching operation is started for each segment. The state mapping relationship records the jump results of each data unit in the AC tree in each state. For any segment other than the last segment, if the state after the jump in the jump result of the last data unit in the current segment is not the starting state, then an extended matching operation is performed on the current segment. Performing an extended matching operation on the current segment means: taking the state after the jump in the jump result of the last data unit in the current segment as the state of the first data unit in the next segment of the current segment, to initiate a matching operation on the next segment, and ending the extended matching operation on the current segment when an extended matching end event is detected during the matching process. The extended matching end event is defined as: the jump result of any data unit in the first state contains a second state, and the state ID in the established state ID hierarchy indicates that the state ID of the second state is less than or equal to the maximum state ID in the first level where the first state is located. The state ID hierarchy is obtained by hierarchically arranging the state IDs of each state in the AC tree.
2. The method according to claim 1, characterized in that, The step of using the initial state in the established state mapping relationship as the state of the first data unit in each segment and initiating the matching operation for each segment includes: For each segment, the starting state in the established state mapping relationship is taken as the current state, and the ASCII code of the first data unit in the segment is taken as the current ASCII code. Using the current ASCII code and the current state as indexes, find the state corresponding to the jump in the state mapping relationship. If the current data unit is not the last data unit of the segment, use the jump state as the current state, and use the ASCII code of the next data unit as the current ASCII code. Return to the step of finding the jump state corresponding to the jump in the state mapping relationship using the current ASCII code and the current state as indexes. If the current data unit is the last data unit of the segment, end the matching operation of the segment.
3. The method according to claim 1, characterized in that, The step of using the state after the jump in the jump result of the last data unit in the current segment as the state of the first data unit in the next segment to initiate a matching operation for the next segment includes: The jump result of the last data unit in the current segment is taken as the jump state after the jump, and the ASCII code of the first data unit in the next segment of the current segment is taken as the current ASCII code. Using the current ASCII code and current state as indices, find the state corresponding to the jump in the state mapping relationship; If the extended matching end event is determined to occur based on the current state and the state after the jump, the extended matching operation of this segment ends. Otherwise, if the current data unit is not the last data unit of the current segment, the state after the jump is taken as the current state, the ASCII code of the next data unit is taken as the current ASCII code, and the step of finding the state after the jump corresponding to the index in the state mapping relationship with the current ASCII code and the current state as the index is returned until the current data unit is the last data unit of the current segment.
4. The method according to claim 3, characterized in that, The method further includes: If the current data unit is the last data unit of the current segment, and the state after the jump is not the starting state, then return to the step of performing the extended matching operation on the current segment; if the state after the jump is the starting state, then end the extended matching operation of the current segment.
5. The method according to claim 1, characterized in that, In the state ID hierarchy structure, the maximum state ID of the lower level is less than the minimum state ID of the higher level within two adjacent levels. In the state ID hierarchy structure, the state IDs at the same level are set in ascending order.
6. The method according to claim 1, characterized in that, The operation of enabling data unit matching for each segment includes: simultaneously enabling data unit matching for each segment in parallel.
7. The method according to claim 1, characterized in that, The m segments are of equal length, or, All segments in the m-segment except the last one are of equal length.
8. A matching device based on an AC automaton, characterized in that, The device includes: The first matching unit divides the data to be matched into m segments, and uses the starting state in the established state mapping relationship as the state of the first data unit in each segment to start the matching operation for each segment; the state mapping relationship records the jump results of each data unit in the AC tree in each state; The second matching unit is used to perform an extended matching operation on any segment other than the last segment if the state after the jump in the jump result of the last data unit in the current segment is not the starting state. This extended matching operation means taking the state after the jump in the jump result of the last data unit in the current segment as the state of the first data unit in the next segment of the current segment to initiate a matching operation on that next segment. The extended matching operation ends when an extended matching end event is detected during the matching process. The extended matching end event is defined as follows: the jump result of any data unit in the first state contains a second state, and the state ID in the established state ID hierarchy indicates that the state ID of the second state is less than or equal to the maximum state ID in the first level where the first state is located. The state ID hierarchy is obtained by hierarchically arranging the state IDs of each state in the AC tree.
9. The apparatus according to claim 8, characterized in that, The step of using the initial state in the established state mapping relationship as the state of the first data unit in each segment and initiating the matching operation for each segment includes: For each segment, the starting state in the established state mapping relationship is taken as the current state, and the ASCII code of the first data unit in the segment is taken as the current ASCII code. Using the current ASCII code and current state as indices, find the state corresponding to the jump in the state mapping relationship. If the current data unit is not the last data unit of the segment, use the jump state as the current state, and the ASCII code of the next data unit as the current ASCII code. Return to the step of finding the jump state corresponding to the jump in the state mapping relationship using the current ASCII code and current state as indices. If the current data unit is the last data unit of the segment, end the matching operation for that segment; and / or, The step of using the state after the jump in the jump result of the last data unit in the current segment as the state of the first data unit in the next segment to initiate a matching operation for the next segment includes: The jump result of the last data unit in the current segment is taken as the jump state after the jump, and the ASCII code of the first data unit in the next segment of the current segment is taken as the current ASCII code. Using the current ASCII code and current state as indices, find the state corresponding to the jump in the state mapping relationship; If the extended matching end event is determined to occur based on the current state and the state after the jump, then the extended matching operation for this segment ends. Otherwise, if the current data unit is not the last data unit of the current segment, the state after the jump is used as the current state, and the ASCII code of the next data unit is used as the current ASCII code. The step of finding the state after the jump corresponding to the index in the state mapping relationship using the current ASCII code and the current state as indices is returned until the current data unit is the last data unit of the current segment; and / or, The second matching unit further, when the current data unit is the last data unit of the current segment, if the state after the jump is not the starting state, returns to the step of performing the extended matching operation on the current segment; if the state after the jump is the starting state, the extended matching operation of the current segment ends; and / or, In the state ID hierarchy structure, the maximum state ID of the lower level is less than the minimum state ID of the higher level within two adjacent levels. In the state ID hierarchy, state IDs at the same level are set in ascending order; and / or... The operation of enabling data unit matching for each segment includes: simultaneously enabling data unit matching for each segment in parallel; and / or, The m segments are of equal length, or, All segments in the m-segment except the last one are of equal length.
10. An electronic device comprising: Processor- and machine-readable storage media; The machine-readable storage medium stores machine-executable instructions that can be executed by the processor; The processor is used to execute machine-executable instructions to implement the steps of any one of claims 1 to 7.