A drug-related case-oriented criminal behavior sequence visualization method and system

By constructing similar node trees and using sequence pattern mining technology, the problem of visualizing criminal behavior sequences in drug-related cases has been solved, achieving efficient and intuitive visualization analysis and helping judicial personnel understand the patterns and characteristics of cases.

CN115510858BActive Publication Date: 2026-06-30GUIZHOU UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
GUIZHOU UNIV
Filing Date
2022-09-27
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

Existing technologies are insufficient to effectively visualize the sequence of criminal acts in drug-related cases, making it difficult for judicial personnel to intuitively understand the patterns and characteristics of these cases.

Method used

A similar node tree is constructed, semantically similar nodes are merged, the minimum description length is used as the optimization objective to mine sequence patterns, and key points are visualized through a question-answering system. Sequence pattern mining techniques are used to reduce visual interference and discover the focus in the sequence.

Benefits of technology

It enables the representation of more information within a small scope, reduces visual interference, discovers case patterns, reduces the cognitive burden on judicial personnel, and provides efficient and intuitive sequence visualization analysis.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN115510858B_ABST
    Figure CN115510858B_ABST
Patent Text Reader

Abstract

This invention proposes a method for visualizing criminal behavior sequences in drug-related cases. First, pre-trained word vectors are generated using the case details of the drug-related case. A tree structure of behavioral words is constructed based on the similarity of word vectors at sequence nodes. Then, nodes in the criminal behavior sequence are merged according to the tree structure of the behavioral words. Sequence patterns are mined, and all sequences are clustered based on these patterns. The sequences are represented as sequence patterns and sequence complements. Finally, the focal points in the sequence patterns are extracted for focused visualization. This invention provides assistance for case analysis in the judicial field, offers reference for judicial personnel's decision-making, and promotes the development of visualization analysis in the judicial field.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of sequence visualization technology, specifically, it relates to a method and system for visualizing the sequence of criminal behavior in drug-related cases. Background Technology

[0002] In 2022, with the continuous development and improvement of smart courts, technologies such as artificial intelligence, big data, data mining, and visualization analysis were increasingly integrated with the judiciary to help improve court efficiency. Data mining technology provides new analytical perspectives on case details and improves analytical efficiency, while visualization provides judicial personnel with a more intuitive and convenient way to understand patterns in cases.

[0003] Drug-related crimes are among the most serious crimes in judicial cases. They not only cause immense harm to people's physical and mental health but also often trigger other serious criminal offenses such as robbery and theft, creating a series of social problems and posing a significant threat to economic development and social harmony and stability. This has led to increasing research into drug-related cases. Studying the sequences of criminal behavior in drug-related cases can reveal the characteristics of offenders' behavior and help police predict their actions. A criminal behavior sequence is a sequence constructed by extracting key behaviors from drug-related cases, with each case forming a separate sequence. Visualizing offender behavior helps judicial personnel understand the patterns in cases more intuitively, revealing the behavioral characteristics of each case and making the case content easier to comprehend. Summary of the Invention

[0004] This invention proposes a method and system for visualizing criminal behavior sequences in drug-related cases. Based on constructing a similar node tree from semantically similar nodes in the sequence, similar nodes are merged. The system uses minimum description length as an optimization objective to mine sequence patterns and then performs sequence visualization layout. It identifies focal points within these patterns and visualizes these focal points. This invention can fully uncover the patterns and focal points in criminal behavior, providing judicial personnel with an efficient and intuitive sequence visualization method.

[0005] This invention is achieved through the following technical solution:

[0006] A method for visualizing the sequence of criminal behaviors in drug-related cases:

[0007] The method specifically includes the following steps:

[0008] Step 1: Data preprocessing. Use word segmentation tools to extract behavioral words from the case text of drug-related cases. Based on the semantic similarity of behavioral words, construct a similar node tree for similar behavioral words in the sequence of criminal behaviors.

[0009] Step 2: Select nodes from the similar node tree constructed in Step 1 and merge the behavioral words in the criminal behavior sequence;

[0010] Step 3: Mine the sequence generated after merging nodes in Step 2 to find the sequence pattern. Divide all sequences into clusters according to the sequence pattern, and represent the sequence as sequence pattern, sequence supplement and original sequence for preliminary visualization.

[0011] Step 4: Use a question-and-answer system to extract the focus from the sequence patterns in Step 3 and visualize it; finally, visualize a large number of sequences as sequence patterns, focus, sequence supplements and original sequences.

[0012] Furthermore, in step one,

[0013] Using the case details of drug-related cases, pre-trained word vectors are generated using word2vec. By searching the pre-trained word vectors, the word vectors corresponding to the behavioral words in the criminal behavior sequence are obtained.

[0014] Based on the similarity between the word vectors of the behavioral words, a similarity node tree is constructed for the behavioral words.

[0015] Furthermore, in step two,

[0016] Each node in the similar node tree generates a vector Vector_A(s1,s2,…,sm).

[0017] Where m is the number of cases, and Vector_A represents whether the behavior or a sub-behavior of the behavior occurs in the behavior sequence of each case or in the similar node tree;

[0018] Each legal provision generates a vector Vector_B(s1,s2,…,sm), which represents whether the legal provision is used in each case;

[0019] The correlation X between two vectors is calculated using the chi-square test. 2 The metric X is obtained by combining the node information from various legal provisions in the similar node tree. 2 / L, where L is the number of legal provisions;

[0020] Based on the information metric X of each node in the similar node tree 2 / L performs node merging, replacing all child nodes of the selected node in the criminal behavior sequence with that node, thereby reducing similar nodes.

[0021] Furthermore, in step three,

[0022] The sequence generated after merging nodes is subjected to sequence pattern extraction. Based on the description length between two sequences as the optimization target, the sequence patterns of the two sequences with the smallest description length are extracted and the two sequences are merged into the same cluster. This process is iterated until all sequences are classified into clusters. The sequences in the same cluster are visualized in the form of sequence patterns, sequence supplements and original sequences.

[0023] Furthermore, in step four,

[0024] The relevant legal provisions are used as questions and input into the QA system. The system searches for the case sentences most relevant to the legal provisions, thereby finding the sequence pattern nodes corresponding to the case sentences. The identified sequence pattern nodes are then visualized.

[0025] A sequence visualization system for drug-related cases:

[0026] The system includes a data preprocessing module, a behavior word merging module, a sequence mining module, and a focus visualization module;

[0027] The data preprocessing module uses word segmentation tools to extract behavioral words from the case text of drug-related cases. Based on the semantic similarity of behavioral words, similar behavioral words in the sequence of criminal behaviors are constructed into a similar node tree.

[0028] The behavior word merging module is used to select nodes from the similar node tree constructed by the data preprocessing module and merge behavior words in the criminal behavior sequence;

[0029] The sequence mining module is used to mine the sequences generated after merging nodes, mine the sequence patterns, divide all sequences into clusters according to the sequence patterns, represent the sequences as sequence patterns and sequence complements, and perform preliminary visualization.

[0030] The focus visualization module is used to extract the focus in the sequence pattern using a question-and-answer system for key visualization; ultimately, a large number of sequences are visualized as sequence patterns, focus, sequence supplements, and original sequences.

[0031] An electronic device includes a memory and a processor, the memory storing a computer program, the processor executing the computer program to implement the steps of the above method.

[0032] A computer-readable storage medium for storing computer instructions that, when executed by a processor, implement the steps of the above-described method.

[0033] Beneficial effects of the invention

[0034] This invention, based on the semantic information of drug-related cases, merges nodes using the number of similar nodes, introduces sequence pattern mining technology to uncover patterns within sequences, and employs a question-and-answer system to discover focal points within these patterns. It visualizes a large number of sequences as sequence patterns, focal points, sequence supplements, and the original sequence, enabling sequence visualization to represent as much information as possible within a small scope. Furthermore, it reduces the interference of similar nodes on sequence analysis, discovers patterns within sequences, and highlights important information for visualization, resulting in better sequence visualization effects. This will assist in case analysis in the judicial field, provide reference for judicial personnel's decision-making, and promote the development of visualization analysis in the judicial field.

[0035] This invention is based on constructing a similar node tree from semantically similar nodes in a sequence. Similar nodes in the sequence are merged to reduce visual interference, providing a solid foundation for sequence pattern mining and visualization. Using sequence pattern mining, a large number of high-dimensional sequences are visualized. This allows for the classification, sorting, combination, and display of each dimension of case attributes in criminal cases. Multiple attributes or variables representing the data of an object or event can be seen, enabling the display of as much sequence data as possible with a limited layout and the discovery of patterns within cases. The introduction of a question-and-answer system helps identify key information in the sequence, effectively recognizing focal points. The visual layout reduces visual burden and also helps judicial personnel expand the amount of information they can remember during work, facilitating the analysis of criminal behavior sequences, reducing cognitive burden, and expanding cognitive abilities. Attached Figure Description

[0036] Figure 1 This is the overall flowchart of the present invention;

[0037] Figure 2 This is a schematic diagram of similar node merging according to the present invention;

[0038] Figure 3 This is a schematic diagram of pattern recognition in this invention. Detailed Implementation

[0039] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0040] Combination Figures 1 to 3 .

[0041] A method for visualizing the sequence of criminal behaviors in drug-related cases:

[0042] The method specifically includes the following steps:

[0043] Step 1: Data preprocessing. Use word segmentation tools to extract behavioral words from the case text of drug-related cases. Based on the semantic similarity of behavioral words, construct a similar node tree for similar behavioral words in the sequence of criminal behaviors.

[0044] Step 2: Select nodes from the similar node tree constructed in Step 1 and merge the behavioral words in the criminal behavior sequence;

[0045] Step 3: Mine the sequence generated after merging nodes in Step 2 to find the sequence pattern. Divide all sequences into clusters according to the sequence pattern, and represent the sequence as sequence pattern, sequence supplement and original sequence for preliminary visualization.

[0046] Step 4: Use a question-and-answer system to extract the focus from the sequence patterns in Step 3 and visualize it; finally, visualize a large number of sequences as sequence patterns, focus, sequence supplements and original sequences.

[0047] In step one,

[0048] Using the case details of drug-related cases, pre-trained word vectors are generated using word2vec. By searching the pre-trained word vectors, the word vectors corresponding to the behavioral words in the criminal behavior sequence are obtained.

[0049] Based on the similarity between the word vectors of the behavioral words, a similarity node tree is constructed for the behavioral words.

[0050] In step two,

[0051] Each node in the similar node tree generates a vector Vector_A(s1,s2,…,sm).

[0052] Where m is the number of cases, and Vector_A represents whether the behavior or a sub-behavior of the behavior occurs in the behavior sequence of each case or in the similar node tree;

[0053] Each legal provision generates a vector Vector_B(s1,s2,…,sm), which represents whether the legal provision is used in each case;

[0054] The correlation X between two vectors is calculated using the chi-square test. 2 The metric X is obtained by combining the node information from various legal provisions in the similar node tree. 2 / L, where L is the number of legal provisions;

[0055] Based on the information metric X of each node in the similar node tree 2 / L performs node merging, replacing all child nodes of the selected node in the criminal behavior sequence with that node, thereby reducing the number of similar nodes and reducing visual burden.

[0056] In step three,

[0057] The sequence generated after merging nodes is subjected to sequence pattern extraction. Based on the distance between two sequences, the sequence pattern between the two closest sequences is extracted and the two sequences are merged into the same cluster. This process is iterated until all sequences are classified into clusters. Sequences within the same cluster are visualized in the form of sequence patterns and sequence complements. This allows for the display of as much content as possible within a relatively small scope while preserving as much original information as possible, providing users with a more intuitive visual display.

[0058] In step four,

[0059] The relevant legal provisions are used as questions and input into the QA system. The system searches for the case sentences most relevant to the legal provisions, thereby finding the sequence pattern nodes corresponding to the case sentences. The identified sequence pattern nodes are then visualized.

[0060] A sequence visualization system for drug-related cases:

[0061] The system includes a data preprocessing module, a behavior word merging module, a sequence mining module, and a focus visualization module;

[0062] The data preprocessing module uses word segmentation tools to extract behavioral words from the case text of drug-related cases. Based on the semantic similarity of behavioral words, similar behavioral words in the sequence of criminal behaviors are constructed into a similar node tree.

[0063] The behavior word merging module is used to select nodes from the similar node tree constructed by the data preprocessing module and merge behavior words in the criminal behavior sequence;

[0064] The sequence mining module is used to mine the sequences generated after merging nodes, mine the sequence patterns, divide all sequences into clusters according to the sequence patterns, represent the sequences as sequence patterns and sequence complements, and perform preliminary visualization.

[0065] The focus visualization module is used to extract the focus in the sequence pattern using a question-and-answer system for key visualization; ultimately, a large number of sequences are visualized as sequence patterns, focus, sequence supplements, and original sequences.

[0066] Example:

[0067] First, perform step one: using the original case details, generate pre-trained word vectors using word2vec. Then, obtain the word vectors corresponding to the behavioral words in the criminal behavior sequence by searching the pre-trained word vectors. Based on the similarity between the word vectors of the behavioral words, construct a similarity node tree for the behavioral words.

[0068] Then, proceed to step two: select behavioral words from the similarity tree structure and merge behavioral words from the criminal behavior sequences. For example, if similar behavioral words exist in the criminal behavior sequences, such as "driving," "driving to," and "driving to," the highly similar meanings of these behavioral words will make sequence visualization difficult. Therefore, the behavioral words are constructed in the form of a tree, with each behavioral word node vector (vector_A) representing whether a given node exists in each case, and each legal provision vector (vector_B) representing whether the relevant legal provision is included in each case. The metric of each behavioral word node in the similarity node tree is calculated using chi-square verification, and nodes in the tree are selected based on the metric of each behavioral word. The child nodes of the selected node in all criminal behavior sequences ("driving to," "driving to") are replaced with the node ("driving").

[0069] Step three involves grouping multiple sequences into the same cluster based on their similarity and mining patterns from the sequence patterns. These sequences are then visualized using sequence patterns and sequence supplementation. Step four involves inputting the case sentence corresponding to each node in the sequence pattern as the answer and the corresponding legal provision as the question into the question-and-answer system.

[0070] Find the case-related sentences that match the legal provisions, thereby identifying the corresponding behavioral node words, and then visualize the identified behavioral node words.

[0071] An electronic device includes a memory and a processor, the memory storing a computer program, the processor executing the computer program to implement the steps of the above method.

[0072] A computer-readable storage medium for storing computer instructions that, when executed by a processor, implement the steps of the above-described method.

[0073] The memory in this application embodiment can be volatile memory or non-volatile memory, or it can include both volatile and non-volatile memory. The non-volatile memory can be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), or flash memory. The volatile memory can be random access memory (RAM), which is used as an external cache. By way of example, but not limitation, many forms of RAM are available, such as static random access memory (SRAM), dynamic random access memory (DRAM), synchronous dynamic random access memory (SDRAM), double data rate synchronous dynamic random access memory (DDRSDRAM), enhanced synchronous dynamic random access memory (ESDRAM), synchronous linked dynamic random access memory (SLDRAM), and direct rambus RAM (DR RAM). It should be noted that the memory used in the methods described in this invention is intended to include, but is not limited to, these and any other suitable types of memory.

[0074] In the above embodiments, implementation can be achieved, in whole or in part, through software, hardware, firmware, or any combination thereof. When implemented in software, it can be implemented, in whole or in part, as a computer program product. The computer program product includes one or more computer instructions. When the computer instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of this application are generated. The computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions can be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another. For example, the computer instructions can be transmitted from one website, computer, server, or data center to another via wired (e.g., coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium can be any available medium accessible to a computer or a data storage device such as a server or data center that integrates one or more available media. The available media may be magnetic media (e.g., floppy disks, hard disks, magnetic tapes), optical media (e.g., high-density digital video discs (DVDs)), or semiconductor media (e.g., solid-state disks (SSDs)).

[0075] In implementation, each step of the above method can be completed by integrated logic circuits in the processor's hardware or by instructions in software. The steps of the method disclosed in the embodiments of this application can be directly implemented by a hardware processor, or by a combination of hardware and software modules in the processor. The software modules can reside in random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, registers, or other mature storage media in the art. This storage medium is located in memory, and the processor reads information from the memory and, in conjunction with its hardware, completes the steps of the above method. To avoid repetition, detailed descriptions are omitted here.

[0076] It should be noted that the processor in the embodiments of this application can be an integrated circuit chip with signal processing capabilities. During implementation, each step of the above method embodiments can be completed by the integrated logic circuits in the processor's hardware or by instructions in software form. The processor can be a general-purpose processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components. It can implement or execute the methods, steps, and logic block diagrams disclosed in the embodiments of this application. The general-purpose processor can be a microprocessor or any conventional processor. The steps of the methods disclosed in the embodiments of this application can be directly embodied as being executed by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor. The software modules can be located in random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, registers, or other mature storage media in the art. This storage medium is located in memory, and the processor reads the information in the memory and, in conjunction with its hardware, completes the steps of the above methods.

[0077] The foregoing has provided a detailed description of the method and system for visualizing criminal behavior sequences in drug-related cases, and has elucidated the principles and implementation methods of the present invention. The descriptions of the above embodiments are only for the purpose of helping to understand the method and core ideas of the present invention. At the same time, for those skilled in the art, there will be changes in the specific implementation methods and application scope based on the ideas of the present invention. Therefore, the content of this specification should not be construed as a limitation of the present invention.

Claims

1. A method for visualizing the sequence of criminal behaviors in drug-related cases, characterized in that: The method specifically includes the following steps: Step 1: Data preprocessing. Use word segmentation tools to extract behavioral words from the case text of drug-related cases. Based on the semantic similarity of behavioral words, construct a similar node tree for similar behavioral words in the sequence of criminal behaviors. Step 2: Select nodes from the similar node tree constructed in Step 1 and merge the behavioral words in the criminal behavior sequence; Each node in the similar node tree generates a vector Vector_A(s1,s2,…,sm). Where m is the number of cases, and Vector_A represents whether the behavior or a sub-behavior of the behavior occurs in the behavior sequence of each case or in the similar node tree; Each legal provision generates a vector Vector_B(s1,s2,…,sm), which represents whether the legal provision is used in each case; The correlation X between two vectors is calculated using the chi-square test. 2 By combining the various legal provisions, the node metric in the similar node tree is obtained as X. 2 / L, where L is the number of legal provisions; Based on the information metric X of each node in the similar node tree 2 / L performs node merging, replacing all child nodes of the selected node in the criminal behavior sequence with that node, thereby reducing similar nodes; Step 3: Mine the sequence generated after merging nodes in Step 2 to find the sequence pattern. Divide all sequences into clusters according to the sequence pattern, and represent the sequence as sequence pattern, sequence supplement and original sequence for preliminary visualization. Step 4: Use a question-and-answer system to extract the focus from the sequence patterns in Step 3 and visualize it; finally, visualize a large number of sequences as sequence patterns, focus, sequence supplements and original sequences.

2. The method according to claim 1, characterized in that: In step one, Use word segmentation tools to extract behavioral words from the case texts of drug-related cases; Using the case details of drug-related cases, pre-trained word vectors are generated using word2vec. By searching the pre-trained word vectors, the word vectors corresponding to the behavioral words in the criminal behavior sequence are obtained. Based on the similarity between the word vectors of the behavioral words, a similarity node tree is constructed for the behavioral words.

3. The method according to claim 2, characterized in that: In step three, The sequence generated after merging nodes is subjected to sequence pattern extraction. Based on the minimum description length between two sequences as the optimization objective, the sequence pattern between the two sequences with the smallest description length is extracted and the two sequences are merged into the same cluster. This process is iterated until all sequences are classified into clusters. The sequences in the same cluster are visualized in the form of sequence pattern, sequence supplement, and original sequence.

4. The method according to claim 3, characterized in that: In step four, The relevant legal provisions are used as questions and input into the QA system. The system searches for the case sentences most relevant to the legal provisions, thereby finding the sequence pattern nodes corresponding to the case sentences. The identified sequence pattern nodes are then visualized.

5. A sequence visualization system for drug-related cases, characterized in that: The system is based on the sequence visualization method for drug-related cases as described in any one of claims 1 to 4; The system includes a data preprocessing module, a behavior word merging module, a sequence mining module, and a focus visualization module; The data preprocessing module uses word segmentation tools to extract behavioral words from the case text of drug-related cases. Based on the semantic similarity of behavioral words, similar behavioral words in the sequence of criminal behaviors are constructed into a similar node tree. The behavior word merging module is used to select nodes from the similar node tree constructed by the data preprocessing module and merge behavior words in the criminal behavior sequence; Each node in the similar node tree generates a vector Vector_A(s1,s2,…,sm). Where m is the number of cases, and Vector_A represents whether the behavior or a sub-behavior of the behavior occurs in the behavior sequence of each case or in the similar node tree; Each legal provision generates a vector Vector_B(s1,s2,…,sm), which represents whether the legal provision is used in each case; The correlation X between two vectors is calculated using the chi-square test. 2 By combining the various legal provisions, the node metric in the similar node tree is obtained as X. 2 / L, where L is the number of legal provisions; Based on the information metric X of each node in the similar node tree 2 / L performs node merging, replacing all child nodes of the selected node in the criminal behavior sequence with that node, thereby reducing similar nodes; The sequence mining module is used to mine the sequences generated after merging nodes, mine the sequence patterns, divide all sequences into clusters according to the sequence patterns, and represent the sequences in the form of sequence patterns, sequence supplements and original sequences for preliminary visualization. The focus visualization module is used to extract the focus in the sequence pattern using a question-and-answer system for key visualization; ultimately, a large number of sequences are visualized as sequence patterns, focus, sequence supplements, and original sequences.

6. An electronic device comprising a memory and a processor, wherein the memory stores a computer program, characterized in that, When the processor executes the computer program, it implements the steps of the method according to any one of claims 1 to 4.

7. A computer-readable storage medium for storing computer instructions, characterized in that, When the computer instructions are executed by the processor, they implement the steps of the method according to any one of claims 1 to 4.