Methods, media, and equipment for predicting criminal case judgments based on arguments.
By generating arguments for conviction and sentencing, and combining hierarchical sequence models and topological attention mechanisms, this method addresses the shortcomings in interpretability and operability of existing judgment prediction methods, achieving more efficient judgment prediction.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- ZHEJIANG UNIV
- Filing Date
- 2022-12-01
- Publication Date
- 2026-06-30
AI Technical Summary
Existing methods for predicting criminal case judgments lack interpretability and operability, failing to meet judges' interactive needs in the prediction process and leading to amplified errors.
A method based on argument generation is adopted to simulate the judge's judgment process. By generating arguments for conviction and sentencing, a hierarchical sequence model and a topological attention mechanism are used to perform feature weighted aggregation and make judgment predictions in stages.
It improves the interpretability and accuracy of judgment predictions, meets judges' interactive needs in the prediction process, and enhances prediction effectiveness.
Smart Images

Figure CN116258375B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of natural language processing, and in particular to a method, medium, and device for predicting criminal case judgments based on argument generation. Background Technology
[0002] With the increasing number of cases year by year, grassroots courts face a serious problem of "too many cases and too few judges." Therefore, utilizing computer technology for intelligent judicial assistance is of practical significance. Deep learning methods, especially natural language processing technology, have improved many judicial assistance tasks, such as identifying points of contention and generating courtroom dialogues. As the most important part of judicial decision-making, the judgment prediction task has been studied for decades. The purpose of the judgment prediction task is to predict the legal provisions, charges, and sentences based on a given description of facts. In judicial practice, judges usually obtain results step by step: they first summarize the arguments based on the facts, and then arrive at a judgment. However, existing technologies focus more on how to extract effective features through end-to-end methods, neglecting the interactivity and interpretability of the system in real-world applications.
[0003] To improve the practicality and efficiency of sentence prediction methods, two challenges urgently need to be addressed. First, the prediction process must be free from human intervention. Judicial tasks require high accuracy, but existing methods are end-to-end, making the prediction process a "black box." In practical applications, judges cannot intervene during the prediction process, which may amplify errors. Second, the models lack interpretability. This is a common problem with existing deep learning models, but it is particularly important for judicial applications because any judgment requires corresponding explanations and clarifications to make it more convincing.
[0004] This method incorporates argument generation into the model by simulating the actual judgment process of judges, effectively making up for the shortcomings of existing methods in terms of interventionability and interpretability. Summary of the Invention
[0005] The purpose of this invention is to overcome the shortcomings of existing technologies and provide a method for predicting criminal case judgments based on argument generation. Compared with general criminal case judgment prediction algorithms, this invention fully considers the interpretability and operability of judgment results in real-world scenarios, and effectively compensates for the deficiencies of existing methods by generating arguments for conviction and sentencing.
[0006] To achieve the above-mentioned objectives, the technical solution adopted by this invention is as follows:
[0007] In a first aspect, the present invention provides a method for predicting criminal case judgments based on argument generation, which includes the following steps:
[0008] S1: Obtain a training set containing factual descriptions, conviction arguments, sentencing arguments, and judgment results for each individual sample; the judgment results include applicable legal provisions, charges, and sentences.
[0009] S2: Using the argument generation module trained on the training set, generate conviction and sentencing arguments from the factual description of the target criminal case;
[0010] S3: Using a hierarchical sequence model, extract features from the fact description and the conviction and sentencing arguments generated by the argument generation module, and use a topological attention mechanism to perform weighted aggregation of corresponding features based on the logical topological relationship between the tasks predicted by the judgment results.
[0011] S4: Using the judgment prediction module trained on the training set, based on the weighted aggregated feature information in the hierarchical sequence model, the applicable legal provisions, charges, and sentences for the target criminal case are predicted by three predictors respectively.
[0012] Preferably, the specific steps of S1 include:
[0013] S101: Extract factual descriptions, court opinions, and judgment results from judicial documents, and use keywords to extract arguments for conviction and sentencing from the court opinions;
[0014] The stated facts are a descriptive statement that includes the facts ascertained and established by the court, expressed as follows: in l represents the t-th word in the factual description. f Indicates the total number of words in the factual description;
[0015] The arguments for conviction and sentencing are the basis for judgments on conviction and sentencing in criminal cases, respectively, and are referred to as the arguments for conviction. and sentencing arguments in This indicates the t-th word in the argument for conviction, l cr Indicates the total number of words in the arguments for conviction. This indicates the t-th word in the sentencing argument, l pr Indicates the total number of words in the sentencing arguments;
[0016] The judgment includes the legal provisions, charges, and sentences determined by the facts and arguments, denoted as a, c, and p, respectively.
[0017] S102: Represent the information extracted from the judgment documents in S101 as training samples in the form of a six-tuple (f,cr,pr,a,c,p).
[0018] Preferably, in S2, the argument generation module is obtained by training the text generation model through the training set, and its input is the fact description f, and its output is the generated conviction argument cr and sentencing argument pr.
[0019] The text generation model is either the Transformer-based generation model BART or the LSTM-based generation model PGN; and the conviction argument cr and sentencing argument pr are generated through a joint generation mode or an independent generation mode. The joint generation mode uses a shared encoder to generate two arguments, and the independent generation mode uses two separate encoders to generate two different arguments respectively.
[0020] Preferably, in step S3, the specific steps for obtaining the weighted aggregated feature information in the hierarchical sequence model include:
[0021] S301: For the input sequence of three words—f (f representing the fact), cr (the evidence for conviction), and pr (the evidence for sentencing)—first, each word is mapped into a word embedding represented by a 300-dimensional vector. Then, these three word embedding sequences are passed through an encoder to obtain three hidden state sequences h. f h cr and h pr ;
[0022] S302: Utilize topological attention mechanism to aggregate features from factual descriptions and generated conviction and sentencing arguments, resulting in two aggregated expressions of factual descriptions and arguments. and
[0023]
[0024]
[0025] in: Represents sequence h f The element mean is given by Att(), which represents the attention mechanism. For any sequence x and value y, the attention distribution q of the attention mechanism Att(x,y) for each sequence x is calculated by the following formula:
[0026] e i =v T tanh(W x x i +W y y+b)
[0027] q = softmax(e i )
[0028] Where v, W x W y,b are parameters that need to be learned; the attention distribution q is the importance distribution of each word in the argument to a certain word in the fact description; the expression after aggregation by the attention mechanism is as follows:
[0029]
[0030] In the formula: q i and x i Let q and χ be the i-th elements in vector form, respectively.
[0031] Preferably, in step S4, the judgment prediction module includes three predictors for predicting legal provisions, crimes, and sentences respectively. The specific methods for predicting legal provisions, crimes, and sentences are as follows:
[0032] S401: h obtained from the hierarchical sequence model f As input, the legal provision predictor obtains the probability distribution P of the relevant legal provisions through a fully connected layer and a softmax layer. a Legal provision prediction result a pred We obtain it from the following formula:
[0033] P a =softmax(FC(h f ))
[0034] a pred =argmaxP a
[0035] S402: Obtained from a hierarchical sequence model As input, the crime predictor obtains the probability distribution P of the crime judgment through a fully connected layer and a softmax layer. c Crime prediction result c pred We obtain it from the following formula:
[0036]
[0037] c pred =argmaxP c
[0038] S403: Obtained from a hierarchical sequence model As input, the sentence predictor obtains the sentence p through a fully connected layer and a rounding operation. pred Sentence prediction results p pred We obtain it from the following formula:
[0039]
[0040] In the formula: round represents the rounding function, and FC represents a fully connected layer.
[0041] Preferably, the argument generation module, hierarchical sequence model, and decision prediction module are used as a whole prediction framework. The argument generation module and the decision prediction module need to be trained together on the training set before being used for actual prediction tasks. The total loss function of the whole prediction framework is the sum of the losses of the two modules.
[0042] As a preferred approach, when training the argument generation module, in the process of progressively generating arguments for conviction and sentencing, the correct word is used as the output of the previous step for the current step t prediction.
[0043] When generating arguments for conviction, the loss function at any step t... and the loss function for all T1 steps They are:
[0044]
[0045]
[0046] In the formula: Words indicating arguments for conviction The predicted probability distribution;
[0047] When generating sentencing arguments, the loss function at any step t... and the loss function for all T2 steps They are:
[0048]
[0049]
[0050] In the formula: Words indicating sentencing arguments The predicted probability distribution;
[0051] The overall loss function of the argument generation module is:
[0052]
[0053] Preferably, for training the decision prediction module, the loss of each of the three predictors needs to be calculated separately:
[0054] The loss function of the legal provision predictor is:
[0055]
[0056] Where n a It is the number of applicable legal provisions, y i Given a 0-1 label, if i = a, then y i =1, otherwise y i=0, where a is the actual applicable legal provision in the sample; P a(i) P represents the probability distribution of the relevant legal provisions. a The i-th element in;
[0057] The loss function of the crime predictor for:
[0058]
[0059] Where n c It refers to the number of related convictions, y j Given a 0-1 label, if j = c, then y j =1, otherwise y j =0, c is the actual crime convicted in the sample; P c(j) P represents the probability distribution of the conviction. c The j-th element in;
[0060] The loss function of the sentence prediction device is:
[0061]
[0062] In the formula: p represents the actual sentence in the sample;
[0063] The total loss function of the decision prediction module is:
[0064]
[0065] In a second aspect, the present invention provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the criminal case judgment prediction method based on argument generation as described in any of the solutions of the first aspect above.
[0066] Thirdly, the present invention provides a computer electronic device, which includes a memory and a processor;
[0067] The memory is used to store computer programs;
[0068] The processor is configured to, when executing the computer program, implement the criminal case judgment prediction method based on argument generation as described in any of the solutions of the first aspect above.
[0069] Compared with the prior art, the present invention has the following beneficial effects:
[0070] This invention proposes a novel method for predicting criminal case judgments based on argument generation, taking into full consideration interactivity and interpretability. Referring to the actual trial logic of judges, this method divides the previous end-to-end prediction into two stages: First, it generates conviction and sentencing arguments based on factual descriptions; then, it aggregates the factual descriptions and generated arguments through a topological attention mechanism for judgment prediction. Attached Figure Description
[0071] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0072] Figure 1 This is a schematic diagram of the criminal case judgment prediction method based on argument generation provided in the embodiments of the present invention.
[0073] Figure 2 This is a schematic diagram of case segmentation provided in an embodiment of the present invention.
[0074] Figure 3 This is a logic diagram of the criminal case judgment prediction method based on argument generation provided in the embodiments of the present invention.
[0075] Figure 4 This is a schematic diagram illustrating the effect of the criminal case judgment prediction method based on argument generation provided in an embodiment of the present invention. Detailed Implementation
[0076] To enable those skilled in the art to better understand the present invention, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present invention. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0077] It should be noted that the terms "first," "second," etc., in the specification, claims, and accompanying drawings of this invention are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of the invention described herein can be implemented in orders other than those illustrated or described herein. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover a non-exclusive inclusion; for example, a process, method, system, product, or apparatus that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatus.
[0078] To address the problems existing in the prior art, embodiments of the present invention provide a method for predicting criminal case judgments based on argument generation, such as... Figure 1 As shown, it includes the following steps:
[0079] S1: Construct a training set containing factual descriptions, conviction arguments, sentencing arguments, and judgment results for each individual sample; the judgment results include applicable legal provisions, charges, and sentences.
[0080] In this embodiment, the specific steps for constructing the training set include:
[0081] S101: Extract factual descriptions, court opinions, and judgment results from judicial documents, and use keywords to extract conviction and sentencing arguments from the court opinions.
[0082] The factual description is a descriptive statement that contains the facts ascertained and established by the court, expressed as follows: in l represents the t-th word in the factual description. f Indicates the total number of words in the factual description;
[0083] The arguments for conviction and the arguments for sentencing are the basis for conviction and sentencing in criminal cases, respectively, and are referred to as the arguments for conviction. and sentencing arguments in This indicates the t-th word in the argument for conviction, l cr Indicates the total number of words in the arguments for conviction. This indicates the t-th word in the sentencing argument, l pr Indicates the total number of words in the sentencing arguments;
[0084] The judgment includes the legal provisions, charges, and sentences determined by the facts and arguments, denoted as a, c, and p, respectively.
[0085] S102: Represent the information extracted from the judgment documents in S101 as training samples in the form of six-tuples (f,cr,pr,a,c,p), thereby constructing a training set for training the prediction module.
[0086] S2: Using the argument generation module trained on the above training set, generate conviction and sentencing arguments from the factual description of the target criminal case.
[0087] In this embodiment, the argument generation module is obtained by training the text generation model through the training set. Its input is the fact description f, and its output is the generated conviction argument cr and sentencing argument pr.
[0088] Furthermore, this text generation model can employ two text generation models and two generation modes. The two text generation models are: the Transformer-based generation model BART and the LSTM-based generation model PGN. The two text generation modes are: a joint generation mode that uses a shared encoder to generate two arguments, and an independent generation mode that uses two separate encoders to generate two different arguments. In subsequent examples of this invention, after relevant experiments, the preferred approach of using the LSTM-based generation model PGN and the joint generation mode was adopted, which achieved better prediction results.
[0089] S3: Using a hierarchical sequence model, extract features from the fact descriptions and the conviction and sentencing arguments generated by the argument generation module, and use a topological attention mechanism to perform weighted aggregation of corresponding features based on the logical topological relationships between the tasks predicted by the judgment results.
[0090] In embodiments of the present invention, the specific steps for obtaining the weighted aggregated feature information in the above-mentioned hierarchical sequence model include:
[0091] S301: For the input sequence of three words—f (f representing the fact), cr (the evidence for conviction), and pr (the evidence for sentencing)—first, each word is mapped into a word embedding represented by a 300-dimensional vector. Then, these three word embedding sequences are passed through an encoder to obtain three hidden state sequences h. f h cr and h pr ;
[0092] S302: Utilize topological attention mechanism to aggregate features from factual descriptions and generated conviction and sentencing arguments, resulting in two aggregated expressions of factual descriptions and arguments. and
[0093]
[0094]
[0095] in: Represents sequence h f The element mean is given by Att(), which represents the attention mechanism.
[0096] To describe the attention mechanism Att() more concisely, we introduce the sequence x and the value y as general formulas to represent the above. and In the attention mechanism computation process, the two sequences, x and y, can represent The attention mechanism in the calculation process of h cr and It can also represent The attention mechanism in the calculation process of h cr and Att(h pr ,h f For any sequence x and value y, the attention mechanism Att(x,y) has an attention distribution q that can be calculated for each sequence x using the following formula:
[0097] e i =v T tanh(W x χ i +W y y+b)
[0098] q = softmax(e i )
[0099] Where v, W x W y ,b are parameters that need to be learned; the attention distribution q is the importance distribution of each word in the argument to a certain word in the fact description; the expression after aggregation by the attention mechanism is as follows:
[0100]
[0101] In the formula: q i and x i Let h be the i-th element in vector form q and χ, respectively. topatt That is, corresponding to the two expressions mentioned above. and
[0102] S4: Using the judgment prediction module trained on the training set, based on the weighted aggregated feature information in the hierarchical sequence model, the applicable legal provisions, charges, and sentences for the target criminal case are predicted by three predictors respectively.
[0103] In an embodiment of the present invention, in S4 above, the judgment prediction module includes three predictors for predicting legal provisions, crimes, and sentences respectively. The specific methods for predicting legal provisions, crimes, and sentences are as follows:
[0104] S401: h obtained from the hierarchical sequence model f As input, the legal provision predictor obtains the probability distribution P of the relevant legal provisions through a fully connected layer and a softmax layer. a Legal provision prediction result a pred We obtain it from the following formula:
[0105] P a =softmax(FC(h f ))
[0106] a pred =argmaxP a
[0107] S402: Obtained from a hierarchical sequence model As input, the crime predictor obtains the probability distribution P of the crime judgment through a fully connected layer and a softmax layer. c Crime prediction result c pred We obtain it from the following formula:
[0108]
[0109] c pred =argmaxP c
[0110] S403: Obtained from a hierarchical sequence model As input, the sentence predictor obtains the sentence p through a fully connected layer and a rounding operation. pred Sentence prediction results p pred We obtain it from the following formula:
[0111]
[0112] In the formula: round represents the rounding function, and FC represents a fully connected layer.
[0113] It should be noted that in this invention, the argument generation module in S2, the hierarchical sequence model in S2, and the decision prediction module in S4 are used as a whole prediction framework for the decision prediction task. However, both the argument generation module and the decision prediction module need to be trained on the training set before being used for the actual prediction task. This training process can be completed by directly training the whole prediction framework together with the training interface. In this embodiment, the total loss function of the whole prediction framework is the sum of the losses of the two modules. The calculation methods of the loss functions of the argument generation module and the decision prediction module are described in detail below.
[0114] For the training of the argument generation module, since it adopts a text generation model, it needs to generate the corresponding arguments step by step. Therefore, when generating conviction arguments and sentencing arguments step by step, for the current step t prediction, the correct word is used as the output of the previous step, and then the argument words of the current step are predicted.
[0115] When generating arguments for conviction, the loss function at any step t... and the loss function for all T1 steps They are:
[0116]
[0117]
[0118] In the formula: Words indicating arguments for conviction The predicted probability distribution;
[0119] When generating sentencing arguments, the loss function at any step t... and the loss function for all T2 steps They are:
[0120]
[0121]
[0122] In the formula: Words indicating sentencing arguments The predicted probability distribution;
[0123] The overall loss function of the argument generation module is:
[0124]
[0125] For training the decision prediction module, the loss of each of the three predictors needs to be calculated separately:
[0126] The loss function of the legal provision predictor is:
[0127]
[0128] Where n a It is the number of applicable legal provisions, y i Given a 0-1 label, if i = a, then y i =1, otherwise y i =0, where a is the actual applicable legal provision in the sample; P a(i) P represents the probability distribution of the relevant legal provisions. a The i-th element in;
[0129] The loss function of the crime predictor for:
[0130]
[0131] Where n c It refers to the number of related convictions, y j Given a 0-1 label, if j = c, then y j =1, otherwise y j =0, c is the actual crime convicted in the sample; P c(j) P represents the probability distribution of the conviction. c The j-th element in;
[0132] The loss function of the sentence prediction device is:
[0133]
[0134] In the formula: p represents the actual sentence in the sample;
[0135] The total loss function of the decision prediction module is:
[0136]
[0137] The specific method of training the model based on the overall framework's total loss function is existing technology and will not be described in detail here. This training process needs to be completed before performing any prediction tasks or tests.
[0138] Compared to conventional judgment prediction methods, this invention innovatively models real-world judgment scenarios and incorporates an argument generation module, improving prediction accuracy while ensuring the model's interactivity and interpretability. The following example demonstrates the specific effectiveness of this invention's method using a concrete application example. The specific steps, as described in S1-S4, will not be repeated here; the focus is on showcasing its effects.
[0139] Example
[0140] This embodiment is used for training and testing on a dataset of criminal case judgment documents.
[0141] The first step is to process the court judgment dataset using the following steps.
[0142] 1) Extract the following from legal documents based on keywords: factual description, evidence for conviction, evidence for sentencing, relevant legal provisions, crime and sentence. Figure 2 The diagram illustrates an exemplary legal document segmentation.
[0143] 2) The dataset was divided into training set, validation set and test set in a ratio of 8:1:1.
[0144] To objectively evaluate the performance of the method of the present invention, the following method is used for evaluation:
[0145] 1) Performance of the argument generation module. ROUGE and BLEU were used as evaluation metrics. ROUGE compares the generated results with reference results. The official ROUGE script was used, and the results of ROUGE-1, ROUGE-2, and ROUGE-L were preserved. BLEU is an automatic text generation quality assessment method that is highly similar to human assessment.
[0146] 2) Accuracy of the judgment prediction module. For crime and legal provision prediction, Mi-F1 and Ma-F1 are used as evaluation metrics, calculated using functions from the Sci-kit Learn library. For sentence prediction, LogDis and Acc25 are used as evaluation metrics. LogDis is calculated as log(|p pred -p|+1), where p pred 'p' is the predicted value, and 'p' is the actual value. Acc25 means that a prediction is considered correct if the error between the predicted and actual values is within 25%.
[0147] The experimental results are shown in Tables 1, 2, and 3. Tables 1 and 2 show that the argument generation method of this invention has high text quality. Table 3 shows that the decision prediction method has high accuracy.
[0148] Table 1 Quality Assessment of Conviction Arguments Generation
[0149] ROUGE-1 ROUGE-2 ROUGE-L BLEU-1 BLEU-2 BLEU-N 78.2 68.4 76.1 77.5 73.1 70.1
[0150] Table 2 Quality Assessment of Sentencing Arguments
[0151] ROUGE-1 ROUGE-2 ROUGE-L BLEU-1 BLEU-2 BLEU-N 62.3 45.4 56.8 58.3 51.6 46.0
[0152] Table 3. Judgment Prediction Quality Assessment
[0153]
[0154] in addition, Figure 4 The paper also shows a comparison of the true and predicted values of an exemplary method for predicting criminal case judgments based on argument generation, demonstrating that the present invention can fully meet the needs of predicting criminal case judgments.
[0155] Furthermore, it should be noted that the above-described steps in this invention can be implemented as software functional units in the form of logical instructions in memory. When this software is sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this invention, or the part that contributes to the prior art, or a part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this invention.
[0156] Similarly, based on the same inventive concept, another preferred embodiment of the present invention also provides a computer-readable storage medium corresponding to the criminal case judgment prediction method based on argument generation provided in the above embodiments. The storage medium stores a computer program, which, when executed by a processor, implements the above-mentioned criminal case judgment prediction method based on argument generation.
[0157] Similarly, based on the same inventive concept, another preferred embodiment of the present invention also provides a computer electronic device corresponding to the criminal case judgment prediction method based on argument generation provided in the above embodiments, which includes a memory and a processor;
[0158] The memory is used to store computer programs;
[0159] The processor is configured to implement the above-described method for predicting criminal case judgments based on arguments when executing the computer program.
[0160] It is understood that the aforementioned storage medium and memory can be random access memory (RAM) or non-volatile memory (NVM), such as at least one disk storage device. Furthermore, the storage medium can also be any medium capable of storing program code, such as a USB flash drive, external hard drive, magnetic disk, or optical disk.
[0161] It is understood that the processors mentioned above can be general-purpose processors, including central processing units (CPUs), network processors (NPs), etc.; they can also be digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components.
[0162] It should also be noted that those skilled in the art will understand that, for the sake of convenience and brevity, the specific working process of the device described above can be referred to the corresponding process in the foregoing method embodiments, and will not be repeated here. In the embodiments provided in this application, the division of steps or modules in the device and method is merely a logical functional division, and there may be other division methods in actual implementation. For example, multiple modules or steps may be combined or integrated together, and a module or step may also be split.
[0163] The embodiments described above are merely preferred embodiments of the present invention and are not intended to limit the invention. Those skilled in the art can make various changes and modifications without departing from the spirit and scope of the invention. Therefore, all technical solutions obtained through equivalent substitution or transformation fall within the protection scope of the present invention.
Claims
1. A criminal case judgment prediction method based on argument generation, characterized by, Includes the following steps: S1: Obtain a training set containing factual descriptions, conviction arguments, sentencing arguments, and judgment results for each individual sample; the judgment results include applicable legal provisions, charges, and sentences. S2: Using the argument generation module trained on the training set, generate conviction and sentencing arguments from the factual description of the target criminal case; S3: Using a hierarchical sequence model, extract features from the fact description and the conviction and sentencing arguments generated by the argument generation module, and use a topological attention mechanism to perform weighted aggregation of corresponding features based on the logical topological relationship between the tasks predicted by the judgment results. S4: Using the judgment prediction module trained on the training set, based on the weighted aggregated feature information in the hierarchical sequence model, the applicable legal provisions, charges and sentences for the target criminal case are predicted by three predictors respectively. The specific steps of S1 include: S101: Extract factual descriptions, court opinions, and judgment results from judicial documents, and use keywords to extract arguments for conviction and sentencing from the court opinions; The stated facts are a descriptive statement that contains the facts ascertained and established by the court, expressed as follows: ,in This represents the t-th word in the factual description. Indicates the total number of words in the factual description; The arguments for conviction and sentencing are the basis for judgments on conviction and sentencing in criminal cases, respectively, and are referred to as the arguments for conviction. and sentencing arguments ,in This indicates the t-th word in the argument for conviction. Indicates the total number of words in the arguments for conviction. This indicates the t-th word in the sentencing argument. Indicates the total number of words in the sentencing arguments; The judgment includes the legal provisions, charges, and sentences determined by the facts and arguments, and is expressed as follows: , and ; S102: Represent the information extracted from the judgment documents in S101 as a six-tuple ( Training samples in the form of ) In step S2, the argument generation module is obtained by training the text generation model using the training set, and its input is factual descriptions. The output is the generated conviction arguments. and sentencing arguments ; The text generation model is either the Transformer-based generative model BART or the LSTM-based generative model PGN; and the conviction arguments and sentencing arguments The arguments are generated using either a joint generation mode or an independent generation mode. The joint generation mode uses a shared encoder to generate two arguments, while the independent generation mode uses two separate encoders to generate two different arguments. In step S3, the specific steps for obtaining the weighted aggregated feature information in the hierarchical sequence model include: S301: Fact description for input Arguments for conviction and sentencing arguments Given three word sequences, each word is first mapped to a word embedding represented by a 300-dimensional vector. Then, these three word embedding sequences are passed through an encoder to obtain three hidden state sequences. , and ; S302: Utilize topological attention mechanism to aggregate features from factual descriptions and generated conviction and sentencing arguments, resulting in two aggregated expressions of factual descriptions and arguments. and : ; ; in: Represents a sequence The element mean is given by Att(), which represents the attention mechanism. For any sequence x and value y, the attention mechanism Att(x,y) is applied to each sequence x. attention distribution The following formula is used to calculate: ; ; in These are all parameters that need to be learned; attention distribution This represents the importance distribution of each word in the argument relative to a certain word in the factual description; the expression after aggregation via an attention mechanism is as follows: ; In the formula: and They are in vector form and The i-th element in; In S4, the judgment prediction module includes three predictors for predicting legal provisions, crimes, and sentences respectively. The specific methods for predicting legal provisions, crimes, and sentences are as follows: S401: Obtained from a hierarchical sequence model As input, the legal provision predictor obtains the probability distribution of relevant legal provisions through a fully connected layer and a softmax layer. Legal provisions prediction results We obtain it from the following formula: ; ; S402: Obtained from a hierarchical sequence model As input, the crime predictor obtains the probability distribution of the crime conviction through a fully connected layer and a softmax layer. Crime prediction results We obtain it from the following formula: ; ; S403: Obtained from a hierarchical sequence model As input, the sentence predictor obtains the sentence through a fully connected layer and a rounding operation. Sentence prediction results We obtain it from the following formula: ; In the formula: This represents the rounding function. This represents a fully connected layer.
2. The method for predicting criminal case judgments based on argument generation as described in claim 1, characterized in that, The argument generation module, hierarchical sequence model, and decision prediction module form a whole prediction framework. The argument generation module and the decision prediction module need to be trained together on the training set before being used for actual prediction tasks. The total loss function of the whole prediction framework is the sum of the losses of the two modules.
3. The method for predicting criminal case judgments based on argument generation as described in claim 2, characterized in that, For the training of the argument generation module, when generating conviction arguments and sentencing arguments step by step, the correct word is used as the output of the previous step for the current t-th step prediction; When generating arguments for conviction, the loss function at any step t... And all loss function of step They are: ; ; In the formula: Words indicating arguments for conviction The predicted probability distribution; When generating sentencing arguments, the loss function at any step t... And all loss function of step They are: ; ; In the formula: Words indicating sentencing arguments The predicted probability distribution; The overall loss function of the argument generation module is: 。 4. The method for predicting criminal case judgments based on argument generation as described in claim 3, characterized in that, For training the decision prediction module, the loss of each of the three predictors needs to be calculated separately: The loss function of the legal provision predictor is: ; in It refers to the number of applicable legal provisions. For a 0-1 label, if = but =1, otherwise =0, These are the actual applicable legal provisions in the sample; Represents the probability distribution of relevant legal provisions The i-th element in; The loss function of the crime predictor for: ; in It refers to the number of related convictions. For a 0-1 label, if = but =1, otherwise =0, These are the actual convictions in the sample. Represents the probability distribution of the conviction. The j-th element in; The loss function of the sentence prediction device is: ; In the formula: This represents the actual sentence in the sample. The total loss function of the decision prediction module is: 。 5. A computer-readable storage medium, characterized in that, The storage medium stores a computer program, which, when executed by a processor, implements the criminal case judgment prediction method based on argument generation as described in any one of claims 1 to 4.
6. A computer electronic device, characterized in that, Including memory and processor; The memory is used to store computer programs; The processor is configured to, when executing the computer program, implement the criminal case judgment prediction method based on argument generation as described in any one of claims 1 to 4.