A case search method, device, equipment and medium

By constructing an event graph and comparing feature vectors, the problems of slow speed and low accuracy in existing case retrieval technologies are solved, achieving faster and more accurate case retrieval.

CN116501839BActive Publication Date: 2026-06-16SOUTHWEST UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
SOUTHWEST UNIV
Filing Date
2023-05-08
Publication Date
2026-06-16

AI Technical Summary

Technical Problem

Existing case retrieval technologies suffer from slow retrieval speed and low accuracy, mainly due to the large number of cases and the lack of targeted information extraction methods.

Method used

By acquiring a dataset of legal case samples, events are extracted based on preset candidate event types and arguments, an event graph is constructed, and case retrieval is performed through branching and feature vector comparison of the event graph, with information extracted for different crimes.

🎯Benefits of technology

It improved the retrieval speed and accuracy of similar cases, and enhanced the precision of extracting key case information.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116501839B_ABST
    Figure CN116501839B_ABST
Patent Text Reader

Abstract

The application relates to the technical field of artificial intelligence, and provides a case search method and device, equipment and a medium. The method comprises the following steps: acquiring a legal case sample data set and to-be-searched legal case data; performing event extraction on the legal case sample data set according to a plurality of preset candidate event types and a plurality of candidate argument types corresponding to each candidate event type, to obtain first event information, wherein the candidate event type is used to indicate a charge of the legal case; constructing an event graph according to the first event information; performing event extraction on the to-be-searched legal case data to obtain second event information; comparing branches and feature vectors of the event graph according to the second event information, and calling similar legal cases in the legal case sample data set corresponding to the to-be-searched legal case data according to a comparison result. Through the event type used to indicate the charge and the branches and feature vectors of the event graph, the accuracy of case search is improved.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of artificial intelligence technology, specifically to a case retrieval method, apparatus, device, and medium. Background Technology

[0002] Case retrieval refers to the search and research of similar cases for legal cases. Similar cases are those that are similar to the pending case in terms of basic facts, points of contention, and applicable laws, and have already been adjudicated by a people's court. With the continuous development of the internet, big data, artificial intelligence, and the public availability of court judgments and trial videos, various case retrieval systems allow parties and judges to find cases roughly similar to their own and compare the judgments.

[0003] In existing case retrieval technologies, key information corresponding to cases with different charges is extracted using a uniform information extraction method, and then this key information is stored in a database for retrieval. However, due to the large number of cases, too much key information will lead to slow retrieval speed and low retrieval accuracy. Summary of the Invention

[0004] In view of the shortcomings of the prior art described above, the purpose of this application is to provide a case retrieval method, apparatus, device and medium to solve the problems of slow case retrieval speed and low retrieval accuracy in the prior art.

[0005] To achieve the above and other related objectives, this application provides a case retrieval method, the method comprising:

[0006] Obtain a sample dataset of legal cases and the data of legal cases to be retrieved;

[0007] Based on a set of multiple candidate event types and multiple candidate arguments corresponding to each candidate event type, event extraction is performed on the legal case sample dataset to obtain first event information, wherein the candidate event type is used to indicate the crime in the legal case;

[0008] Based on the first event information, construct an event graph;

[0009] Event extraction is performed on the legal case data to be retrieved to obtain second event information;

[0010] The branches and feature vectors of the event graph are compared based on the second event information, and similar legal cases corresponding to the legal case data to be retrieved are called from the legal case sample dataset based on the comparison results.

[0011] In one embodiment of this application, after obtaining the legal case sample dataset and the legal case data to be retrieved, the method further includes:

[0012] Obtain multiple charges from the legal case sample dataset and define each charge as a candidate event type;

[0013] The candidate arguments are obtained based on the event occurrence type corresponding to each of the candidate event types.

[0014] In one embodiment of this application, the step of extracting events from the legal case sample dataset according to a preset plurality of candidate event types and a plurality of candidate arguments corresponding to each candidate event type to obtain first event information includes:

[0015] The legal case sample dataset is transmitted to a pre-trained event extraction model;

[0016] Using the pre-trained event extraction model, based on the multiple candidate event types and the multiple candidate arguments corresponding to each candidate event type, events are extracted from the legal case sample dataset to obtain multiple first event types and multiple first argument information corresponding to each first event type, wherein the multiple first event types and the multiple first argument information corresponding to each first event type are included in the first event information.

[0017] In one embodiment of this application, event extraction is performed on the legal case data to be retrieved to obtain second event information, including:

[0018] The legal case data to be retrieved is transmitted to the pre-trained event extraction model;

[0019] Using the pre-trained event extraction model, based on the multiple candidate event types and the multiple candidate arguments corresponding to each candidate event type, event extraction is performed on the legal case to be retrieved to obtain a second event type and multiple second argument information corresponding to the second event type. The second event type and the multiple second argument information corresponding to the second event type are included in the second event information.

[0020] In one embodiment of this application, before transmitting the legal case sample dataset to the pre-trained event extraction model, the process includes:

[0021] Obtain a dataset of training samples of legal cases with labels, the labels being used to indicate the event type and argument type of the training samples of legal cases;

[0022] The training sample dataset of the legal cases is transmitted to a pre-built event extraction model to obtain multiple third event types and multiple third argument information corresponding to each third event type;

[0023] Based on the third event type and the event type in the label, a first loss value is obtained;

[0024] The second loss value is obtained based on the third argument information and the argument type in the label;

[0025] Based on the first loss value and the second loss value, the model parameters of the pre-built event extraction model are updated to obtain the trained event extraction model.

[0026] In one embodiment of this application, constructing an event graph based on the first event information includes:

[0027] Based on the multiple first event types, construct multiple branches of the event graph;

[0028] The event graph is obtained by taking multiple legal case samples corresponding to each of the first event types as multiple nodes of each branch and generating a first feature vector based on the multiple first argument information corresponding to each node.

[0029] In one embodiment of this application, the branches and feature vectors of the event graph are compared based on the second event information, and similar legal cases corresponding to the legal case data to be retrieved are called from the legal case sample dataset based on the comparison results, including:

[0030] The search path is obtained by calling the branch in the event graph that matches the second event type;

[0031] Determine the vector similarity between the first feature vector corresponding to each candidate node in the retrieval path and the second feature vector corresponding to the second argument information. If the vector similarity is greater than or equal to a preset vector similarity threshold, then the candidate node is taken as the target node.

[0032] The similar legal cases are obtained by calling the legal cases corresponding to the target node in the legal case sample dataset.

[0033] In one embodiment of this application, a case retrieval device is also provided, the device comprising:

[0034] The data acquisition module is used to acquire sample datasets of legal cases and data of legal cases to be retrieved;

[0035] The first event extraction module is used to extract events from the legal case sample dataset according to a preset number of candidate event types and a number of candidate arguments corresponding to each candidate event type, so as to obtain first event information, wherein the candidate event types are used to indicate the crime in the legal case;

[0036] The event graph construction module is used to construct an event graph based on the first event information;

[0037] The second event extraction module is used to extract events from the legal case data to be retrieved to obtain second event information.

[0038] The case retrieval module is used to compare the branches and feature vectors of the event graph based on the second event information, and to call similar legal cases in the legal case sample dataset that correspond to the legal case data to be retrieved based on the comparison results.

[0039] In one embodiment of this application, an electronic device is also provided, the electronic device comprising:

[0040] One or more processors;

[0041] A storage device for storing one or more programs, which, when executed by one or more processors, cause the electronic device to implement the case retrieval method as described above.

[0042] In one embodiment of this application, a computer-readable storage medium is also provided, on which a computer program is stored, which, when executed by a computer's processor, causes the computer to perform the case retrieval method as described above.

[0043] The beneficial effects of this invention are:

[0044] First, a sample dataset of legal cases and data of legal cases to be retrieved are acquired. Then, based on multiple preset candidate event types and multiple candidate arguments corresponding to each candidate event type, events are extracted from the sample dataset of legal cases to obtain first event information, wherein the candidate event types are used to indicate the charges in the legal cases. Next, an event graph is constructed based on the first event information. Then, events are extracted from the data of legal cases to be retrieved to obtain second event information. Finally, the branches and feature vectors of the event graph are compared based on the second event information, and similar legal cases corresponding to the data of legal cases to be retrieved are retrieved from the sample dataset of legal cases based on the comparison results. In this invention, after acquiring the event information corresponding to the sample data of legal cases, an event graph is constructed. Then, by using the branches and feature vectors of the event graph to perform case retrieval of the event information corresponding to the data of legal cases to be retrieved, the retrieval speed and accuracy of case retrieval can be improved. In addition, by extracting events using event types that indicate the charges in the legal cases and arguments corresponding to each event type, arguments corresponding to different charges can be obtained, improving the accuracy of extracting key case information.

[0045] It should be understood that the above general description and the following detailed description are exemplary and explanatory only, and do not limit this application. Attached Figure Description

[0046] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with this application and, together with the description, serve to explain the principles of this application. It is obvious that the drawings described below are merely some embodiments of this application, and those skilled in the art can obtain other drawings based on these drawings without any inventive effort. In the drawings:

[0047] Figure 1 This is a schematic diagram illustrating the implementation environment of the case retrieval method in an exemplary embodiment of this application;

[0048] Figure 2 This is a schematic flowchart illustrating a case retrieval method according to an exemplary embodiment of this application;

[0049] Figure 3 This is a schematic diagram of an event graph structure shown in an exemplary embodiment of this application;

[0050] Figure 4 This is a block diagram illustrating a case retrieval device in an exemplary embodiment of this application;

[0051] Figure 5 A schematic diagram of the structure of a computer system suitable for an electronic device according to an embodiment of this application is shown. Detailed Implementation

[0052] The embodiments of the present invention will be described below with reference to the accompanying drawings and preferred embodiments. Those skilled in the art can easily understand other advantages and effects of the present invention from the content disclosed in this specification. The present invention can also be implemented or applied through other different specific embodiments, and various details in this specification can also be modified or changed based on different viewpoints and applications without departing from the spirit of the present invention. It should be understood that the preferred embodiments are only for illustrating the present invention and not for limiting the scope of protection of the present invention.

[0053] It should be noted that the illustrations provided in the following embodiments are only schematic representations of the basic concept of the present invention. Therefore, the drawings only show the components related to the present invention and are not drawn according to the actual number, shape and size of the components in the actual implementation. In the actual implementation, the form, quantity and proportion of each component can be arbitrarily changed, and the layout of the components may also be more complex.

[0054] In the following description, numerous details are explored to provide a more thorough explanation of embodiments of the invention. However, it will be apparent to those skilled in the art that embodiments of the invention may be practiced without these specific details. In other embodiments, well-known structures and devices are shown in block diagram form rather than in detail to avoid obscuring embodiments of the invention.

[0055] First, it should be noted that when judges or parties involved in legal cases need to make detailed and accurate judgments on current cases, they need to retrieve similar cases from the list of adjudicated cases, and then use these similar cases to determine the charges, compensation, settlements, and other relevant aspects. Existing case retrieval systems often employ text similarity calculations, machine learning, and natural language processing for this purpose.

[0056] Text similarity calculation is a widely used technique in case retrieval. It identifies similar cases by calculating the similarity between two legal cases. Commonly used text similarity calculation methods include string-based distance algorithms, such as Levenshtein Distance (edit distance) and Jaccard similarity coefficient (used to compare the similarity and differences between finite sample sets). Algorithms such as the Dice coefficient (used to calculate similarity between simple sets) can calculate the similarity between two texts and are commonly used for text matching and similarity calculation. Semantic-based text similarity calculation methods, such as Word2Vec (a language model), FastText (a word vector calculation and text classification tool), and BERT (Bidirectional Encoder Representation from Transformers, a pre-trained language representation model), perform semantic modeling of texts to calculate similarity between them and are commonly used for tasks such as natural language processing and text classification.

[0057] Machine learning is also a commonly used technique for case retrieval. It can classify and cluster legal cases to identify similar cases. Machine learning techniques can be trained using features from legal cases to build models that identify similar cases. Commonly used machine learning algorithms include decision trees, support vector machines, Naive Bayes, and neural networks.

[0058] Natural Language Processing (NLP) is a technology used to process natural language text. It can analyze legal case texts to identify relevant information and case relationships. NLP techniques include text segmentation, part-of-speech tagging, named entity recognition, syntactic analysis, and semantic analysis.

[0059] When performing case retrieval using the aforementioned methods of text similarity calculation, machine learning, and natural language processing, all case information is often processed directly without extracting information specific to different crimes, resulting in low accuracy in information extraction. Furthermore, after extracting case information, it is often directly stored in a database for retrieval. However, due to the sheer volume and complexity of legal cases, this leads to slow retrieval speeds and low accuracy.

[0060] Chinese patent CN114547245A proposes a method for case retrieval by extracting key case information and calculating its similarity. However, the method used in this patent to extract legal elements is a regular expression. It uses a uniform approach to extract legal elements for cases with different charges, including charges, criminal acts, types of persons, criminal consequences, compensation acts, and settlement situations. This method may not achieve satisfactory matching results for complex cases with different charges.

[0061] The following explains the technical terms used in this application:

[0062] Event extraction: Event extraction is a task that extracts event information from text. Its goal is to identify information such as event type, participants, time, and location in the text. Event extraction relies on entity extraction and relation extraction, and it is more difficult than entity extraction and relation extraction.

[0063] An event graph is a method for representing events graphically. It represents factors such as participants, actions, and time in an event as nodes and edges in a graph, thereby enabling a deep understanding and analysis of the event. Unlike traditional knowledge graphs, event graphs primarily focus on the process and dynamic characteristics of events. By abstracting various factors in an event into nodes and edges, they form a dynamically changing graphical structure that better reflects the essence of the event.

[0064] Figure 1 This is a schematic diagram illustrating the implementation environment of the case retrieval method in an exemplary embodiment of this application.

[0065] Reference Figure 1As shown, the implementation environment may include a case retrieval terminal 101, a server 102, and an interactive terminal 103. The technical solution provided in this application embodiment can be applied to the case retrieval terminal 101. The case retrieval terminal 101 is used to obtain a sample dataset of legal cases from the server 102 to construct an event graph. Then, it obtains the legal case data to be retrieved input by the judge or legal case party from the interactive terminal 103 and completes the case retrieval. Finally, it feeds back the retrieved similar legal cases to the judge or legal case party through the interactive terminal 103, or stores the retrieved similar legal cases in the server.

[0066] in, Figure 1 The case retrieval terminal 101 shown can be any terminal device that supports data acquisition and data processing, such as a smartphone, in-vehicle computer, tablet computer, or laptop computer, but is not limited to these. Figure 1 The server 102 shown can be, for example, a standalone physical server, a server cluster or distributed system composed of multiple physical servers, or a cloud server providing basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDN (Content Delivery Network), and big data and artificial intelligence platforms. No restrictions are placed on this. The case retrieval terminal 101 can communicate with the interactive terminal 103 and the server 102 via wireless networks such as 3G (third-generation mobile information technology), 4G (fourth-generation mobile information technology), and 5G (fifth-generation mobile information technology). No restrictions are placed on this as well.

[0067] In one embodiment of this application, a case retrieval terminal 101 acquires a legal case sample dataset and legal case data to be retrieved; based on multiple preset candidate event types and multiple candidate arguments corresponding to each candidate event type, it extracts events from the legal case sample dataset to obtain first event information, wherein the candidate event types are used to indicate the crime of the legal case; based on the first event information, it constructs an event graph; it extracts events from the legal case data to be retrieved to obtain second event information; it compares the branches and feature vectors of the event graph based on the second event information, and calls similar legal cases in the legal case sample dataset corresponding to the legal case data to be retrieved based on the comparison results. In this embodiment, after acquiring the event information corresponding to the legal case sample data, an event graph is constructed, and then case retrieval is performed on the event information corresponding to the legal case data to be retrieved through the branches and feature vectors of the event graph, which can improve the retrieval speed and accuracy of case retrieval; in addition, by extracting events through event types used to indicate the crime of the legal case and arguments corresponding to each event type, arguments corresponding to different crimes can be obtained, improving the accuracy of extracting key case information.

[0068] The above sections introduced exemplary implementation environments for applying the technical solutions of this application. Next, we will continue to introduce the case retrieval method of this application.

[0069] To address the problems of slow search speed and low search accuracy in existing technologies, embodiments of this application propose a method for searching similar cases, a device for searching similar cases, an electronic device, a computer-readable storage medium, and a computer program product. These embodiments will be described in detail below.

[0070] Please see Figure 2 , Figure 2 This is a schematic flowchart illustrating a case retrieval method according to an exemplary embodiment of this application. This method can be applied to... Figure 1 The implementation environment is shown. It should be understood that this method can also be applied to other exemplary implementation environments and specifically executed by devices in other implementation environments. This embodiment does not limit the implementation environment to which the method is applicable.

[0071] like Figure 2 As shown, in an exemplary embodiment, the case retrieval method includes at least steps S210 to S250, which are described in detail below:

[0072] In step S210, a legal case sample dataset and legal case data to be retrieved are obtained.

[0073] For example, obtain Figure 1The server 102 shown pre-stores a sample dataset of legal cases. In this embodiment, the sample dataset includes multiple legal cases corresponding to various crimes. Additionally, the legal case data to be retrieved is obtained from the interactive terminal 103. In this embodiment, the legal case data to be retrieved may be, for example, legal case data corresponding to one crime, or legal case data corresponding to multiple crimes; this application does not limit this.

[0074] In step S220, events are extracted from the legal case sample dataset according to a number of preset candidate event types and a number of candidate arguments corresponding to each candidate event type to obtain first event information, wherein the candidate event type is used to indicate the crime in the legal case.

[0075] It should be noted that the event type is used to indicate the crime in a legal case, such as arson, traffic accident, or intentional injury. Arguments are the basic components of an event and can be understood as key information about the event. For example, if an event type is arson, its corresponding arguments include the arsonist, the victim, property damage, motive, time of arson, location of arson, and arson weapon. In this embodiment, by extracting events from a sample dataset of legal cases, the corresponding first event information can be obtained. The first event information includes the first event types of multiple legal cases and the first argument information corresponding to each event type. The first argument information may include argument category and argument content.

[0076] For example, by extracting events from a legal case sample dataset using candidate event types and candidate arguments, the first event types include arson and traffic accident. Arson corresponds to three argument categories: arsonist, victim, and property damage. The arsonist argument category includes arguments such as "Zhang San," "Li Si," and "Wang Wu," while the property damage argument category includes arguments such as "1000 yuan," "2000 yuan," and "3000 yuan." In this embodiment, different crimes in legal cases are associated with event types, and then events are extracted from legal cases using multiple arguments corresponding to each event type. This makes the arguments more targeted to the crimes, improving the accuracy of extracting key information from legal cases.

[0077] In step S230, an event graph is constructed based on the first event information.

[0078] For example, based on the event types of multiple legal cases in the first event information and the argument information corresponding to each event type, the branches, nodes and feature vectors of the event graph corresponding to the legal case sample dataset are constructed. This event graph is used for subsequent case retrieval of the legal cases to be retrieved.

[0079] In step S240, event extraction is performed on the legal case data to be retrieved to obtain second event information.

[0080] For example, when performing event extraction on the legal case to be retrieved, if the legal case to be retrieved is a single case, then the second event type of the case and multiple second argument information corresponding to the event type are obtained; if the legal case to be retrieved is multiple cases, then multiple second event types of the multiple cases and multiple second argument information corresponding to each event type are obtained. The second argument information includes argument category and argument content.

[0081] In step S250, the branches and feature vectors of the event graph are compared according to the second event information, and similar legal cases corresponding to the legal case data to be retrieved are called from the legal case sample dataset according to the comparison results.

[0082] For example, based on the second event type in the second event information, multiple legal case sample data with the same event type as the legal case to be retrieved are selected from the knowledge graph. Then, the feature vector formed by the second argument information is compared with the feature vector in the event graph, and similar legal cases are selected from the multiple legal case sample data based on the vector similarity.

[0083] As can be seen from steps S210 to S250 above, the solution proposed in this embodiment constructs an event graph after obtaining the event information corresponding to the sample data of legal cases, and then performs case retrieval on the event information corresponding to the legal case data to be retrieved through the branches and feature vectors of the event graph, which can improve the retrieval speed and accuracy of case retrieval; in addition, by extracting events through event types used to indicate the crimes of legal cases and the arguments corresponding to each event type, arguments corresponding to different crimes can be obtained, which improves the accuracy of extracting key information of cases.

[0084] In one embodiment of this application, Figure 2 After obtaining the legal case sample dataset and the legal case data to be retrieved in step S210, the process also includes:

[0085] Obtain multiple charges from the legal case sample dataset and define each charge as a candidate event type;

[0086] The candidate arguments are obtained based on the event occurrence type corresponding to each of the candidate event types.

[0087] For example, after obtaining a dataset of legal case samples, the charges for each legal case sample are selected, such as arson, intentional injury, and traffic accident. Multiple charges are defined as candidate event types, and a set of candidate event types T = {t1, t2, t3, ..., t4} is generated. nFor each candidate event type, the key event information is defined as candidate arguments, such as people, time, and amount.

[0088] For example, a candidate event type set T = {t1, t2, t3, ..., t4} is obtained based on multiple charges in a legal case sample dataset. n}. Where t1 is the crime of arson, and the set of arguments corresponding to the crime of arson is A1={a1(arsonist), a2(victim), a3(property loss), a4(motive), a5(time of arson), a6(location of arson), a7(arson tool)}.

[0089] In one embodiment of this application, event extraction is performed on the legal case sample dataset according to a plurality of preset candidate event types and a plurality of candidate arguments corresponding to each candidate event type to obtain first event information, including:

[0090] The legal case sample dataset is transmitted to a pre-trained event extraction model;

[0091] Using the pre-trained event extraction model, based on the multiple candidate event types and the multiple candidate arguments corresponding to each candidate event type, events are extracted from the legal case sample dataset to obtain multiple first event types and multiple first argument information corresponding to each first event type, wherein the multiple first event types and the multiple first argument information corresponding to each first event type are included in the first event information.

[0092] For example, using a pre-trained event extraction model, the first event type of the first legal case sample data is arson. The argument information for this arson case includes: arsonist – Zhao; victim – Chen; property damage – some items inside the house; motive – emotional dispute; time of arson – March 25, 2016; location of arson – Chen's bungalow; arson tool – lighter. Using the same pre-trained event extraction model, the first event type of the second legal case sample data is traffic accident. The argument information for this traffic accident case includes: defendant – Lou; victim – Chu; time of accident – ​​approximately 3:40 PM on August 28, 2010; location of accident – ​​intersection of South Fourth Ring Road and Tiesanguanmiao Road in this city; degree of injury – second-degree serious injury; method of arrest – surrender; expenses paid – 287,431.2 yuan; vehicle type – small sedan. In this embodiment, event types are generated for different crimes, and then arguments are extracted for different event types. Compared with the existing technology that extracts uniform information for cases with different crimes, matching arguments with each event type improves the accuracy of extracting key information in legal cases.

[0093] In one embodiment of this application, event extraction is performed on the legal case data to be retrieved to obtain second event information, including:

[0094] The legal case data to be retrieved is transmitted to the pre-trained event extraction model;

[0095] Using the pre-trained event extraction model, based on the multiple candidate event types and the multiple candidate arguments corresponding to each candidate event type, event extraction is performed on the legal case to be retrieved to obtain a second event type and multiple second argument information corresponding to the second event type. The second event type and the multiple second argument information corresponding to the second event type are included in the second event information.

[0096] For example, the legal case data to be retrieved is transmitted to a pre-trained event extraction model to obtain the event type t corresponding to the legal case data to be retrieved. q and the set of arguments A q .

[0097] In one embodiment of this application, before transmitting the legal case sample dataset to a pre-trained event extraction model, the following steps are included:

[0098] Obtain a dataset of training samples of legal cases with labels, the labels being used to indicate the event type and argument type of the training samples of legal cases;

[0099] The training sample dataset of the legal cases is transmitted to a pre-built event extraction model to obtain multiple third event types and multiple third argument information corresponding to each third event type;

[0100] Based on the third event type and the event type in the label, a first loss value is obtained;

[0101] The second loss value is obtained based on the third argument information and the argument type in the label;

[0102] Based on the first loss value and the second loss value, the model parameters of the pre-built event extraction model are updated to obtain the trained event extraction model.

[0103] In one embodiment of this application, constructing an event graph based on the first event information includes:

[0104] Based on the multiple first event types, construct multiple branches of the event graph;

[0105] The event graph is obtained by taking multiple legal case samples corresponding to each of the first event types as multiple nodes of each branch and generating a first feature vector based on the multiple first argument information corresponding to each node.

[0106] For example, see Figure 3 , Figure 3 This is a schematic diagram of an event graph structure illustrated in an exemplary embodiment of this application. The dataset of legal case samples is classified by crime, and multiple candidate event types are defined. For example, arson and bombing can be considered branches of crimes endangering public safety, and theft and fraud can be considered branches of crimes against property. For arson, the corresponding legal case samples are used as multiple nodes of a branch, and a first feature vector is generated based on multiple first argument information corresponding to arson. For example, the first-instance criminal judgment of Zhao for arson is used as a node for arson, and the first feature vector corresponding to that node is obtained based on the arson time, arson tool, arson location, arsonist, victim, property damage, and motive.

[0107] It should be noted that in this embodiment of the application, each case in the knowledge graph is represented as a vector, which can retain more semantic information about entities and relationships.

[0108] In one embodiment of this application, the branches and feature vectors of the event graph are compared based on the second event information, and similar legal cases corresponding to the legal case data to be retrieved are called from the legal case sample dataset based on the comparison results, including:

[0109] The search path is obtained by calling the branch in the event graph that matches the second event type;

[0110] Determine the vector similarity between the first feature vector corresponding to each candidate node in the retrieval path and the second feature vector corresponding to the second argument information. If the vector similarity is greater than or equal to a preset vector similarity threshold, then the candidate node is taken as the target node.

[0111] The similar legal cases are obtained by calling the legal cases corresponding to the target node in the legal case sample dataset.

[0112] For example, the event type t corresponding to the legal case data to be retrieved q For arson, then choose Figure 3 The arson branch shown is used as the search path, and then t is determined in the search path. q The set of arguments A qThe vector similarity between the corresponding second feature vector and the first feature vector of each candidate node. In this retrieval path, there are k nodes whose first feature vectors and argument set A... q If the vector similarity of the second feature vector is greater than the preset vector similarity threshold, then the legal case samples corresponding to the k nodes are regarded as similar legal cases, and they are sorted according to the magnitude of vector similarity.

[0113] In this embodiment of the application, the branch with the same event type as the case data to be retrieved is first found through the event graph, and then the feature vector similarity is judged in the branch. It is not necessary to judge the feature vector similarity in the entire database, which not only reduces the cost of similar case retrieval, but also further improves the accuracy of retrieval.

[0114] Figure 4 This is a block diagram illustrating a case retrieval device according to an exemplary embodiment of this application. The device can be applied to… Figure 1 The implementation environment shown is not limited to this embodiment. This device can also be applied to other exemplary implementation environments and specifically configured in other devices. This embodiment does not limit the implementation environment to which the device is applicable.

[0115] like Figure 4 As shown, the exemplary case retrieval device includes:

[0116] Data acquisition module 401 is used to acquire legal case sample datasets and legal case data to be retrieved;

[0117] The first event extraction module 402 is used to extract events from the legal case sample dataset according to a preset set of multiple candidate event types and multiple candidate arguments corresponding to each candidate event type, to obtain first event information, wherein the candidate event types are used to indicate the crime in the legal case;

[0118] The event graph construction module 403 is used to construct an event graph based on the first event information;

[0119] The second event extraction module 404 is used to extract events from the legal case data to be retrieved to obtain second event information.

[0120] The case retrieval module 405 is used to compare the branches and feature vectors of the event graph according to the second event information, and to call similar legal cases in the legal case sample dataset that correspond to the legal case data to be retrieved according to the comparison results.

[0121] In this exemplary case retrieval device, after acquiring event information corresponding to legal case sample data, an event graph is constructed. Then, case retrieval is performed on the event information corresponding to the legal case data to be retrieved through the branches and feature vectors of the event graph, which can improve the retrieval speed and accuracy of case retrieval. In addition, by extracting events using event types that indicate the crimes of legal cases and the arguments corresponding to each event type, arguments corresponding to different crimes can be obtained, which improves the accuracy of extracting key case information.

[0122] It should be noted that the case retrieval device and the case retrieval method provided in the above embodiments belong to the same concept. The specific operation methods of each module and unit have been described in detail in the method embodiments and will not be repeated here. In practical applications, the case retrieval device provided in the above embodiments can be assigned to different functional modules as needed, that is, the internal structure of the device can be divided into different functional modules to complete all or part of the functions described above. This is not a limitation here.

[0123] Embodiments of this application also provide an electronic device, including: one or more processors; and a storage device for storing one or more programs, which, when executed by the one or more processors, cause the electronic device to implement the case retrieval methods provided in the above embodiments.

[0124] Figure 5 A schematic diagram of a computer system suitable for an electronic device according to an embodiment of this application is shown. It should be noted that... Figure 5 The computer system 500 of the electronic device shown is merely an example and should not impose any limitation on the functionality and scope of use of the embodiments of this application.

[0125] like Figure 5 As shown, the computer system 500 includes a Central Processing Unit (CPU) 501, which can perform various appropriate actions and processes based on programs stored in Read-Only Memory (ROM) 502 or programs loaded from Storage Unit 508 into Random Access Memory (RAM) 503, such as performing the methods described in the above embodiments. The RAM 503 also stores various programs and data required for system operation. The CPU 501, ROM 502, and RAM 503 are interconnected via a bus 504. An Input / Output (I / O) interface 505 is also connected to the bus 504.

[0126] The following components are connected to I / O interface 505: an input section 506 including a keyboard, mouse, etc.; an output section 507 including a cathode ray tube (CRT), liquid crystal display (LCD), etc., and speakers, etc.; a storage section 508 including a hard disk, etc.; and a communication section 509 including a network interface card such as a LAN (Local Area Network) card, modem, etc. The communication section 509 performs communication processing via a network such as the Internet. A drive 510 is also connected to I / O interface 505 as needed. Removable media 511, such as a disk, optical disk, magneto-optical disk, semiconductor memory, etc., are installed on drive 510 as needed so that computer programs read from them can be installed into storage section 508 as needed.

[0127] Specifically, according to embodiments of this application, the processes described above with reference to the flowcharts can be implemented as computer software programs. For example, embodiments of this application include a computer program product comprising a computer program carried on a computer-readable medium, the computer program including a computer program for performing the methods shown in the flowcharts. In such embodiments, the computer program can be downloaded and installed from a network via communication section 509, and / or installed from removable medium 511. When the computer program is executed by central processing unit (CPU) 501, it performs various functions defined in the system of this application.

[0128] It should be noted that the computer-readable medium shown in the embodiments of this application can be a computer-readable signal medium or a computer-readable storage medium, or any combination of the two. A computer-readable storage medium can be, for example, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of a computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer disk, a hard disk, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM), flash memory, optical fiber, portable compact disc read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination thereof. In this application, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, carrying a computer-readable computer program. Such propagated data signals can take various forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination thereof. Computer-readable signal media can also be any computer-readable medium other than computer-readable storage media, which can send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device. The computer program contained on the computer-readable medium can be transmitted using any suitable medium, including but not limited to wireless, wired, etc., or any suitable combination thereof.

[0129] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of this application. Each block in a flowchart or block diagram may represent a module, segment, or portion of code, which contains one or more executable instructions for implementing a specified logical function. It should also be noted that in some alternative implementations, the functions indicated in the blocks may occur in a different order than those indicated in the drawings. For example, two consecutively indicated blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in a block diagram or flowchart, and combinations of blocks in a block diagram or flowchart, can be implemented using a dedicated hardware-based system that performs the specified function or operation, or using a combination of dedicated hardware and computer instructions.

[0130] The units described in the embodiments of this application can be implemented in software or hardware, and the described units can also be located in a processor. The names of these units do not necessarily limit the specific unit itself.

[0131] Another aspect of this application provides a computer-readable storage medium storing a computer program thereon, which, when executed by a computer's processor, causes the computer to perform the case retrieval method as described above. This computer-readable storage medium may be included in the electronic device described in the above embodiments, or it may exist independently and not assembled into the electronic device.

[0132] Another aspect of this application provides a computer program product or computer program including computer instructions stored in a computer-readable storage medium. A processor of a computer device reads the computer instructions from the computer-readable storage medium and executes the computer instructions, causing the computer device to perform the case retrieval method provided in the various embodiments described above.

[0133] The above embodiments are merely illustrative of the principles and effects of the present invention and are not intended to limit the invention. Any person skilled in the art can modify or alter the above embodiments without departing from the spirit and scope of the present invention. Therefore, all equivalent modifications or alterations made by those skilled in the art without departing from the spirit and technical concept disclosed in the present invention should still be covered by the claims of the present invention.

Claims

1. A method for retrieving similar cases, characterized in that, The method includes: Obtain a sample dataset of legal cases and the data of legal cases to be retrieved; Obtain multiple charges from the legal case sample dataset and define each charge as a candidate event type; obtain candidate arguments based on the event occurrence type corresponding to each candidate event type; Based on a set of multiple candidate event types and multiple candidate arguments corresponding to each candidate event type, event extraction is performed on the legal case sample dataset to obtain first event information, wherein the candidate event type is used to indicate the crime in the legal case; Based on the first event information, an event graph is constructed; the construction of the event graph based on the first event information includes: constructing multiple branches of the event graph based on multiple first event types; taking multiple legal case samples corresponding to each first event type as multiple nodes of each branch, and generating a first feature vector based on multiple first argument information corresponding to each node, so as to obtain the event graph based on the multiple branches, multiple nodes and multiple first feature vectors; Event extraction is performed on the legal case data to be retrieved to obtain second event information; The branches and feature vectors of the event graph are compared based on the second event information, and similar legal cases corresponding to the legal case data to be retrieved are called from the legal case sample dataset based on the comparison results. The branches and feature vectors of the event graph are compared based on the second event information, and similar legal cases corresponding to the legal case data to be retrieved are called from the legal case sample dataset based on the comparison results, including: The search path is obtained by calling the branch in the event graph that matches the second event type; Determine the vector similarity between the first feature vector corresponding to each candidate node in the retrieval path and the second feature vector corresponding to the second argument information. If the vector similarity is greater than or equal to a preset vector similarity threshold, then the candidate node is taken as the target node. The similar legal cases are obtained by calling the legal cases corresponding to the target node in the legal case sample dataset.

2. The case retrieval method according to claim 1, characterized in that, The step of extracting events from the legal case sample dataset based on multiple preset candidate event types and multiple candidate arguments corresponding to each candidate event type to obtain first event information includes: The legal case sample dataset is transmitted to a pre-trained event extraction model; Using the pre-trained event extraction model, based on the multiple candidate event types and the multiple candidate arguments corresponding to each candidate event type, events are extracted from the legal case sample dataset to obtain multiple first event types and multiple first argument information corresponding to each first event type, wherein the multiple first event types and the multiple first argument information corresponding to each first event type are included in the first event information.

3. The case retrieval method according to claim 2, characterized in that, The legal case data to be retrieved is subjected to event extraction to obtain second event information, including: The legal case data to be retrieved is transmitted to the pre-trained event extraction model; Using the pre-trained event extraction model, based on the multiple candidate event types and the multiple candidate arguments corresponding to each candidate event type, event extraction is performed on the legal case to be retrieved to obtain a second event type and multiple second argument information corresponding to the second event type. The second event type and the multiple second argument information corresponding to the second event type are included in the second event information.

4. The case retrieval method according to claim 2, characterized in that, Before transmitting the legal case sample dataset to the pre-trained event extraction model, the process includes: Obtain a training sample dataset of legal cases with labels, the labels being used to indicate the event type and argument type of the legal case training samples; transmit the training sample dataset of legal cases to a pre-built event extraction model to obtain multiple third event types and multiple third argument information corresponding to each third event type; Based on the third event type and the event type in the label, a first loss value is obtained; The second loss value is obtained based on the third argument information and the argument type in the label; Based on the first loss value and the second loss value, the model parameters of the pre-built event extraction model are updated to obtain the trained event extraction model.

5. A case retrieval device based on the case retrieval method according to any one of claims 1-4, characterized in that, The device includes: a data acquisition module, used to acquire a legal case sample dataset and legal case data to be retrieved; The first event extraction module is used to extract events from the legal case sample dataset according to a preset number of candidate event types and a number of candidate arguments corresponding to each candidate event type, so as to obtain first event information, wherein the candidate event types are used to indicate the crime in the legal case; The event graph construction module is used to construct an event graph based on the first event information; The second event extraction module is used to extract events from the legal case data to be retrieved to obtain second event information. The case retrieval module is used to compare the branches and feature vectors of the event graph based on the second event information, and to call similar legal cases in the legal case sample dataset that correspond to the legal case data to be retrieved based on the comparison results.

6. An electronic device, characterized in that, The electronic device includes: One or more processors; A storage device for storing one or more programs, which, when executed by one or more processors, cause the electronic device to implement the case retrieval method as described in any one of claims 1 to 4.

7. A computer-readable storage medium, characterized in that, It stores a computer program that, when executed by the computer's processor, causes the computer to perform the case retrieval method as described in any one of claims 1 to 4.