A multi-level agent-based drug dictionary intelligent matching method and system
By constructing a dedicated vector knowledge base and a large language model working together, efficient and accurate matching of drug dictionaries was achieved, solving the matching difficulties caused by the heterogeneity of drug information, reducing the cost of manual intervention and improving the scalability of the system.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SHAN DONG MSUN HEALTH TECH GRP CO LTD
- Filing Date
- 2026-03-25
- Publication Date
- 2026-06-19
AI Technical Summary
Existing technologies for drug dictionary matching suffer from low matching accuracy, high manual intervention costs, and poor system scalability. In particular, when faced with heterogeneous drug information and inconsistent classifications among different medical institutions, it is difficult to achieve efficient and accurate automated matching.
A drug dictionary intelligent matching method based on multi-level intelligent agents is adopted to construct a dedicated vector knowledge base for Western medicine, traditional Chinese medicine, formula granules and Chinese herbal medicine. Semantic vectors are generated through text embedding model, and candidate set optimization and multi-dimensional verification are performed using large language model to achieve semantic matching of drugs.
It significantly improves matching accuracy, reduces the cost of manual intervention, and has good scalability, enabling it to quickly adapt to the access needs of new types of drugs, and provides interpretable matching results and traceability.
Smart Images

Figure CN122242492A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of medical information technology, specifically to a method and system for intelligent matching of drug dictionaries based on multi-level intelligent agents. Background Technology
[0002] Drug dictionary matching refers to the process of matching local drug names in different medical institution information systems with a standard drug database. Due to significant heterogeneity in drug information records across medical institutions, drug dictionary matching faces numerous technical challenges. Specifically, the arbitrariness of drug naming leads to multiple representations of the same generic drug in different systems, such as mixing brand names and generic names, missing dosage form information, or abbreviations; the specification field also lacks standardized formatting, with significant differences in unit expressions (e.g., "g" vs. "gram"), packaging specifications (e.g., "×24 tablets" vs. "24 tablets"), and numerical expressions; manufacturer information often appears in abbreviations, full names, or aliases, lacking a standardized manufacturer name database; furthermore, the lack of a unified drug classification system further exacerbates the matching difficulty, as there is overlap in clinical applications for categories such as Western medicine, traditional Chinese medicine, granules, and herbal medicines, and different institutions may classify the same drug inconsistently.
[0003] To address the aforementioned issues, existing technologies primarily rely on three types of methods for drug dictionary matching. The first is rule-based exact matching, which compares fields such as drug name and specifications using complete string matching or regular expressions. This method has extremely high requirements for data standardization and cannot adapt to naming variations and expression differences present in practical applications. The second is keyword-based fuzzy matching, which uses algorithms such as TF-IDF or BM25 to calculate text similarity. While this can alleviate the naming difference problem to some extent, it is limited by the semantic understanding capabilities of the bag-of-words model, making it susceptible to interference from irrelevant keywords and difficult to accurately identify semantic similarities. The third is rule-based classification matching, which classifies drugs using a predefined rule base before matching. This method is not only costly to maintain and unable to cover all drug variations, but also poorly adaptable to the rapid iteration of new drugs.
[0004] In summary, existing technologies have significant shortcomings in terms of matching accuracy, cost of manual intervention, and system scalability, making it difficult to meet the automated matching needs of large-scale, multi-type drug dictionaries. Summary of the Invention
[0005] To address the shortcomings of existing technologies, this invention proposes a drug dictionary intelligent matching method and system based on multi-level intelligent agents. Dedicated vector knowledge bases and matching logic are designed for Western medicines / traditional Chinese medicine preparations, granules, and traditional Chinese herbs: Western medicines / traditional Chinese medicine preparations are verified based on name, specifications, and manufacturer; granules are verified based on name and manufacturer; and traditional Chinese herbs are verified only by name. Drug semantic vectors are generated through a text embedding model. A candidate set is obtained through vector retrieval and reordering. A large language model then performs candidate set optimization and multi-dimensional parallel or serial verification. Finally, a structured result containing matching identifiers and failure reasons is output, achieving semantic-level intelligent drug matching. This significantly improves matching accuracy and robustness, reduces manual intervention costs, and possesses good scalability, enabling rapid adaptation to the access needs of new drug types.
[0006] On the one hand, a drug dictionary intelligent matching method based on multi-level intelligent agents is provided, including: Construct a dedicated vector knowledge base for different types of drugs, wherein the different types of drugs include at least Western medicine / Chinese patent medicine, formula granules, and Chinese herbal medicine; Receive information on drugs to be matched, and determine the corresponding matching dimension based on the drug type of the drug to be matched. The matching dimension includes at least one of the following: drug name, manufacturer, and specifications. Based on the text embedding model, the information of the drug to be matched is converted into a semantic vector. Vector retrieval is performed in the dedicated vector knowledge base of the corresponding drug type to obtain an initial candidate drug set. The initial candidate drug set is then reordered to generate a candidate drug set. A large language model is used to filter out standard drug data that is most similar to the drug to be matched from the candidate drug set; Based on the matching dimensions, the drug to be matched and the most similar standard drug data are subjected to dimension matching verification. If all matching dimensions pass the verification, the drug to be matched and the standard drug entry are determined to be the same drug, and the matching result is output.
[0007] Furthermore, the corresponding matching dimension is determined based on the type of drug to be matched, specifically: If the drug type is Western medicine or traditional Chinese medicine, the matching dimensions include three dimensions: drug name, manufacturer, and specifications, and the dimension matching verification is performed in parallel. If the drug type is a formula granule, the matching dimensions include two dimensions: drug name and manufacturer, and the dimension matching verification is a serial verification. If the drug type is traditional Chinese medicine, then the matching dimension includes one dimension: the drug name.
[0008] Furthermore, the construction of the vector knowledge base also includes: Standardized drug data is cleaned and structured to retain the core attributes of the drugs; A text embedding model is used to concatenate the core fields of each drug data into a unified text string and then transform it into a high-dimensional dense vector; The generated vector data is written into the vector knowledge base in batches, establishing an association mapping between the vectors and the original drug attribute information, and configuring an index type that is compatible with cosine similarity calculation for the vector data; The cosine similarity algorithm is used as the metric for vector similarity.
[0009] Furthermore, the text embedding model is the qianwen3-embedding-8b model, and the large language model is the Moonshot-Kimi-K2-Instruct model.
[0010] Furthermore, based on the aforementioned vector knowledge base, the semantic vectors are subjected to similarity retrieval to generate a candidate drug set, specifically including: The semantic vectors are input into the Milvus vector database. The top N semantic vectors with the highest similarity are retrieved based on the cosine similarity algorithm. Based on the association mapping between the vectors and the original attributes, the complete information of the candidate drugs is traced back to form an initial candidate drug set. The reordering model is used to perform fine similarity calculations on each drug data in the initial candidate drug set, and the top M drugs with the highest similarity are selected as the final candidate drug set.
[0011] Furthermore, the parallel verification is implemented through a parallel workflow built using LangGraph, which simultaneously triggers verification of three dimensions: drug name, specifications, and manufacturer, and summarizes the verification results of each dimension.
[0012] Furthermore, the serial verification is implemented through a serial workflow built using LangGraph, which sequentially completes candidate set optimization and dimension verification.
[0013] On the other hand, a drug dictionary intelligent matching system based on multi-level intelligent agents is provided, including: The knowledge base construction module is configured to: construct a dedicated vector knowledge base for different types of medicines, wherein the different types of medicines include at least Western medicine / Chinese patent medicine, formula granules, and Chinese herbal medicines; The input module is configured to: receive information about drugs to be matched, and determine the corresponding matching dimension based on the drug type of the drug to be matched, wherein the matching dimension includes at least one of drug name, manufacturer, and specifications; The semantic retrieval module is configured to: convert the drug information to be matched into semantic vectors based on a text embedding model; perform vector retrieval in a dedicated vector knowledge base for the corresponding drug type to obtain an initial candidate drug set; and reorder the initial candidate drug set to generate a candidate drug set. The large model decision module is configured to: use a large language model to select standard drug data that is most similar to the drug to be matched from the candidate drug set; The matching verification module is configured to: perform dimensional matching verification between the drug to be matched and the standard drug data according to the matching dimensions; if all matching dimensions pass the verification, it is determined that the drug to be matched and the standard drug data are the same drug, and the matching result is output.
[0014] In another aspect, a computer device is also provided, including a computer-readable storage medium, a processor, and a computer program stored on the computer-readable storage medium and executable on the processor, characterized in that, when the processor executes the program, it performs the method described in the first aspect.
[0015] In another aspect, a computer-readable storage medium is also provided, on which a computer program is stored, which, when executed by a processor, performs the method described in the first aspect.
[0016] The above technical solution has the following advantages or beneficial effects: (1) Significantly improved matching accuracy and robustness. This invention adopts a technical architecture of "embedded representation - semantic retrieval - multi-dimensional LLM judgment". Through the text embedding model, drug information is transformed into high-dimensional semantic vectors to achieve semantic similarity understanding, effectively overcoming the limitations of traditional string matching in handling naming variations, specification differences and manufacturer information confusion. On this basis, parallel verification of name, manufacturer and specification is adopted for Western medicine / Chinese patent medicine to ensure the accuracy of matching results; two-dimensional and single-dimensional verification are adopted for formula granules and Chinese herbal medicines respectively, taking into account the attribute characteristics of different drug categories, and the overall matching accuracy is greatly improved.
[0017] (2) Differentiated intelligent matching of multiple types of drugs has been achieved. This invention constructs dedicated vector knowledge bases for three categories of drugs: Western medicine / Chinese patent medicine, formula granules, and Chinese herbal medicine, and designs differentiated matching dimensions and verification logic, solving the matching chaos problem caused by the inconsistent drug classification system in traditional methods. Western medicine / Chinese patent medicine verifies the three dimensions of name, specification, and manufacturer; formula granules focus on the name and manufacturer dimensions; and Chinese herbal medicine only verifies the name dimension, achieving accurate adaptation of the attribute characteristics of different categories of drugs.
[0018] (3) Reduced manual intervention costs and improved matching efficiency. Through the collaborative work of vector retrieval and large language model, this invention achieves fully automated processing of drug dictionary matching. Candidate drug set retrieval and multi-dimensional verification are both completed automatically by the model. Only the prompt word template needs to be configured during the system deployment stage, which greatly reduces the workload of manual comparison and rule maintenance. At the same time, the structured output includes matching identifiers and failure reasons, which facilitates the rapid location and review of abnormal results, further improving the overall processing efficiency.
[0019] (4) The matching results are interpretable and traceable. This invention records the matching status of each dimension in the multi-dimensional verification process. When a matching fails, the reason for the failure is clearly output (such as "name judgment failed" or "specification judgment failed"), making the matching process transparent and the results verifiable. Compared with the "black box" matching of traditional methods, this solution provides medical institutions with traceable matching quality assurance and meets the compliance requirements of medical insurance docking and clinical drug management.
[0020] (5) The architecture is scalable and adaptable to the rapid integration of new drugs. This invention adopts a modular design, decoupling the vector knowledge base from the large model decision layer. When a new type of drug needs to be integrated, only the corresponding dedicated knowledge base needs to be built and the matching dimension rules need to be configured. There is no need to modify the core algorithm process, which effectively supports the dynamic updating of the drug catalog and the needs of business expansion. Attached Figure Description
[0021] The accompanying drawings, which form part of this invention, are used to provide a further understanding of the invention. The illustrative embodiments of the invention and their descriptions are used to explain the invention and do not constitute an improper limitation of the invention.
[0022] Figure 1 This is a flowchart of the method according to Embodiment 1 of the present invention; Figure 2 This is a schematic diagram of the large model verification process in Embodiment 1 of the present invention; Figure 3 This is a schematic diagram of the system structure of Embodiment 2 of the present invention; Figure 4 This is a schematic diagram of the structure of a computer device according to Embodiment 3 of the present invention. Detailed Implementation
[0023] To make the objectives, technical solutions, and advantages of the present invention clearer, the embodiments of the present invention will be described in further detail below with reference to the accompanying drawings. Those skilled in the art should understand that the specific embodiments described herein are for illustrative purposes only and are not intended to limit the scope of the invention.
[0024] It should be noted that the following detailed description is illustrative and intended to provide further explanation of the invention. Unless otherwise specified, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention pertains.
[0025] Example 1 This embodiment provides a drug dictionary intelligent matching method based on multi-level intelligent agents. Figure 1 This is a flowchart illustrating the overall method of an embodiment of the present invention, as follows: Figure 1 As shown, it includes the following steps: S101: Construct a dedicated vector knowledge base for different types of drugs, where different types of drugs include at least Western medicine / Chinese patent medicine, formula granules, and Chinese herbal medicines; S102: Receive information on drugs to be matched, and determine the corresponding matching dimension based on the drug type of the drug to be matched. The matching dimension includes at least one of the following: drug name, manufacturer, and specifications. S103: Based on the text embedding model, the information of the drug to be matched is converted into a semantic vector, and vector retrieval is performed in the dedicated vector knowledge base of the corresponding drug type to obtain an initial candidate drug set. The initial candidate drug set is then reordered to generate a candidate drug set. S104: Use a large language model to filter out standard drug data that is most similar to the drug to be matched from the candidate drug set; S105: Based on the matching dimensions, perform dimension matching verification between the drug to be matched and the most similar standard drug data. If all matching dimension verifications pass, it is determined that the drug to be matched and the standard drug entry are the same drug, and the matching result is output.
[0026] In step S101, to ensure the accuracy and efficiency of subsequent searches, an appropriate knowledge base is constructed for the field characteristics of various types of drugs. Different types of drugs include at least Western medicine / Chinese patent medicine, formula granules, and Chinese herbal medicine.
[0027] Specifically, the construction of the Western medicine / Traditional Chinese medicine knowledge base involves building a dedicated vector knowledge base for Western medicine / Traditional Chinese medicine based on a standardized drug dataset containing four core dimensions: drug name, manufacturer, specifications, and unique code. The specific implementation process is as follows: First, data preprocessing is performed to clean and structure standardized drug data, removing invalid characters and unifying field formats. For example, the abbreviation and full name of the manufacturer are normalized, and the unit of specification values is standardized (e.g., "g" is unified as "gram"). Finally, the drug name, manufacturer, specification, and unique code are retained as core attributes to form a well-organized basic data entry.
[0028] Then, vector transformation is performed using the qianwen3-embedding-8b text embedding model (output dimension is 4096). The core fields of each drug data are concatenated into a unified text string in the order of "drug name + manufacturer + specification", and then transformed into a high-dimensional dense vector to ensure that all drug vectors are in the same feature space.
[0029] Finally, the knowledge base is stored by batch writing the generated drug vector data into the Milvus vector database, establishing an association mapping between the vectors and the original drug attribute information (name, manufacturer, specifications, unique code); at the same time, the Milvus database is configured with an index type adapted for cosine similarity calculation. In this embodiment, IVF_FLAT is selected to optimize the subsequent retrieval efficiency, and finally a structured knowledge base dedicated to Western medicine / traditional Chinese medicine is formed that supports efficient vector retrieval and can trace back the original attributes.
[0030] Formula Granule Knowledge Base Construction: A dedicated formula granule vector knowledge base is built based on a standardized dataset containing three core dimensions: drug name, manufacturer, and unique code. During data preprocessing, only the drug name, manufacturer, and unique code are retained. During vector conversion, "formula granule name + manufacturer" is concatenated into a text string, and a 4096-dimensional vector is generated using the qianwen3-embedding-8b model. Specification fields are not involved in vector generation. The knowledge base is stored in a Milvus database, with an associated mapping established and an IVF_FLAT index configured. The underlying cosine similarity algorithm is used by default to measure vector similarity, with values ranging from [0,1]. Values closer to 1 indicate a higher semantic match between the name and manufacturer dimensions.
[0031] Construction of a Traditional Chinese Medicine (TCM) Knowledge Base: A dedicated TCM vector knowledge base is built based on a standardized dataset containing drug names and unique codes. After data preprocessing, only drug names and unique codes are retained; during vector transformation, only drug names are converted into high-dimensional dense vectors; the data is stored in a Milvus database and indexed with IVF_FLAT. The underlying layer uses a cosine similarity algorithm, measuring vector similarity only along the drug name dimension.
[0032] In step S102: the corresponding matching dimension is determined based on the drug type of the drug to be matched, specifically as follows: The matching of Western medicines and traditional Chinese medicines is the most complex module, requiring simultaneous verification of three core dimensions: drug name, specifications, and manufacturer. Therefore, for both Western and traditional Chinese medicines, the matching dimensions include drug name, manufacturer, and specifications, with the matching priority being name > manufacturer > specifications. The matching logic is as follows: first, a candidate drug set is obtained through vector retrieval; then, a Large Language Model (LLM) selects the most similar entries from the candidate set; finally, the matching of the three dimensions is verified in parallel, and the drug is determined to be "the same drug" only when all dimensions match.
[0033] For formulation granules, the matching dimensions include two dimensions: drug name and manufacturer. The matching priority is name > manufacturer. The matching logic is as follows: first, a candidate drug set is obtained through vector retrieval; then, LLM selects the most similar entries from the candidate set; and finally, the name and manufacturer dimensions are checked sequentially. Only when both dimensions match is it determined to be "the same drug".
[0034] For traditional Chinese medicine, the matching dimension only includes the name of the medicine. The matching logic is as follows: first, direct matching of the literal meaning of the name is performed. If the match is successful, the result is output directly. If it fails, vector retrieval and LLM optimization are performed to verify the name dimension. If the match is successful, it is determined to be "the same medicine".
[0035] In step S103: For Western medicine or traditional Chinese medicine, based on any complete information about the Western medicine / traditional Chinese medicine to be matched, the search text is concatenated according to the fixed format of "name + manufacturer + specification" to ensure that the search text is aligned with the generation rules of the knowledge base vector. The embedding method is called to convert the search text into a 4096-dimensional vector through the qianwen3-embedding-8b model, ensuring that the dimension and feature space are consistent with the vectors in the knowledge base. The generated vector is input into the Milvus vector database, and the top 40 drug vectors with the highest similarity are retrieved based on the cosine similarity algorithm. Based on the association mapping between the vector and the original attributes, the complete information of the candidate drugs, such as name, manufacturer, specification, and unique code, is traced back to form an initial candidate drug set.
[0036] To improve the accuracy of the candidate set, the initial candidate set was re-ranked using the gte-rerank-v2 fine ranking model. The re-ranking model then performed a fine similarity calculation on each drug data in the candidate set (still the name + manufacturer + specification organization text), and selected the top 10 data with the highest similarity as the final candidate drug set for subsequent selection by the large language model.
[0037] For formulated granules, based on the name and manufacturer information of the granules to be matched, the search text is concatenated in a fixed format of "name + manufacturer" to ensure alignment with the knowledge base vector generation rules. The embedding method is called, and the qianwen3-embedding-8b model is used to convert the search text into a 4096-dimensional vector, ensuring consistency with the knowledge base vector dimension and feature space. The generated vector is then input into the Milvus database, and the top 40 drug vectors with the highest cosine similarity are retrieved. The original attributes associated with the vectors are traced back to form an initial candidate drug set. The gte-rerank-v2 fine-ranking model is used to re-rank the initial candidate set (based on the "name + manufacturer" text of the formulated granules) and perform fine-grained similarity calculations to select the 10 most similar data as the final candidate set for LLM selection.
[0038] For traditional Chinese medicine (TCM) herbs, the search text only extracts the name field of the medicine to be matched, ensuring alignment with the knowledge base vector generation rules. The embedding method is called, and the name text is converted into a 4096-dimensional vector using the qianwen3-embedding-8b model. This vector is then fed into the Milvus database, and the top 20 medicine vectors with the highest cosine similarity are retrieved. The original associated attributes are traced back to form an initial candidate medicine set. The gte-rerank-v2 model is then used to perform fine-grained similarity calculations based solely on the "medicine name," selecting the 5 most similar data points as the final candidate set for LLM selection.
[0039] In step S104: In this embodiment, the candidate set selection process uses Moonshot-Kimi-K2-Instruct as the large language model.
[0040] Specifically: For Western medicines / traditional Chinese medicines, the reordered candidate drug set (including name, manufacturer, specification and unique code) is formatted into standardized text (e.g., drug code: 1, drug name: dequinoline chloride tablets, manufacturer name: Shantou Special Economic Zone Meiji Pharmaceutical Co., Ltd., specification: 24 tablets / box, 0.25mg per tablet), and the search text of the matching drug is concatenated according to the same format.
[0041] Then, LLM matching is performed. The formatted input text and candidate set are passed into the LLM. Matching is performed based on the preset large model prompt words (for Western medicine / Chinese patent medicine, the matching degree of name, manufacturer and specification needs to be considered, with name and manufacturer having higher priority than specification). The model selects the single complete drug data with the most similar overall from the candidate set and extracts its unique code (drugCode) as the output result.
[0042] Finally, the results are cleaned by performing structured parsing on the LLM return results, removing redundant text, verifying the legality of the JSON format, and finally extracting the unique code and the corresponding complete drug attributes (name, manufacturer, specifications) as "standard drug data" for subsequent multi-dimensional verification and judgment.
[0043] For formulated granules, the reordered candidate set results (including name, manufacturer, and unique code) are formatted into standardized text (e.g., drug code: 1, drug name: licorice, manufacturer name: XX Traditional Chinese Medicine Formula Granules Co., Ltd.). The search text for the formulated granule drugs to be matched is organized in the same format. Then, the formatted input text and candidate set are passed to the LLM. Matching is performed based on preset prompts (requiring only name and manufacturer matching degree, with name having higher priority than manufacturer, and must return a unique most similar entry). The Moonshot-Kimi-K2-Instruct model is called to filter out the single drug data with the most similar overall, and the complete entry is returned as the preferred result.
[0044] For traditional Chinese medicine, the reordered candidate set (including drug name and unique code) is formatted into standardized text (e.g., drug code: 1, drug name: licorice). Then, LLM matching is performed. The formatted input text and candidate set are passed to the LLM, which matches based on preset prompts (requiring only the name and returning the unique most similar entry). The Moonshot-Kimi-K2-Instruct model is called to filter out the single drug data with the most similar overall, and the complete entry is returned as the preferred result.
[0045] Through the above process, the LLM model and the differentiated prompt word strategy effectively optimize the candidate sets of different types of drugs, providing accurate standard drug data for the subsequent dimensional verification process.
[0046] In step S105: Different verification strategies are adopted according to different drug types, such as... Figure 2 As shown, Figure 2 This is a schematic diagram of the large model verification process in an embodiment of the present invention. The large model verification nodes are constructed in parallel or serial manner through the LangGraph workflow framework to achieve multi-dimensional matching verification.
[0047] Specifically, for Western medicines / traditional Chinese medicines, the standard drug data selected in step S104 is used as a benchmark. Parallel matching and verification are performed between the drug to be matched and the standard drug in three dimensions: name, manufacturer, and specifications. The drug is determined to be the same drug only when all dimensions match. The specific process is as follows: First, the name, manufacturer, and specifications of the drug to be matched are compiled and matched with the corresponding fields of the standard drug. A CnsWesJudgeState state object is then constructed as the data source for multi-dimensional validation. The CnsWesJudgeState state definition is as follows: class CnsWesJudgeState(BaseModel): data_drug_name: str = ""# Name of drug to be matched data_spec: str = ""# Drug specification to be matched data_manufacturer: str = ""# Drug manufacturer to be matched std_drug_name: str = ""# Standard drug name std_spec: str = ""# Standard drug specifications std_manufacturer: str = ""# Standard pharmaceutical manufacturer llm_name_output: Annotated[str, overwrite_reducer] = "" llm_spec_output: Annotated[str, overwrite_reducer] = "" llm_manufacturer_output: Annotated[str, overwrite_reducer] = "" summary_res: dict = Field(default_factory=dict) isSameDrug_final: str = ""# Final judgment result failure_reason: str = ""# Reason for failure A parallel workflow was built based on LangGraph, and a large language model was used to independently validate the drug name, manufacturer, and specifications. 1) Drug name verification: Based on name matching prompts, only the drug name to be matched and the standard drug are compared (ignoring interference such as specifications, manufacturers, and special characters). The Moonshot-Kimi-K2-Instruct model determines whether they are the same drug and returns "1 (yes) / 0 (no)".
[0048] 2) Manufacturer verification: Based on the manufacturer matching prompts, match the manufacturer's core information, determine whether a match is found, and return "1 / 0".
[0049] 3) Specification verification: Based on the specification matching prompt, only the drug specifications of the drug to be matched and the standard drug are compared. The core specification values are matched in a coarse-grained manner (ignoring packaging units, character order, and special symbols) to determine whether there is a match, and return "1 / 0".
[0050] The `summary_and_statistics` method is then called to aggregate the validation results of the three dimensions. Only when the validation results of the name, manufacturer, and specification are all "1", isSameDrug_final is determined to be 1, meaning that the drug to be matched is the same drug as the standard drug, and the unique code of the standard drug is returned. If any of the validation results is "0", isSameDrug_final is determined to be 0 (not the same drug), and the reason for failure is recorded (such as name validation failure, specification validation failure, etc.).
[0051] For formulation granules, a serial workflow is built using LangGraph. Based on the most similar standard drug data selected in step S104, a large model is used to determine whether the drug to be matched is the same as the standard drug based on two dimensions: name and manufacturer. If both name and manufacturer match, isSameDrug=1 (same drug), and the unique code of the standard drug is returned. If either dimension does not match, isSameDrug=0 (not the same drug), and the reason for failure is recorded (e.g., "drug name mismatch" or "manufacturer mismatch"). Finally, the returned results are parsed in a structured manner, redundant text is removed, the validity of the JSON format is verified, and the unique code and the reason for failure (if verification fails) are extracted.
[0052] For Chinese herbal medicines, the system determines whether the medicine to be matched and the standard medicine are the same medicine based solely on the name of the medicine. If the names match, isSameDrug = 1 (same medicine) and the unique code of the standard medicine is returned. If the names do not match, isSameDrug = 0 (not the same medicine) and the reason for failure is recorded, i.e., the medicine names do not match and they are not the same medicine.
[0053] Through the aforementioned differentiated large model verification strategy, this invention achieves accurate matching of different types of drugs, ensuring matching accuracy while improving processing efficiency through parallel or serial node design.
[0054] System Deployment Instructions: The method in this embodiment can be deployed on the following technology stack: the backend uses Java to provide the interface layer, and Python implements the large model inference layer; the large model used is Moonshot-Kimi-K2-Instruct; the workflow framework uses LangGraph (Python). The deployment process includes: deploying Java interface services to provide entry services for drug matching (such as DrugDictMappingService, WesternChineseService, ChineseHubService, GranuleService); deploying Python large model inference services, encapsulating the LangGraph workflow, and providing external calling interfaces; configuring prompt word templates and adjusting matching rules according to business needs; integrating the large model API and configuring calling keys and parameters; testing various types of drug matching scenarios to verify the accuracy and the accuracy of failure attribution.
[0055] In summary, this embodiment achieves efficient and accurate matching of the drug dictionary by constructing a dedicated vector knowledge base, intelligently determining drug types, generating candidate sets through semantic retrieval, optimizing large models, and performing multi-dimensional verification. It significantly reduces the cost of manual intervention and has good scalability.
[0056] Example 2 This embodiment provides a drug dictionary intelligent matching system based on multi-level intelligent agents, such as... Figure 3 As shown, including; The knowledge base construction module is configured to: build a dedicated vector knowledge base for different types of medicines, wherein different types of medicines include at least Western medicine / Chinese patent medicine, formula granules, and Chinese herbal medicines; The input module is configured to receive information about drugs to be matched and determine the corresponding matching dimension based on the drug type of the drug to be matched. The matching dimension includes at least one of the following: drug name, manufacturer, and specifications. The semantic retrieval module is configured to: convert the drug information to be matched into semantic vectors based on the text embedding model, perform vector retrieval in the dedicated vector knowledge base of the corresponding drug type to obtain an initial candidate drug set, reorder the initial candidate drug set, and generate a candidate drug set. The large model decision module is configured to: use a large language model to select standard drug data that is most similar to the drug to be matched from the candidate drug set; The matching and verification module is configured to: perform dimensional matching verification between the drug to be matched and the standard drug data according to the matching dimensions; if all matching dimensions pass the verification, it is determined that the drug to be matched and the standard drug data are the same drug, and the matching result is output.
[0057] It should be noted that each module in this embodiment corresponds one-to-one with each step in Embodiment 1, and their specific implementation processes are the same, so they will not be repeated here.
[0058] Example 3 This embodiment provides a computer device, such as... Figure 4 As shown, the system includes a computer-readable storage medium 1003, a processor 1001, a communication interface 1002, and a computer program stored on the computer-readable storage medium 1003 and executable on the processor 1001. The processor 1001, communication interface 1002, and computer-readable storage medium 1003 can be connected via a bus or other means. The communication interface 1002 is used to receive and send data. When the processor 1001 executes the program, it implements the steps in the domain-adaptive UAV fault diagnosis method described in Embodiment 1 above.
[0059] Those skilled in the art will recognize that the units and algorithm steps described in connection with the various examples of this embodiment can be implemented in electronic hardware or a combination of computer software and electronic hardware. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this invention.
[0060] Example 4 This embodiment also provides a computer-readable storage medium for storing computer instructions, which, when executed by a processor, complete the method described in Embodiment 1.
[0061] The above description is merely a preferred embodiment of the present invention and is not intended to limit the invention. Various modifications and variations can be made to the present invention by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the scope of protection of the present invention.
Claims
1. A drug dictionary intelligent matching method based on multi-level intelligent agents, characterized in that, include: Construct a dedicated vector knowledge base for different types of drugs, wherein the different types of drugs include at least Western medicine / Chinese patent medicine, formula granules, and Chinese herbal medicines; Receive information on drugs to be matched, and determine the corresponding matching dimension based on the drug type of the drug to be matched. The matching dimension includes at least one of the following: drug name, manufacturer, and specifications. Based on the text embedding model, the information of the drug to be matched is converted into a semantic vector. Vector retrieval is performed in the dedicated vector knowledge base of the corresponding drug type to obtain an initial candidate drug set. The initial candidate drug set is then reordered to generate a candidate drug set. A large language model is used to filter out standard drug data that is most similar to the drug to be matched from the candidate drug set; Based on the matching dimensions, the drug to be matched and the most similar standard drug data are subjected to dimension matching verification. If all matching dimensions pass the verification, the drug to be matched and the standard drug entry are determined to be the same drug, and the matching result is output.
2. The method according to claim 1, characterized in that: The matching dimension is determined based on the type of drug to be matched, specifically: If the drug type is Western medicine or traditional Chinese medicine, the matching dimensions include three dimensions: drug name, manufacturer, and specifications, and the dimension matching verification is performed in parallel. If the drug type is a formula granule, the matching dimensions include two dimensions: drug name and manufacturer, and the dimension matching verification is a serial verification. If the drug type is traditional Chinese medicine, then the matching dimension includes one dimension: the drug name.
3. The method according to claim 1, characterized in that, The construction of the vector knowledge base also includes: Standardized drug data is cleaned and structured to retain the core attributes of the drugs; A text embedding model is used to concatenate the core fields of each drug data point into a unified text string and then transform it into a high-dimensional dense vector. The generated vector data is written into the vector knowledge base in batches, establishing an association mapping between the vectors and the original drug attribute information, and configuring an index type suitable for cosine similarity calculation for the vector data; The cosine similarity algorithm is used as the metric for vector similarity.
4. The method according to claim 1, characterized in that, The text embedding model is the qianwen3-embedding-8b model, and the large language model is the Moonshot-Kimi-K2-Instruct model.
5. The method according to claim 1, characterized in that, Based on the aforementioned vector knowledge base, similarity retrieval is performed on the semantic vectors to generate a candidate drug set, specifically including: The semantic vectors are input into the Milvus vector database. The top N semantic vectors with the highest similarity are retrieved based on the cosine similarity algorithm. Based on the association mapping between the vectors and the original attributes, the complete information of the candidate drugs is traced back to form an initial candidate drug set. The reordering model is used to perform fine similarity calculations on each drug data in the initial candidate drug set, and the top M drugs with the highest similarity are selected as the final candidate drug set.
6. The method according to claim 2, characterized in that, The parallel verification is implemented through a parallel workflow built using LangGraph, which simultaneously triggers verification of three dimensions: drug name, specifications, and manufacturer, and summarizes the verification results of each dimension.
7. The method according to claim 2, characterized in that, The serial verification is implemented through a serial workflow built using LangGraph, which sequentially completes candidate set optimization and dimension verification.
8. A drug dictionary intelligent matching system based on multi-level intelligent agents, characterized in that, include: The knowledge base construction module is configured to: construct a dedicated vector knowledge base for different types of medicines, wherein the different types of medicines include at least Western medicine / Chinese patent medicine, formula granules, and Chinese herbal medicines; The input module is configured to: receive information about drugs to be matched, and determine the corresponding matching dimension based on the drug type of the drug to be matched, wherein the matching dimension includes at least one of drug name, manufacturer, and specifications; The semantic retrieval module is configured to: convert the drug information to be matched into semantic vectors based on a text embedding model; perform vector retrieval in a dedicated vector knowledge base for the corresponding drug type to obtain an initial candidate drug set; and reorder the initial candidate drug set to generate a candidate drug set. The large model decision module is configured to: use a large language model to select standard drug data that is most similar to the drug to be matched from the candidate drug set; The matching verification module is configured to: perform dimensional matching verification between the drug to be matched and the standard drug data according to the matching dimensions; if all matching dimensions pass the verification, it is determined that the drug to be matched and the standard drug data are the same drug, and the matching result is output.
9. A computer device comprising a computer-readable storage medium, a processor, and a computer program stored on the computer-readable storage medium and executable on the processor, characterized in that, When the processor executes the program, it implements the steps in the intelligent matching method for drug dictionaries based on multi-level intelligent agents as described in any one of claims 1-7.
10. A computer-readable storage medium having a computer program stored thereon, the program being executed by a processor to implement the steps of the intelligent matching method for a drug dictionary based on a multi-level intelligent agent as described in any one of claims 1-7.