A method, storage medium, device and equipment for generating an image diagnosis report based on an audio input

By receiving voice data, converting it into text, and using an attention-based coding model for error correction and entity recognition, the problem of inaccurate image diagnostic report template search was solved, achieving high accuracy and stable generation of image diagnostic reports.

CN122245589APending Publication Date: 2026-06-19AIR FORCE MEDICAL CENT PLA

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
AIR FORCE MEDICAL CENT PLA
Filing Date
2026-03-18
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

The accuracy of image diagnostic report template search in the existing technology is poor. The keywords input by voice may contain errors after being converted into text, resulting in inaccurate templates. Furthermore, simple semantic matching is insufficient to accurately search for image diagnostic report templates.

Method used

By receiving voice data and converting it into initial text data, errors are reduced using recognition confidence and error correction processing. The word weights of medical terms are enhanced by combining an attention-based encoding model. Named entity recognition and entity verification are then performed to determine standardized text and match target templates.

🎯Benefits of technology

It improves the accuracy and stability of image diagnostic report generation. By identifying confidence thresholds to trigger error correction and matching degree calculation rules, it ensures the controllability and consistency of template retrieval, thereby improving the accuracy of report generation.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122245589A_ABST
    Figure CN122245589A_ABST
Patent Text Reader

Abstract

This invention relates to the field of medical information technology, specifically to a method, storage medium, apparatus, and device for generating image diagnostic reports based on audio input. The method includes: receiving voice data sent by a terminal device; converting the voice data into corresponding initial text data; determining words corresponding to the text data through initial word segmentation and error correction word segmentation; determining a first semantic feature corresponding to the text data using an attention-based encoding model; determining medical entities contained in the text data based on the first semantic feature; determining standardized text corresponding to the text data; determining a target template matching the standardized text from candidate templates; and receiving an image diagnostic report corresponding to the target template from the terminal device. This invention reduces speech-to-text errors by identifying error correction processing triggered by a confidence threshold, improving the accuracy and stability of image diagnostic report generation, and solving the problem of low template matching accuracy in existing technologies.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of medical information technology, and in particular to a method, storage medium, apparatus, and device for generating image diagnostic reports based on audio input. Background Technology

[0002] Currently, the generation of imaging diagnostic reports usually involves physicians manually entering or voice-entering keywords to search for the required imaging diagnostic report template based on the imaging results. Physicians then modify the imaging diagnostic report template themselves to obtain a complete imaging diagnostic report.

[0003] In existing technologies, image diagnostic report generation systems typically use semantic matching algorithms to determine image diagnostic report templates semantically similar to keywords manually or verbally input by the user, and then provide the corresponding image diagnostic report templates to doctors. However, existing technologies have the following problems: the converted text of voice-input keywords may contain errors that do not match the doctor's original meaning, leading to inaccurate image diagnostic report templates; and simple semantic matching is insufficient to accurately retrieve image diagnostic report templates. Therefore, it is urgent to introduce quantifiable recognition quality evaluation and error correction mechanisms into the speech-to-text process to reduce the impact of speech recognition errors on the accuracy of subsequent template retrieval and report generation.

[0004] To address the technical problem of poor accuracy in searching image diagnostic report templates in the existing technology, no effective solution has yet been proposed. Therefore, this invention provides a method, storage medium, apparatus, and device for generating image diagnostic reports based on audio input. Summary of the Invention

[0005] In view of the above-mentioned technical problems, the present invention provides a method, storage medium, apparatus and device for generating image diagnostic reports based on audio input, so as to at least solve the technical problem of poor accuracy of image diagnostic report template search in the prior art.

[0006] A first aspect of the present invention provides a method for generating an image diagnostic report based on audio input, comprising: receiving voice data sent by a terminal device, wherein the voice data is entered by a user through the terminal device; converting the voice data into initial text data corresponding to the voice data, and obtaining a recognition confidence level corresponding to the initial text data; wherein, if the recognition confidence level is not less than a preset threshold, performing initial word segmentation directly on the initial text data; wherein, if the recognition confidence level is less than the preset threshold, using the initial text data as error-corrected text data, and performing error-corrected word segmentation on the error-corrected text data; determining words corresponding to the text data through the initial word segmentation and the error-corrected word segmentation; determining a first semantic feature corresponding to the text data using an attention-based encoding model, wherein the encoding model determines the weights corresponding to the key vectors of the words according to pre-defined medical terms; performing entity recognition on the text data according to the first semantic feature to determine the medical entities contained in the text data; determining standardized text corresponding to the text data according to the medical entities contained in the text data; determining a target template matching the standardized text from candidate templates, sending the target template to the terminal device, and receiving an image diagnostic report corresponding to the target template from the terminal device.

[0007] A second aspect of the present invention also provides a storage medium comprising a stored program, wherein the methods described above are executed by a processor when the program is running.

[0008] A third aspect of the present invention also provides an apparatus for generating an image diagnostic report based on audio input, comprising: a receiving module for receiving voice data sent by a terminal device, the voice data being entered by a user on the terminal device; an audio-to-text module for converting the voice data into initial text data corresponding to the voice data, obtaining the recognition confidence level corresponding to the initial text data, and determining words corresponding to the text data; an attention encoding module for determining a first semantic feature corresponding to the text data using an attention-based encoding model, wherein the encoding model determines the weights corresponding to the key vectors of the words based on pre-defined medical terms; an entity recognition module for performing entity recognition on the text data based on the first semantic feature, and determining the medical entities contained in the text data; a normalized output module for determining normalized text corresponding to the text data based on the medical entities contained in the text data; and a sending module for determining a target template matching the normalized text from candidate templates, sending the target template to the terminal device, and receiving an image diagnostic report corresponding to the target template from the terminal device.

[0009] A fourth aspect of the present invention also provides an apparatus for generating an image diagnostic report based on audio input, comprising: a processor; and a memory connected to the processor, for providing the processor with instructions to process the following steps: receiving voice data sent by a terminal device, the voice data being entered by a user on the terminal device; converting the voice data into initial text data corresponding to the voice data, obtaining a recognition confidence level corresponding to the initial text data, and determining words corresponding to the text data; using an attention-based encoding model to determine a first semantic feature corresponding to the text data, wherein the encoding model determines the weights corresponding to the key vectors of the words based on pre-defined medical terms; performing entity recognition on the text data based on the first semantic feature to determine the medical entities contained in the text data; determining standardized text corresponding to the text data based on the medical entities contained in the text data; determining a target template matching the standardized text from candidate templates, sending the target template to the terminal device, and receiving an image diagnostic report corresponding to the target template from the terminal device.

[0010] Compared with existing technologies, the beneficial effects of the technical solution of this invention are as follows:

[0011] The technical solution of this invention, after acquiring the user's voice data used to search for image report templates, converts the voice data into corresponding text data. Based on an encoding model, the weight of words matching medical terms in the attention key vector is increased to obtain the first semantic feature, thus realizing the conversion from voice to text. Then, based on this first semantic feature, named entity recognition is performed on the text data. During the named entity recognition process, the role of words already identified as medical terms is strengthened, and after obtaining the medical entities contained in the text data through named entity recognition, the text data is converted into standardized text. Through entity verification, accurate conversion from text data to standardized data is achieved. Furthermore, by matching the standardized text with the target template, the image diagnostic report template matching the user's voice input is further accurately determined. Specifically, error correction triggered by a confidence threshold reduces the voice-to-text error, and the controllability and consistency of template retrieval are improved through matching degree calculation rules, thereby improving the accuracy and stability of image diagnostic report generation and solving the technical problem of low template matching accuracy in existing technologies. Attached Figure Description

[0012] The accompanying drawings, which are included to provide a further understanding of this disclosure and form part of this application, illustrate exemplary embodiments of this disclosure and are used to explain this disclosure, but do not constitute an undue limitation of this disclosure. In the drawings:

[0013] Figure 1 This is a hardware structure block diagram of the computing device in Example 1;

[0014] Figure 2 This is a schematic diagram of the image diagnostic report generation system based on audio input in Example 1;

[0015] Figure 3 This is a flowchart illustrating the first aspect of Embodiment 1: an image diagnostic report generation method based on audio input.

[0016] Figure 4 This is a schematic diagram of the edges between nodes in the first aspect of Embodiment 1;

[0017] Figure 5 This is a schematic diagram of the image diagnostic report generation device based on audio input in Example 2;

[0018] Figure 6 This is a schematic diagram of the image diagnostic report generation device based on audio input in Example 3. Detailed Implementation

[0019] To enable those skilled in the art to better understand the technical solutions of this disclosure, the technical solutions of the embodiments of this disclosure will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are merely some embodiments of this disclosure, and not all embodiments. Based on the embodiments of this disclosure, all other embodiments obtained by those skilled in the art without creative effort should fall within the scope of protection of this disclosure.

[0020] Example 1: Example 1 is a specific implementation of the image diagnostic report generation method based on audio input of the present invention, which describes in detail the specific process from audio input to diagnostic report.

[0021] According to the method for generating an image diagnostic report based on audio input provided in this embodiment, it should be noted that the steps shown in the flowchart in the accompanying drawings can be executed in a computer system such as a set of computer-executable instructions. Furthermore, although a logical order is shown in the flowchart, in some cases, the steps shown or described may be executed in a different order than that shown here.

[0022] The method embodiments provided in this example can be executed on a computer terminal, server, or similar computing device. For example... Figure 1As shown, this embodiment provides a hardware structure block diagram of a computing device for implementing an audio input-based image diagnostic report generation method. The computing device may include one or more processors (processors may include, but are not limited to, microprocessors such as MCUs or programmable logic devices such as FPGAs), a memory for storing data, a transmission device for communication functions, and an input / output interface. The memory, transmission device, and input / output interface are connected to the processor via a bus. In addition, it may also include a display, keyboard, and cursor control device connected to the input / output interface. Those skilled in the art will understand that... Figure 1 The structure shown is for illustrative purposes only and does not limit the structure of the aforementioned electronic device. For example, a computing device may also include... Figure 1 The more or fewer components shown, or having the same Figure 1 The different configurations shown.

[0023] It should be noted that the aforementioned one or more processors and / or other data processing circuits are generally referred to herein as "data processing circuits". These data processing circuits may be embodied, in whole or in part, in software, hardware, firmware, or any other combination thereof. Furthermore, the data processing circuits may be a single, independent processing module, or may be integrated, in whole or in part, into any other element in a computing device. As involved in the embodiments of this disclosure, the data processing circuits serve as processor control (e.g., selection of a variable resistor termination path connected to an interface).

[0024] The memory can be used to store software programs and modules of application software, such as the program instructions / data storage device corresponding to the image diagnostic report generation method in this embodiment of the present disclosure. The processor executes various functional applications and data processing by running the software programs and modules stored in the memory, thereby realizing the image diagnostic report generation method of the aforementioned application. The memory may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some instances, the memory may further include memory remotely located relative to the processor, and these remote memories can be connected to the computing device via a network. Examples of the aforementioned networks include, but are not limited to, the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.

[0025] The transmission device is used to receive or send data via a network. Specific examples of the network described above may include a wireless network provided by the computing device's communication provider. In one example, the transmission device includes a Network Interface Controller (NIC), which can connect to other network devices via a base station to communicate with the Internet. In another example, the transmission device may be a Radio Frequency (RF) module used for wireless communication with the Internet.

[0026] The display can be, for example, a touchscreen liquid crystal display (LCD), which allows users to interact with the user interface of the computing device.

[0027] It should be noted here that, in some optional embodiments, the above... Figure 1 The computing device shown may include hardware elements (including circuitry), software elements (including computer code stored on a computer-readable medium), or a combination of both hardware and software elements. It should be noted that... Figure 1 This is only one instance of a specific particular instance, and is intended to illustrate the types of components that may exist in the aforementioned computing devices.

[0028] like Figure 2 The diagram illustrates an audio-input-based image diagnostic report generation system according to this embodiment. The system includes a terminal device 100 and a server 200. The terminal device 100 is connected to the server 200 via a network. A radiologist can input voice data on the terminal device 100 to search for a desired medical image diagnostic report template. The terminal device 100 sends the voice data to the server 200. The server 200 converts the voice data into text, determines the target template, vectorizes the standardized text and each candidate template, and calculates the text similarity between the standardized text and each candidate template. If the text similarity is greater than a preset similarity threshold, the candidate template with the highest text similarity is selected as the target template. The target template is then sent to the terminal device 100, and the physician completes the image diagnostic report based on the target template using the terminal device 100.

[0029] It should be noted that the terminal device 100 and server 200 in the system can both use the hardware structure described above.

[0030] Under the aforementioned operating environment, according to the first aspect of this embodiment, a method for generating image diagnostic reports based on audio input is provided. This method comprises... Figure 2 The server 200 shown is implemented. Figure 3 A flowchart illustrating the method is shown below. (Refer to...) Figure 3 As shown, the method includes:

[0031] S302: Receive voice data sent by the terminal device, which is recorded by the user on the terminal device;

[0032] S304: Convert the speech data into initial text data corresponding to the speech data, and obtain the recognition confidence level corresponding to the initial text data. If the recognition confidence level is not less than a preset threshold, directly perform initial word segmentation on the initial text data; if the recognition confidence level is less than the preset threshold, use the initial text data as the corrected text data; perform corrected word segmentation on the corrected text data; determine the words corresponding to the text data through the initial word segmentation and the corrected word segmentation.

[0033] Specifically, when the identification confidence level is less than a preset threshold, the initial text data is subjected to error correction processing to obtain corrected text data. The error correction processing specifically includes identifying suspected erroneous words from the initial text data based on a pre-set medical terminology list, and generating at least one candidate error correction word for each suspected erroneous word; determining the score of each candidate error correction word in the context of the initial text data based on the first semantic feature, and selecting the candidate error correction word with the highest score to replace the suspected erroneous word to obtain corrected text data.

[0034] S306: Using an attention-based encoding model, determine the first semantic features corresponding to the text data, wherein the encoding model determines the weights corresponding to the key vectors of words based on pre-defined medical terms;

[0035] S308: Based on the first semantic feature, perform entity recognition on the text data to determine the medical entities contained in the text data;

[0036] S310: Based on the medical entities contained in the text data, determine the standardized text corresponding to the text data;

[0037] S312: Determine the target template that matches the standardized text from the candidate templates, wherein the text similarity between the standardized text and each candidate template is calculated as the matching degree, and the candidate template with the highest matching degree and greater than the preset matching threshold is selected as the target template, the target template is sent to the terminal device, and the image diagnosis report corresponding to the target template is received from the terminal device.

[0038] Specifically, in this embodiment, the user (physician) inputs the required image diagnostic report template for voice search through the terminal device 100, thereby enabling the terminal device 100 to obtain the voice data input by the user. Then, the terminal device can send the voice data to the server 200. The server 200 receives the voice data sent by the terminal device 100 (S302) and can convert the voice data into text data corresponding to the voice data, and obtain the recognition confidence level corresponding to the initial text data. If the recognition confidence level is not less than a preset threshold, initial word segmentation is directly performed on the initial text data. If the recognition confidence level is less than the preset threshold, error correction processing is performed on the initial text data to obtain corrected text data, and error-corrected word segmentation is performed on the corrected text data. Through initial word segmentation and error-corrected word segmentation, the words corresponding to the text data are determined.

[0039] The error correction process can be implemented based on medical terminology constraints and contextual semantic scoring. Specifically, server 200 can identify suspected erroneous words from the text data based on a preset medical terminology list (e.g., RadLex, SNOMED CT terminology set, or a self-built medical terminology database), and generate a set of candidate correction words for each suspected erroneous word. These candidate correction words can be generated based on homophone / near-phoneme rules, spelling similarity, or edit distance. Subsequently, server 200 can score each candidate correction word in conjunction with the semantic features of the context in which the suspected erroneous word is located (e.g., the first semantic feature obtained in subsequent step S306), and select the candidate correction word with the highest score to replace the suspected erroneous word, thereby obtaining the corrected text data. Furthermore, when the score difference of the candidate correction words is less than a preset threshold, the set of candidate correction words can be sent to terminal device 100 for physician confirmation and selection, or the physician can be prompted to repeat the corresponding speech segment and trigger re-recognition to improve the accuracy of speech input. By segmenting the corrected text data, the words corresponding to the text data can be determined (S304). That is, the words contained in the text data can be determined by segmentation.

[0040] One approach is to employ an audio-to-text model based on a Conformer-Transformer hybrid architecture to convert speech data into text data. The Conformer model combines the local perception capabilities of a Convolutional Neural Network (CNN) with the global attention mechanism of a Transformer, specifically designed for processing sequential data. It effectively captures local features and long-range dependencies in speech signals; in this embodiment, the Conformer is used as the encoder. The Transformer is used as the decoder, receiving the hidden representations from the Conformer encoder and converting them into a text sequence in the target language. Furthermore, for this audio-to-text model, adversarial training methods can be used to perform medical-domain transfer learning on a general model pre-trained on the LibriSpeech dataset (using medically relevant training samples for transfer learning), thereby obtaining an audio-to-text model well-suited for the medical field.

[0041] It should be noted that this manual mainly describes the processing of voice data by the server as the execution entity. Of course, the terminal device 100 can also perform some processing on the voice data (including speech-to-text, confidence assessment and error correction, and subsequent entity recognition operations) to obtain standardized text, and then send the standardized text to the server 200. The server 200 then matches the required medical image diagnosis report template for the user based on the final standardized text.

[0042] Then, server 200 can use an attention-based encoding model to determine the first semantic feature corresponding to the text data. The encoding model determines the weights corresponding to the key vectors of words based on pre-defined medical terms (S306). The purpose of step S306 is to increase the weight of the key vectors corresponding to words belonging to medical terms, thereby strengthening the influence of medical terms in the text data on named entity recognition. How to determine the first semantic feature will be explained below.

[0043] Then, server 200 can perform entity recognition on the text data based on the first semantic feature to determine the medical entities contained in the text data (S308). The medical entities mentioned here can include entities under various medical-related entity types, such as diseases, observations, operations, drugs, and treatment plans, as mentioned below. In other words, this step maps each word in the text data to medical entities, thereby facilitating the standardization of the text data and the search for target templates in subsequent processes.

[0044] Then, server 200 can determine the standardized text corresponding to the text data based on the medical entities contained in the text data (S310). The standardized text mentioned here refers to converting each word in the text data into medical terms as much as possible and mapping it to the encoding in the disease classification standard (such as ICD-11) (i.e., the standard encoding mentioned below). Then, it determines the target template that matches the standardized text from the candidate templates, sends the target template to the terminal device, and receives the image diagnosis report corresponding to the target template from the terminal device (S312).

[0045] As mentioned in the background section, in existing technologies, image diagnostic report generation systems typically use semantic matching algorithms to match semantically similar image diagnostic report templates from a template library based on keywords manually input by the user or input via voice, thereby providing the corresponding image diagnostic report templates to doctors. However, existing technologies have the following problems: the keywords input via voice may contain errors after being converted into text, which may not match the doctor's original meaning, resulting in inaccurate image diagnostic report templates. Furthermore, simple semantic matching is insufficient to accurately retrieve image diagnostic report templates.

[0046] In view of this, according to the technical solution of this embodiment, after obtaining the voice data used by the user to search for image report templates, the voice data is converted into corresponding initial text data, and error correction processing is triggered based on the recognition confidence and a preset threshold to obtain corrected text data and reduce speech-to-text errors. Based on the encoding model, the weight of words matching medical terms in the attention key vector is increased to obtain a first semantic feature. Then, based on this first semantic feature, named entity recognition is performed on the text data, thereby strengthening the role of words already identified as medical terms in named entity recognition. After obtaining the medical entities contained in the text data through named entity recognition, the text data is converted into standardized text, and then candidate template screening and target template determination are completed based on matching degree calculation rules. This allows for more accurate determination of image diagnosis report templates that match the user's voice input.

[0047] Optionally, the operation of entity recognition on the text data based on the first semantic feature includes: determining entity information in the text data corresponding to each entity type based on the first semantic feature, so as to obtain the medical entities contained in the text data, wherein each entity type includes: disease, sight, operation, drug and treatment plan.

[0048] Entity recognition is performed on text data to obtain medical entities. Specifically, this can be based on the SNOMED CT system (a health and medical service terminology system) to identify entities under various entity types contained in the text data. Each entity type corresponds to a class of entities related to medicine. Entity types can include diseases, findings, procedures, drugs, locations, and treatment plans. Entity information can refer to specific words in the text data that belong to a particular entity type. For example, assuming the text data contains words such as "kidney" and "nodule," if the entity type identified for the word "kidney" is location, then the text data contains the entity information "kidney" under the location entity type. If the entity type identified for the word "nodule" is disease, then the text data contains the entity information "nodule" under the disease entity type.

[0049] Optionally, the operation of determining the first semantic feature corresponding to the text data using an attention-based encoding model includes: determining the first weight corresponding to each word, wherein the first weight of words belonging to medical terms is greater than the first weight of words not belonging to medical terms; inputting each word corresponding to the text data into the attention-based encoding model to determine the first key vector, first query vector, and first value vector corresponding to each word, and weighting the key vectors corresponding to the corresponding words according to the first weights to obtain the weighted first key vector; determining the attention weight corresponding to each word according to the first query vector and the weighted first key vector; determining the first word vector corresponding to each word according to the attention weight and the first value vector; and determining the first semantic feature corresponding to the text data according to the first word vector.

[0050] In other words, before entity recognition, the semantic features of the text data (i.e., the first semantic features mentioned above) are determined using terminology enhancement within the attention mechanism. First, it is determined whether each word in the text data belongs to a medical term. Specifically, this can be done using the RadLex 3.14 terminology database (containing 21,793 medical entities). Based on this, a first weight is determined for each word, which is used for terminology enhancement. Therefore, the first weight corresponding to a word belonging to a medical term can be higher than the first weight corresponding to a word not belonging to a medical term. Then, using the attention mechanism, the key vector, query vector, and value vector (i.e., the first key vector, the first query vector, and the first value vector) corresponding to each word are determined. Then, the key vectors of the corresponding words are weighted according to their first weights to obtain the weighted key vectors corresponding to each word.

[0051] The weighted key vector is determined using the following formula:

[0052]

[0053] in, Let j be the weighted key vector of the j-th word. The first weight corresponding to the j-th word, Let be the key vector of the j-th word.

[0054] Then, server 200 continues to use the original self-attention method to determine the first semantic feature of the text data based on the weighted key vector, query vector, and value vector corresponding to each word. Specifically, it determines the attention weight corresponding to each word based on the query vector and weighted key vector; it determines the word vector (i.e., the first word vector) corresponding to each word based on the attention weight and value vector; and it determines the first semantic feature corresponding to the text data based on the word vector, as shown in the following formula:

[0055]

[0056]

[0057]

[0058] Where m is the number of words in the text data. Let j be the weighted key vector corresponding to the j-th word. Let i be the query vector corresponding to the i-th word. The attention weight between the i-th word and the j-th word is... For the value vector corresponding to the j-th word, This is the word vector corresponding to the i-th word (i.e., the first word vector).

[0059] This strengthens the influence of medical terms on named entity recognition in text data by increasing the weight of the key vectors corresponding to medical terms.

[0060] It should be noted that the encoding model includes a semantic extraction model and a self-attention module. The self-attention module is mainly used to implement the above steps from determining the key vector, query vector and value vector corresponding to each word to determining the first semantic feature. Before determining the self-attention module, the semantic extraction model can first process the word vectors of each word in the text data, and then the self-attention module can determine the key vector, query vector and value vector corresponding to each word based on the word vectors of each word.

[0061] Optionally, the operation of entity recognition of text data based on the first semantic feature further includes: generating a second semantic feature corresponding to the first semantic feature using a second encoding model based on an attention mechanism, based on a first knowledge graph related to medical terms, wherein node pairs with relationships in the first knowledge graph have a second weight related to the magnitude of the relationship between the node pairs; and determining entity information corresponding to each entity type in the text data based on the second semantic feature.

[0062] Specifically, in this application, although the first semantic features described above can be directly used for subsequent operations to determine the standardized text, considering that the subsequent operations to determine the standardized text also require the assistance of a knowledge graph (i.e., the second knowledge graph described later), it is further possible to generate second semantic features corresponding to the first semantic features based on the first semantic features, using a second encoding model based on an attention mechanism according to the pre-deployed first knowledge graph, and then determine the entity information corresponding to each entity in the text data based on the second semantic features.

[0063] The first knowledge graph can be the same as the knowledge graph described later, or it can be a separately deployed knowledge graph. The first knowledge graph is generated using literature containing medical terminology through named entity recognition and relation extraction, or it can be obtained through manual annotation. However, in the first knowledge graph, for two related nodes (node ​​pairs), they have a weight (i.e., a second weight) reflecting the strength of the relationship between the two nodes. This weight can be, for example, 0-1; the larger the weight value, the stronger the relationship between the two nodes.

[0064] like Figure 4 As shown, in a knowledge graph, the node "nodule" can be associated with either the node "lung" or the node "breast." Therefore, "nodule" and "lung" form one node pair; "nodule" and "breast" form another node pair. Furthermore, the weight between nodes "nodule" and "lung" in the knowledge graph could be, for example, 0.6; the weight between nodes "breast" and "nodule" could be, for example, 0.4, and so on.

[0065] How to use the attention-based second encoding model to generate the second semantic features corresponding to the first semantic features will be explained in detail below.

[0066] In this way, since the second semantic feature is generated using an attention-based encoding model based on the relationships between nodes in the knowledge graph, it can more accurately match the knowledge graph in subsequent operations that use the knowledge graph to determine standardized text, thereby using the knowledge graph to resolve ambiguities in the text data. The specific method for generating the second semantic feature will be explained in detail below.

[0067] Further, based on the first knowledge graph, the operation of generating second semantic features corresponding to the first semantic features using a second encoding model based on an attention mechanism includes: inputting the first word vector into the second encoding model based on an attention mechanism to determine the second key vector, second query vector, and second value vector corresponding to each word; determining the target word to be encoded from the words in the text data; determining the target node and the associated node corresponding to the target word in the first knowledge graph; determining the associated words corresponding to the associated nodes in the words in the text data; determining the second key vector of each word relative to the target word based on the second weight between the target node and the associated node; determining the second attention weight of each word relative to the target word based on the second query vector and the weighted second key vector of the target word; determining the second word vector corresponding to the target word based on the second attention weight and the second value vector corresponding to each word; and determining the second semantic features corresponding to the text data based on the second word vectors corresponding to each word.

[0068] Specifically, in the process of generating the second semantic feature, the word vector f is first... j (i.e., the first word vector) is input into the second encoding model to generate a word vector f. j The corresponding query vector Q j (i.e., the second query vector), key vector K j (i.e., the second key vector) and the value vector V j (i.e., the second value vector).

[0069] For the i-th word (i.e. the target word), in the first knowledge graph, the node corresponding to the i-th word (i.e. the target node) is matched, and the associated nodes that have a relationship with the target node are determined.

[0070] Then, among each word, identify the related words that match the associated node. That is, the related word can be regarded as a related word that has a relationship with the target word (i.e., the i-th word).

[0071] Then, based on the second weight between the associated nodes and the target node, the second key vector K of each word is determined after weighting relative to the target word (i.e., the i-th word).i,j :

[0072]

[0073] Among them, K i,j The key vector (i.e., the second key vector) is the weighted key vector of the j-th word relative to the target word (i.e., the i-th word). As the target word changes, the key vector K of the j-th word relative to the target word also changes. i,j They are different. i,j The weight between the j-th word and the target word (i.e., the i-th word) is determined based on the weight (i.e., the second weight) between the node corresponding to the j-th word in the first knowledge graph and the node corresponding to the target word (i.e., the i-th word). Specifically, when the j-th word is a related word to the target word (i.e., the i-th word), r... i,j It equals the second weight between these two nodes, and when i=j, r i,j The value is equal to 1; when the j-th word is not a related word to the i-th word, r i,j It equals 0. b is a preset constant, which is set to "1" in this application.

[0074] Then, based on the query vector Q of the target word (i.e., the i-th word) i and the weighted second key vector K i,j Determine the second attention weight W corresponding to each word and the target word (i.e., the i-th word). i,j :

[0075]

[0076]

[0077] Then, based on the second attention weight W corresponding to each word... i,j and the second value vector V corresponding to each word j Determine the second word vector F corresponding to the target word (i.e., the i-th word). i :

[0078]

[0079] Therefore, for each word, the corresponding second word vector is calculated in the above manner to determine the second semantic feature corresponding to the text data.

[0080] Therefore, the second semantic feature calculated in the above way is generated by an attention-based encoding model based on the relationship between nodes in the first knowledge graph. As a result, it can match the knowledge graph more accurately in the subsequent operation of determining standardized text using the knowledge graph, thereby using the knowledge graph to resolve ambiguities of words in the text data.

[0081] Optionally, the operation of determining the standardized text corresponding to the text data based on the medical entities contained in the text data includes: determining the target medical term that matches the entity information corresponding to the medical entity and the similarity between the entity information and the target medical term; if the similarity is less than a preset value, correcting the medical entity according to a preset second knowledge graph to obtain a corrected entity, and determining the standard code corresponding to the corrected entity; and if the similarity is not less than a preset value, determining the standard code corresponding to the medical entity contained in the text data, wherein the standard code is a code in the disease classification standard; and determining the standardized text corresponding to the text data based on the corrected entity and the standard code corresponding to the corrected entity, and / or the medical entity and the standard code corresponding to the medical entity.

[0082] In other words, after the server 200 identifies the various medical entities contained in the text data, it verifies and corrects each medical entity to make every word in the text data as medically relevant as possible and accurately maps it to the code of the disease classification standard, thereby obtaining standardized text. Then, when matching the standardized text with each candidate template, it accurately matches the image diagnosis report template required by the physician.

[0083] First, server 200 determines the target medical term that matches the entity information corresponding to the medical entity and the similarity between the entity information and the target medical term. Specifically, for a medical entity, the similarity between the entity information corresponding to the medical entity and each medical term in the SNOMED CT system can be determined, and the medical term with the highest similarity is taken as the target medical term. Thus, based on the similarity between the entity information corresponding to the medical entity and the target medical term, it is determined whether the medical entity needs to be corrected.

[0084] If the similarity is not less than a preset value (e.g., 0.8), no correction is needed for the medical entity. If the similarity is less than the preset value, the medical entity is corrected using a preset second knowledge graph. This preset knowledge graph contains nodes corresponding to each medical term, and edges between nodes representing the relationships between the medical terms. How the preset knowledge graph is used to correct medical entities will be explained below.

[0085] In other words, some medical entities in the text data do not need to be corrected, while others do. Ultimately, the text data may contain some medical entities initially identified through entity recognition, as well as some corrected entities. The process involves mapping the medical entities (those that do not need correction) and the corrected entities in the text data to the standard codes corresponding to their respective medical terms, i.e., the codes in the International Classification of Diseases (ICD-11). Thus, the standardized text contains medical entities that do not need correction, as well as corrected entities, and the standard code corresponding to each entity (including both the medical entities that do not need correction and the corrected entities).

[0086] Optionally, when the similarity is less than a preset value, the operation of correcting the medical entities according to the preset second knowledge graph to obtain the corrected entities includes: taking the medical entities with similarity less than the preset value as the first entity, and taking the medical entities with similarity not less than the preset value as the second entity; determining the first relationship between the first entity and the second entity; and determining the second relationship between the first node corresponding to the first entity and the second node corresponding to the second entity in the preset knowledge graph; and determining the corrected entity corresponding to the first entity according to the first relationship and the second relationship.

[0087] In other words, when server 200 needs to correct a certain medical entity, server 200 can use that medical entity as the first entity, and other medical entities in the text data that do not need correction as the second entity. Combining the second entity with a preset second knowledge graph, server 200 can verify and correct the first entity. Specifically, the nodes corresponding to the first entity and the second entity, as well as the relationships between the nodes (second relationships), can be determined from the preset second knowledge graph.

[0088] Then, server 200 can determine the relationship between the first entity and the second entity based on the text data. That is, by means of relation extraction, the relationship between the first entity and the second entity (the first relation) is determined based on the text data, and whether the first relation and the second relation are consistent.

[0089] If they match, the first entity is directly output as the corrected entity. The relationship between the first and second entities can be identified using a pre-trained relation extraction model. Table 1 shows examples of relationships between entities, and Table 2 below shows examples of some corresponding entities.

[0090] If there is a discrepancy, the server 200, based on the first relationship, determines from the preset second knowledge graph whether there is an edge corresponding to the first relationship among the edges connecting the nodes corresponding to the first entity (i.e., whether the edges connecting the nodes corresponding to the first entity represent the first relationship). If so, the server 200 can query the nodes connected to the edges corresponding to the first relationship, determine the medical term corresponding to the node, and determine whether the similarity between the medical term and the entity information of the first entity is not less than a preset value. If so, the server 200 corrects the first entity based on the medical term to obtain the corrected entity corresponding to the first entity.

[0091] Unlike existing technologies that rely solely on single similarity or keywords for entity normalization, this embodiment employs a multi-dimensional correction mechanism combining "similarity threshold triggering + preset knowledge graph relationship consistency verification" to achieve entity correction. Specifically, when the similarity is not less than a preset value, direct output reduces unnecessary correction overhead; when the similarity is less than the preset value, the candidate correction path is constrained by verifying the consistency between the first relationship extracted from the corrected text data and the second relationship between corresponding nodes in the knowledge graph. This avoids random replacement, mismatch, or entity discarding, improving the accuracy and reliability of medical entity standardization. Furthermore, when standardized text generated based on more accurate entities and their standard codes is used for template matching, it can improve the distinguishability of candidate template matching, thereby enhancing the accuracy and stability of image diagnostic report template determination.

[0092] It should be noted that, to facilitate entity verification using a pre-defined knowledge graph, the node and edge types can correspond to the entity types and relationships between entities when constructing the pre-defined knowledge graph. That is, the pre-defined knowledge graph stores nodes corresponding to various medical terms and edges representing the relationships between these terms. The node types corresponding to various medical terms can include types that correspond one-to-one with each entity type. For example, node types can include types corresponding to the entity types in the above examples: disease node type, site node type, visible node type, drug node type, and treatment plan node type, etc. Furthermore, edge types can be pre-set to correspond to the relationships between entities. For example, edges between nodes can include edges representing "disease relationships," edges representing "state description relationships," and edges representing "parent-child site relationships," corresponding to the relationship types in Table 1 below.

[0093] The following provides examples of entity verification and correction.

[0094] For example, server 200 identifies the word "lung" in the text data as an entity belonging to the entity type "location," with the corresponding entity information being "lung." It also identifies the word "nodule" as an entity belonging to the entity type "disease," with the corresponding entity information being "nodule." Then, it determines the similarity between each word and various medical terms in the SNOMED CT system (for example, by extracting word vectors for the words and medical terms using language models such as Word2Vec and BERT, and determining the similarity between the words and medical terms), and identifies the medical term with the highest similarity to the two words. Assuming the similarity between the target medical term with the highest similarity to "lung" is 0.75, and the similarity between the target medical term with the highest similarity to "nodule" is 0.9, then with a preset value of 0.8, the entity recognition result for "lung" needs to enter the correction stage, while the entity recognition result for "nodule" does not.

[0095] Continuing with the example above, since the term "lung" needs to be included in the correction process while the term "nodule" does not, we can treat "lung" as the first entity and "nodule" as the second entity, and extract the relationship between them. Let's assume the extracted relationship is "disease relationship" (first relationship). Server 200 queries the preset knowledge graph and identifies the node corresponding to the term "lung" as the "lung" node and the node corresponding to the term "nodule" as the "nodule" node. It then determines the edge between the two nodes as representing the "disease relationship," and uses this edge as the second relationship. Since the second relationship is consistent with the first relationship, the first entity can be directly output as the corrected entity.

[0096] If the first relation and the second relation are inconsistent (or if the node corresponding to the first entity is not found by directly querying the semantic matching between entity information and the corresponding medical terms of the node, but the node corresponding to the second entity is found), then the node corresponding to the second entity can be determined, and the edge connected to that node can be determined.

[0097] Continuing with the previous example, assuming that the determined second relationship is inconsistent with the first relationship (this is just an example), we can determine each edge connected to the "nodule" node, and determine whether there is a "disease relationship" among the relationships represented by each edge connected to the node. Figure 4 This is a schematic diagram of an edge between nodes according to the first aspect of Embodiment 1 of this disclosure. Figure 4The diagram assumes that the edges connecting the "nodule" node contain two edges indicating a "disease relationship," connecting the "lung" node and the "breast" node. Matching "lung" with "breast" and then with the word "lung" sequentially, if the similarity between "lung" and "lung" is higher than a preset value (e.g., 0.7), then the entity corresponding to the word "lung" can be used as the corrected entity.

[0098] Table 1. Examples of relationships between the first entity and the second entity.

[0099] Table 2. Example table of entities between the first entity and the second entity.

[0100] After obtaining the standardized text through the above steps, server 200 identifies the target template that matches the standardized text from the candidate templates. By sending the target template to terminal 100, the user completes the corresponding image diagnostic report based on the target template.

[0101] In addition, refer to Figure 1 As shown, according to a second aspect of this embodiment, a storage medium is provided. The storage medium includes a stored program, wherein, when the program is executed, a processor performs any of the methods described above.

[0102] According to the technical solution of this embodiment, the physician's speech is converted into corrected text, and after a series of processing steps, the text data is converted into standardized text containing only medical terminology as much as possible. First, named entity recognition is performed using an attention mechanism combined with terminology enhancement to obtain each medical entity in the text data. Then, the medical entities are verified and corrected. Medical entities with high similarity to medical terminology are directly output, while those with low similarity are corrected based on a preset knowledge graph (a knowledge graph related to medical terminology). Finally, each medical entity and the corrected entity are mapped to ICD-11 encoding, thereby matching the standardized text with each candidate template, which improves the accuracy of finding the template truly needed by the physician.

[0103] It should be noted that, for the sake of simplicity, the foregoing method embodiments are all described as a series of actions. However, those skilled in the art should understand that the present invention is not limited to the described order of actions, because according to the present invention, some steps can be performed in other orders or simultaneously. Furthermore, those skilled in the art should also understand that the embodiments described in the specification are preferred embodiments, and the actions and modules involved are not necessarily essential to the present invention.

[0104] Through the above description of the embodiments, those skilled in the art can clearly understand that the methods according to the above embodiments can be implemented by means of software plus necessary general-purpose hardware platforms. Of course, they can also be implemented by hardware, but in many cases the former is a better implementation method. Based on this understanding, the technical solution of the present invention, in essence, or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product is stored in a storage medium (such as ROM / RAM, magnetic disk, optical disk) and includes several instructions to cause a terminal device (which may be a mobile phone, computer, server, or network device, etc.) to execute the methods described in the various embodiments of the present invention.

[0105] Example 2: Example 2 is a manifestation of the method of Example 1 at the device level; specifically, it is a device for performing the steps of the method of Example 1.

[0106] like Figure 5 As shown, the image diagnostic report generation device 500 according to this embodiment corresponds to the method described according to the first aspect of Embodiment 1. The device 500 includes: a receiving module 510 for receiving voice data sent by a terminal device, the voice data being entered by a user on the terminal device; an audio-to-text module 520 for converting the voice data into initial text data corresponding to the voice data, and obtaining the recognition confidence level corresponding to the initial text data; if the recognition confidence level is not less than a preset threshold, directly performing initial word segmentation on the initial text data; if the recognition confidence level is less than the preset threshold, using the initial text data as corrected text data; performing corrected word segmentation on the corrected text data; determining the words corresponding to the text data through the initial word segmentation and the corrected word segmentation; and an attention encoding module 53. 0, used to determine the first semantic feature corresponding to the text data using an attention-based encoding model, wherein the encoding model determines the weights corresponding to the key vectors of words according to pre-defined medical terms; entity recognition module 540, used to perform entity recognition on the text data according to the first semantic feature to determine the medical entities contained in the text data; normalization output module 550, used to determine the normalized text corresponding to the text data according to the medical entities contained in the text data; and sending module 560, used to determine the target template that matches the normalized text from the candidate templates, send the target template to the terminal device, and receive the image diagnosis report corresponding to the target template from the terminal device.

[0107] Optionally, the entity recognition module 540 is specifically used to determine the entity information corresponding to each entity type in the text data based on the first semantic feature, so as to obtain the medical entities contained in the text data. Each entity type includes: disease, sight, operation, drug and treatment plan.

[0108] Optionally, the attention encoding module 530 is specifically used to: determine a first weight corresponding to a word, wherein the first weight of a word belonging to medical terminology is greater than the first weight of a word not belonging to medical terminology; input each word corresponding to the text data into the encoding model, determine the key vector, query vector, and value vector corresponding to each word, and weight the key vector corresponding to each word according to the first weight corresponding to each word to obtain a weighted key vector; determine the attention weight corresponding to each word according to the query vector and the weighted key vector corresponding to each word; determine the word vector corresponding to each word according to the attention weight and the value vector corresponding to each word; and determine the first semantic feature corresponding to the text data according to the word vector corresponding to each word.

[0109] Optionally, the attention encoding module 530 is further configured to: generate a second semantic feature corresponding to the first semantic feature using a second encoding model based on an attention mechanism, based on a first knowledge graph related to medical terms, wherein node pairs with relationships in the first knowledge graph have a second weight related to the magnitude of the relationship between the node pairs; and determine entity information corresponding to each entity type in the text data based on the second semantic feature.

[0110] Optionally, the operation of generating second semantic features corresponding to the first semantic features using a second encoding model based on an attention mechanism, based on the first knowledge graph, includes: inputting the first word vector into the second encoding model based on an attention mechanism to determine the second key vector, second query vector, and second value vector corresponding to each word; determining the target word to be encoded from the words in the text data; determining the target node and the associated node corresponding to the target word in the first knowledge graph; determining the associated words corresponding to the associated nodes in the words in the text data; determining the second key vector of each word relative to the target word based on the second weight between the target node and the associated node; determining the second attention weight of each word relative to the target word based on the second query vector and the weighted second key vector of the target word; determining the second word vector corresponding to the target word based on the second attention weight and the second value vector corresponding to each word; and determining the second semantic features corresponding to the text data based on the second word vectors corresponding to each word.

[0111] Optionally, the standardization output module 550 is specifically used to: determine the target medical term that matches the entity information corresponding to the medical entity and the similarity between the entity information and the target medical term; if the similarity is less than a preset value, correct the medical entity according to a preset knowledge graph to obtain a corrected entity, and determine the standard code corresponding to the corrected entity; and if the similarity is not less than a preset value, determine the standard code corresponding to the medical entity contained in the text data, wherein the standard code is a code in the disease classification standard; and determine the standardized text corresponding to the text data according to the corrected entity and the standard code corresponding to the corrected entity, and / or the medical entity and the standard code corresponding to the medical entity.

[0112] Optionally, the standardized output module 550 is specifically used to modify medical entities according to a preset knowledge graph when the similarity is less than a preset value, and obtain modified entities. This includes: taking medical entities with similarity less than the preset value as first entities and medical entities with similarity not less than the preset value as second entities; determining a first relationship between the first entity and the second entity; and determining a second relationship between a first node corresponding to the first entity and a second node corresponding to the second entity in the preset knowledge graph; and determining the modified entity corresponding to the first entity based on the first relationship and the second relationship.

[0113] According to the technical solution of this embodiment, the physician's speech is converted into text, and after a series of processing steps, the text data is converted into standardized text containing only medical terminology as much as possible. First, named entity recognition is performed using an attention mechanism combined with terminology enhancement to obtain each medical entity in the text data. Then, the medical entities are verified and corrected. Medical entities with high similarity to medical terminology are directly output, while those with low similarity are corrected based on a preset knowledge graph (a knowledge graph related to medical terminology). Finally, each medical entity and the corrected entity are mapped to ICD-11 encoding, thereby matching the standardized text with each candidate template, which improves the accuracy of finding the template truly needed by the physician.

[0114] Example 3: Example 3 is a hardware implementation of the method of Example 1. Specifically, this example is a related device.

[0115] like Figure 6As shown, the image diagnostic report generation apparatus 600 according to this embodiment corresponds to the method described according to the first aspect of Embodiment 1. The apparatus 600 includes: a processor 610; and a memory 620 connected to the processor 610, used to provide the processor 610 with instructions to process the following steps: receiving voice data sent by a terminal device, the voice data being entered by a user on the terminal device; converting the voice data into initial text data corresponding to the voice data, and obtaining the recognition confidence level corresponding to the initial text data; if the recognition confidence level is not less than a preset threshold, directly performing initial word segmentation on the initial text data; if the recognition confidence level is less than the preset threshold, using the initial text data as corrected text data; and performing correction on the corrected text data. Error segmentation; through the initial segmentation and the error correction segmentation, words corresponding to the text data are determined; using an attention-based encoding model, a first semantic feature corresponding to the text data is determined, wherein the encoding model determines the weights corresponding to the key vectors of the words based on pre-defined medical terms; based on the first semantic feature, entity recognition is performed on the text data to determine the medical entities contained in the text data; based on the medical entities contained in the text data, standardized text corresponding to the text data is determined; a target template matching the standardized text is determined from the candidate templates, the target template is sent to the terminal device, and an image diagnosis report corresponding to the target template is received from the terminal device.

[0116] Optionally, the operation of entity recognition on the text data based on the first semantic feature includes: determining entity information in the text data corresponding to each entity type based on the first semantic feature, so as to obtain the medical entities contained in the text data, wherein each entity type includes: disease, sight, operation, drug and treatment plan.

[0117] Optionally, the operation of determining the first semantic feature corresponding to the text data using an attention-based encoding model includes: determining the first weight corresponding to each word, wherein the first weight of words belonging to medical terms is greater than the first weight of words not belonging to medical terms; inputting each word corresponding to the text data into the encoding model to determine the key vector, query vector, and value vector corresponding to each word, and weighting the key vector corresponding to each word according to the first weight corresponding to each word to obtain a weighted key vector; determining the attention weight corresponding to each word according to the query vector and the weighted key vector corresponding to each word; determining the word vector corresponding to each word according to the attention weight and the value vector corresponding to each word; and determining the first semantic feature corresponding to the text data according to the word vector corresponding to each word.

[0118] Optionally, the operation of entity recognition of text data based on the first semantic feature further includes: generating a second semantic feature corresponding to the first semantic feature using a second encoding model based on an attention mechanism, based on a first knowledge graph related to medical terms, wherein node pairs with relationships in the first knowledge graph have a second weight related to the magnitude of the relationship between the node pairs; and determining entity information corresponding to each entity type in the text data based on the second semantic feature.

[0119] Optionally, the operation of generating second semantic features corresponding to the first semantic features using a second encoding model based on an attention mechanism, based on the first knowledge graph, includes: inputting the first word vector into the second encoding model based on an attention mechanism to determine the second key vector, second query vector, and second value vector corresponding to each word; determining the target word to be encoded from the words in the text data; determining the target node and the associated node corresponding to the target word in the first knowledge graph; determining the associated words corresponding to the associated nodes in the words in the text data; determining the second key vector of each word relative to the target word based on the second weight between the target node and the associated node; determining the second attention weight of each word relative to the target word based on the second query vector and the weighted second key vector of the target word; determining the second word vector corresponding to the target word based on the second attention weight and the second value vector corresponding to each word; and determining the second semantic features corresponding to the text data based on the second word vectors corresponding to each word.

[0120] Optionally, the operation of determining the standardized text corresponding to the text data based on the medical entities contained in the text data includes: determining the target medical term that matches the entity information corresponding to the medical entity and the similarity between the entity information and the target medical term; if the similarity is less than a preset value, correcting the medical entity according to a preset knowledge graph to obtain a corrected entity, and determining the standard code corresponding to the corrected entity; and if the similarity is not less than a preset value, determining the standard code corresponding to the medical entity contained in the text data, wherein the standard code is a code in the disease classification standard; and determining the standardized text corresponding to the text data based on the corrected entity and the standard code corresponding to the corrected entity, and / or the medical entity and the standard code corresponding to the medical entity.

[0121] Optionally, when the similarity is less than a preset value, the operation of correcting the medical entities according to the preset knowledge graph to obtain the corrected entities includes: taking the medical entities with similarity less than the preset value as the first entity, and taking the medical entities with similarity not less than the preset value as the second entity; determining the first relationship between the first entity and the second entity; and determining the second relationship between the first node corresponding to the first entity and the second node corresponding to the second entity in the preset knowledge graph; and determining the corrected entity corresponding to the first entity according to the first relationship and the second relationship.

[0122] According to the technical solution of this embodiment, the physician's speech is converted into text, and after a series of processing steps, the text data is converted into standardized text containing only medical terminology as much as possible. First, named entity recognition is performed using an attention mechanism combined with terminology enhancement to obtain each medical entity in the text data. Then, the medical entities are verified and corrected. Medical entities with high similarity to medical terminology are directly output, while those with low similarity are corrected based on a preset knowledge graph (a knowledge graph related to medical terminology). Finally, each medical entity and the corrected entity are mapped to ICD-11 encoding, thereby matching the standardized text with each candidate template, which improves the accuracy of finding the template truly needed by the physician.

[0123] The sequence numbers of the above embodiments of the present invention are for descriptive purposes only and do not represent the superiority or inferiority of the embodiments.

[0124] In the above embodiments of the present invention, the descriptions of each embodiment have different focuses. For parts not described in detail in a certain embodiment, please refer to the relevant descriptions of other embodiments.

[0125] In the several embodiments provided in this application, it should be understood that the disclosed technical content can be implemented in other ways. The device embodiments described above are merely illustrative; for example, the division of units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the displayed or discussed mutual coupling, direct coupling, or communication connection may be through some interfaces; the indirect coupling or communication connection between units or modules may be electrical or other forms.

[0126] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.

[0127] Furthermore, the functional units in the various embodiments of the present invention can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit.

[0128] If the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, read-only memory (ROM), random access memory (RAM), portable hard drives, magnetic disks, or optical disks.

[0129] The above description is only a preferred embodiment of the present invention. It should be noted that for those skilled in the art, several improvements and modifications can be made without departing from the principle of the present invention, and these improvements and modifications should also be considered within the scope of protection of the present invention.

Claims

1. A method for generating image diagnostic reports based on audio input, characterized in that, include: The system receives voice data sent by a terminal device, the voice data being recorded by the user through the terminal device. The speech data is converted into corresponding initial text data, and the recognition confidence level corresponding to the initial text data is obtained. If the recognition confidence level is not less than a preset threshold, the initial text data is directly subjected to initial word segmentation. If the recognition confidence level is less than a preset threshold, the initial text data is used as the corrected text data, and the corrected text data is then segmented for error correction. The words corresponding to the text data are determined through the initial word segmentation and the error correction word segmentation. Using an attention-based encoding model, a first semantic feature corresponding to the text data is determined, wherein the encoding model determines the weights corresponding to the key vectors of the words based on pre-defined medical terms; Based on the first semantic feature, entity recognition is performed on the text data to determine the medical entities contained in the text data; Based on the medical entities contained in the text data, determine the standardized text corresponding to the text data; The target template that matches the standardized text is determined from the candidate templates, the target template is sent to the terminal device, and the image diagnosis report corresponding to the target template is received from the terminal device.

2. The method according to claim 1, characterized in that, The operation of entity recognition on the text data based on the first semantic feature includes: Based on the first semantic feature, the entity information corresponding to each entity type in the text data is determined as the medical entities contained in the text data. The entity types include: disease, sight, operation, drug, and treatment plan.

3. The method according to claim 2, characterized in that, The operation of determining the first semantic feature corresponding to the text data using an attention-based encoding model includes: A first weight is determined for each word, wherein the first weight of a word belonging to medical terminology is greater than the first weight of a word not belonging to medical terminology; Each word corresponding to the text data is input into the first encoding model based on the attention mechanism to determine the first key vector, the first query vector, and the first value vector corresponding to each word. Based on the first weight corresponding to each word, the key vectors corresponding to the corresponding words are weighted to obtain the weighted first key vector. Based on the first query vector corresponding to each word and the weighted first key vector, determine the attention weight corresponding to each word; Based on the attention weights corresponding to each word and the first value vectors corresponding to each word, determine the first word vectors corresponding to each word; Based on the first word vectors corresponding to each word, the first semantic feature corresponding to the text data is determined.

4. The method according to claim 3, characterized in that, The operation of entity recognition of the text data based on the first semantic feature further includes: Based on a first knowledge graph related to the medical term, a second semantic feature corresponding to the first semantic feature is generated using a second encoding model based on an attention mechanism, wherein the node pairs with relationships in the first knowledge graph have a second weight related to the magnitude of the relationship between the node pairs; Based on the second semantic feature, the entity information corresponding to each entity type in the text data is determined.

5. The method according to claim 4, characterized in that, The operation of generating second semantic features corresponding to the first semantic features using a second encoding model based on an attention mechanism, based on the first knowledge graph, includes: The first word vector is input into the second encoding model based on the attention mechanism to determine the second key vector, the second query vector, and the second value vector corresponding to each word; Determine the target words to be encoded from the words in the text data; In the first knowledge graph, determine the target node corresponding to the target word and the associated node corresponding to the target node; Identify the associated words corresponding to the associated nodes from the words in the text data; Based on the second weight between the target node and the associated node, determine the second key vector of each word relative to the target word after weighting; Based on the second query vector of the target word and the weighted second key vector, determine the second attention weight of each word relative to the target word; Based on the second attention weight corresponding to each word and the second value vector corresponding to each word, determine the second word vector corresponding to the target word; Based on the second word vectors corresponding to each word, the second semantic features corresponding to the text data are determined.

6. The method according to claim 1, characterized in that, The operation of determining the standardized text corresponding to the text data based on the medical entities contained in the text data includes: Determine the target medical term that matches the entity information corresponding to the medical entity and the similarity between the entity information and the target medical term; If the similarity is less than a preset value, the medical entity is corrected according to the preset second knowledge graph to obtain the corrected entity, and the standard code corresponding to the corrected entity is determined. If the similarity is not less than a preset value, a standard code corresponding to the medical entity contained in the text data is determined, and the standard code is a code in the disease classification standard. Based on the corrected entity and the standard code corresponding to the corrected entity, and / or the medical entity and the standard code corresponding to the medical entity, determine the standardized text corresponding to the text data.

7. The method according to claim 6, characterized in that, When the similarity is less than a preset value, the medical entity is corrected according to a preset second knowledge graph to obtain the corrected entity. This includes the following operations: Medical entities with a similarity less than a preset value are designated as first entities, and medical entities with a similarity not less than a preset value are designated as second entities. Determine a first relationship between the first entity and the second entity; and determine a second relationship between a first node in the second knowledge graph corresponding to the first entity and a second node in the second entity corresponding to the second entity. Based on the first relationship and the second relationship, determine the modified entity corresponding to the first entity.

8. A storage medium, characterized in that, The storage medium includes a stored program, wherein, when the program is executed, the method described in any one of claims 1 to 7 is performed by a processor.

9. An image diagnostic report generation device based on audio input, characterized in that, include: A receiving module is used to receive voice data sent by a terminal device, wherein the voice data is entered by the user on the terminal device; The audio-to-text module is used to convert the speech data into initial text data corresponding to the speech data, obtain the recognition confidence corresponding to the initial text data, and determine the words corresponding to the text data; An attention encoding module is used to determine a first semantic feature corresponding to the text data using an attention-based encoding model, wherein the encoding model determines the weights corresponding to the key vectors of the words based on pre-defined medical terms. An entity recognition module is used to perform entity recognition on the text data based on the first semantic feature, and to determine the medical entities contained in the text data; The standardized output module is used to determine the standardized text corresponding to the text data based on the medical entities contained in the text data. The sending module is used to determine the target template that matches the standardized text from the candidate templates, send the target template to the terminal device, and receive the image diagnosis report corresponding to the target template from the terminal device.

10. An image diagnostic report generation device based on audio input, characterized in that, For implementing the image diagnostic report generation method based on audio input as described in any one of claims 1 to 7, the device comprises: processor; A memory, connected to the processor, for providing the processor with instructions to perform the following processing steps: Receive voice data sent by a terminal device, wherein the voice data is recorded by the user on the terminal device; The speech data is converted into initial text data corresponding to the speech data, the recognition confidence corresponding to the initial text data is obtained, and the words corresponding to the text data are determined. Using an attention-based encoding model, a first semantic feature corresponding to the text data is determined, wherein the encoding model determines the weights corresponding to the key vectors of the words based on pre-defined medical terms; Based on the first semantic feature, entity recognition is performed on the text data to determine the medical entities contained in the text data; Based on the medical entities contained in the text data, determine the standardized text corresponding to the text data; The target template that matches the standardized text is determined from the candidate templates, the target template is sent to the terminal device, and the image diagnosis report corresponding to the target template is received from the terminal device.