A department recommendation guide method based on deep learning

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By combining BigBird and an improved RGCN model, the problem of insufficient semantic understanding and logical integration in traditional intelligent triage methods is solved. This enables accurate department recommendations for complex medical descriptions and improves logical interpretability, thereby increasing the efficiency and accuracy of triage decisions.

CN122245844APending Publication Date: 2026-06-19FENGSHANG TECHNOLOGY CO LTD

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: FENGSHANG TECHNOLOGY CO LTD
Filing Date: 2026-03-27
Publication Date: 2026-06-19

Application Information

Patent Timeline

27 Mar 2026

Application

19 Jun 2026

Publication

CN122245844A

IPC: G16H80/00; G16H10/40; G16H50/20; G06F40/295; G06F40/30; G06N3/0464; G06N3/047; G06N5/04

AI Tagging

Application Domain

Medical communication Semantic analysis

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Traditional intelligent triage and recommendation methods struggle to achieve accurate recommendations when faced with complex medical descriptions and multiple symptoms. Furthermore, they lack semantic understanding and logical integration, resulting in significant biases in departmental recommendations, high referral rates, and an inability to effectively utilize spatiotemporal epidemiological knowledge and relationships between medical entities.

Method used

By combining the BigBird model with the improved RGCN model, and through block sparse attention mechanism, random structure perturbation and relation weight transformation, we can adaptively process long sequence text, capture the correlation between cross-modal features, integrate seasonal disease patterns and deterministic medical logic, generate logical path labels and align semantic graphs with logical subgraphs.

Benefits of technology

It improves the accuracy and logical interpretability of departmental recommendations, realizes efficient and accurate intelligent triage decision support, and enhances the logical consistency and diagnostic accuracy of recommendation results.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122245844A_ABST

Patent Text Reader

Abstract

This invention discloses a deep learning-based department recommendation and triage method, belonging to the field of intelligent medical technology, comprising the following steps: S1, generating a spatiotemporal feature vector containing seasonal disease patterns and regional prevalence trends; S2, generating a text embedding matrix integrating spatiotemporal context information; S3, generating a token interaction strength matrix and a high-dimensional semantic feature vector; S4, performing ensemble inference with random feature selection and random threshold segmentation, and outputting logical path labels; S5, outputting target prediction probabilities by improving the RGCN model; S6, generating visualized heatmap data containing highlighted distributions of key features; S7, outputting intelligent triage recommendation results. This invention overcomes the limitations of traditional intelligent triage methods, which rely on simple keyword matching, shallow semantic understanding, and difficulties in logical fusion, providing an efficient, accurate, and visualized solution for triage services in intelligent medical systems.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of intelligent medical technology, and in particular to a department recommendation and triage method based on deep learning. Background Technology

[0002] As the construction of smart healthcare systems deepens, traditional intelligent triage and recommendation methods are facing increasing challenges. In modern hospital management, accurate recommendations for patient departments are not only crucial for optimizing the allocation of medical resources but also a key factor in improving patient experience and treatment efficiency. However, most current intelligent triage and recommendation methods rely on simple keyword matching, manual triage, or classification models based on shallow machine learning, resulting in rigid processes and limited recommendation accuracy. While these traditional methods can meet basic triage needs, their lack of deep semantic understanding and complex logical reasoning capabilities regarding triage request text makes them ill-suited to the diverse descriptions, numerous implicit symptoms, and high demands for medical knowledge required in real-world medical settings.

[0003] The main limitations of traditional intelligent triage and recommendation methods lie in their weak semantic representation capabilities and insufficient logical integration depth. Existing methods typically rely on pre-set rule bases, keyword statistics, or ordinary word vector models to determine patient intent, exhibiting poor generalization ability and difficulty in accurately parsing complex medical expressions in natural language. When triage request texts exhibit characteristics such as long sequences, semantic ambiguity, non-standardized terminology, and the presence of a large amount of redundant information, the predictive robustness and context-awareness of traditional methods are severely limited. In particular, when faced with lengthy patient descriptions, descriptions of multiple concurrent symptoms, and seasonal or regional epidemic characteristics, traditional single-text analysis methods struggle to efficiently and accurately extract key diagnostic features, leading to significant department recommendation bias, high referral rates, and severely impacting the accuracy of initial diagnosis and the efficiency of medical resource utilization.

[0004] Furthermore, traditional methods often neglect the effective alignment and deep integration between textual semantic representations and deterministic medical logic rules in the triage decision-making process, making it difficult to comprehensively utilize spatiotemporal epidemiological background knowledge and the topological dependencies between medical entities. For example, in complex medical diagnostic logic, traditional purely data-driven methods cannot effectively integrate deterministic logical paths such as "fever accompanied by rash" with the correlation information of external environmental features such as "flu season," resulting in recommendation results lacking interpretability, logical conflicts, or failing to capture epidemic trends. Even when some methods employ general deep learning models, they fail to uncover the potential alignment relationships between semantic features and logical rules, making it difficult to achieve efficient, accurate, and logically supported intelligent department recommendations and decision support.

[0005] Therefore, how to provide a department recommendation and triage method based on deep learning is a problem that urgently needs to be solved by those skilled in the art. Summary of the Invention

[0006] This invention proposes a deep learning-based method for department recommendation and triage. By combining the BigBird model with an improved RGCN model, it can more accurately predict and recommend target departments and triage paths. This method can not only automatically extract deep semantic features and token interaction strength from triage request text, but also combine spatiotemporal epidemiological knowledge and medical entity logic rules to generate logical path labels through highly randomized tree reasoning. Furthermore, it utilizes the Sinkhorn optimal transport alignment mechanism to effectively fuse semantic graphs and logical subgraphs, significantly improving the accuracy and logical interpretability of intelligent triage recommendations. By introducing a block sparse attention mechanism, random structure perturbation, and relation weight transformation strategy, it improves traditional feature extraction and graph neural network propagation methods, enabling the model to adaptively process long-sequence texts and effectively capture potential correlations between cross-modal features. It also integrates seasonal disease patterns and deterministic medical logic in real time, thereby achieving accurate recommendations and decision support for patients' recommended departments. This invention overcomes the limitations of traditional intelligent triage methods, which rely on simple keyword matching, shallow semantic understanding, and difficulties in logical fusion. It provides an efficient, accurate, and visually supported solution for triage services in smart healthcare systems.

[0007] A department recommendation and triage method based on deep learning according to an embodiment of the present invention includes the following steps: S1. Collect patient triage request text, extract spatiotemporal tuples through timestamp parsing and geographic coordinate positioning, input spatiotemporal tuples into spatiotemporal epidemiology knowledge base to perform feature index retrieval, and output spatiotemporal feature vectors containing seasonal disease patterns and regional epidemic trends. S2. Perform word segmentation and vector mapping processing on the triage request text to construct a text token sequence. Perform feature-level linear weighted fusion on the spatiotemporal feature vector and the text token sequence to generate a text embedding matrix that integrates spatiotemporal context information. S3. Input the text embedding matrix into the BigBird model, use the block sparse attention mechanism, integrate sliding window local modeling and global token information transmission, calculate the semantic dependencies within the token sequence, and generate the token interaction strength matrix and high-dimensional semantic feature vector. S4. Perform medical entity naming recognition on the triage request text to extract structured symptom keywords, input the structured symptom keywords into an extremely random tree model to perform integrated reasoning of random feature selection and random threshold segmentation, and output logical path labels; S5. Construct semantic graphs and logical subgraphs based on high-dimensional semantic feature vectors and logical path labels, input them into the improved RGCN model, guide the cross-graph message transmission of semantic and logical nodes through Sinkhorn optimal transmission alignment, random structure perturbation and relation weight transformation mechanism, generate graph alignment feature vectors, and output the target prediction probability through Softmax regression. S6. Map the Token interaction intensity matrix to the color space numerical domain, perform color encoding rendering on the Token in the triage request text, and generate a visual heatmap data containing the highlighted distribution of key features. S7. Select the target department identifier with the highest confidence based on the target prediction probability, and perform structured encapsulation of the target department identifier, visual heat map data and logical path label to output intelligent triage recommendation results.

[0008] Optionally, S1 specifically includes: S11. Collect the patient's triage request text and perform formatted parsing. Extract the text generation timestamp and the patient's terminal IP address. Perform address inversion operation through the IP address geolocation database. Combine the text generation timestamp and the inverted geographic coordinates to construct a spatiotemporal tuple. S12. Input the spatiotemporal tuple into the preset spatiotemporal epidemiology knowledge base, match the target administrative division based on the geographic coordinates, and match the historical epidemiological data curve of the same period based on the text-generated timestamp to extract the disease activity index of the target administrative division in the current time window. S13. The disease activity index is transformed linearly to generate a spatiotemporal feature vector. This spatiotemporal feature vector is then fused with the word vector matrix of the triage request text using a feature-level weighted method to generate a text embedding matrix that incorporates spatiotemporal background information. (Intelligent Medical Technology Field) Optionally, S2 specifically includes: S21. Use the byte pair encoding algorithm to perform sub-word segmentation on the triage request text, generate a text token index sequence, and input the text token index sequence into the pre-trained word embedding layer to perform lookup mapping, and output the initial word embedding matrix; S22. Construct a fully connected linear mapping layer to project the spatiotemporal feature vectors onto the same feature semantic space as the initial word embedding matrix, generating a spatiotemporally aligned feature tensor. S23. Based on the length of the text token index sequence, perform a copy and concatenation operation on the spatiotemporal alignment feature tensor to generate a spatiotemporal context matrix with the same dimension as the initial word embedding matrix. S24. Perform positional encoding superposition and residual connection calculation on the initial word embedding matrix and the spatiotemporal context matrix to generate a text embedding matrix that integrates spatiotemporal context information.

[0009] Optionally, the BigBird model includes a sequence position encoding layer, a block sparse attention layer, and a position-wise feedforward network layer: The sequence position encoding layer is used to calculate the sequence position information of the token based on the sine function, and performs element-wise addition operation based on the broadcast mechanism on the generated position encoding vector and the text embedding matrix to generate the input tensor; The block sparse attention layer is used to construct a sparse attention pattern consisting of local window attention, global attention, and random attention; the input tensor is projected into a query, key, and value matrix, and the attention score matrix between tokens is calculated based on the sparse attention pattern; a normalized exponential function operation is performed on adjacent tokens and global tags within a preset context neighborhood radius to generate a token interaction strength matrix, and the value matrix is weighted and summed using the token interaction strength matrix to output a context semantic representation matrix; The position-wise feedforward network layer is used to input the context semantic representation matrix into a neural network containing two linear transformation layers, which is processed by the GeLU nonlinear activation function in between to perform nonlinear mapping and dimension upscaling operations on the feature dimension, and output a high-dimensional semantic feature vector. Output the token interaction strength matrix and high-dimensional semantic feature vector.

[0010] Optionally, the highly randomized tree model includes a feature encoding mapping layer, a multidimensional feature concatenation layer, a random splitting ensemble layer, a path index generation layer, and a probability distribution output layer: The feature encoding mapping layer is used to map structured symptom keywords to discrete index identifiers in a predefined medical thesaurus. Based on the index identifiers, a row vector query is performed in the predefined word embedding matrix to obtain an initial word embedding tensor. The initial word embedding tensor is then transformed linearly to map discrete semantic symbols to a symptom feature tensor in a low-dimensional continuous vector space. The multidimensional feature concatenation layer is used to calculate the product of the word frequency weight and the inverse document frequency weight of the structured symptom keywords, obtain the word frequency-inverse document frequency feature value, and perform one-hot encoding mapping on the part-of-speech tags to which the keywords belong to generate a text statistical feature tensor; the text statistical feature tensor and the symptom feature tensor are concatenated along the feature dimension to construct a multidimensional hybrid feature matrix; The random splitting ensemble layer is used to initialize the topology of multiple decision trees. For the multidimensional mixed feature matrix of each node, random sampling of feature subsets is performed to determine the candidate splitting feature dimension, and uniform random sampling is performed within the numerical distribution range of each candidate feature to obtain the splitting threshold. Based on the candidate splitting feature dimension and the splitting threshold, a binary decision function is constructed to recursively allocate the multidimensional mixed feature matrix to the left child node or the right child node until the number of node samples is lower than the preset threshold or the decision tree depth reaches the maximum depth limit, thereby generating an extremely random decision tree cluster. The path index generation layer is used to track the traversal trajectory of the multidimensional hybrid feature matrix in the extremely random decision tree cluster, extract the node identifier sequence from the root node to the terminal node in each tree; combine and encode the node identifier sequences corresponding to each decision tree, and output logical path labels.

[0011] Optionally, the improved RGCN model includes an optimal transport alignment layer, a random structure perturbation layer, a multi-relation weight transformation layer, a message passing aggregation layer, and a prediction probability mapping layer: The optimal transmission alignment layer is used to receive high-dimensional semantic feature vectors and logical path labels, which are mapped to semantic graph node distribution matrices and logical graph node distribution matrices, respectively. A cost matrix based on node feature similarity is constructed, an entropy regularization constraint term is introduced, and the optimal transmission plane is solved iteratively through the Sinkhorn algorithm. The optimal transmission plane is used as a soft-assignment weight matrix to calculate the alignment similarity matrix between semantic graph nodes and logical graph nodes, and the initial cross-graph connection weights are output. The random structure perturbation layer is used to define the heterogeneous topology structure constructed by semantic graph nodes and logical graph nodes as the original heterogeneous graph; according to the preset discard probability, the edges in the original heterogeneous graph are randomly removed to generate a perturbation subgraph structure; the perturbation subgraph structure is input into the random RGCN processing unit, and Gaussian noise is injected into the node hidden state during the message passing process to perform random perturbation, and the node hidden state tensor containing noise features is output. The multi-relation weight transformation layer is used to identify different types of edge relations in heterogeneous topologies and assign an independent learnable weight matrix to each type of relation. The node hidden state tensor is taken as input and linear transformation is performed through the specific weight matrix corresponding to each relation type to generate relation-specific feature tensors. Batch normalization is performed on the relation-specific feature tensors to generate transformed feature tensors. The message passing aggregation layer is used to perform a Hadamard product operation on the transformed feature tensor and the initial cross-graph connection weights to obtain weighted neighbor features; based on the alignment similarity matrix, the information transmission strength between semantic nodes and logical nodes is dynamically adjusted, and the weighted neighbor features are aggregated along the feature dimension to generate an aggregated feature tensor. The prediction probability mapping layer is used to input the aggregated feature tensor into the fully connected layer to perform dimensionality reduction and linear transformation, generating a logits vector in the field of intelligent medical technology; Softmax normalization is performed on the logits vector in the field of intelligent medical technology to map the values to the probability interval and output the target prediction probability.

[0012] Optionally, S6 specifically includes: S61. Analyze the Token interaction intensity matrix, extract the attention scalar corresponding to each Token, perform numerical normalization and linearly map it to the target color space numerical domain, and construct a discrete mapping relationship between the Token sequence index and the color channel vector value. S62. Based on the discrete mapping relationship, perform tensor operations on the Token in the triage request text, inject the color channel vector values into the text representation matrix, and calculate the text rendering matrix with color attribute features. S63. Perform dimensional transformation and format encapsulation on the text rendering matrix to generate standardized visual heatmap data.

[0013] Optionally, S7 specifically includes: S71. Perform maximum likelihood estimation on the target prediction probability vector and locate the index coordinates corresponding to the maximum probability value. S72. Perform a hash search in the preset department code dictionary based on the index coordinates to parse out the target department identifier; S73. The target department identifier, visual heatmap data, and logical path label input data serialization interface are used to perform key-value pair mapping and binary encapsulation to build structured intelligent triage recommendation results.

[0014] The beneficial effects of this invention are: This invention effectively solves the technical challenge of heterogeneous fusion of semantic features and medical logic rules by employing an improved RGCN model. Traditional prediction methods often struggle to deeply integrate deterministic medical logic with deep learning semantic representations, resulting in a lack of logical constraints in recommendation results. The improved RGCN model constructs semantic graphs and logical subgraphs, and utilizes the Sinkhorn optimal transmission algorithm to calculate the alignment similarity matrix between nodes, achieving precise feature alignment during cross-graph message passing. This mechanism adaptively adjusts the information transmission intensity between semantic and logical nodes, effectively integrating logical path labels generated by highly randomized trees with high-dimensional semantic features. It breaks through the limitations of single-modality feature expression, significantly improving the logical consistency and diagnostic accuracy of departmental recommendation results, and providing efficient, accurate, and interpretable technical support for intelligent triage decision-making. Attached Figure Description

[0015] The accompanying drawings are provided to further illustrate the invention and form part of the specification. They are used in conjunction with embodiments of the invention to explain the invention and do not constitute a limitation thereof. In the drawings: Figure 1 This is an overall flowchart of a deep learning-based department recommendation and triage method proposed in this invention; Figure 2This is a flowchart illustrating the working principle of the improved RGCN model for a deep learning-based department recommendation and triage method proposed in this invention. Detailed Implementation

[0016] The invention will now be described in further detail with reference to the accompanying drawings. These drawings are simplified schematic diagrams, illustrating only the basic structure of the invention, and therefore only show the components relevant to the invention.

[0017] refer to Figure 1 and Figure 2 A department recommendation and triage method based on deep learning includes the following steps: S1. Collect patient triage request text, extract spatiotemporal tuples through timestamp parsing and geographic coordinate positioning, input spatiotemporal tuples into spatiotemporal epidemiology knowledge base to perform feature index retrieval, and output spatiotemporal feature vectors containing seasonal disease patterns and regional epidemic trends. S2. Perform word segmentation and vector mapping processing on the triage request text to construct a text token sequence. Perform feature-level linear weighted fusion on the spatiotemporal feature vector and the text token sequence to generate a text embedding matrix that integrates spatiotemporal context information. S3. Input the text embedding matrix into the BigBird model, use the block sparse attention mechanism, integrate sliding window local modeling and global token information transmission, calculate the semantic dependencies within the token sequence, and generate the token interaction strength matrix and high-dimensional semantic feature vector. S4. Perform medical entity naming recognition on the triage request text to extract structured symptom keywords, input the structured symptom keywords into an extremely random tree model to perform integrated reasoning of random feature selection and random threshold segmentation, and output logical path labels; S5. Construct semantic graphs and logical subgraphs based on high-dimensional semantic feature vectors and logical path labels, input them into the improved RGCN model, guide the cross-graph message transmission of semantic and logical nodes through Sinkhorn optimal transmission alignment, random structure perturbation and relation weight transformation mechanism, generate graph alignment feature vectors, and output the target prediction probability through Softmax regression. S6. Map the Token interaction intensity matrix to the color space numerical domain, perform color encoding rendering on the Token in the triage request text, and generate a visual heatmap data containing the highlighted distribution of key features. S7. Select the target department identifier with the highest confidence based on the target prediction probability, and perform structured encapsulation of the target department identifier, visual heat map data and logical path label to output intelligent triage recommendation results.

[0018] In this embodiment, S1 specifically includes: S11. Collect the patient's triage request text and perform formatted parsing to extract the text generation timestamp and the patient's terminal IP address. Perform address derivation using the IP address geolocation database, and combine the text generation timestamp and the derivation geographic coordinates to construct a spatiotemporal tuple. Specifically, the formatted parsing operation uses regular expressions to match the "Date" and "X-Forwarded-For" fields in the text request header, converts the extracted time string to Unix timestamp format, and converts the IP address string to a 32-bit unsigned integer. The address derivation operation uses the 32-bit unsigned integer formatted IP address as the input key, queries the pre-loaded GeoLite2 database index table to obtain the corresponding city node number, and then retrieves the location coordinates in the pre-stored latitude and longitude lookup table based on the city node number. If the search is successful, a high-precision latitude and longitude tuple is returned; if the search is unsuccessful, the default coordinates of the center point of the province are returned. Finally, the Unix timestamp and the latitude and longitude tuple are combined sequentially to construct a spatiotemporal tuple with the structure {timestamp, longitude in the field of intelligent medical technology, latitude in the field of intelligent medical technology}. S12. Input the spatiotemporal tuple into the preset spatiotemporal epidemiology knowledge base, match the target administrative division based on geographic coordinates, and match the historical contemporaneous epidemiological data curve based on the text-generated timestamp to extract the disease activity index of the target administrative division in the current time window. Specifically, the operation of matching the target administrative division based on geographic coordinates is to use the point-in-polygon algorithm to determine the geographic fence where the latitude and longitude coordinates are located and obtain the corresponding standard administrative division code. The operation of matching the historical contemporaneous data curve based on the text-generated timestamp is to convert the Unix timestamp into a year and month-day time index, retrieve the daily incidence data sequence of the administrative division in the same period of the past three years in the spatiotemporal epidemiology knowledge base, and calculate the moving average of the 7 days before the current time point. The operation of extracting the disease activity index is to compare the moving average with the annual baseline value of the region stored in the knowledge base, calculate the normalized activity score based on the difference, and mark the positive value if the difference is positive, indicating a high incidence of the epidemic. Finally, output the disease activity index in the range of 0 to 1. S13. The disease activity index is transformed linearly to generate a spatiotemporal feature vector. The spatiotemporal feature vector is then fused with the word vector matrix of the triage request text to generate a text embedding matrix that incorporates spatiotemporal background information. Specifically, the linear transformation involves constructing a fully connected layer with an input dimension of 1 and an output dimension of 768, mapping the scalar form of the disease activity index to a 768-dimensional spatiotemporal feature vector. The feature-level weighted fusion involves processing the spatiotemporal feature vector using the Sigmoid activation function to generate attention weight coefficients with the same dimension as the word vectors. The attention weight coefficients are then multiplied element-wise with the word vector matrix of the triage request text to dynamically adjust the feature values of each word vector. Finally, the weighted word vector matrix is residually connected with the original word vector matrix to generate a text embedding matrix that incorporates spatiotemporal background information and has a dimension of 768 times the sequence length.

[0019] In this embodiment, S2 specifically includes: S21. A byte-pair encoding algorithm is used to perform sub-word segmentation on the triage request text to generate a text token index sequence. The text token index sequence is then input into a pre-trained word embedding layer to perform lookup mapping and output an initial word embedding matrix. Specifically, the sub-word segmentation operation involves loading a pre-set byte-pair encoding vocabulary containing 50,000 merging rules, scanning the triage request text as a byte stream, and segmenting the text into a token index sequence of the smallest semantic unit based on the iterative merging rules of the most frequent character pairs. The pre-trained word embedding layer is specifically a lookup table storing 50,000 rows of 768-dimensional feature parameters per row. Through the lookup operation, each token index is mapped to a corresponding 768-dimensional dense vector. Finally, the vectors of all tokens in the sequence are stacked to generate an initial word embedding matrix with a dimension of the sequence length multiplied by 768.

[0020] S22. Construct a fully connected linear mapping layer to project the spatiotemporal feature vectors onto the same feature semantic space as the initial word embedding matrix, generating a spatiotemporally aligned feature tensor. The input dimension of the fully connected linear mapping layer is set to 768 dimensions, and the output dimension is also set to 768 dimensions. The weight matrix is initialized using a Xavier normal distribution, and the bias term is initialized to zero. Specifically, the projection operation involves performing matrix multiplication on the 768-dimensional spatiotemporal feature vectors and the weight matrix, and then superimposing the bias term. This maps the numerical distribution of the spatiotemporal feature data to a high-dimensional semantic space consistent with the word embedding vectors, thereby eliminating feature modality differences and generating a spatiotemporally aligned feature tensor of shape 1 by 768.

[0021] S23. Based on the length of the text token index sequence, perform a copy and concatenation operation on the spatiotemporal alignment feature tensor to generate a spatiotemporal context matrix with the same dimension as the initial word embedding matrix. First, obtain the actual length N of the token index sequence generated in step S21. If N is less than the preset maximum sequence length of 512, pad the end of the sequence with zeros to 512; otherwise, truncate it to 512. Then, copy the spatiotemporal alignment feature tensor of shape 1 by 768 along the sequence dimension 512 times to expand it into an intermediate tensor of shape 512 by 768. Finally, perform a concatenation or direct replacement operation on the intermediate tensor and the zero-padded text token sequence of shape 512 by 768 to ensure that the dimension of the finally generated spatiotemporal context matrix is strictly maintained at 512 by 768.

[0022] S24. Perform positional encoding overlay and residual connection calculation on the initial word embedding matrix and the spatiotemporal context matrix to generate a text embedding matrix that integrates spatiotemporal context information. Specifically, the positional encoding overlay operation is to generate a sinusoidal positional encoding matrix with a preset dimension of 512 x 768, and perform element-wise addition operation on this matrix and the spatiotemporal context matrix generated in step S23 to inject sequence position information. The residual connection calculation operation is to add the spatiotemporal context matrix after positional encoding overlay to the initial word embedding matrix generated in step S21, so that the spatiotemporal background information is directly injected into the original text feature stream. Finally, perform layer normalization processing on the addition result to generate a text embedding matrix with a dimension of 512 x 768 that integrates spatiotemporal context information.

[0023] In this embodiment, the BigBird model includes a sequence position encoding layer, a block sparse attention layer, and a position-by-position feedforward network layer: The sequence position encoding layer is used to calculate the sequence position information of the token based on the sine function. It performs element-wise addition based on the broadcast mechanism on the generated position encoding vector and the text embedding matrix to generate an input tensor. Specifically, the operation of calculating the sequence position information is to calculate vector elements of different dimensions for each position in the sequence using the sine and cosine functions respectively, with the frequency parameter of the sine function set to 10000. The element-wise addition based on the broadcast mechanism specifically refers to directly adding the generated position encoding matrix with the text embedding matrix of the same dimension, and using the broadcast mechanism to automatically align the tensor shape, thereby injecting the position information into the text features and outputting an input tensor with the dimension unchanged.

[0024] The block sparse attention layer is used to construct a sparse attention pattern consisting of local window attention, global attention, and random attention. The input tensor is projected into a query, key, and value matrix. An attention score matrix between tokens is calculated based on the sparse attention pattern. A normalized exponential function is performed on adjacent tokens and global markers within a preset context neighborhood radius to generate a token interaction strength matrix. The value matrix is then weighted and summed using the token interaction strength matrix to output a context semantic representation matrix. Specifically, the sparse attention pattern is constructed by defining a local window of size 128, several global markers, and random connections with a sampling probability of 0.05 to form a sparse mask matrix. When calculating the attention score matrix, this mask matrix is used to mask the attention weights of invalid regions. A Softmax normalized exponential function is performed only on the regions corresponding to the preset context neighborhood radius and global markers to generate a token interaction strength matrix. Finally, this matrix is weighted and summed on the value matrix to output a context semantic representation matrix with the same dimension as the input feature matrix.

[0025] The position-wise feedforward network layer is used to input the context semantic representation matrix into a neural network containing two linear transformation layers. The matrix is then processed by the GeLU nonlinear activation function to perform nonlinear mapping and dimensionality reduction operations on the feature dimensions, outputting a high-dimensional semantic feature vector. The two linear transformation layers are respectively constructed as an up-level layer (from the input dimension to four times the input dimension) and a down-level layer (from four times the input dimension). The specific operation of performing the nonlinear mapping on the feature dimensions involves inputting the context semantic representation matrix into the up-level layer, performing feature transformation via the GeLU nonlinear activation function to enhance the model's nonlinear expressive power. The dimensionality reduction operation involves inputting the activated feature matrix into the down-level layer, restoring the feature dimension from four times the input dimension to the input dimension, thus integrating the high-dimensional features and ultimately outputting a high-dimensional semantic feature vector with the same dimension as the input feature matrix.

[0026] Output the token interaction strength matrix and high-dimensional semantic feature vector.

[0027] In this embodiment, the highly random tree model includes a feature encoding mapping layer, a multidimensional feature concatenation layer, a random splitting ensemble layer, a path index generation layer, and a probability distribution output layer: The feature encoding mapping layer is used to map structured symptom keywords to discrete index identifiers in a predefined medical thesaurus. Based on the index identifiers, a row vector query is performed in the predefined word embedding matrix to obtain an initial word embedding tensor. The initial word embedding tensor is then transformed linearly to map discrete semantic symbols into a symptom feature tensor in a low-dimensional continuous vector space. Specifically, the mapping to discrete index identifiers involves traversing the structured symptom keywords and matching them against the predefined medical thesaurus using a hash lookup table to obtain a unique integer index for each keyword. The row vector query is performed by using the integer index as the row number to extract the corresponding fixed-length vector from the pre-stored word embedding matrix. The linear transformation is performed by multiplying the fixed-length vector with a preset weight matrix and adding a bias term to project the discrete high-dimensional sparse features into a low-dimensional continuous vector, outputting the symptom feature tensor.

[0028] The multidimensional feature concatenation layer is used to calculate the product of the word frequency weight and the inverse document frequency weight of the structured symptom keywords, obtain the word frequency-inverse document frequency feature value, and perform one-hot encoding mapping on the part-of-speech tags to which the keywords belong to generate a text statistical feature tensor. The text statistical feature tensor and the symptom feature tensor are concatenated along the feature dimensions to construct a multidimensional hybrid feature matrix. Specifically, the operation of calculating the word frequency-inverse document frequency feature value is as follows: the frequency of a single keyword in the current text is counted as the word frequency weight, and the inverse document frequency weight of the keyword in the entire medical text corpus is calculated as the inverse document frequency weight. The two are multiplied to obtain the numerical feature. Specifically, the operation of performing one-hot encoding mapping is as follows: a zero vector with a dimension equal to the total number of part-of-speech categories is constructed, and the dimension position corresponding to the part of speech of the keyword is set to 1. Specifically, the operation of performing tensor concatenation is as follows: the generated numerical feature vector, the one-hot encoded vector and the symptom feature tensor obtained in the previous step are concatenated end to end in the column direction to form a multidimensional hybrid feature matrix containing multi-source information.

[0029] The random splitting ensemble layer is used to initialize the topology of multiple decision trees. For the multidimensional mixed feature matrix of each node, random sampling of feature subsets is performed to determine the candidate splitting feature dimension, and uniform random sampling is performed within the numerical distribution range of each candidate feature to obtain the splitting threshold. Based on the candidate splitting feature dimension and the splitting threshold, a binary decision function is constructed to recursively allocate the multidimensional mixed feature matrix to the left or right child node until the number of node samples is lower than a preset threshold or the decision tree depth reaches the maximum depth limit, generating an extremely random decision tree cluster. Specifically, the operation of random sampling of feature subsets is to randomly select a fixed number of features from the full feature dimension, without considering the ranking of feature importance. The operation of uniform random sampling is to randomly select a value between the minimum and maximum values of the selected candidate features as the splitting threshold. The operation of the binary decision function is to compare the value of the sample in the candidate feature dimension with the size of the splitting threshold. If it is less than the threshold, it is allocated to the left child node; otherwise, it is allocated to the right child node. This process is repeated until the termination condition is met, thereby constructing an extremely random decision tree cluster.

[0030] The path index generation layer is used to track the traversal trajectory of the multidimensional mixed feature matrix in the extremely random decision tree cluster, extract the node identifier sequence from the root node to the terminal node in each tree; combine and encode the node identifier sequences corresponding to each decision tree to output logical path labels; the specific operation of tracking the traversal trajectory is to input the multidimensional mixed feature matrix into the extremely random decision tree cluster, and record the index number of the current node according to the decision result of each layer of nodes; the specific operation of extracting the node identifier sequence is to arrange all the node index numbers recorded in the process from the root node, through the intermediate nodes, and finally to the terminal node in order to form the path sequence of the sample under the current tree; the specific operation of performing the combination encoding is to concatenate the first and last paths of all decision trees or map them into binary codes to generate a logical path label that can uniquely represent the feature distribution of the sample.

[0031] In this embodiment, the improved RGCN model includes an optimal transport alignment layer, a random structure perturbation layer, a multi-relation weight transformation layer, a message passing aggregation layer, and a prediction probability mapping layer: The optimal transmission alignment layer receives high-dimensional semantic feature vectors and logical path labels, and converts them into probability distributions using a normalized exponential function, thereby constructing semantic graph node distribution matrices and logical graph node distribution matrices. The normalized exponential function involves substituting the numerical value of each dimension of the input vector into an exponential function with the natural constant e as the base for nonlinear mapping. All mapping results are then summed, and the mapping result of each element is divided by this sum, resulting in a probability distribution vector where all elements are positive and the sum is 1. Based on this, Euclidean distance is calculated using the node feature vectors in the two distribution matrices to construct a cost matrix. Subsequently, an entropy regularization constraint term is introduced, and the Sinkhorn algorithm iteratively calculates the value under the cost matrix constraint. The optimal transmission plane is determined by the Sinkhorn algorithm, which uses the quotient of the cost matrix and the negative ratio of the entropy regularization coefficient as an input exponential function to generate a kernel matrix. An alternating iterative loop is then performed: first, the kernel matrix is normalized in the row direction so that the sum of each row's elements satisfies the source distribution constraint; then, the kernel matrix is normalized in the column direction so that the sum of each column's elements satisfies the target distribution constraint. This process is repeated until the matrix values converge to a preset error range, thus outputting the optimal transmission plane. Finally, this optimal transmission plane is used as a soft-assignment weight matrix, with its values used as connection weights. The alignment degree between semantic graph nodes and logical graph nodes is calculated, constructing an alignment similarity matrix representing node alignment relationships, and from this, the initial cross-graph connection weights are output.

[0032] The random structure perturbation layer is used to define the heterogeneous topology structure constructed by semantic graph nodes and logical graph nodes as the original heterogeneous graph, and to perform random removal operations on the edges in the original heterogeneous graph according to a preset discard probability. Specifically, the random removal operation involves traversing all edges of the original heterogeneous graph, generating random numbers and comparing them with the preset discard probability. If the random number is less than the discard probability, the edge is removed from the graph. Then, the adjacency relationship is reconstructed based on the edge set after the removal operation, generating a perturbation subgraph structure with randomly missing connections. The perturbation subgraph structure is input into the random RGCN processing unit to perform message passing, and Gaussian noise is injected into the node hidden state for random perturbation during this process. Specifically, the Gaussian noise injection operation involves superimposing Gaussian distributed random numbers with a mean of 0 and a preset variance onto the node hidden state. Finally, a node hidden state tensor containing noise features is output.

[0033] The multi-relation weight transformation layer is used to identify different types of edge relationships in heterogeneous topologies and assign an independent learnable weight matrix to each relationship type. The hidden state tensor of a node is used as input, and a linear transformation is performed through the specific weight matrix corresponding to each relationship type to generate relationship-specific feature tensors. Batch normalization is performed on the relationship-specific feature tensors to generate transformed feature tensors. Specifically, identifying edge relationships involves parsing metadata in the heterogeneous graph to distinguish between different types such as semantic graph internal connections, logical graph internal connections, and cross-graph connections. Assigning independent weight matrices involves initializing and maintaining an independent parameter matrix for each relationship type to specifically capture the feature transformation patterns under that relationship. Performing batch normalization involves calculating the mean and variance of a small batch of the transformed feature tensors and performing standardization to accelerate model convergence.

[0034] The message passing aggregation layer is used to perform a Hadamard product operation on the transformed feature tensor and the initial cross-graph connection weights to obtain weighted neighbor features; based on the alignment similarity matrix, it dynamically adjusts the information transmission intensity between semantic nodes and logical nodes, and performs an aggregation operation on the weighted neighbor features along the feature dimension to generate an aggregated feature tensor; the specific operation of performing the Hadamard product operation is to multiply the transformed feature tensor with the corresponding elements of the initial cross-graph connection weight matrix to achieve channel-based weighting; the specific operation of dynamically adjusting the information transmission intensity is to use the alignment similarity matrix as coefficients to amplify or suppress the feature information from neighbor nodes; the specific operation of performing the aggregation operation is to sum or average the weighted neighbor feature tensor along the feature dimension to fuse local neighborhood information.

[0035] The prediction probability mapping layer is used to input the aggregated feature tensor into the fully connected layer to perform dimensionality reduction and linear transformation, generating a logits vector in the field of intelligent medical technology. Softmax normalization is then performed on the logits vector to map the values to a probability interval, outputting the target prediction probability. Specifically, the dimensionality reduction and linear transformation are performed by mapping high-dimensional features to a dimension space representing the number of categories through the fully connected layer. The Softmax normalization is performed by taking the exponential function value for each element of the logits vector and dividing it by the sum of all exponential values. Finally, the target prediction probability is output by using the normalized value as the model's prediction confidence for each category, and the sum of all prediction probabilities equals 1.

[0036] In this embodiment, S6 specifically includes: S61. Analyze the Token interaction intensity matrix, extract the attention scalar corresponding to each Token, perform numerical normalization processing and linearly map it to the target color space numerical domain, and construct a discrete mapping relationship between the Token sequence index and the color channel vector value. Specifically, the operation of extracting the attention scalar is to calculate the average attention weight value of each Token position along the feature dimension of the attention matrix to obtain a real number sequence. The specific operation of performing numerical normalization processing is to scale the values in the real number sequence to the range of 0 to 1 through Min-Max processing. The specific operation of performing linear mapping to the target color space is to use a preset color mapping lookup table to convert the values in the range of 0 to 1 into color vector values of the red, green, and blue channels. The specific operation of constructing the discrete mapping relationship is to establish a one-to-one correspondence dictionary between the position index of the Token in the sequence and the calculated color vector value to form a coloring reference table.

[0037] S62. Based on discrete mapping relationships, tensor operations are performed on the tokens in the triage request text, and color channel vector values are injected into the text representation matrix to calculate a text rendering matrix with color attribute features. Specifically, the tensor operation is performed by copying and concatenating the corresponding color vector values to the feature vectors corresponding to each token in the text representation matrix according to the coloring reference table, or by superimposing color weights on the feature dimensions. Specifically, the color channel vector values are injected by fusing color information into the original text features so that the feature representation of each token contains the original semantic information and the corresponding visual color information. Specifically, the text rendering matrix is calculated by generating a three-dimensional tensor, where the first two dimensions correspond to the row and column layout of the text sequence, and the third dimension stores the color channel data, forming a numerical matrix that can be directly displayed.

[0038] S63. Perform dimensional transformation and format encapsulation on the text rendering matrix to generate standardized visual heatmap data. Specifically, the dimensional transformation involves converting the data type of the text rendering matrix into an integer format required for image processing and adjusting the channel dimensions to the standard image storage order. The format encapsulation involves adding image file header information, resolution parameters, and color mode descriptors to the converted matrix and encapsulating it into a binary data stream in bitmap or network image format. The final output of the generated visual heatmap data is a complete image file containing text characters and highlight color information, which intuitively displays the importance distribution of tokens.

[0039] In this embodiment, S7 specifically includes: S71. Perform maximum likelihood estimation on the target prediction probability vector to locate the index coordinates corresponding to the maximum probability value. Specifically, the maximum likelihood estimation operation involves iterating through all element values in the target prediction probability vector and using the Argmax function to search for and lock the node with the largest value. The index coordinates are located by obtaining the array index position of the element with the maximum probability in the vector and outputting this index as the preliminary index coordinates of the prediction result to identify the best-matching department category.

[0040] S72. Perform a hash search in the preset department code dictionary based on the index coordinates to parse out the target department identifier; the specific operation of performing the hash search is to input the index coordinates as the input key value into the preloaded department code dictionary hash table; the specific operation of parsing the target department identifier is to directly locate and extract the corresponding value range data through the hash table, obtain the standard department name or department code that precisely matches the index coordinates, and complete the conversion from numerical index to human-readable department information.

[0041] S73. Input the target department identifier, visualization heatmap data, and logical path label into the data serialization interface, perform key-value pair mapping and binary encapsulation, and construct a structured intelligent triage recommendation result. Specifically, the key-value pair mapping operation maps the target department identifier to the "department" field, the visualization heatmap data to the "heatmap" field, and the logical path label to the "logic_path" field, thus constructing a standardized attribute dictionary. The binary encapsulation operation uses a serialization protocol to convert the attribute dictionary into a binary data stream in JSON or Protobuf format, thus constructing a complete intelligent triage recommendation result containing diagnostic suggestions, visualization basis, and decision path.

[0042] Example 1: To verify the feasibility of this invention in smart healthcare services, the method was applied to the intelligent triage service platform of a large-scale comprehensive tertiary hospital in a province (hereinafter referred to as "Hospital H"). Traditional hospital triage systems typically employ rule engines based on keyword matching or ordinary text classification algorithms. These methods not only struggle to accurately capture core disease characteristics when patient descriptions are complex, colloquial, and symptom expressions are vague, but also fail to explain the specific decision-making basis for recommending departments, resulting in limited triage accuracy and insufficient patient trust. To address these issues, Hospital H decided to adopt the intelligent triage recommendation method proposed in this invention, based on multimodal feature fusion and optimal transmission alignment.

[0043] During implementation, Hospital H used its in-hospital self-service registration terminals and mobile applications to collect patients' triage request texts. A deep semantic extraction network was constructed using the sequence position encoding layer and block sparse attention layer to transform the input text into a high-dimensional semantic feature vector. Simultaneously, Hospital H's medical expert team performed refined part-of-speech tagging and path indexing on structured symptom keywords, training an extremely random decision tree cluster to generate logical path labels. Furthermore, the system generated visualized heatmap data based on the token interaction strength matrix, providing interpretability support for model decision-making.

[0044] Hospital H maps discrete symptom keywords into low-dimensional continuous vectors through the feature encoding mapping layer and the multi-dimensional feature concatenation layer, and concatenates them with textual statistical feature tensors to construct a multi-dimensional hybrid feature matrix. Through the optimal transport alignment layer, the optimal transport plane between the high-dimensional semantic feature vector and the logical path label is calculated, outputting the initial cross-graph connection weights, effectively solving the alignment problem between semantic and logical modalities. Subsequently, a random structure perturbation layer and a multi-relation weight transformation layer are introduced to inject Gaussian noise into the hidden states of nodes in the heterogeneous graph neural network and perform relation-specific linear transformations, enhancing the model's robustness and feature representation ability. Using a message passing aggregation layer and a prediction probability mapping layer, the target prediction probability is output, and the target department identifier is parsed using maximum likelihood estimation.

[0045] In the core recommendation and visualization stage, the method of this invention performs a hash search in a preset department coding dictionary based on index coordinates to accurately locate the target department, while generating a text rendering matrix with color attribute features. Subsequently, the target department identifier, visualization heatmap data, and logical path labels are input into a data serialization interface, and binary encapsulation is performed to construct a structured intelligent triage recommendation result containing department suggestions, highlighted heatmaps, and decision paths, realizing a closed loop from text analysis to visualization decision support.

[0046] During implementation, the technical team at Hospital H discovered that, compared to traditional manual triage and rule-based guidance methods, the method of this invention significantly improves the accuracy and interpretability of guidance recommendations. Traditional methods cannot handle ambiguous descriptions and lack decision-making path visualization, while the method of this invention effectively achieves precise departmental mapping for complex conditions and transparency in the decision-making process through block sparse attention mechanisms, optimal transport cross-graph alignment, and visual heatmap feedback.

[0047] To further verify the actual performance of the method of the present invention, Hospital H conducted a detailed comparative test between the method of the present invention and the traditional method. The specific performance data is shown in Table 1: Table 1. Performance Comparison of Hospital Intelligent Triage Recommendation Methods in the Field of Intelligent Medical Technology

[0048] As shown in Table 1, the performance of the intelligent triage and recommendation system was comprehensively improved after applying the method of this invention. The accuracy rate of department recommendations increased from 84.2% with traditional methods to 96.5%, and the accuracy rate of handling complex descriptions improved from 76.8% to 94.2%, significantly enhancing the system's ability to understand descriptions of difficult and complicated cases. The department mis-assignment rate decreased significantly from 12.5% to 2.1%, effectively alleviating the pressure on outpatient departments. The average response time for triage was shortened from 1.5 seconds to 0.4 seconds, significantly enhancing the system's interactive experience. In addition, the rate of manual triage intervention decreased from 25.0% to 5.5%, and the doctor-patient matching rate increased from 85.0% to 97.5%, significantly optimizing the allocation of medical resources. The user trust score also improved significantly, from 7.2 points to 9.1 points.

[0049] Through the method of this invention, Hospital H has successfully realized the intelligent and precise guidance service, effectively solving the pain point of patients "knowing their disease but not their department", ensuring the efficient operation of outpatient order, greatly improving the digital level of medical services and patient satisfaction, reducing the workload of nurses at the guidance desk, enhancing the stability and interpretability of the guidance system, and providing strong technical support for the construction of smart hospitals.

[0050] The above description is only a preferred embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any equivalent substitutions or modifications made by those skilled in the art within the scope of the technology disclosed in the present invention, based on the technical solution and inventive concept of the present invention, should be covered within the scope of protection of the present invention.

Claims

1. A department recommendation and triage method based on deep learning, characterized in that, Includes the following steps: S1. Collect patient triage request text, extract spatiotemporal tuples through timestamp parsing and geographic coordinate positioning, input spatiotemporal tuples into spatiotemporal epidemiology knowledge base to perform feature index retrieval, and output spatiotemporal feature vectors containing seasonal disease patterns and regional epidemic trends. S2. Perform word segmentation and vector mapping processing on the triage request text to construct a text token sequence. Perform feature-level linear weighted fusion on the spatiotemporal feature vector and the text token sequence to generate a text embedding matrix that integrates spatiotemporal context information. S3. Input the text embedding matrix into the BigBird model, use the block sparse attention mechanism, integrate sliding window local modeling and global token information transmission, calculate the semantic dependencies within the token sequence, and generate the token interaction strength matrix and high-dimensional semantic feature vector. S4. Perform medical entity naming recognition on the triage request text to extract structured symptom keywords, input the structured symptom keywords into an extremely random tree model to perform integrated reasoning of random feature selection and random threshold segmentation, and output logical path labels; S5. Construct semantic graphs and logical subgraphs based on high-dimensional semantic feature vectors and logical path labels, input them into the improved RGCN model, guide the cross-graph message transmission of semantic and logical nodes through Sinkhorn optimal transmission alignment, random structure perturbation and relation weight transformation mechanism, generate graph alignment feature vectors, and output the target prediction probability through Softmax regression. S6. Map the Token interaction intensity matrix to the color space numerical domain, perform color encoding rendering on the Token in the triage request text, and generate a visual heatmap data containing the highlighted distribution of key features. S7. Select the target department identifier with the highest confidence based on the target prediction probability, and perform structured encapsulation of the target department identifier, visual heat map data and logical path label to output intelligent triage recommendation results.

2. The department recommendation and triage method based on deep learning according to claim 1, characterized in that, S1 specifically includes: S11. Collect the patient's triage request text and perform formatted parsing. Extract the text generation timestamp and the patient's terminal IP address. Perform address inversion operation through the IP address geolocation database. Combine the text generation timestamp and the inverted geographic coordinates to construct a spatiotemporal tuple. S12. Input the spatiotemporal tuple into the preset spatiotemporal epidemiology knowledge base, match the target administrative division based on the geographic coordinates, and match the historical epidemiological data curve of the same period based on the text-generated timestamp to extract the disease activity index of the target administrative division in the current time window. S13. Generate a spatiotemporal feature vector by linear transformation of the disease activity index, and perform feature-level weighted fusion of the spatiotemporal feature vector with the word vector matrix of the triage request text to generate a text embedding matrix that integrates spatiotemporal background information.

3. The department recommendation and triage method based on deep learning according to claim 1, characterized in that, S2 specifically includes: S21. Use the byte pair encoding algorithm to perform sub-word segmentation on the triage request text, generate a text token index sequence, and input the text token index sequence into the pre-trained word embedding layer to perform lookup mapping, and output the initial word embedding matrix; S22. Construct a fully connected linear mapping layer to project the spatiotemporal feature vectors onto the same feature semantic space as the initial word embedding matrix, generating a spatiotemporally aligned feature tensor. S23. Based on the length of the text token index sequence, perform a copy and concatenation operation on the spatiotemporal alignment feature tensor to generate a spatiotemporal context matrix with the same dimension as the initial word embedding matrix. S24. Perform positional encoding superposition and residual connection calculation on the initial word embedding matrix and the spatiotemporal context matrix to generate a text embedding matrix that integrates spatiotemporal context information.

4. The department recommendation and triage method based on deep learning according to claim 1, characterized in that, The BigBird model includes a sequence position encoding layer, a block sparse attention layer, and a position-wise feedforward network layer: The sequence position encoding layer is used to calculate the sequence position information of the token based on the sine function, and performs element-wise addition operation based on the broadcast mechanism on the generated position encoding vector and the text embedding matrix to generate the input tensor; The block sparse attention layer is used to construct a sparse attention pattern consisting of local window attention, global attention, and random attention; the input tensor is projected into a query, key, and value matrix, and the attention score matrix between tokens is calculated based on the sparse attention pattern; Perform a normalized exponential function operation on adjacent tokens and global tokens within a preset context neighborhood radius to generate a token interaction strength matrix. Then, use the token interaction strength matrix to perform a weighted summation on the value matrix to output a context semantic representation matrix. The position-wise feedforward network layer is used to input the context semantic representation matrix into a neural network containing two linear transformation layers, which is processed by the GeLU nonlinear activation function in between to perform nonlinear mapping and dimension upscaling operations on the feature dimension, and output a high-dimensional semantic feature vector. Output the token interaction strength matrix and high-dimensional semantic feature vector.

5. The department recommendation and triage method based on deep learning according to claim 1, characterized in that, The highly random tree model includes a feature encoding mapping layer, a multidimensional feature concatenation layer, a random splitting ensemble layer, a path index generation layer, and a probability distribution output layer. The feature encoding mapping layer is used to map structured symptom keywords to discrete index identifiers in a predefined medical thesaurus. Based on the index identifiers, a row vector query is performed in the predefined word embedding matrix to obtain an initial word embedding tensor. The initial word embedding tensor is then transformed linearly to map discrete semantic symbols to a symptom feature tensor in a low-dimensional continuous vector space. The multidimensional feature concatenation layer is used to calculate the product of the word frequency weight and the inverse document frequency weight of the structured symptom keywords, obtain the word frequency-inverse document frequency feature value, and perform one-hot encoding mapping on the part-of-speech tags to which the keywords belong to generate a text statistical feature tensor; the text statistical feature tensor and the symptom feature tensor are concatenated along the feature dimension to construct a multidimensional hybrid feature matrix; The random splitting ensemble layer is used to initialize the topology of multiple decision trees. For the multidimensional mixed feature matrix of each node, random sampling of feature subsets is performed to determine the candidate splitting feature dimension, and uniform random sampling is performed within the numerical distribution range of each candidate feature to obtain the splitting threshold. Based on the candidate splitting feature dimension and the splitting threshold, a binary decision function is constructed to recursively allocate the multidimensional mixed feature matrix to the left child node or the right child node until the number of node samples is lower than the preset threshold or the decision tree depth reaches the maximum depth limit, thereby generating an extremely random decision tree cluster. The path index generation layer is used to track the traversal trajectory of the multidimensional hybrid feature matrix in the extremely random decision tree cluster, extract the node identifier sequence from the root node to the terminal node in each tree; combine and encode the node identifier sequences corresponding to each decision tree, and output logical path labels.

6. The department recommendation and triage method based on deep learning according to claim 1, characterized in that, The improved RGCN model includes an optimal transport alignment layer, a random structure perturbation layer, a multi-relation weight transformation layer, a message passing aggregation layer, and a prediction probability mapping layer. The optimal transmission alignment layer is used to receive high-dimensional semantic feature vectors and logical path labels, which are mapped to semantic graph node distribution matrices and logical graph node distribution matrices, respectively. A cost matrix based on node feature similarity is constructed, an entropy regularization constraint term is introduced, and the optimal transmission plane is solved iteratively through the Sinkhorn algorithm. The optimal transmission plane is used as a soft-assignment weight matrix to calculate the alignment similarity matrix between semantic graph nodes and logical graph nodes, and the initial cross-graph connection weights are output. The random structure perturbation layer is used to define the heterogeneous topology structure constructed by semantic graph nodes and logical graph nodes as the original heterogeneous graph; according to the preset discard probability, the edges in the original heterogeneous graph are randomly removed to generate a perturbation subgraph structure; the perturbation subgraph structure is input into the random RGCN processing unit, and Gaussian noise is injected into the node hidden state during the message passing process to perform random perturbation, and the node hidden state tensor containing noise features is output. The multi-relation weight transformation layer is used to identify different types of edge relations in heterogeneous topologies and assigns an independent learnable weight matrix to each type of relation. The node hidden state tensor is taken as input, and linear transformation is performed through the specific weight matrix corresponding to each relation type to generate relation-specific feature tensors. Batch normalization is performed on relation-specific feature tensors to generate transformed feature tensors; The message passing aggregation layer is used to perform a Hadamard product operation on the transformed feature tensor and the initial cross-graph connection weights to obtain weighted neighbor features; based on the alignment similarity matrix, the information transmission strength between semantic nodes and logical nodes is dynamically adjusted, and the weighted neighbor features are aggregated along the feature dimension to generate an aggregated feature tensor. The prediction probability mapping layer is used to input the aggregated feature tensor into the fully connected layer to perform dimensionality reduction and linear transformation, generating a logits vector in the field of intelligent medical technology; Softmax normalization is performed on the logits vector in the field of intelligent medical technology to map the values to the probability interval and output the target prediction probability.

7. The department recommendation and triage method based on deep learning according to claim 1, characterized in that, S6 includes the following steps: S61. Analyze the Token interaction intensity matrix, extract the attention scalar corresponding to each Token, perform numerical normalization and linearly map it to the target color space numerical domain, and construct a discrete mapping relationship between the Token sequence index and the color channel vector value. S62. Based on the discrete mapping relationship, perform tensor operations on the Token in the triage request text, inject the color channel vector values into the text representation matrix, and calculate the text rendering matrix with color attribute features. S63. Perform dimensional transformation and format encapsulation on the text rendering matrix to generate standardized visual heatmap data.

8. The department recommendation and triage method based on deep learning according to claim 1, characterized in that, S7 includes the following steps: S71. Perform maximum likelihood estimation on the target prediction probability vector and locate the index coordinates corresponding to the maximum probability value. S72. Perform a hash search in the preset department code dictionary based on the index coordinates to parse out the target department identifier; S73. The target department identifier, visual heatmap data, and logical path label input data serialization interface are used to perform key-value pair mapping and binary encapsulation to build structured intelligent triage recommendation results.