System for automated delineation of tumor margins in radiological images using hybrid convolutional transformer networks

The hybrid convolutional transformer network addresses the limitations of existing methods by integrating local and global feature representations for precise tumor boundary delineation, enhancing accuracy and efficiency in clinical settings.

DE202026102269U1Undetermined Publication Date: 2026-07-02EASWARI ENGINEERING COLLEGE TAMIL NADU +3

Patent Information

Authority / Receiving Office
DE · DE
Patent Type
Utility models
Current Assignee / Owner
EASWARI ENGINEERING COLLEGE TAMIL NADU
Filing Date
2026-04-22
Publication Date
2026-07-02

AI Technical Summary

Technical Problem

Existing automated and semi-automated tumor segmentation methods struggle with capturing high-level contextual information, are computationally intensive, and lack generalizability across different clinical scenarios, leading to inconsistent and inaccurate delineation of tumor boundaries, particularly in heterogeneous tumors and low-contrast images.

Method used

A hybrid convolutional transformer network that integrates convolutional layers for local feature extraction with transformer-based attention mechanisms to capture global context, using cross-attention feature fusion and edge-sensitive loss functions for precise tumor margin delineation.

Benefits of technology

Enables robust, efficient, and accurate tumor margin delineation across various imaging conditions and datasets, supporting real-time clinical applications with reduced variability and improved contour accuracy.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure 00000000_0000_ABST
    Figure 00000000_0000_ABST
Patent Text Reader

Abstract

A system for automated tumor margin determination in radiological images, comprising: an image acquisition interface configured to receive radiological image data from one or more imaging modalities; a preprocessing unit operationally coupled to the image acquisition interface and configured to perform intensity normalization, spatial resampling, and noise reduction on the received image data to generate standardized image inputs; a feature extraction unit comprising a plurality of convolutional layers arranged in a hierarchical structure and configured to extract spatial features of different orders of magnitude from the standardized image inputs;a transformer coding unit operationally coupled to the feature extraction unit and configured to generate context-sensitive feature representations by applying self-attention operations over spatial areas of the extracted features; a fusion unit operationally coupled to both the feature extraction unit and the transformer coding unit and configured to combine features derived from convolutions and context representations derived from transformers into a unified feature map; a decoding unit operationally coupled to the fusion unit and configured to generate a segmentation map corresponding to the tumor regions by incrementally increasing and reconstructing the spatial resolution; a boundary refinement unit configured to improve the delineation of tumor margins in the segmentation map;a processing unit that is operationally coupled with the preprocessing unit, the feature extraction unit, the transformer coding unit, the fusion unit, the decoding unit, and the boundary refinement unit, wherein the processing unit executes instructions stored in a memory unit to perform automated tumor boundary delineation; and a display interface configured to overlay the delineated tumor boundaries onto the radiological image data.
Need to check novelty before this filing date? Find Prior Art

Description

Technical field of the invention The present disclosure relates generally to the field of medical image processing and computer-aided diagnosis and in particular to a system and associated device architecture for the automated delineation of tumor boundaries in radiological images using a hybrid deep learning framework that integrates convolutional neural networks and transformer-based attention mechanisms. Background of the invention Precise tumor delineation in radiological imaging techniques such as magnetic resonance imaging (MRI), computed tomography (CT), and positron emission tomography (PET) is crucial for the diagnosis, treatment planning, and monitoring of oncological diseases. Tumor segmentation is typically performed manually by radiologists. This process is time-consuming, subjective, and prone to error, particularly in cases of heterogeneous tumor morphology, low contrast, or indistinct borders. Differences between treating physicians further contribute to inconsistencies in treatment planning, especially in radiation therapy, where precise tumor boundaries are essential. Existing automated and semi-automated segmentation methods include thresholding, region growth, level-set methods, and atlas-based approaches. However, these methods often fail in complex clinical scenarios due to their limited ability to capture high-level contextual information and adapt to varying tumor shapes and textures. Newer approaches based on convolutional neural networks (CNNs), such as U-Net architectures, have demonstrated improved performance by learning hierarchical spatial features from image data. Nevertheless, CNN-based methods are inherently limited by their localized receptive fields and struggle to capture remote dependencies and the global context that are essential for accurately delineating irregular tumor margins. Transformer-based architectures, originally developed for natural language processing, have recently been adapted for image processing tasks and offer superior capabilities for modeling long-distance relationships through self-awareness mechanisms. However, pure transformer models require large datasets and are computationally intensive, making them less suitable for medical imaging tasks where annotated data is often limited. Furthermore, transformer architectures may lack the fine spatial resolution required for precise boundary definition. Therefore, there is a need for an advanced system that combines the strengths of convolutional architectures in capturing local spatial features with the global context modeling capabilities of transformer networks, thereby enabling robust and precise delineation of tumor boundaries under various radiological imaging conditions.The precise delineation of tumor margins in radiological images has long been a fundamental challenge in medical image analysis, particularly with modalities such as magnetic resonance imaging (MRI), computed tomography (CT), and positron emission tomography (PET). These imaging techniques provide crucial structural and functional information for diagnosis, staging, and treatment planning in oncology. However, the inherent complexity of tumor morphology, the variability of imaging protocols, and the presence of noise and artifacts significantly complicate the accurate determination of tumor margins. Traditionally, tumor margin delineation has relied on manual annotation by experienced radiologists who visually assess the image slices and mark the relevant areas.Although manual delineation is considered the clinical gold standard, it has significant limitations, including variability between examiners, inconsistency within a single examiner, high time expenditure, and susceptibility to fatigue-related errors. These challenges are particularly evident in high-volume hospitals and in cases with complex tumor structures such as gliomas, metastases, or infiltrative tumors, where the boundaries are ambiguous and poorly defined. To overcome these limitations, early computer-aided approaches to tumor segmentation based on classical image processing techniques were developed. Threshold methods, for example, attempt to separate tumor tissue from healthy tissue by applying intensity-based thresholds. Although simple and computationally efficient, threshold methods are highly sensitive to intensity variations and often fail with heterogeneous tumors or low-contrast images. Region growth methods extend segmentation from starting points based on similarity criteria, but their performance is highly dependent on the choice of starting points and they tend to encroach on adjacent, non-tumor areas with weak intensity gradients. Edge-based methods, including gradient operators and active contour models, attempt to detect tumor boundaries by identifying intensity discontinuities.However, these methods struggle with noise and blurred edges, which are common in medical imaging, and often require careful parameter adjustment. Model-based approaches such as level-set methods and deformable models have been introduced to improve segmentation accuracy by incorporating geometric boundary conditions and pre-known shape information. These techniques iteratively develop contours to adapt object boundaries, thus enabling flexible representation of complex shapes. While they offer greater robustness compared to basic methods, level-set approaches are computationally intensive and sensitive to initialization. Furthermore, they often require manually created energy functions and may not converge correctly in images with weak or missing boundary information. Atlas-based segmentation methods represent another class of techniques in which pre-labeled anatomical templates are registered onto patient images to guide segmentation.Although useful in certain anatomical regions with consistent structure, atlas-based methods are less effective for tumor segmentation due to the high variability in tumor size, shape, and location. Registration errors and anatomical differences further impair their performance. With the advent of machine learning, supervised classification methods such as Support Vector Machines (SVMs), Random Forests, and k-Nearest Neighbors have been introduced for tumor segmentation. These methods utilize manually created features extracted from images, including intensity, texture, and shape descriptors, to classify pixels or regions into tumor and non-tumor categories. While these approaches offer improvements over purely heuristic methods, their performance is highly dependent on the quality and relevance of the manually created features. Feature development is time-consuming and often fails to capture the complex patterns inherent in medical images. Furthermore, these models lack scalability and generalizability when applied to different datasets or imaging modalities. With the rapid advancement of deep learning, convolutional neural networks (CNNs) have established themselves as the dominant paradigm for medical image segmentation. Architectures such as fully convolutional networks (FCNs) and encoder-decoder models, particularly U-Net and its variants, have achieved significant success in delineating tumor regions. CNNs automatically learn hierarchical feature representations directly from the data, eliminating the need for manual feature development. The use of skip connections in U-Net allows for the combination of low-level spatial information with high-level semantic features, thus improving localization accuracy. Despite these advantages, CNN-based methods have inherent limitations due to their localized receptive fields.Convolutional operations are inherently limited to processing information within a limited neighborhood, restricting the model's ability to capture dependencies across greater distances and the global context. This limitation becomes critical in tumor segmentation, as contextual information from distant image regions may be required to distinguish tumor tissue from normal anatomical structures. Efforts to overcome the limitations of CNNs led to the development of multi-scale and dilated convolution techniques, which expand the receptive field without significantly increasing computational costs. While these methods offer partial improvements, they still rely on fixed convolution kernels and do not fully capture dynamic relationships between distant pixels. Attention mechanisms have also been incorporated into CNN architectures to improve feature representation by weighting relevant regions. Although attention-based CNNs improve performance, they often operate within limited spatial areas and may not effectively model global dependencies. More recently, transformer-based architectures have been introduced into computer vision, offering a fundamentally different approach to feature modeling. Vision Transformers (ViTs) and their variants utilize self-awareness mechanisms to compute relationships between all image pairs, thereby capturing long-term dependencies and global contextual information. This capability is particularly beneficial for tumor segmentation, as understanding the broader anatomical context is essential for accurate delineation. However, transformer models also present significant challenges. They typically require large, annotated datasets for effective training, which are often unavailable in medical imaging due to the high costs and expertise required for annotation.Furthermore, transformers lack inherent inductive distortions such as locality and translational invariance, which are naturally encoded in CNNs. Therefore, pure transformer models may struggle to capture fine spatial details and deliver coarse segmentation results. To leverage the complementary strengths of CNNs and transformer architectures, hybrid approaches have been proposed. In such systems, convolutional layers are used for local feature extraction, while transformer layers model global relationships. Although promising, existing hybrid models often exhibit suboptimal integration strategies, where the interaction between convolutional and transformer features is not fully optimized. This can lead to redundancy, increased computational complexity, and inefficient use of learned representations. Furthermore, many existing models do not adequately address the issue of edge accuracy, which is critical for clinical applications such as surgical planning and radiotherapy. Another important challenge in tumor segmentation is the variability and imbalance in medical datasets. Tumor regions often occupy only a small portion of the image, leading to class imbalance and biasing models toward background regions. Conventional loss functions such as cross-entropy are unsuitable for such scenarios and exhibit low sensitivity to small tumor regions. Although specialized loss functions such as the Dice coefficient and focal loss have been proposed, these may not adequately account for contouring accuracy and shape consistency. Furthermore, variations in imaging protocols, scanner types, and patient populations lead to domain shifts that can negatively impact model performance when applied to unknown data. Computational efficiency and real-time capability remain crucial challenges. Many advanced deep learning models require significant computing resources, limiting their use in resource-constrained clinical environments. High memory consumption and long inference times hinder integration into real-time diagnostic workflows. Furthermore, the lack of interpretability in deep learning models poses a challenge for clinical applications, as medical professionals require transparent and comprehensible results to trust automated systems. In summary, while significant progress has been made in automated tumor segmentation, existing solutions suffer from several technical limitations. These include inadequate modeling of the global context, insufficient delineation accuracy, reliance on large annotated datasets, computational inefficiency, and limited generalizability across different clinical scenarios. These drawbacks highlight the need for an improved system that effectively integrates local and global feature representations, increases delineation accuracy, and operates efficiently under practical clinical conditions. Summary of the invention This disclosure describes a system and device for automated tumor margin delineation in radiological images using a hybrid convolutional transformer network. The system integrates a convolutional feature extraction pipeline with a transformer-based context coding mechanism and a multi-scale decoding structure for precise segmentation. The device includes specialized hardware components, such as image data acquisition interfaces, a high-performance processing unit, memory, and a display interface for real-time visualization of the delineated tumor margins. The system employs a hybrid architecture in which convolutional layers extract low- and mid-level spatial features, while transformer-encoder layers capture dependencies across larger distances in the image. A cross-attention feature fusion mechanism integrates features from both domains, enabling improved edge detection and robustness against noise and artifacts. Segmentation is then refined using edge-sensitive loss functions and post-processing steps to ensure clinical-grade accuracy. The present invention aims to provide a system for automated tumor margin determination in radiological images that overcomes the limitations of manual segmentation and conventional computer-aided methods by enabling precise, consistent, and reproducible identification of tumor regions across different imaging modalities. The invention aims to reduce reliance on subjective clinical interpretation and minimize variability between and within examiners by introducing a data-driven, technically robust approach to delineating complex tumor margins. A further objective of the invention is the development of a hybrid computing architecture that integrates convolutional neural networks with transformer-based attention mechanisms to simultaneously capture fine local spatial features and global context dependencies in radiological images. The combination of these complementary modeling paradigms is intended to improve segmentation accuracy, particularly in heterogeneous tumors, indistinct borders, and low-contrast imaging conditions. A further objective of the invention is to provide a device-based implementation for real-time or near-real-time processing of high-resolution radiological data. This enables seamless integration into clinical workflows such as diagnostic evaluation, surgical planning, and radiotherapy planning. The invention aims to ensure highly efficient system operation with clinically acceptable latency and resource utilization. A further objective of the invention is to improve contour accuracy through the use of advanced feature fusion strategies and edge-based optimization methods. This enables precise delineation of tumor margins even in the presence of noise, artifacts, and anatomical variability. The system is designed to generate segmentation results with high spatial accuracy, which is crucial for treatment planning and predicting treatment success. A further objective of the invention is to improve the generalizability across different datasets, imaging techniques, and patient populations through the integration of adaptive learning mechanisms and robust preprocessing techniques. This ensures that the system remains effective even in real clinical environments with variable imaging protocols and devices. A further objective of the invention is to overcome class imbalances and detect small tumor regions through the use of specialized training strategies and loss functions that consider both region-based accuracy and boundary integrity. The system aims to achieve high sensitivity and specificity in the detection of tumor tissue, including small or early-stage lesions that are often difficult to identify. A further objective of the invention is to provide an interactive and user-friendly output interface that enables clinicians to visualize, validate, and, if necessary, refine the automatically generated tumor margins. This strengthens user confidence and facilitates clinical application by ensuring that the system supports, rather than replaces, expert decision-making. A further objective of the invention is to provide a scalable and extensible framework that can be adapted to different tumor types and imaging techniques without significant redesign, thus enabling broader applicability in medical imaging and oncology. The invention also aims to support integration with existing hospital information systems and picture archiving and communication systems (PACS) for efficient data processing and optimized workflow management. Overall, the invention aims to provide a technically advanced, reliable and clinically applicable solution for automated tumor margin delineation, which significantly improves accuracy, efficiency and ease of use compared to existing methods. BRIEF DESCRIPTION OF THE IMAGE These and other features, aspects and advantages of the present invention will be better understood if the following detailed description is read with reference to the accompanying drawing, in which the same symbols represent the same parts: Fig. 1 shows a block diagram of a system for the automated delineation of tumor boundaries in radiological images. Furthermore, those skilled in the art will recognize that the elements in the drawing are simplified and not necessarily drawn to scale. For example, the flowcharts illustrate the process by highlighting the main steps to facilitate understanding of the present disclosure. With regard to the construction of the device, one or more components may be represented in the drawing by conventional symbols. The drawing may show only those specific details relevant to understanding the embodiments of the present disclosure, so as not to clutter the drawing with details that are already apparent to those skilled in the art from the description contained herein. Detailed description of the invention To facilitate understanding of the principles of the invention, reference is made below to the embodiment shown in the drawing, which is described using specific terms. It is understood, however, that this does not limit the scope of protection of the invention. Rather, modifications and further developments of the depicted system, as well as further applications of the inventive principles shown therein, are conceivable, insofar as they would normally occur to a person skilled in the art in the field of the invention. It will be clear to those skilled in the art that the foregoing general description and the following detailed description are exemplary and explanatory of the invention and are not to be understood as a limitation of it. References to “an aspect”, “another aspect”, or similar phrases in this description mean that a particular feature, structure, or property described in connection with the embodiment is included in at least one embodiment of the present disclosure. Therefore, phrases such as “in one embodiment”, “in another embodiment”, and similar expressions in this description may, but do not necessarily, all refer to the same embodiment. The terms "includes," "comprehensive," or similar expressions denote non-exclusive inclusion. Thus, a procedure or method containing a list of steps does not only include those steps but may also include further steps not explicitly listed or inherent in the procedure or method. Likewise, the statement "includes..." for one or more devices, subsystems, elements, structures, or components, without further limitations, does not preclude the existence of other devices, subsystems, elements, structures, or components. Unless otherwise defined, all technical and scientific terms used herein have the same meanings generally known to those skilled in the art in the field to which this invention belongs. The systems, methods, and examples described herein serve only for illustration and are not to be understood as limiting. Embodiments of the present disclosure are described in detail below with reference to the attached drawing. Fig. 1 shows a block diagram of a system for automated tumor margin determination in radiological images. The system 100 comprises: an image acquisition interface (102) configured to receive radiological image data from one or more imaging modalities; a preprocessing unit (104) operationally coupled to the image acquisition interface and configured to perform intensity normalization, spatial resampling, and noise reduction on the received image data to generate standardized image inputs; a feature extraction unit (106) comprising multiple convolutional layers in a hierarchical structure and configured to extract multiscale spatial features from the standardized image inputs;a transformer coding unit (108) operationally coupled to the feature extraction unit and configured to generate context-sensitive feature representations by applying self-attention operations to spatial areas of the extracted features; a fusion unit (110) operationally coupled to both the feature extraction unit and the transformer coding unit and configured to combine convolution-derived features and transformer-derived context representations into a unified feature map; a decoding unit (112) operationally coupled to the fusion unit and configured to generate a segmentation map corresponding to the tumor regions by stepwise upscaling and reconstructing the spatial resolution; a tumor margin refinement unit (114) configured to improve the delineation of the tumor margins in the segmentation map;a processing unit (116) that is operationally coupled with the preprocessing unit, the feature extraction unit, the transformer coding unit, the fusion unit, the decoding unit, and the tumor margin refinement unit, wherein the processing unit executes instructions stored in a memory unit to perform automatic tumor margin delineation; and a display interface (118) configured to display the delineated tumor margins superimposed on the radiological image data. In one embodiment, the preprocessing unit (104) is further configured to perform modality-specific intensity standardization by mapping the pixel intensity distributions to a predefined statistical range and applying histogram equalization to improve the contrast between tumor and surrounding tissue. In one embodiment, the feature extraction unit (106) comprises residual convolution layers with skip connections configured to preserve spatial information across multiple resolution levels while mitigating gradient degradation during training and inference. In one embodiment, the feature extraction unit (106) is configured to generate feature maps on multiple scales by using convolution operations with different kernel sizes and step sizes, thereby enabling the representation of both fine-grained and coarse structural features of tumor regions. In one embodiment, the transformer coding unit (108) comprises a plurality of attention layers configured to compute query representations, key representations, and value representations from flattened feature maps and to generate attention-weighted feature outputs based on similarity measures between spatial positions. In one embodiment, the transformer coding unit (108) further comprises a position coding device configured to embed spatial position information into the flattened feature maps before attention operations are applied, thereby preserving spatial relationships within the image. In one embodiment, the fusion unit (110) is configured to perform cross-attention between convolution- and transformer-derived features, calculating attention weights to selectively highlight relevant tumor-related regions while suppressing background information. In one embodiment, the fusion unit (110) further comprises channel-wise weighting means and spatial weighting means configured to adaptively adjust the importance of features both across the channel dimension and across the spatial dimension. In one embodiment, the decoding unit (112) comprises a plurality of upsampling layers and convolution layers arranged hierarchically, with each upsampling layer increasing the spatial resolution and combining corresponding high-resolution features from the feature extraction unit via skip connections. In one embodiment, the decoding unit (112) is configured to generate a probabilistic segmentation map in which each pixel or voxel is assigned a probability value corresponding to the presence of a tumor. The described system is implemented using physical components that work together in a coordinated manner to determine tumor boundaries. The image acquisition interface includes dedicated input circuits and communication interfaces for receiving radiological data from imaging devices. The preprocessing unit consists of signal conditioning circuits, including analog-to-digital converters, normalization circuits, and filter modules that physically process image signals to standardize intensity, resolution, and noise characteristics. The feature extraction unit consists of multi-layered processing hardware, such as parallel multiplication-accumulation arrays and convolution engines, arranged in a hierarchical architecture to extract spatial features at various scales.The transformer coding unit is implemented in specialized matrix computational hardware and attention processing circuitry that generates weighted feature relationships across spatial regions. The fusion unit comprises dedicated cross-correlation and weighting circuitry that physically combines multiple feature streams into a unified representation. The decoding unit is implemented using upsampling hardware and reconstruction circuitry that incrementally restores spatial resolution to generate segmentation results. The contour refinement unit includes an edge sharpening circuit that sharpens and delineates tumor boundaries. These components are interconnected via high-speed data buses and controlled by a processing unit comprising one or more processors and associated memory elements that persistently store executable instructions.This enables the coordinated operation of the hardware modules. The present invention provides a system for automated tumor margin delineation in radiological images. The workflow is controlled by a sequence of transformation steps performed by a processing unit in coordination with several networked units. The method begins with the acquisition of radiological image data via an image acquisition interface capable of receiving two- or three-dimensional image volumes from imaging techniques such as magnetic resonance imaging (MRI), computed tomography (CT), or positron emission tomography (PET). Upon acquisition, the image data is transferred to a preprocessing unit, which performs a series of normalization and standardization operations to reduce variability between scans and improve feature consistency.Specifically, the preprocessing unit applies intensity normalization by mapping raw pixel values ​​into a standardized intensity range using statistical measures derived from the distribution of the input image. Spatial resampling is then performed to ensure a uniform voxel spacing across different datasets. This is followed by noise reduction using filter techniques such as Gaussian filtering or anisotropic diffusion, which preserves structural edges and reduces high-frequency noise artifacts. After preprocessing, the standardized image data is passed to a feature extraction unit (FEU), which consists of several hierarchically arranged convolutional layers. The FEU uses successive convolution operations with machine learning kernels to generate feature maps that capture low- and mid-level spatial features, including edges, gradients, and texture variations in tumor regions. Each convolutional layer is followed by a nonlinear activation function and a normalization step to stabilize the feature distributions and accelerate convergence. The hierarchical arrangement allows for a stepwise abstraction of the image features: early layers encode fine spatial details, while deeper layers represent more complex structural patterns.Residual connections in the convolution layers ensure the preservation of spatial information and efficient gradient propagation, thus preventing a loss of quality in deeper representations. The feature maps generated by the feature extraction unit are then transformed in a transformer coding unit. This transforms the spatial feature maps into a sequence of flat feature representations. Before the attention mechanisms are calculated, positional information corresponding to the spatial coordinates is embedded into the feature sequence to preserve geometric relationships. The transformer coding unit then computes context-sensitive representations using attention operations. Query, key, and value representations are derived from the input feature sequence through linear transformations. The attention mechanism evaluates the similarity between different spatial positions and assigns adaptive weights. This allows the technique to capture dependencies over longer distances and global contextual relationships across the entire image.Multiple layers of attention are applied sequentially to refine these context representations and ensure that relevant tumor-related regions are highlighted even at spatial distances. The outputs of the convolution and transformer coding units are subsequently integrated into a fusion unit, which performs hybrid feature combination. The fusion unit aligns feature dimensions and calculates cross-attention weights that quantify the relevance of the spatial features obtained from the convolution relative to the contextual information obtained from the transformer. These weighted representations are then combined by concatenation and transformation to generate a unified feature map. Additional weighting mechanisms are applied at both the spatial and channel levels to suppress irrelevant background features and emphasize tumor-specific properties. This hybrid representation allows the system to utilize both local structural details and global contextual information for improved delineation accuracy. The unified feature map is then processed by a decoding unit that reconstructs a high-resolution segmentation. The decoding unit performs progressive upsampling of the fused feature representation using interpolation or learned upsampling operations, followed by convolution refinement to restore spatial detail. In each upsampling step, corresponding high-resolution features from the feature extraction unit are integrated via skip connections, ensuring that fine-grained spatial information is preserved throughout the reconstruction process. This multi-scale decoding strategy enables precise localization of tumor margins while maintaining contextual coherence. The final output of the decoding unit is a probabilistic segmentation map in which each pixel is represented by a corresponding segment.Each voxel is assigned a probability value that indicates the presence of tumor tissue. To further improve tumor border accuracy, the segmentation map is processed by a tumor border refinement unit that applies edge-sensitive optimization techniques. During training, the unit uses loss functions that emphasize contour accuracy by penalizing deviations along tumor borders more severely than in homogeneous areas. This encourages the model to learn sharper and more accurate boundaries. During inference, the tumor border refinement unit applies post-processing operations such as morphological filtering and contour smoothing to eliminate isolated artifacts and produce continuous and clinically meaningful border representations. These operations ensure that the final tumor regions exhibit both geometric consistency and anatomical plausibility. The processing unit coordinates the execution of the entire procedure by retrieving trained model parameters and configuration data from memory. The procedure supports both two-dimensional, slice-based processing and three-dimensional volumetric processing. Convolution and attention operations are extended across the volumetric dimensions to capture dependencies between slices. The processing unit's parallel computing capabilities, including graphics processing units (GPUs) or tensor processing units (TPUs), are used to accelerate both training and inference, enabling near real-time performance suitable for clinical applications. The final tumor margins are displayed via a user interface, with the segmentation results overlaid on the original radiological image data to provide intuitive visualization for clinical users. The system also allows for interactive adjustments, incorporating user input to refine the segmentation results. This enables a collaborative workflow between automated calculation and expert validation. Through the coordinated execution of preprocessing, hierarchical feature extraction, transformer-based contextual modeling, hybrid feature fusion, multiscale decoding, and margin refinement, the presented method achieves robust, accurate, and efficient tumor margin delineation under various imaging conditions and for different tumor types. In one embodiment, the described system is implemented as a medical imaging device comprising an image acquisition interface for receiving radiological images from one or more imaging modalities such as MRI, CT, and PET scanners. The device further comprises a preprocessing unit connected to the acquisition interface, which performs image normalization, intensity standardization, spatial resampling, and noise filtering to generate standardized input data for further processing. A feature extraction unit consists of several hierarchically arranged convolutional layers. These convolutional layers extract spatial features at different resolutions. Each layer applies a series of learnable filters to capture edges, textures, and structural patterns in tumor regions. The feature extraction unit may include residual connections and batch normalization layers to improve gradient propagation and training stability. The system further includes a transformer coding unit that is operationally coupled to the feature extraction unit. The transformer coding unit comprises several self-attention layers configured to process flat feature maps and model dependencies over larger spatial distances. Each self-attention layer computes query, key, and value representations, enabling the system to assign context-related weights to different image areas. Positional coding ensures the preservation of spatial information during the transformation process. In a preferred embodiment, a hybrid fusion unit is provided that integrates convolutional features and transformer-encoded representations. This hybrid fusion unit utilizes cross-attention mechanisms and feature chaining strategies to combine local and global information, thereby improving the system's ability to distinguish tumor margins from surrounding tissue. The fusion process can include channel-wise and spatial attention modules to selectively highlight relevant features. An additional decoding unit is integrated, incorporating a multi-scale upsampling architecture for reconstructing high-resolution segmentation maps from the fused feature representations. This decoding unit can utilize skip connections between corresponding encoder and decoder layers to preserve fine spatial details. The output of the decoding unit is a probability map indicating the probability that each pixel or voxel belongs to a tumor region. The system also includes a contour refinement unit that improves the delineation of tumor margins. During training, this unit uses edge-based loss functions such as margin loss or the Dice coefficient with contour emphasis, and during inference, it can incorporate morphological operations or conditional random fields to smooth and sharpen segmentation boundaries. A processing unit, which may include one or more graphics processing units (GPUs) or tensor processing units (TPUs), is operationally coupled to all functional units and configured to execute the hybrid convolutional transform network. The processing unit accesses a memory that stores trained model parameters, training datasets, and intermediate feature representations. During operation, radiological images are received via the acquisition interface and processed in the preprocessing unit. The processed images are then passed through the convolution feature extraction unit to generate hierarchical feature maps. These feature maps are then transformed by the transformer coding unit to incorporate global contextual information. The hybrid fusion unit integrates these representations, which are subsequently decoded into segmentation maps by the decoding unit. The contour refinement unit improves the precision of the tumor margins, and the final segmented result is displayed via a visualization interface. In an alternative embodiment, the system can be integrated into a clinical workstation or provided as a cloud-based service, enabling remote access and real-time processing. The device can also include user interfaces that allow clinicians to review, adjust, and validate the automatically generated tumor margins. The presented system thus offers a technically advanced and computationally efficient solution for automated tumor segmentation by combining the advantages of convolutional and transformer-based architectures to achieve superior accuracy, robustness, and clinical applicability. The present invention relates to medical image analysis and computer-aided diagnostic systems, in particular a system and a device for automated tumor margin determination in radiological images. The invention specifically relates to advanced image processing methods that utilize hybrid deep learning architectures and integrate convolution-based feature extraction with attention-based context modeling for the precise segmentation of tumor regions in various imaging modalities. The drawing and the preceding description illustrate embodiments. Those skilled in the art will recognize that one or more of the described elements can be combined to form a single functional element. Alternatively, certain elements can be divided into several functional elements. Elements of one embodiment can be added to another. For example, the process flows described here can be modified and are not limited to the manner described herein. Furthermore, the actions of a flowchart need not be performed in the sequence shown; nor do all actions necessarily need to be carried out. Actions that do not depend on other actions can be performed in parallel with the other actions. The scope of protection of the embodiments is in no way limited by these specific examples. Numerous variations, whether explicitly stated in the description or not, such as...Differences in structure, dimensions, and materials are possible. The scope of protection of the embodiments is at least as comprehensive as described by the following claims. The advantages, other benefits, and problem solutions have been described above with reference to specific embodiments. However, the advantages, benefits, problem solutions, and any components that can effect or enhance an advantage, benefit, or solution are not to be construed as critical, necessary, or essential features or components of the claims. REFERENCES 100 A system for the automatic delineation of tumor margins in radiological images. 102 Image acquisition interface 104 Preprocessing unit 106 Feature extraction unit 108 Transformer coding unit 110 Fusion unit 112 Decoding unit 114 Margin refinement unit 116 Processing unit 118 Display interface

Claims

A system for automated tumor margin determination in radiological images, comprising: an image acquisition interface configured to receive radiological image data from one or more imaging modalities; a preprocessing unit operationally coupled to the image acquisition interface and configured to perform intensity normalization, spatial resampling, and noise reduction on the received image data to generate standardized image inputs; a feature extraction unit comprising a plurality of convolutional layers arranged in a hierarchical structure and configured to extract spatial features of different orders of magnitude from the standardized image inputs;a transformer coding unit operationally coupled to the feature extraction unit and configured to generate context-sensitive feature representations by applying self-attention operations over spatial areas of the extracted features; a fusion unit operationally coupled to both the feature extraction unit and the transformer coding unit and configured to combine features derived from convolutions and context representations derived from transformers into a unified feature map; a decoding unit operationally coupled to the fusion unit and configured to generate a segmentation map corresponding to the tumor regions by incrementally increasing and reconstructing the spatial resolution; a boundary refinement unit configured to improve the delineation of tumor margins in the segmentation map;a processing unit that is operationally coupled with the preprocessing unit, the feature extraction unit, the transformer coding unit, the fusion unit, the decoding unit, and the boundary refinement unit, wherein the processing unit executes instructions stored in a memory unit to perform automated tumor boundary delineation; and a display interface configured to overlay the delineated tumor boundaries onto the radiological image data. System according to claim 1, wherein the preprocessing unit is further configured to perform modality-specific intensity standardization by mapping pixel intensity distributions to a predefined statistical area and applying histogram equalization to improve the contrast between tumor and surrounding tissue. System according to claim 1, wherein the feature extraction unit comprises residual convolution layers with skip connections configured to preserve spatial information across multiple resolution levels while mitigating gradient degradation during training and inference. The system according to claim 1, wherein the feature extraction unit is configured to generate feature maps on multiple scales by applying convolution operations with varying kernel sizes and step sizes, thereby enabling the representation of both fine-grained and coarse structural features of tumor regions. System according to claim 1, wherein the transformer coding unit comprises a plurality of attention layers configured to compute query representations, key representations and value representations from flattened feature maps and to generate attention-weighted feature outputs based on similarity measures between spatial positions. System according to claim 5, wherein the transformer coding unit further comprises a position coding device configured to embed spatial position information into the flattened feature maps before attention operations are applied, thereby preserving spatial relationships within the image. System according to claim 1, wherein the fusion unit is configured to perform cross-attention between convolution and transformer-derived features, calculating attention weights to selectively highlight relevant tumor-related regions while suppressing background information. System according to claim 7, wherein the fusion unit further comprises means for channel-wise weighting and means for spatial weighting configured to adaptively adjust the importance of features both across the channel dimension and across the spatial dimension. System according to claim 1, wherein the decoding unit comprises a plurality of upsampling layers and convolution layers arranged hierarchically, each upsampling layer increasing the spatial resolution and combining corresponding high-resolution features from the feature extraction unit via skip connections. System according to claim 9, wherein the decoding unit is configured to generate a probabilistic segmentation map in which each pixel or voxel is assigned a probability value corresponding to the presence of a tumor.