A containerized humongous atomic service running method based on isolated domain elastic boundary
By verifying the validity of atomic services at container startup, establishing a correlation table between dependency types, isolation parameters, and anomaly probabilities, and dynamically adjusting the isolation boundaries of containers, the multi-dimensional conflicts of OpenHarmony atomic services in containerized deployment are resolved, achieving highly stable and secure operation within containers.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- 北京麟卓信息科技有限公司
- Filing Date
- 2026-04-03
- Publication Date
- 2026-06-19
Smart Images

Figure CN121979617B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of computer software development technology, specifically relating to a containerized HarmonyOS atomic service operation method based on the elastic boundary of the isolation domain. Background Technology
[0002] OpenHarmony atomic services serve as the core carrier of lightweight, distributed applications. Their operation relies on the underlying technology stack of the OpenHarmony operating system, including Binder-based inter-process communication (IPC) mechanisms, distributed soft bus device discovery and data transfer protocols, Ability framework lifecycle management, dynamic resource scheduling, and device resource access. When OpenHarmony is deployed as a container in a Linux environment, the containers achieve isolation of resources such as processes, networks, mounts, and users through Linux kernel namespaces, and implement rigid restrictions on resource usage through control groups, ensuring environmental independence and security between containers.
[0003] However, the characteristics of atomic services conflict with container isolation mechanisms in multiple dimensions and at a deep level, mainly including: First, process identifier isolation conflict: cross-process collaboration of atomic services depends on the consistency of global process PIDs, while PID namespace isolation can cause PIDs within containers and host PIDs to form independent numbering spaces, leading to errors such as the target process not existing when making inter-process communication calls; Second, network protocol stack isolation conflict: distributed soft buses rely on UDP broadcast for device discovery and TCP long connections for data transmission, while network namespace isolation can lead to the interception of protocol packets across containers or hosts, resulting in a service discovery success rate of less than 30%; Third, the dynamic nature of resource requirements conflicts with static constraints: the resource requirements of atomic services are highly volatile, and the static quotas of control groups can easily lead to contradictions between insufficient resources and service lag or resource idleness and waste; Fourth, state synchronization dependency conflicts with mounting isolation: the lifecycle state of atomic services needs to be synchronized in real time with the meta-capability management service (AbilityManagerService) through state files, and the isolation of mounting namespaces may cause files to be invisible, leading to state transition failures; Fifth, device resource access isolation conflicts: some atomic services depend on physical device nodes, and the container's device isolation strategy may not map such nodes, leading to service startup failures.
[0004] The existing solutions have the following limitations: statically disabling isolation functionality will compromise the isolation security of containers, increasing the risk of permission escape by more than 50%; isolation exception configuration based on preset rules cannot adapt to dynamic dependencies of atomic services, and the exception rate is still as high as 25%; modifying the OpenHarmony kernel to adapt to container interfaces (such as replacing Binder with Linux IPC) will destroy the system's nativeness, causing the compatibility with the existing atomic service ecosystem to drop to 60%. Summary of the Invention
[0005] In view of this, the present invention provides a containerized Harmony atomic service operation method based on the elastic boundary of the isolation domain, which covers the monitoring of the entire life cycle of the atomic service and the container, obtains resource dependencies and dynamic dependencies, and dynamically adjusts the isolation boundary of the container based on the monitoring results, thereby achieving highly stable operation of OpenHarmony atomic services within the container.
[0006] This invention provides a method for running containerized HarmonyOS atomic services based on the elastic boundary of an isolation domain, which specifically includes the following steps:
[0007] When HarmonyOS container starts, it verifies the validity of atomic services, establishes an association table between dependency types, isolation parameters, missing configurations and anomaly probabilities, and defines the isolation domains within the container to determine isolation parameters and set isolation boundaries.
[0008] When an atomic service initializes, it generates an initialization log to obtain its runtime dynamic dependencies, thereby modifying its process namespace, mapping the required devices, and adding network routes for HarmonyOS devices. When an atomic service runs, it modifies the isolation boundary based on the communication object. When the communication object is a local system service, it adds the corresponding target process to the container process namespace. When it is another container, it creates a virtual network interface card and adds it to the container's bridge. When it is another HarmonyOS device, it creates a tunnel and an encrypted channel.
[0009] Input the real-time load and container resource usage data of atomic services into the resource prediction model to obtain the predicted value of the resources required for the atomic services to run after a set time, including CPU utilization and memory usage. Adjust the CPU quota, memory limit or GPU scheduling priority according to the predicted value.
[0010] When an atomic service experiences performance anomalies, increase the scheduling priority of the process it resides in and raise the upper limit of resource quotas. When an atomic service's service state switches to "lost focus" and its container becomes isolated, reset the isolation boundary to the selected configuration for a set time period, and identify the dynamic dependency configuration isolation boundary when it recovers to an active state. When an atomic service becomes zombie and its container becomes isolated, enable a backup container and synchronize state data. When an atomic service changes from background running to active, restore access to the device, increase resource quotas, and open network ports. When it changes from active to background running, disable access to the device, reduce resource quotas, and close network ports except for UDP ports. When an atomic service exits, restore the initial value of the isolation boundary and add the running data to the historical running data.
[0011] Furthermore, the method for establishing the association table between dependency type, isolation parameter, missing configuration and anomaly probability is as follows: the association table is established by using K-means clustering analysis of the historical running data of the atomic service.
[0012] Furthermore, when the atomic service starts, the startup status is monitored. If the number of startup failures is less than a threshold, the process namespace is adjusted or the required devices are remapped according to the reason for the failure. Otherwise, a pre-created redundant standby container is used to start the atomic service.
[0013] Furthermore, for atomic services, the service status record state transition timestamps are periodically obtained, and heartbeat packets are sent to them. If no heartbeat response is received after more than a set number of times, the service is marked as dead. The performance parameters of atomic services, including inter-process communication latency and user interface thread frame rate, are monitored. If the performance parameters exceed the threshold, the service is marked as abnormal. For the container where the atomic service is located, the mapping relationship between the thread PID in the container and the host thread PID is periodically monitored to see if it is correct, whether the routing and port rules required for cross-domain communication exist, and whether the current resource usage is within the adjusted quota range. If any of these conditions are not met, the container is marked as isolated and failed.
[0014] Furthermore, the initialization log is generated during the initialization of the atomic service by: real-time monitoring of the entire process of atomic service startup, process creation, inter-process communication and device access, obtaining the service name of the atomic service and the HarmonyOS device ID where the service is located, the process PID, startup parameters and service type of the process created by the system for the atomic service, the communication interface and capability calls of the atomic service, and the external devices accessed by the atomic service, and the initialization log is generated from these data.
[0015] Furthermore, the real-time load of the atomic service includes the user interface thread frame rate, cross-process call latency, and service status, and the container resource usage data includes CPU utilization, memory usage, disk IOPS, and network throughput.
[0016] Furthermore, the resource prediction model is constructed using an LSTM prediction model, consisting of an input layer, an LSTM layer, and a fully connected layer with two-dimensional output. A training dataset is constructed using historical running data, and the resource prediction model is trained using the training dataset.
[0017] Furthermore, the isolation domain includes a core domain, a shared domain, and a resource domain. The core domain is a region with a fixed user namespace and isolated system directory. The shared domain is a minimal set of configured process namespaces, network namespaces, and mapped devices. The resource domain is a region that allocates CPU, memory, and disk I / O limits based on historical data.
[0018] Furthermore, the dependency type is the fault type or the device, permission and resource that the service depends on, the isolation parameter is the container's restriction configuration, the missing configuration is the specific configuration that is missing under the isolation parameter, and the anomaly probability is the probability that the atomic service will fail when the current dependency type has a missing configuration.
[0019] Furthermore, the dynamic dependencies include services, processes, mapping devices, and HarmonyOS devices.
[0020] Beneficial effects:
[0021] This invention, during the container startup phase, extracts core configurations and verifies signatures by parsing the atomic service HAP package, and uses K-means clustering analysis of historical runtime data to obtain an association table and set isolation boundaries. During the atomic service initialization and startup phase, it monitors the entire process, including startup and process creation, in real time, generating logs to obtain dynamic dependencies and adjust container-related configurations. Simultaneously, it monitors startup status and performs targeted repairs or activates backup containers based on the number of failures. During service operation, it elastically adjusts isolation boundaries based on communication objects, periodically monitors the consistency of atomic service status, health, performance parameters, and container isolation boundaries, executes differentiated repair strategies based on anomalies, and coordinates adjustments to container device access, resource quotas, and network permissions based on service front-end and back-end state transitions. When the atomic service exits, it restores the container isolation boundaries to their initial values, collects its entire lifecycle runtime data, and incorporates it into historical runtime data to provide data support for subsequent strategy optimization. Attached Figure Description
[0022] Figure 1 This is a flowchart illustrating a containerized HarmonyOS atomic service operation method based on an isolation domain elastic boundary, provided by the present invention. Detailed Implementation
[0023] The present invention will be described in detail below with reference to the accompanying drawings and embodiments.
[0024] This invention provides a containerized HarmonyOS atomic service operation method based on an elastic boundary of isolation domain. The core idea is as follows: During the container startup phase, core configurations are extracted and signatures are verified by parsing the atomic service HAP package. Historical operation data is analyzed using K-means clustering to obtain an association table, and an isolation boundary is set. During the atomic service initialization and startup phase, the entire process, including startup and process creation, is monitored in real time, generating logs to obtain dynamic dependencies and adjust container-related configurations. Simultaneously, the startup status is monitored, and targeted repairs or the activation of backup containers are performed based on the number of failures. During the service operation phase, the isolation boundary is elastically adjusted based on communication objects. The consistency of the atomic service's status, health, performance parameters, and container isolation boundary is periodically monitored. Differentiated repair strategies are executed based on anomalies, and device access, resource quotas, and network permissions of the container are adjusted collaboratively based on the service's front-end and back-end state transitions. When the atomic service exits, the container isolation boundary is restored to its initial value, and its entire lifecycle operation data is collected and incorporated into historical operation data to provide data support for subsequent strategy optimization.
[0025] This invention provides a containerized HarmonyOS atomic service execution method based on isolation domain elastic boundaries, the processing flow of which is as follows: Figure 1 As shown, the specific steps include:
[0026] Step 1: When the HarmonyOS container starts, parse the installation package HAP (Harmony AbilityPackage) of the atomic service. Extract the capability type, service callable scenarios, background running mode, permission requirements, and supported device types of the atomic service from the configuration file. Parse the signature file and verify the validity of the signature. Use K-means clustering to analyze the historical running data of the atomic service to obtain the first association table between dependency type, isolation parameters, missing configuration, and anomaly probability. Predefine the isolation domains within the container, including the core domain, shared domain, and resource domain. The core domain is a region with a fixed user namespace and isolated system directory. The shared domain is a region with a minimum set of configured process namespaces, network namespaces, and mapped devices to ensure normal service operation. The resource domain is a region with CPU, memory, and disk I / O limits allocated based on historical data. Determine the isolation parameters based on the isolation domain to complete the setting of isolation boundaries.
[0027] Among them, the dependency type is the device, permission, resource, etc. that the service depends on, the isolation parameter is the container's restriction configuration such as device mapping, permission, resource quota, etc., the missing configuration is the specific configuration that is missing under the isolation parameter, and the anomaly probability is the probability that the atomic service will fail when the current dependency type has a missing configuration.
[0028] Step 2: During the initialization process of the atomic service from loading to startup, monitor the entire process of atomic service startup, process creation, inter-process communication, and device access in real time. Obtain the service name and HarmonyOS device ID of the atomic service, the process PID, startup parameters, and service type of the process created by the system for the atomic service, the communication interface and capability calls of the atomic service, and the external devices accessed by the atomic service, forming an initialization log. The container parses the initialization log to obtain the dynamic dependencies of the atomic service's operation, including services, processes, devices to be mapped, and HarmonyOS devices. Based on the dynamic dependencies, perform operations such as modifying the process namespace of the atomic service, mapping the required devices, and adding network routes to the HarmonyOS devices. When the atomic service starts, monitor the startup status. If the number of startup failures is less than a threshold, perform repair operations such as adjusting the process namespace or remapping the required devices according to the reason for the failure. Otherwise, start the atomic service using a pre-created redundant standby container.
[0029] Step 3: During the operation of the atomic service, the container obtains the communication object of the atomic service, modifies the isolation boundary according to the communication object, and when the type of the communication object is the target process of the local system service, saves the original process namespace of the container where the atomic service is located, adds the target process to the process namespace, and restores the original process namespace when the communication with the target process ends; when the communication object is another container, creates a virtual network card and adds it to the bridge of the container where the atomic service is located, opens the communication port to establish the connection between the container and other containers; when the communication object is another HarmonyOS device, creates a tunnel and encrypted channel to establish an encrypted connection between the current HarmonyOS device and other HarmonyOS devices.
[0030] Step 4: Obtain the real-time load of the atomic service, including the user interface thread frame rate, cross-process call latency and service status, and container resource usage data, including CPU utilization, memory usage, disk read / write operations per second (IOPS) and network throughput. Input the real-time load of the atomic service and container resource usage data into the resource prediction model to obtain the predicted value of the resources required for the atomic service to run after a set time, including CPU utilization and memory usage. Adjust the CPU quota, memory limit or GPU scheduling priority according to the predicted value.
[0031] The service status includes foreground running, background running, background suspended, active, lost focus, and terminated. This invention uses an LSTM prediction model to construct a resource prediction model, consisting of an input layer, an LSTM layer, and a fully connected layer for two-dimensional output. Historical running data is used to construct a training dataset, which is then used to train the resource prediction model.
[0032] Step 5: For atomic services, periodically obtain the service status record state transition timestamps and send heartbeat packets to them. If no heartbeat response is received after more than a set number of times, mark the service as dead. Monitor the performance parameters of atomic services, including inter-process communication latency and user interface thread frame rate. If the performance parameters exceed the threshold, mark the service as performance abnormal. For the container where the atomic service is located, periodically monitor whether the mapping relationship between the thread PID in the container and the host thread PID is correct, whether the routing and port rules required for cross-domain communication exist, and whether the current resource usage is within the adjusted quota range. If any of these conditions are not met, mark the container as isolated and failed.
[0033] When an atomic service is marked as having performance issues, increase the scheduling priority of the process containing the atomic service and adjust the resource limits to a larger value. When the service state of an atomic service changes to "lost focus" and the container it resides in is marked as having failed isolation, reset the isolation boundary to a stable configuration for a set period of time, and re-execute the identification of dynamic dependencies when it recovers to an active state, and then supplement the configuration of the isolation boundary based on the identification results. When an atomic service is marked as having service failure and the container it resides in is marked as having failed isolation, enable a standby container, perform state data synchronization, and switch the network traffic of the original container to the standby container.
[0034] When the service status of an atomic service transitions from background operation to active, access to the device is restored, the resource quota is adjusted to a larger value, and the network port is opened. When the service status transitions from active to background operation, access to the device is closed, the resource quota is adjusted to a smaller value, and the network port is closed, leaving only the UDP discovery port open.
[0035] Step 6: When the atomic service exits, restore the container's isolation boundary to its initial value, collect all runtime data of the atomic service from startup to exit, including isolation boundary adjustment records, abnormal failures, repair logs, performance parameters and interaction statistics, and add the runtime data to the historical runtime data.
[0036] Example:
[0037] This embodiment employs a containerized Harmony atomic service operation method based on an isolation domain elastic boundary provided by the present invention. It resolves the multi-dimensional conflicts between the container isolation mechanism and the full lifecycle dependencies of OpenHarmony atomic services, achieving highly stable operation of OpenHarmony atomic services within containers. The container isolation mechanism includes namespace isolation and cgroups (control groups) restrictions, and the specific process includes:
[0038] S1. Dependency modeling and isolation domain pre-definition before atomic service startup: Before the atomic service HAP package is loaded, the service dependency model is built through static parsing, signature verification, and historical data fusion, and the initial edge of the isolation domain is predefined.
[0039] S1.1, Static parsing and signature verification of HAP packages.
[0040] Use the official OpenHarmony tool hap-toolkit (version 3.2 and above) to unpack and parse the HAP package. Execute the command: hap-toolkit unpack --input=target.hap --output=unpack_dir --verify, where --verify enables signature verification.
[0041] Parsing key configurations in unpack_dir / entry / src / main / config.json:
[0042] The `abilities` array extracts the capability type (type field, such as page, service, data), skills (scenarios in which the service can be invoked), and backgroundModes (background running mode, such as audioPlayback requiring continuous use of the audio device).
[0043] The requiredPermissions array records permission requirements (e.g., ohos.permission.CAMERA corresponds to access to the / dev / video0 device, and ohos.permission.LOCATION corresponds to access to the location service).
[0044] The deviceType array determines the supported device types (e.g., wearables require low-power resource scheduling, while tablets require high GPU performance).
[0045] Signature verification: Parse the signature files (certificate.pem, profile.json) in the unpack_dir / META-INF directory, and verify the validity of the signature using hap-toolkit verify --cert=certificate.pem --profile=profile.json. If it is a debug signature (debug=true), then mark it as allowing a lenient isolation policy (such as temporarily opening more device nodes).
[0046] S1.2 Historical operation data fusion and dependency feature extraction.
[0047] Read the historical runtime data of the atomic service (uniquely identified by bundleName and version) from the time-series database of the container management platform (such as InfluxDB). The data structure is defined as follows:
[0048] {
[0049] serviceId: com.example.weather,
[0050] timestamp: 1690000000
[0051] errorType: IPC_TIMEOUT,
[0052] context: {
[0053] pidNamespace: 1-100,
[0054] netPorts: [37099, 37100],
[0055] deviceNodes: [ / dev / graphics]
[0056] },
[0057] metrics: {
[0058] cpuPeak: 80, / / Peak CPU utilization (%)
[0059] memPeak: 200 / / Peak memory (MB)
[0060] }
[0061] }
[0062] Historical data is analyzed using the K-means clustering algorithm to generate a table relating dependency types, isolation parameters, and anomaly probabilities. For example, when errorType is DEVICE_ACCESS_FAILED and deviceNodes does not contain / dev / video0, the anomaly probability is 0.6, so / dev / video0 is included in the list of mandatory mapped devices.
[0063] S1.3, Definition and configuration generation of initial boundaries for the isolation domain.
[0064] The isolation domains are divided into three categories: core domain, shared domain, and resource domain, which are predefined through the containerd configuration file (config.toml) and the runc specification file (spec.json).
[0065] Core domain (mandatory isolation):
[0066] User Namespace: The default user ID within the container is set to 1000 (non-root). Configure spec.json with user:{uid: 1000, gid: 1000};
[0067] Mounting basic isolation: System directories such as / proc and / sys use private mounts (MS_PRIVATE), configured with mounts:[{destination: / proc, type: proc, options: [private]}].
[0068] Shared domain (can be dynamically adjusted):
[0069] PID Namespace: Sets the initial visibility range (e.g., 1 to 200) based on the association table, configured in spec.json for Linux: {Namespaces: [{type: pid, range: 1-200}]};
[0070] Network Namespace: Create a virtual bridge ohos0 (IP: 172.18.0.1 / 24) using brctl addbrohos0, pre-configure port mappings 37099:37099 / udp (host port: container port) and 37100:37100 / tcp, and configure ports as: [{hostPort: 37099, containerPort: 37099, protocol: udp}];
[0071] Device mounting: The mapped device (e.g., / dev / video0) must be configured through spec.json's devices: [{path: / dev / video0, type: c, major: 81, minor: 0, permissions: rwm}].
[0072] Resource domain (can be dynamically adjusted):
[0073] cgroups initial quota: CPU quota is set to 80% of the historical peak (e.g., 80% of the peak corresponds to a quota of 64%), configured cpu: {quota: 64000, period: 100000};
[0074] Set the memory limit to 120% of the historical peak (e.g., a peak of 200MB corresponds to a limit of 240MB), and configure memory:{limit: 251658240} (in bytes).
[0075] IOPS limit: Set to 1200 based on historical IO peak (e.g., 1000 IOPS), and configure blkio: {weight: 500,max: [{type: bps, value: 1200}]}.
[0076] S2, Dynamic calibration of the isolation domain during the atomic service startup phase.
[0077] During the initialization process of an atomic service from AbilityPackage::Load to Ability::OnStart, dynamic dependencies are captured in real time by the eBPF tracer, and the isolation domain parameters are dynamically adjusted in conjunction with the startup status feedback to ensure successful startup.
[0078] S2.1, eBPF tracer deployment is linked to critical functions.
[0079] Develop an eBPF program (based on libbpf 1.2.0), hooking the OpenHarmony startup key function. Compilation command: clang -target bpf -D__TARGET_ARCH_x86_64 -O2 -c ohos_launch_trace.bpf.c -oohos_launch_trace.bpf.o.
[0080] Hook function and tracking target:
[0081] AbilityManagerService::StartAbility: Tracks parameters Want & want (including target service name abilityName and device ID deviceId) to identify the startup target;
[0082] AppSpawn::SpawnProcess: Tracks the return value pid_t (child process PID) and argv (startup parameters, including service type identifier);
[0083] IPCProxy::Invoke: Tracks the interfaceToken (service interface identifier, such as ohos.distributedScheduler) and code (call code, identifying the specific operation);
[0084] DeviceManager::GetDeviceList: Tracks the returned list of devices and identifies dependent external devices.
[0085] The eBPF program is loaded via `bpftool prog load ohos_launch_trace.bpf.o / sys / fs / bpf / ohos_launch`, and the user-space parser `ohos_launch_analyzer` reads the trace data through the ring buffer.
[0086] S2.2 Dynamic Dependency Identification and Isolation Domain Parameter Adjustment.
[0087] The ohos_launch_analyzer parses the tracking data and generates a real-time dependency list. For example, when the interfaceToken of IPCProxy::Invoke is ohos.sensorService, it is identified as needing to access the sensor service (host PID=105); when DeviceManager::GetDeviceList returns device_192.168.1.101, it is identified as needing cross-host communication.
[0088] Dynamically adjust the isolation domain by calling the container runtime interface:
[0089] PID Namespace Expansion: If the target PID (e.g., 105) is not in the initial range (1 to 200), expand the range using `runcupdate --pid-range 1-300 ohos_container`, and update the ` / proc` directory inside the container (`mount -o remount / proc`) to ensure the process is visible;
[0090] Temporary device node mapping: If it is detected that access to an unconfigured device (such as / dev / sensor0) is required, execute mount --bind / dev / sensor0 / var / lib / containerd / ohos_container / rootfs / dev / sensor0, and grant permissions via echo c 10:57 rwm > / sys / fs / cgroup / devices / ohos_container / devices.allow (10:57 is the major and minor device number of sensor0);
[0091] Temporary network route addition: If cross-host communication is required, execute `ip route add 192.168.1.101 via 172.18.0.1 dev eth0` to add a route and ensure that data packets are reachable.
[0092] S2.3 Startup status verification and adaptive retry.
[0093] After the Ability::OnStart callback is executed, the startup status is obtained through the OpenHarmony AbilityManagerClient interface:
[0094] auto& client = AbilityManagerClient::GetInstance();
[0095] AbilityRunningInfo info;
[0096] int ret = client.GetAbilityRunningInfo(abilityToken, info);
[0097] if (ret != 0 || !info.isReady) {
[0098] / / Startup failed, record the reason for failure.
[0099] LOG(ERROR) << Ability start failed, reason: << info.errorReason;
[0100] }
[0101] If startup fails, a retry mechanism will be triggered:
[0102] First failure: Analyze the ohos_launch_analyzer logs. If the PID is not visible, expand the PID range to 1 to 500. Second failure: If the device is inaccessible, check the device node permissions and remap them. Third failure: Start the backup isolation domain (pre-created redundant container). Start the service using containerd start ohos_container_backup. Mark the original container as pending diagnosis.
[0103] S3, cross-domain interaction adaptation during atomic service runtime.
[0104] During the runtime phase of the atomic service Ability::OnActive, a secure and controllable temporary channel is established through an elastic penetration mechanism at the isolation domain boundary to address cross-process and cross-device interaction requirements. The channel is automatically closed and the isolation boundary is restored after the interaction ends.
[0105] S3.1 Interaction behavior type identification and feature extraction.
[0106] Dynamically hook the OpenHarmony interaction interface using a dynamic hook engine (such as Frida). Script example:
[0107] / / Hook the soft bus session creation interface
[0108] Interceptor.attach(Module.findExportByName(libsoftbus.so,OpenSession), {
[0109] onEnter: function(args) {
[0110] this.deviceId = Memory.readCString(args[0]); / / Target device ID
[0111] this.port = Memory.readInt(args[1]); / / Communication port
[0112] },
[0113] onLeave: function(retval) {
[0114] send({ type: open_session, deviceId: this.deviceId, port:this.port});
[0115] }
[0116] });
[0117] Define the rules for determining the interaction type:
[0118] Local cross-process interaction: deviceId is local and interfaceToken contains ohos. (e.g., ohos.locationService); Cross-container interaction: deviceId format is container_${id} (e.g., container_3) and port is 37099 / 37100; Cross-host interaction: deviceId is IP address (e.g., 192.168.1.102) or device UUID (e.g., device_8f3e2d1c).
[0119] S3.2, Elastic Penetration Mechanism of Isolation Domain Boundary.
[0120] Local cross-process interaction adaptation:
[0121] When the target process is in the host namespace (not the current container), the eBPF program temporarily modifies the nsproxy pointer of the process's task_struct (by adding it to the container's PID namespace):
[0122] / / eBPF code snippet: Temporarily associating a process with a container PID Namespace
[0123] struct task_struct *target_task = (struct task_struct *)bpf_get_current_task();
[0124] / / Save the original nsproxy pointer for recovery
[0125] bpf_map_update_elem(&orig_nsproxies, &target_task->pid, &target_task->nsproxy, BPF_ANY);
[0126] / / nsproxy associated with the container
[0127] target_task->nsproxy = container_nsproxy;
[0128] When the interaction ends (triggered by the CloseSession callback), restore the nsproxy pointer:
[0129] struct nsproxy *orig_nsproxy;
[0130] if (bpf_map_lookup_elem(&orig_nsproxies, &target_pid, &orig_nsproxy)== 0) {
[0131] target_task->nsproxy = orig_nsproxy;
[0132] bpf_map_delete_elem(&orig_nsproxies, &target_pid);
[0133] }
[0134] Cross-container interaction adaptation:
[0135] Establish a VxLAN tunnel: `ip link add vxlan_${target_id} type vxlan id ${vni}remote ${target_ip} dstport 4789` (where vni is the container ID hash); Add the tunnel to the virtual bridge: `brctladdif ohos0 vxlan_${target_id}`; Start the interface: `ip link set vxlan_${target_id} up`; Configure firewall rules: `iptables -A INPUT -i vxlan_${target_id} -p udp --dport 37099 -jACCEPT`, allowing soft bus protocol packets to pass through.
[0136] Cross-host interaction adaptation:
[0137] Create an encrypted channel by calling the SDN controller (such as Open vSwitch) API: `ovs-vsctl add-port br0gre0 --set interface gre0 type=gre options:remote_ip=192.168.1.102 options:key=100`; Configure encryption inside the container: `wg set wg0 peer ${peer_pubkey} allowed-ips192.168.1.0 / 24 endpoint 192.168.1.102:51820` (based on WireGuard); Add a static route: `iproute add 192.168.1.0 / 24 dev wg0` to ensure distributed data routing is reachable.
[0138] Interaction security and integrity assurance
[0139] Permission verification: Before cross-device interaction, permissions are verified through the PermissionManagerService::CheckPermission interface. Example:
[0140] int32_t result = PermissionManagerService::GetInstance().CheckPermission(
[0141] abilityToken, ohos.permission.DISTRIBUTED_DATASYNC);
[0142] if (result != PERMISSION_GRANTED) {
[0143] return ERROR_PERMISSION_DENIED;
[0144] }
[0145] Data encryption: Data exchanged between containers and hosts is encrypted using AES-256 built into the soft bus (based on the device certificate negotiation key). The encryption switch is enabled by SoftBusConfig::SetEncryptLevel(3) (level 3 is mandatory encryption).
[0146] Audit log: All cross-domain interaction records (source, destination, time, data volume) are written to / var / log / ohos_container / audit.log in the format [timestamp] type=cross_container src=container_1dst=container_3 data=1024B.
[0147] S4. Elastic scheduling of resources during atomic service runtime.
[0148] Based on the real-time load and predicted trends of atomic services, the cgroups resource domain limit parameters are dynamically adjusted to achieve precise matching of resource supply and demand, avoiding service instability caused by insufficient resources.
[0149] S4.1 Real-time acquisition of multi-dimensional load data.
[0150] OpenHarmony service load balancing:
[0151] Data is collected via the PerformanceMonitor interface (OpenHarmonySDK 5.0 and above):
[0152] UI thread frame rate: float fps = PerformanceMonitor::GetInstance().GetFrameRate(abilityToken); (normal range 30-60fps);
[0153] IPC call latency: int32_t ipcLatency = PerformanceMonitor::GetInstance().GetIpcLatency(abilityToken); (unit: ms, threshold < 100ms);
[0154] Service status: AbilityState state = AbilityManagerClient::GetInstance().GetAbilityState(abilityToken); (ACTIVE / BACKGROUND, etc.).
[0155] Data collection frequency: 100ms / time in ACTIVE state, 1s / time in BACKGROUND state, implemented by Timer::SetInterval(collector, interval).
[0156] Container resource usage collection:
[0157] CPU utilization: Read / sys / fs / cgroup / cpu / ohos_container / cpu.utilization (percentage);
[0158] Memory usage: Read / sys / fs / cgroup / memory / ohos_container / memory.usage_in_bytes (bytes);
[0159] Disk IOPS: Reads / sys / fs / cgroup / blkio / ohos_container / blkio.io_service_bytes (by device);
[0160] Network throughput: Collected using the tc tool at / sys / class / net / eth0 / statistics / rx_bytes and tx_bytes.
[0161] S4.2 Load forecasting and resource adjustment decisions.
[0162] LSTM prediction model training:
[0163] Training dataset: Load data from the past 7 days (one sample every 100ms, including four features: CPU, memory, FPS, and IPC latency).
[0164] Model structure: Input layer (4D), LSTM layer (32 neurons), fully connected layer (2D output, CPU / memory requirements);
[0165] Training framework: Lightweight deployment of TensorFlow Lite, training command: tflite_model_maker train --data=load_data.csv --model=lstm --epochs=50.
[0166] Dynamically adjust decision-making rules:
[0167] If the predicted CPU demand (in 5 seconds) is greater than 90% of the current quota, the quota will be increased by 20% (e.g., from 1 core to 1.2 cores).
[0168] If memory usage exceeds 70% of the current limit for three consecutive collection cycles, the memory limit will be increased by 50% (e.g., from 240MB to 360MB).
[0169] If the FPS is less than 25fps and the CPU utilization is less than 60%, it is determined to be a GPU bottleneck, and the GPU scheduling priority is increased (echo9 is adjusted to / sys / fs / cgroup / devices / ohos_container / gpu.priority).
[0170] If the service is in the BACKGROUND state and the CPU utilization is less than 10% for 30 seconds, reduce the CPU quota to 50% (to release resources).
[0171] S4.3, Dynamic Adjustment and Verification of cgroups Resources.
[0172] Adjustments are performed via the UpdateResources gRPC interface of containerd, example (Go language):
[0173] / / Adjust CPU quota
[0174] req := &containerd.UpdateResourcesRequest{
[0175] ContainerID: ohos_container,
[0176] Resources: &specs.LinuxResources{
[0177] CPU: &specs.LinuxCPU{
[0178] Quota: int64(newQuota * 100000) / / Unit of microseconds (1 core = 100000)
[0179] Period: 100000
[0180] }
[0181] }
[0182] }
[0183] err := client.ContainerService().UpdateResources(ctx, req)
[0184] Post-adjustment verification:
[0185] Read / sys / fs / cgroup / cpu / ohos_container / cpu.cfs_quota_us to confirm that the CPU quota is in effect;
[0186] Verify whether the adjusted FPS has recovered to above 30fps using PerformanceMonitor::GetInstance().GetFrameRate(abilityToken);
[0187] Record the adjustment log to / var / log / ohos_container / resource_adjust.log, including the adjustment time, previous and current values, and the reason for the adjustment.
[0188] S5. Atomic service status maintenance and anomaly isolation and repair.
[0189] Real-time monitoring of service lifecycle status and consistency with isolation domain parameters, and differentiated repair mechanisms for different levels of anomaly triggers, ensures continuous and stable service operation.
[0190] S5.1, Dual-dimensional status monitoring system.
[0191] Service status monitoring:
[0192] Lifecycle state tracking: The state (ACTIVE, INACTIVE, BACKGROUND, DESTROYED) is obtained every second through the AbilityManagerClient interface, and the state transition timestamps are recorded;
[0193] Health check: The HealthChecker component sends a heartbeat packet SendHeartbeat(abilityToken, timestamp) every 500ms. If no response is received for 3 consecutive times (OnHeartbeatAck is not triggered), it is marked as a service dead.
[0194] Performance metrics monitoring: Set thresholds (e.g., IPC latency greater than 200ms, FPS less than 15fps), and mark it as a performance anomaly if it exceeds the threshold for 3 consecutive times.
[0195] For isolation domain consistency monitoring, the kernel module ohos_isolation_monitor (developed based on the Linux 5.15 kernel) is deployed. Every second, it checks: PID mapping consistency, whether the mapping relationship between the PID in the container and the host PID is consistent with / var / lib / ohos_container / pid_map.json; network configuration validity, whether the routing and port rules required for cross-domain communication exist (e.g., ip route show includes the target network segment); and resource limit matching, whether the current resource usage is within the adjusted quota range (e.g., memory usage does not exceed the current limit).
[0196] Calculate the consistency score (0 to 100): 33 points are awarded for each pass, and a score below 60 is marked as an isolation failure.
[0197] S5.2, Hierarchical anomaly repair mechanism.
[0198] Minor anomalies (e.g., IPC latency of 150ms, FPS of 20fps):
[0199] Optimize process scheduling: sched_setscheduler(targetPid, SCHED_BATCH, ¶m) (increases the scheduling priority of the target process);
[0200] Temporarily relax resource limits: echo 120000 for / sys / fs / cgroup / cpu / ohos_container / cpu.cfs_quota_us (temporarily increase CPU quota by 20%).
[0201] Result verification: The indicator was re-collected after 5 seconds. If it did not recover, it was upgraded to moderate abnormality.
[0202] Moderate abnormality (e.g., state abruptly changes to INACTIVE, consistency score is 50):
[0203] Reset isolation domain parameters: runc update --reset-isolation ohos_container (restores the shared domain to the most recent stable configuration);
[0204] Service status restoration: Call AbilityManagerClient::GetInstance().RestoreAbility(abilityToken) (restore to the most recently active state);
[0205] Dependency revalidation: Re-execute dynamic dependency identification and supplement missing isolation configurations (such as remapping device nodes).
[0206] Severe anomalies (such as service freeze, consistency score less than 40 points):
[0207] Start the backup container: containerd run --name=ohos_container_backup --config=backup_spec.json (the backup container is pre-configured with the same core domain as the original container);
[0208] State data synchronization: Synchronize the state files under / data / service / el1 using DistributedDataManager::Sync(abilityToken, ohos_container_backup);
[0209] Traffic switching: The network traffic of the original container is switched to the backup container through the SDN controller (ovs-vsctl setport vxlan0 other_config:remote_ip=172.18.0.3).
[0210] Original container diagnostics: After running `runc pause ohos_container`, logs are collected (runc exec ohos_container cat / var / log / ohos / ability.log) for subsequent root cause analysis.
[0211] S5.3, Coordinated adjustment of isolation domains during state transitions.
[0212] When a service migrates from BACKGROUND to ACTIVE:
[0213] Restore device access: mount --bind / dev / dri / card0 / var / lib / containerd / ohos_container / rootfs / dev / dri / card0 (GPU device);
[0214] Increase resource quotas: CPU quota restored to 1 core (cpu.cfs_quota_us=100000), memory limit restored to 240MB;
[0215] Open network ports: iptables -A INPUT -p tcp --dport 37100 -j ACCEPT (Allow TCP long connections).
[0216] When a service migrates from ACTIVE to BACKGROUND:
[0217] Disable unnecessary devices: umount / var / lib / containerd / ohos_container / rootfs / dev / dri / card0;
[0218] Reduce resource quotas: CPU quota reduced to 0.5 cores (cpu.cfs_quota_us=50000), memory limit reduced to 120MB;
[0219] Shrink network access: iptables -D INPUT -p tcp --dport 37100 -j ACCEPT (only keep the UDP discovery port).
[0220] S6. Isolation domain cleanup and state archiving during the atomic service exit phase: After the atomic service Ability::OnDestroy callback is executed, temporary isolation configuration is cleaned up, full lifecycle runtime data is archived, and optimization suggestions are generated and fed back to the pre-startup modeling phase to achieve strategy iterative optimization.
[0221] S6.1, Clean up temporary configurations in the isolation domain.
[0222] Read ` / var / lib / ohos_container / temp_configs.json` (which records dynamically adjusted temporary configurations), example:
[0223] {
[0224] tempPidRange: 1-500,
[0225] tempDevices: [ / dev / sensor0],
[0226] tempTunnels: [vxlan_3],
[0227] tempIptablesRules: [INPUT -p udp --dport 37099 -j ACCEPT]
[0228] }
[0229] Item-by-item cleanup:
[0230] PID range recovery: runc update --pid-range 1-200 ohos_container;
[0231] Temporary device unmount: umount / var / lib / containerd / ohos_container / rootfs / dev / sensor0;
[0232] VxLAN tunnel destruction: ip link delete vxlan_3;
[0233] To delete an iptables rule: iptables -D INPUT -p udp --dport 37099 -j ACCEPT.
[0234] After the cleanup is complete, verify the container status using `runc check ohos_container` to ensure that the isolation domain has been restored to its initial safe configuration.
[0235] S6.2, Full lifecycle operation data archiving.
[0236] Data types and sources collected:
[0237] Isolation domain adjustment log: / var / log / ohos_container / resource_adjust.log; Exception log: / var / log / ohos_container / error.log (including exception type, time, and remedial measures); Performance metrics: peak CPU, memory, and IOPS, average FPS, and average IPC latency in the time-series database; Interaction statistics: number of interactions, success rate, and average latency between processes, containers, and hosts.
[0238] Standardized data storage: Data is written to InfluxDB using the influx write command. The measurement is ohos_service_lifecycle, tags include serviceId and containerId, and fields include max_cpu, error_count, cross_interaction_count, etc.
[0239] S6.3 Optimization suggestion generation and feedback.
[0240] Association analysis: Use Grafana+Flux query language to analyze archived data, for example:
[0241] from(bucket: ohos_metrics)
[0242] |> range(start: -7d)
[0243] |> filter(fn: (r) => r._measurement == ohos_service_lifecycle andr.serviceId == com.example.weather)
[0244] |> group(columns: [pid_range])
[0245] |> mean(column: error_count)
[0246] If the analysis finds that error_count=0 when pid_range=1-300, then it is recommended to set the initial PID range to 1 to 300.
[0247] Feedback to Step 1: Optimization suggestions are written to / var / lib / ohos_container / optimization_rules.json. The dependency modeling phase in Step 1 reads this file using the jq tool and automatically updates the initial parameters of the isolation domain (such as PID range and device mapping list), realizing a closed-loop iteration of running, analyzing, and optimizing.
[0248] Experimental verification shows that this embodiment has the following main improvements over existing technologies: significantly improved stability, with a 92% reduction in atomic service crash rate, an increase in IPC call success rate from 65% to 99.5%, and a reduction in distributed communication latency from 200ms to 80ms; high isolation security, with the elastic penetration mechanism maintaining container isolation effectiveness above 95%, and reducing the risk of permission escape by 80% compared to traditional sharing solutions; strong adaptability, automatically adapting to different service characteristics (such as UI services and data services) through predictive models and closed-loop optimization, requiring no manual configuration and improving adaptation efficiency by 90%; excellent compatibility, requiring no modification to the OpenHarmony kernel or atomic service code, and compatible with the existing HAP package ecosystem (100% compatibility test pass rate); low performance overhead, with the additional performance overhead of the eBPF probe and on-demand collection mechanism being less than 5%, not affecting service response speed (UI frame rate fluctuation less than 2fps).
[0249] In summary, the above are merely preferred embodiments of the present invention and are not intended to limit the scope of protection of the present invention. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the scope of protection of the present invention.
Claims
1. A containerized microkernel atomic service running method based on isolated domain elastic boundary, characterized in that, Specifically, the following steps are included: When HarmonyOS container starts, it verifies the validity of atomic services, establishes an association table between dependency types, isolation parameters, missing configurations and anomaly probabilities, and defines the isolation domains within the container to determine isolation parameters and set isolation boundaries. When an atomic service initializes, it generates an initialization log to obtain its runtime dynamic dependencies, thereby modifying its process namespace, mapping the required devices, and adding network routes for HarmonyOS devices. When an atomic service runs, it modifies the isolation boundary based on the communication object. When the communication object is a local system service, it adds the corresponding target process to the container process namespace. When it is another container, it creates a virtual network interface card and adds it to the container's bridge. When it is another HarmonyOS device, it creates a tunnel and an encrypted channel. Input the real-time load and container resource usage data of atomic services into the resource prediction model to obtain the predicted value of the resources required for the atomic services to run after a set time, including CPU utilization and memory usage. Adjust the CPU quota, memory limit or GPU scheduling priority according to the predicted value. When an atomic service experiences performance issues, the scheduling priority of the process in which it resides is increased, and the upper limit threshold of resource quotas is raised. When an atomic service switches its service state to "lost focus" and its container is in isolation failure, the isolation boundary is reset to the selected configuration for a set time period, and the dynamic dependency configuration isolation boundary is identified when it recovers to an active state. When an atomic service is in service zombie state and its container is in isolation failure, a backup container is enabled and state data is synchronized. When an atomic service changes from background running to active, access to the device is restored, resource quota is increased, and network ports are opened. When it changes from active to background running, access to the device is closed, resource quota is reduced, and network ports are closed except for UDP ports. When an atomic service exits, the initial value of the isolation boundary is restored, and the running data is added to the historical running data.
2. The containerized humongous atom service running method according to claim 1, characterized in that, The method for establishing the association table between dependency type, isolation parameter, missing configuration and anomaly probability is as follows: K-means clustering analysis is used to analyze the historical running data of atomic services to establish the association table.
3. The containerized HarmonyOS atomic service operation method according to claim 1, characterized in that, When the atomic service starts, the startup status is monitored. If the number of startup failures is less than the threshold, the process namespace is adjusted or the required devices are remapped according to the reason for the failure. Otherwise, the pre-created redundant standby container is used to start the atomic service.
4. The containerized HarmonyOS atomic service operation method according to claim 1, characterized in that, For atomic services, their service status records and state transition timestamps are periodically retrieved, and heartbeat packets are sent to them. If no heartbeat response is received after more than a set number of times, the service is marked as dead. The performance parameters of atomic services, including inter-process communication latency and user interface thread frame rate, are monitored. If the performance parameters exceed the threshold, the service is marked as abnormal. For the container where the atomic service resides, the mapping relationship between the thread PID in the container and the host thread PID is periodically monitored to ensure it is correct, and the routing and port rules required for cross-domain communication are valid. The current resource usage is also monitored to ensure it is within the adjusted quota. If any of these conditions are not met, the container is marked as isolated and unavailable.
5. The containerized HarmonyOS atomic service operation method according to claim 1, characterized in that, The method for generating initialization logs during atomic service initialization is as follows: real-time monitoring of the entire process of atomic service startup, process creation, inter-process communication, and device access; obtaining the service name and HarmonyOS device ID of the atomic service, the process PID, startup parameters, and service type of the process created by the system for the atomic service, the communication interface and capability calls of the atomic service, and the external devices accessed by the atomic service; and generating initialization logs from this data.
6. The containerized HarmonyOS atomic service operation method according to claim 1, characterized in that, The real-time load of the atomic service includes the user interface thread frame rate, cross-process call latency, and service status. The container resource usage data includes CPU utilization, memory usage, disk IOPS, and network throughput.
7. The containerized HarmonyOS atomic service operation method according to claim 1, characterized in that, The resource prediction model is constructed using an LSTM prediction model, consisting of an input layer, an LSTM layer, and a fully connected layer for two-dimensional output. A training dataset is constructed using historical running data, and the resource prediction model is trained using the training dataset.
8. The containerized HarmonyOS atomic service operation method according to claim 1, characterized in that, The isolation domain includes a core domain, a shared domain, and a resource domain. The core domain is a region with a fixed user namespace and an isolated system directory. The shared domain is a minimal set of process namespaces, network namespaces, and mapped devices. The resource domain is a region that allocates CPU, memory, and disk I / O limits based on historical data.
9. The containerized HarmonyOS atomic service operation method according to claim 1, characterized in that, The dependency type refers to the fault type or the device, permissions and resources that the service depends on. The isolation parameter is the container's restriction configuration. The missing configuration is the specific configuration that is missing under the isolation parameter. The anomaly probability is the probability that the atomic service will fail when the current dependency type has a missing configuration.
10. The containerized HarmonyOS atomic service operation method according to claim 1, characterized in that, The dynamic dependencies include services, processes, mapping devices, and HarmonyOS devices.