System exception capturing method, main system, shadow system and intelligent equipment

A main system and shadow technology, applied in the field of system exception capture and intelligent equipment, can solve problems such as unable to capture valid information, PCI bus hangs, other tasks cannot be scheduled, etc., to achieve the effect of enhancing maintainability

Inactive Publication Date: 2015-12-30
ZTE CORP
View PDF3 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] There are some methods for locating crashes in the existing operating system, such as the kdump technology of the Linux operating system can capture the operating system kernel software exception, the nmi_watchdog technology of the Linux operating system can capture the kernel interrupt deadlock exception, the Linux operating system’s The watchdog technology can capture kernel scheduling exceptions, but it cannot capture valid information for crash exceptions caused by the following reasons:
[0004] 1. The CPU hardware failure causes the operating system to hang up
In this case, the CPU hardware hangs directly, causing the operating system running on the CPU to hang directly, so that no valid information can be recorded
[0005] 2. The memory hardware failure causes the operating system to hang up
In this case, the memory hardware failure causes the operating system to hang directly, so that no valid information can be recorded
[0006] 3. PCI (Peripheral Component Interconnect, Peripheral Component Interconnect Standard) device hardware or firmware failure causes the PCI bus to hang, and eventually the operating system hangs
In this case, valid information cannot be recorded
[0007] 4. Hard disk hardware or firmware failure causes the operating system to hang up
In this case, the system I / O (input / output) hangs due to a hard disk failure, and the log cannot be recorded
[0008] 5. The system load is too heavy and the operating system hangs, such as memory exhaustion
In this case, the operating system cannot perform operations related to recording exception information
[0009] 6. High-priority tasks continue to occupy the CPU, causing other low-priority tasks to be unscheduled, eventually causing the operating system to hang
In this case, the system can only schedule high-priority tasks to execute, but the low-level processes related to recording abnormal information cannot be scheduled, so that effective information cannot be recorded
[0010] 7. A deadlock occurs during soft interrupt processing, which causes other tasks to fail to be scheduled, and eventually causes the operating system to hang
However, this solution is not applicable due to the additional configuration of monitoring equipment

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • System exception capturing method, main system, shadow system and intelligent equipment
  • System exception capturing method, main system, shadow system and intelligent equipment
  • System exception capturing method, main system, shadow system and intelligent equipment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0118] Step A11, the shadow system detects and reads the physical CPU information used by the main system from the shared memory. The physical CPU information is saved in the shared memory by the main system.

[0119] Step A12, the shadow system stops running by sending interrupt requests to all CPUs used by the main system.

[0120] Step A13, if the main system can still respond to the interrupt request at this time, the main system stops running, stops accessing the memory, and synchronizes the states among the CPUs. Wherein, if the main system cannot respond to the interrupt request at this time (has hung up), then skip this step.

[0121] Step A14, the shadow system reads the physical memory address previously saved by the main system from the shared memory.

[0122] Step A15, the shadow system uses the physical memory address of the main system as a startup parameter of the query kernel, and then loads the query kernel. This way, the shadow system can access the physic...

Embodiment 2

[0131] In the second embodiment, the shadow system performs anomaly monitoring according to the status of network card resources occupied by the main system, specifically including: image 3 Steps shown:

[0132] Step B11, after the shadow system is started, it obtains the network card information of the main system from the shared memory, and periodically sends a heartbeat request message to the designated network card of the main system through the network card resources allocated to itself. Due to the same hardware environment, if the main system has network card resources, then the shadow system can also be allocated network card resources.

[0133] Step B12, the main system receives the heartbeat request message sent by the shadow system through its own network card, and replies with a corresponding heartbeat response message.

[0134] Step B13, if the shadow system detects that a specified number of heartbeat responses are lost within a specified time, then it is determ...

Embodiment 3

[0136] In the third embodiment, the shadow system performs abnormal capture on the CPU resources occupied by the main system, specifically including: Figure 4 Steps shown:

[0137] In step C11, after the shadow system is started, the CPU resource information used by the main system is read from the shared memory.

[0138] Step C12, the shadow system periodically sends an inter-core interrupt to the CPU used by the main system as a heartbeat request message; this step is applicable to a multi-core CPU chip, and the shadow system and the main system can run on different CPUs.

[0139] Step C13: After receiving the inter-core interrupt from the shadow system through the shared memory, the main system replies to the CPU used by the shadow system with an inter-core interrupt response, that is, a heartbeat response message. It should be noted that after the CPU processes the received core information, it will continue to run normally, and the interrupt processing is very fast, whi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a system exception capturing method, a main system, a shadow system and intelligent equipment. The main system side method includes the steps that the shadow system used for performing exception detection on the main system is started by the main system on a second hardware resource of a hardware environment; the second hardware resource is different from a first hardware resource, operating in the hardware environment, of the main system; own operation state information of the main system is dynamically stored in a shared memory, and when monitoring main system exceptions, the shadow system obtains the operation state information of the main system from the shared memory; own physical memory addresses of the main system are stored in the shared memory, and when monitoring main system exceptions, the shadow system can get access to a physical memory of the main system through the physical memory addresses in the shared memory and obtain the information of the physical memory used by the main system. According to the scheme, on the breakdown condition of the operation system (namely the main system), exception information can be captured.

Description

technical field [0001] The invention relates to the technical field of computer operating systems, in particular to a system abnormality capturing method, a main system, a shadow system and an intelligent device. Background technique [0002] With the rapid development of computer software and hardware technology, the hardware environment and business programs of the operating system are becoming more and more complex. In practical applications, the system often crashes. The possible manifestations are: the keyboard and mouse do not respond, and cannot be pinged. , The display cannot be turned on or abnormal information cannot be displayed on the display, and the system log cannot record effective fault information. At this time, the environment may completely lose response and cannot be operated. The analysis and positioning of such problems has always been a major problem in the industry. [0003] There are some methods for locating crashes in the existing operating syste...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F11/34
CPCG06F11/34
Inventor 蒋彪
Owner ZTE CORP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products