A method for end-side AI dialogue interaction adaptation and reasoning optimization of child-oriented intelligent hardware
By constructing an age semantic resource state graph, the problem of inconsistent age adaptation and edge-side inference resource allocation in children's smart hardware is solved, ensuring the age appropriateness and consistency of children's dialogue output under resource-constrained conditions, and achieving stable dialogue interaction.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- LINTONG VISION (SHENZHEN) TECHNOLOGY CO LTD
- Filing Date
- 2026-04-01
- Publication Date
- 2026-06-30
AI Technical Summary
In the edge-side dialogue link of children's smart hardware, existing technologies have failed to effectively bind the child's age adaptation results with the allocation of edge-side inference resources, model calling paths and degradation rules, resulting in age-appropriateness and consistency drift in the child's dialogue output under extreme resource constraints.
By constructing an age-semantic resource state graph, the results of children's age adaptation, semantic carrying requirements, and edge terminal resource states are jointly mapped, and the target reasoning trajectory is screened and verified to ensure that children's expressive features and device resource states are calculated in a unified manner, thus forming a stable reasoning path.
This technology enables the unified integration of children's expressive characteristics and device resource status under resource constraints in various application areas, including edge AI dialogue interaction adaptation and inference optimization for children's smart hardware.
Smart Images

Figure CN122309671A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of edge AI dialogue interaction adaptation and inference optimization technology, and more specifically, to an edge AI dialogue interaction adaptation and inference optimization method for children's smart hardware. Background Technology
[0002] In the field of edge AI dialogue interaction for children's smart hardware, the mainstream practice in the industry is to solve how to achieve basic local speech recognition, semantic understanding and dialogue response on resource-constrained terminals. The usual approach is to first select the corresponding model based on the user's age or preset level, and then complete a dialogue in a serial link of speech preprocessing, text normalization, model loading, inference generation and result output, and supplement it with quantization compression, block loading and power consumption control to reduce the pressure of edge deployment. For example, in personal companion robots or early education story machines for children aged 3 to 12, the devices often need to continuously respond to multi-turn dialogue requests in noisy environments such as indoors, playgrounds and vehicles under hard constraints such as ARM Cortex-A class processors, memory no more than 512MB, storage no more than 1GB, small battery capacity and unstable or even offline networks. However, under this constraint, the mainstream approach will consistently expose an observable and verifiable defect: although the age-appropriate interaction target has been determined after the same round of child requests has undergone age recognition, child voice adaptation and text normalization in the front end, the back end will temporarily switch inference paths, compress context size or trigger unified degradation due to changes in memory pool level, weight cache mismatch, model block loading jitter or power budget reaching the limit. This results in the final output drifting from the age adaptation result determined in the front end in terms of response length, tone style, knowledge depth and interaction stability. The root cause is that the existing solution still separates the child age adaptation from the edge side inference resource scheduling, and fails to make the age adaptation result inherently determine the inference path that can be executed on the edge side. The technical problem this application aims to solve is: how to ensure that age-adaptation results are consistently bound to edge-side inference resource allocation, model call paths, and degradation rules in the edge-side dialogue link of children's smart hardware, so as to maintain the age-appropriateness, consistency, and real-time performance of children's dialogue output under extreme resource constraints. Summary of the Invention
[0003] To overcome the aforementioned deficiencies of the prior art, embodiments of the present invention provide an edge-side AI dialogue interaction adaptation and inference optimization method for children's smart hardware. This method maps the child's age adaptation results, semantic carrying requirements, and edge terminal resource status together into an age semantic resource state graph. After filtering and perturbing the target inference trajectory in this state graph, local inference and out-of-bounds rollback are performed to solve the problems mentioned in the background art.
[0004] To achieve the above objectives, the present invention provides the following technical solution: a method for adapting and optimizing edge AI dialogue interaction for children's smart hardware, comprising: S1. Obtain the current round of child speech, corresponding text, user age information, historical interaction sequence, remaining memory, processor usage, remaining battery power and network connectivity status on the edge computing terminal. Perform child semantic regularization on the current round of child speech and corresponding text, and extract sentence length, word difficulty, intent category, emotion category and context depth to form a joint state value. S2. Read age nodes, semantic nodes, context nodes, model nodes, quantization nodes, cache nodes, and power consumption nodes. Determine candidate transition edges based on age consistency, term difficulty matching, context capacity, remaining memory after deduction not less than zero, processor available load to meet execution, remaining power to meet running, and only retaining local model connections when disconnected from the network. Output the age semantic resource state graph. S3. Search for candidate links to the response node in the age semantic resource state graph, and count the age offset, semantic missing amount, context truncation amount, model loading amount, cache usage amount and power consumption amount, and output the candidate inference track set; S4. Inject child expression perturbation, environmental noise perturbation, context growth perturbation, memory decrease perturbation and power consumption fluctuation perturbation into each candidate inference track. Retain tracks with zero age offset, semantic missing amount and memory out-of-bounds amount, and sort them in ascending order by latency growth amount, power consumption offset amount and context truncation amount. Take the first track as the target inference track.
[0005] In a preferred embodiment, it further includes: S5. On the edge computing terminal, perform model loading and local dialogue reasoning according to the target reasoning trajectory, generate the target response result, and output it directly when there is no boundary violation. When there is a boundary violation, replace the model node, quantization node, context node and cache node with adjacent low-occupancy nodes and then re-infer the result, and output the final interaction result.
[0006] In a preferred embodiment, S1 includes: S1-1. Perform syllable segmentation and pause localization on the current round of children's speech, extract the duration, segment interval, stress position and intonation fluctuation of each speech segment, and perform alignment segmentation on the corresponding text according to the pause position, and output the speech-text aligned segment set; S1-2. Calculate the word occurrence order, repetition count, reference jump count, emotional word proportion and question-answering position for each segment of the speech-text aligned segment set, and generate segment feature sequences corresponding to sentence length, word difficulty, intent category, emotion category and context depth according to the segment order. S1-3. Write the user's age information, historical interaction sequence, remaining memory, processor usage, remaining battery power, and network connectivity status into the segment feature sequence, perform cumulative merging according to the order of the segments, and output the joint state value.
[0007] In a preferred embodiment, S2 includes: S2-1. Perform combination expansion on age nodes, semantic nodes, context nodes, model nodes, quantization nodes, cache nodes, and power consumption nodes to generate node combination groups. For each node combination group, calculate the age difference, term difficulty difference, context margin, memory margin, processor margin, and battery life margin. Determine the node combination group with zero age difference, term difficulty difference not greater than zero, context margin not less than zero, memory margin not less than zero, processor margin not less than zero, battery life margin not less than zero, and whose model node belongs to the local model when the network connectivity status is disconnected as the initial feasible edge set. S2-2. Calculate the semantic retention cost, loading latency cost, cache occupancy cost, and power consumption cost for each edge in the initial feasible edge set. Then, sum the semantic retention cost, loading latency cost, cache occupancy cost, and power consumption cost to form the total edge cost. Sort each edge in ascending order of total edge cost, descending order of context space, and descending order of memory space. Perform a bounded search from the age node to the model node and output the candidate edge chain set.
[0008] In a preferred embodiment, S2 further includes: S2-3. Inject the term growth, context growth, memory decrease and processor fluctuation into each candidate edge chain in the candidate edge chain set. Recalculate the changes in semantic retention cost, loading latency cost, cache crowding cost and power consumption cost before and after injection. Determine the candidate edge chain whose cost changes are not greater than the corresponding margin as the stable edge chain set. Write each edge in the stable edge chain into the corresponding node pair to form an undetermined connection relationship. S2-4. For multiple edges pointing to the same model node or the same cache node in a given connection relationship, compare the total edge cost, semantic retention cost, and loading delay cost respectively. Keep the edge with the highest total edge cost. If the total edge cost is the same, keep the edge with the highest semantic retention cost. If the semantic retention cost is the same, keep the edge with the highest loading delay cost. Delete the remaining edges and output the age semantic resource state graph.
[0009] In a preferred embodiment, S3 includes: S3-1. Read the starting node, response node and the connecting edges of each node in the age semantic resource state graph, perform layer-by-layer expansion along the connecting edges of each node, generate multiple candidate links from the starting node to the response node, and record the value sequence of the age node, semantic node, context node, model node, quantization node, cache node and power consumption node in the edge order position for each candidate link, and output the candidate link sequence set. S3-2. For each candidate link in the candidate link sequence set, calculate the age offset between the age node and the age segment mapping result in the joint state value, the semantic missing amount between the semantic node and the term difficulty and intent category in the joint state value, the context truncation amount between the context node and the context depth in the joint state value, the model loading amount corresponding to the model node, the cache usage amount corresponding to the cache node, and the power consumption amount corresponding to the power consumption node. Then, perform cumulative summation according to the edge order position and output the link evaluation group corresponding to each candidate link. S3-3. For each link evaluation group, compare the age offset, semantic missing amount, context truncation amount, model loading amount, cache usage and power consumption. Retain candidate links with zero age offset, zero semantic missing amount, and cache usage and power consumption not exceeding the remaining memory and battery power constraints corresponding to the joint state value, and write the retained candidate links into the candidate inference track set.
[0010] In a preferred embodiment, S4 includes: S4-1. For each candidate inference track in the candidate inference track set, inject child expression perturbation, environmental noise perturbation, context growth perturbation, memory decrease perturbation and power consumption fluctuation perturbation respectively. Recalculate the age offset, semantic missing amount, context truncation amount, latency growth amount, memory out-of-bounds amount and power consumption offset for each perturbation, and write them into the perturbation evaluation sequence according to the perturbation type. S4-2. For each candidate inference track's perturbation evaluation sequence, calculate the age offset consistency value, semantic missing consistency value, memory out-of-bounds consistency value, cumulative latency growth value, cumulative power consumption offset value, and cumulative context truncation value. Identify the candidate inference tracks with zero age offset consistency value, zero semantic missing consistency value, and zero memory out-of-bounds consistency value as constraint-preserving tracks, and delete the remaining candidate inference tracks from the candidate inference track set.
[0011] In a preferred embodiment, S4 further includes: S4-3. For each constrained retention track, calculate the time delay growth difference, power consumption offset difference, and context truncation difference between the previous and subsequent perturbations in the order of perturbation. The constrained retention track whose time delay growth difference, power consumption offset difference, and context truncation difference remain unchanged in two consecutive calculations is determined as the convergent track. The constrained retention track that has not reached the unchanged state continues to be injected with five types of perturbations and the calculation is repeated until it reaches the unchanged state. S4-4. For each convergence track, compare the cumulative value of delay growth, cumulative value of power consumption offset, and cumulative value of context truncation. Sort them in ascending order by the cumulative value of delay growth. If the cumulative values of delay growth are the same, sort them in ascending order by the cumulative value of power consumption offset. If the cumulative values of power consumption offset are the same, sort them in ascending order by the cumulative value of context truncation. Take the convergence track with the first position in the sorted order as the target inference track.
[0012] In a preferred embodiment, S5 includes: S5-1: Read the model nodes, quantization nodes, context nodes, and cache nodes in the target inference track. On the edge computing terminal, perform weight loading according to the model block order corresponding to the model node, perform parameter expansion according to the quantization format corresponding to the quantization node, write the historical interaction sequence according to the context capacity corresponding to the context node, allocate the inference cache area according to the cache quota corresponding to the cache node, and output the target inference environment.
[0013] In a preferred embodiment, S5 further includes: S5-2. Perform local dialogue reasoning on the current round of child speech and corresponding text in the target reasoning environment, generate target response results, and simultaneously output sentence length deviation, word difficulty deviation, latency out-of-bounds, memory out-of-bounds and power consumption out-of-bounds, and output reasoning result group; S5-3. Perform an out-of-bounds judgment on the reasoning result group. When the output sentence length deviation, term difficulty deviation, latency out-of-bounds, memory out-of-bounds, and power consumption out-of-bounds are all zero, output the target response result as the final interaction result. When any non-zero item exists, replace the model node, quantization node, context node, and cache node in the target reasoning track with the corresponding adjacent low-occupancy node, and then re-execute the model loading and local dialogue reasoning to output the final interaction result.
[0014] The technical effects and advantages of this invention are as follows: This solution writes the age group mapping results, term difficulty, context depth, model nodes, cache nodes, and power consumption nodes into the age semantic resource state graph, and constrains the subsequent loading and degradation paths with the target inference track, so that the age adaptation results determine the execution path on the end side in advance, thus relatively suppressing the drift of response length, tone style and knowledge depth. Sentence length, intent category, emotion category and context depth are extracted segment by segment from the speech-text aligned fragment set and a joint state value is formed. Based on this, node combination and link evaluation are carried out. This can unify children's expressive features and device resource status into the same calculation object, thus relatively improving the connection consistency between end-side dialogue input parsing and resource scheduling. The semantic preservation cost, loading latency cost, cache crowding cost, and power consumption cost are calculated for candidate edge chains. After perturbation injection, the cost change is recalculated, and only stable edge chains covered by the margin are retained. This helps to screen out non-closed paths in advance under resource-constrained conditions, thus relatively reducing the probability of memory mismatch and loading jitter during inference. The candidate inference track is continuously injected with child expression perturbation, environmental noise perturbation, context growth perturbation, memory decrease perturbation and power consumption fluctuation perturbation, and the target inference track is retained by the consistency value and convergence track determination. Dynamic stability verification can be completed before formal inference, thus relatively improving the response stability in offline, noisy and multi-turn interaction scenarios. By loading weights according to the model block order, expanding parameters according to quantization nodes, writing historical interaction sequences according to context nodes, and allocating inference cache areas according to cache nodes, a target inference environment consistent with the target inference trajectory is formed. This can reduce the switching of execution objects before and after inference, thus relatively shortening the edge response preparation time and improving real-time response performance. Attached Figure Description
[0015] Figure 1 This is a flowchart outlining the method steps of the present invention. Detailed Implementation
[0016] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0017] Refer to the instruction manual appendix Figure 1 The present invention provides a method for adapting and optimizing edge AI dialogue interaction for children's smart hardware, comprising: S1. Obtain the current round of child speech, corresponding text, user age information, historical interaction sequence, remaining memory, processor usage, remaining battery power and network connectivity status on the edge computing terminal. Perform child semantic regularization on the current round of child speech and corresponding text, and extract sentence length, word difficulty, intent category, emotion category and context depth to form a joint state value. In this embodiment, S1 is used to uniformly organize the current round of child's speech, corresponding text, and device-side operating status on the edge computing terminal into a joint state value that can be directly called in the subsequent state graph construction. Its working mechanism is to first obtain stable segment boundaries by aligning speech and text, then extract child's expressive features within the segment boundaries, and finally incorporate age information, historical interaction sequences, and device resource information into the segment feature results in a fixed writing order, thereby avoiding the lack of a unified input standard for the construction of subsequent age nodes, semantic nodes, and resource nodes. This implementation process includes the following steps: The purpose of S1-1 is to form a set of speech-text aligned segments that can be calculated segment by segment. Its mechanism is to first define the segments using measurable pause boundaries in the speech, and then use the same boundaries to segment the corresponding text in reverse, so that the speech features and text features share the same segment position. The input is the current round of child speech and the corresponding text. The processing actions include: performing pause localization in the current round of child speech according to continuous low energy intervals, and determining the interval where the duration of low energy reaches a preset duration threshold as the pause position, where the preset duration threshold is taken from the child interaction configuration table on the terminal side; performing syllable segmentation between adjacent pause positions to form speech segments; calculating the difference between the first and last time points of each speech segment to obtain the duration, calculating the difference between the start point of the next speech segment and the end point of the previous speech segment to obtain the segment interval, calculating the position of the syllable where the energy peak is located in the segment to obtain the stressed position, and calculating the difference between the maximum and minimum fundamental frequency values in the segment to obtain the pitch fluctuation. Then, the corresponding text is aligned and segmented according to the timestamps corresponding to the pause positions. The text content falling within the same pause interval is written into the corresponding speech segment, forming a one-to-one speech-text segment record. The output is a speech-text aligned segment set, which is written into the segment buffer for S1-2 to read. When there is noise occlusion in the child's speech, causing the pause position to be missing, the positions of commas, periods, question marks, and interjections in the corresponding text are read to fill in the pause boundary. When there are empty segments after text segmentation, the empty segments are deleted and the subsequent segment numbers are moved forward. The purpose of S1-2 is to transform the speech-text aligned segment set into a segment feature sequence that reflects children's expression patterns and semantic needs. Its mechanism is to simultaneously count the text-side word structure and speech-side emotional cues for each segment, and solidify them into pre-features for subsequent joint state values according to the segment order. The input is the speech-text aligned segment set. The processing actions include: extracting effective words for each segment according to the order of text appearance and recording the order of word appearance; recording adjacent words that appear twice with the same literal meaning as repeated words and counting the number of repetitions; performing back-reference matching between pronouns in the current segment and nouns or titles in the previous segment, and counting the number of times the back-reference target changes to obtain the reference jump number; reading the emotional word list in the current segment and counting the ratio of the number of emotional words to the number of effective words to obtain the emotional word percentage, where the emotional word list is taken from the terminal's local children's vocabulary database. The question-and-answer pointing positions are determined by the positions of question words, imperative words, and request words in the segment; then, segment feature sequences are generated according to the segment order, corresponding to sentence length, word difficulty, intention category, emotion category, and context depth. Among them, the sentence length is counted according to the number of effective words, the word difficulty is determined by the highest proportion of the age vocabulary level of each word in the segment, and the higher level is taken when they are in parallel, the intention category is determined by the question-and-answer pointing position and the position of the action word, the emotion category is determined by the proportion of emotion words and the amount of intonation fluctuation, and the context depth is determined by the distance of the historical rounds required for the current segment to complete the semantic closure; the output is the segment feature sequence, which is written into the feature buffer for S1-3 to read according to the segment order; when there are no explicit emotion words in the segment, the intonation fluctuation of the corresponding speech segment is directly read to determine the emotion category; when a segment only contains tone words or invalid pause words, the segment is merged into the previous segment and recalculated. The purpose of S1-3 is to unify the child's expressive characteristics with the user-side, historical-side, and device-side states into the same result object, forming the sole input for subsequent state graph construction and track search. Its mechanism is to perform segment-by-segment accumulation and merging according to a fixed field order, so that the child's adaptation information and edge computing resource information form a unified description. The input quantities are segment feature sequences, user age information, historical interaction sequences, remaining memory, processor usage, remaining battery power, and network connectivity status. The processing actions include: first, performing age segment mapping on the user age information, mapping the user age to the corresponding age segment identifier. The age segment division is taken from the age configuration table in the terminal registration information. When the user age information is missing, the temporary age segment identifier is deduced by back-deriving the word difficulty and sentence length of the most recent preset round in the historical interaction sequence; then, reading the most recent rounds of interaction content from the historical interaction sequence, and counting the historical rounds, the number of historical entities, and the most recent main graph category to form the historical interaction field. Subsequently, the remaining memory, processor usage, remaining battery power, and network connectivity status are read to form the device resource field. Then, the age group identifier, sentence length, word difficulty, intent category, emotion category, context depth, historical interaction field, and device resource field are written into the segment feature sequence in a fixed order, and cumulative merging is performed according to the sequence of segments. The intent category and emotion category of the previous segment are written into the previous segment as the preorder reference field. The context depth of the previous segment and the context depth of the next segment are cumulatively compared and the larger value is retained to finally form the joint state value. The output is the joint state value, which is written into the joint state buffer for S2 to read. When the historical interaction sequence is empty, the historical round is recorded as zero, the historical entity count is recorded as zero, and the recent idea graph category is recorded as null, and merging continues. When the network connectivity status is offline, a network disconnection flag is written synchronously for direct use in subsequent local model connection filtering. Through the above implementation process, the current round of children's speech and corresponding text are organized into a joint state value with unified field caliber, unified segment order, and unified resource writing rules. When constructing the age semantic resource state graph later, age group identifier, sentence length, word difficulty, intent category, emotion category, context depth, and device resource fields can be directly read, avoiding problems such as unclear field sources, inconsistent values, or inability to directly compare them. At the same time, this process also resolves boundary scenarios such as missing age, missing pauses, empty text segments, and empty historical interactions in advance, so that subsequent state graph connection, track search, and inference execution have a stable input foundation. In practical applications: When an offline child companion robot receives the child's voice message, "I want to continue listening to the story of the little rabbit. Why did it cry later?", the terminal first segments multiple voice fragments according to the pauses in the speech, and then segments the corresponding text according to the same pause position. Next, it analyzes the order of occurrence of words such as "little rabbit," "later," and "why it cried," the question-and-answer focus, and the proportion of emotional words to identify that the main idea of this round is a story inquiry type, the emotion category is concern type, and the context depth is the previous story round that needs to be read again. Then, it writes the age group identifier mapped from the user's age information, the recent story interaction records, the current remaining memory, processor usage, remaining battery power, and network disconnection status together to form a joint state value, which can be directly used for the connection calculation of subsequent age nodes, semantic nodes, context nodes, and model nodes.
[0018] S2. Read age nodes, semantic nodes, context nodes, model nodes, quantization nodes, cache nodes, and power consumption nodes. Determine candidate transition edges based on age consistency, term difficulty matching, context capacity, remaining memory after deduction not less than zero, processor available load to meet execution, remaining power to meet running, and only retaining local model connections when disconnected from the network. Output the age semantic resource state graph. In this implementation, S2 is used to map the joint state value to an age semantic resource state graph that can be directly invoked by subsequent track searches. Its overall purpose is to unify the child's age adaptation requirements, semantic carrying requirements, and the model loading capacity, caching capacity, and battery life of the edge computing terminal into the same connection structure. This allows subsequent candidate link searches to no longer rely on scattered judgments but to be executed directly on the node connection relationships that have been filtered, verified, and conflict-resolved. The principle process is as follows: First, perform combination expansion according to the dependencies between nodes to filter out initial feasible edges that meet the requirements of age, semantics, context, memory, processor, and battery life. Then, calculate multiple types of costs for the initial feasible edges and perform bounded search to form candidate edge chains. Subsequently, inject runtime perturbations into the candidate edge chains and verify the cost changes. Finally, perform conflict comparison and unique retention on the written pending connection relationships to form an age semantic resource state graph. This implementation process includes the following steps: The purpose of S2-1 is to form an initial feasible edge set that meets the basic connectivity conditions. Its mechanism is to first expand the seven types of nodes according to a fixed hierarchy, and then compare the accessibility of each node combination item by item with the child adaptation field and device resource field in the joint state value, and delete node combinations that do not meet the conditions. The inputs are the joint state value, age node set, semantic node set, context node set, model node set, quantization node set, cache node set, and power consumption node set. The processing actions include: first reading the age group mapping result, term difficulty, context depth, remaining memory, processor usage, battery remaining power, and network connectivity status in the joint state value. The available processor load is obtained by subtracting the processor usage from the total processor load. The battery remaining power corresponds to the duration calculated according to the terminal's local power consumption conversion table, which is derived from the terminal's factory calibration results. Then, the node combination is expanded layer by layer in the order of age node to semantic node, semantic node to context node, context node to model node, model node to quantization node, quantization node to cache node, and cache node to power consumption node to generate node combination groups. Subsequently, for each node combination, the age difference, term difficulty difference, context margin, memory margin, processor margin, and battery life margin are calculated. The age difference is determined by the absolute value of the difference between the age segment ordinal position and the age node ordinal position in the joint state value. The term difficulty difference is determined by subtracting the upper limit of term difficulty in the semantic node from the term difficulty level in the joint state value. The context margin is determined by subtracting the context depth in the joint state value from the context capacity in the context node. The memory margin is determined by subtracting the model node's occupancy and the cache node's occupancy from the remaining memory. The processor margin is determined by subtracting the model node's execution load from the available processor load. The battery life margin is calculated by subtracting the power consumption node's duration from the remaining battery power's corresponding duration. Confirm; then retain the node combination groups with zero age difference, no greater than zero term difficulty difference, no less than zero context margin, no less than zero memory margin, no less than zero processor margin, and no less than zero battery life margin as accessible combinations, and further verify the storage location field of the model nodes when the network connectivity status is disconnected, retaining only the node combination groups whose storage location field is marked as the terminal local storage area; the output is the initial feasible edge set, and it is written to the edge filtering cache for S2-2 to read; when there are no subsequent nodes after a certain layer is expanded, terminate the continued expansion of the node combination group and delete the node combination group; when the user age information is a temporary reverse calculation result, the age node is only expanded in the subset of age nodes corresponding to the temporary age segment; The purpose of S2-2 is to generate a set of candidate edge chains from the initial feasible edge set for subsequent stability verification. Its mechanism involves first calculating four types of costs for each edge: semantic, latency, cache, and power consumption. Then, based on the cost ranking, a boundary-based link search is performed to obtain executable connection chains from age nodes to model nodes. The inputs are the initial feasible edge set and joint state values. The processing actions include: calculating the semantic preservation cost, loading latency cost, cache overrun cost, and power consumption cost for each edge in the initial feasible edge set, where the semantic preservation cost is calculated based on the semantic node and the connection... The cost is determined by summing the number of uncovered intent categories, uncovered sentiment categories, and terms exceeding the term difficulty limit between the combined state values. The loading latency cost is determined by summing the number of model blocks corresponding to the model node, the number of parameter expansions corresponding to the quantization node, and the length of the model block reading order. The cache occupancy cost is determined by the positive difference between the cache node's occupied value and the current cache free value. The power consumption cost is determined by multiplying the power consumption value of the power node's operation by its duration. Finally, the semantic preservation cost, loading latency cost, cache occupancy cost, and power consumption cost are summed to form the total edge cost. Then, the edges are sorted in ascending order of total edge cost, descending order of context space, and descending order of memory space. When the total edge cost is the same, edges with higher context space are prioritized; when context space is also the same, edges with higher memory space are prioritized. A bounded search is then performed from the age node starting layer towards the model node. The bounded search boundary is jointly defined by the number of context node layers, the number of model nodes, and the number of cache nodes. Reaching a model node is the stopping condition for a single edge chain. Furthermore, if the cumulative total edge cost of the current edge chain exceeds the cumulative value of the last retained edge chain, the expansion of that edge chain is stopped. The output is a set of candidate edge chains, which is written to the edge chain cache. The storage area is available for S2-3 to read. When multiple age nodes in the initial feasible edge set meet the conditions at the same time, the search is performed in order of age node position that matches the age segment mapping result in the joint state value. Edge chains that are not found in the search model nodes are not written into the candidate edge chain set. Taking the offline story-telling scenario of a children's companion robot as an example, when the joint state value corresponds to the middle age group, the story questioning class, the high context depth and the remaining memory decreases, the loading latency cost will form a high cumulative value for high block model nodes, and the cache crowding cost will form a high cumulative value for high cache occupancy nodes. This makes the search prioritize retaining edge chains that can carry the story continuation and can be loaded under the current device memory. The purpose of S2-3 is to verify that candidate edge chains can maintain executable connections under conditions of input growth and resource fluctuations. Its mechanism involves incorporating changes in child expression and terminal resources into the same perturbation process, recalculating various cost changes, and retaining only edge chains whose cost changes are still covered by the margin. The inputs are the candidate edge chain set, the joint state value, and the original cost value of each edge in the initial feasible edge set. The processing actions include sequentially injecting term growth, context growth, memory decrease, and processor fluctuation into each candidate edge chain in the candidate edge chain set. The term growth is determined by the number of newly added valid terms in the current round of text, and the context growth is determined by the historical interaction order. The number of new reference rounds and the number of new entities in the column are determined. The memory decrease is determined by the decrease in the difference between the current remaining memory and the reserved safe memory. The reserved safe memory is taken from the terminal-side memory protection configuration table. The processor fluctuation is determined by the difference between the maximum and minimum values of processor usage within the continuous sampling window. The length of the continuous sampling window is taken from the terminal operation monitoring configuration table. After each injection, the semantic preservation cost change, loading latency cost change, cache crowding cost change, and power consumption cost change of each edge in each candidate edge chain are recalculated. The recalculated changes are then compared item by item with the context margin, memory margin, processor margin, and battery life margin corresponding to that edge chain. Subsequently, candidate edge chains whose semantic preservation cost change is no higher than the context margin, whose loading latency cost change is no higher than the processor margin, whose cache encroachment cost change is no higher than the memory margin, and whose power consumption cost change is no higher than the battery life margin are determined as stable edge chain sets. Then, each edge in the stable edge chain is written into the node connection temporary table according to the previous and next node pairs to form undetermined connection relationships. The output is the undetermined connection relationship and is written into the connection temporary area for S2-4 to read. When any cost change of a candidate edge chain exceeds the corresponding margin after any disturbance injection, the injection of subsequent disturbances into the candidate edge chain is stopped and the candidate edge chain is deleted. When an edge in a candidate edge chain has no corresponding successor edge after recalculation, the edge is deleted and the writing of the candidate edge chain is terminated. The purpose of S2-4 is to eliminate duplicate connections pointing to the same resource target from the pending connection relationships, forming a unique and directly searchable age semantic resource state graph. Its mechanism involves comparing the total edge cost, semantic retention cost, and loading delay cost layer by layer for concurrent connections to the same model node or the same cache node, retaining only the top-ranked connection edges and deleting the remaining duplicate edges. The inputs are the pending connection relationships and the corresponding total edge cost, semantic retention cost, and loading delay cost for each edge. The processing actions include: first comparing the total edge cost of multiple edges pointing to the same model node in the pending connection relationship, retaining the edge with the highest total edge cost; when the total edge cost rankings are the same, comparing the semantic retention cost, retaining the edge with the highest semantic retention cost ranking; when the semantic retention cost rankings are the same, comparing the loading delay cost, retaining the edge with the highest loading delay cost ranking; performing the same comparison steps for multiple edges pointing to the same cache node; if the loading delay cost rankings are still the same, retaining the edge written first according to the order in which the node is written to the connection temporary storage area, and deleting the remaining edges. Then, all the remaining node connections are written into the state graph adjacency list in the hierarchical order of age node, semantic node, context node, model node, quantization node, cache node, and power consumption node to form an age semantic resource state graph; the output is the age semantic resource state graph and is written into the state graph area for S3 to read; when a node loses all its successor edges after deleting duplicate edges, the corresponding incoming edges of that node are deleted and the adjacency list is updated synchronously until every retained node in the state graph has a successor connection or is determined to be a termination node; Through the above implementation process, the joint state value is transformed into an age semantic resource state graph with explicit node fields, explicit connection conditions, explicit cost comparison rules, and explicit conflict resolution rules. Subsequent candidate link searches can be directly performed on the connection structure that has already undergone resource constraint screening and perturbation verification, thereby reducing redundant judgments and avoiding further processing of basic connection conflicts in the track search stage. At the same time, this process provides a unique execution calibrator for network outage scenarios, local model identification, edge chain boundaries, cost changes under perturbation, and duplicate connection deletion, making the state graph generation executable and verifiable. In practical applications: When the edge computing terminal is deployed in a children's learning tablet, and the current round of joint state value corresponds to a young age group, question-and-answer companion category, medium context depth, low remaining memory, and is in a disconnected state, the system first expands the node combination layer by layer from the age node to the power consumption node, deletes all combinations corresponding to cloud model nodes, and retains the initial feasible edges that meet the requirements of age consistency, word difficulty upper limit, and memory reserve not less than zero; then, it calculates the semantic retention cost, loading latency cost, cache crowding cost, and power consumption cost for each edge, and searches for several candidate edge chains leading to the local lightweight model node; then, it injects word growth, context growth, memory decrease, and processor fluctuation, deletes edge chains whose cost changes exceed the reserve, retains only stable edge chains, and writes pending connection relationships; finally, it performs a ranking comparison on duplicate connections pointing to the same lightweight model node or the same cache node, retains unique connection edges, and forms an age semantic resource state graph for direct use in subsequent inference track searches.
[0019] S3. Search for candidate links to the response node in the age semantic resource state graph, and count the age offset, semantic missing amount, context truncation amount, model loading amount, cache usage amount and power consumption amount, and output the candidate inference track set; In this embodiment, S3 is used to form a set of candidate inference tracks that can be directly called upon for subsequent perturbation verification and track selection on the age semantic resource state graph. Its overall purpose is to further transform the state graph, which has already undergone connection selection, into link objects that start from the current joint state value, can reach the response node, and have clear evaluation results. This allows the determination of subsequent target inference tracks to be based on link evaluations that can be compared item by item. The principle process is as follows: First, determine the starting node and response node in the age semantic resource state graph, and expand them layer by layer according to a fixed hierarchy to form multiple candidate links; then, calculate the age offset, semantic missing amount, context truncation amount, model loading amount, cache usage, and power consumption for each candidate link edge by edge, and accumulate them according to edge order to form a link evaluation group; finally, filter out candidate links that meet the conditions based on age, semantics, and resource constraints in the joint state value and write them into the candidate inference track set. This implementation process includes the following steps: The purpose of S3-1 is to form a set of candidate link sequences that can reach the response node from the current joint state value. Its working mechanism is to first map the joint state value to the starting node in the state graph, and then expand layer by layer along the connection edges of each node with the response node as the endpoint, thereby transforming the connection relationship in the state graph into a link object that can be evaluated edge by edge. The input is the age semantic resource state graph and the joint state value. The processing actions include: first reading all nodes, all node connection edges, and the corresponding hierarchical position of each edge in the age semantic resource state graph; then, according to the age segment mapping result, term difficulty, context depth, and network connectivity status in the joint state value, locating the age node in the age node layer that is consistent with the age segment mapping result and has retained the successor connection as the starting node. Subsequently, the state graph identifies the terminating node that simultaneously contains values for model nodes, quantization nodes, cache nodes, and power consumption nodes, and possesses a response generation capability marker as the response node. The response generation capability marker comes from the local inference configuration table after the model node and quantization node are paired. Then, starting from the starting node, the process unfolds layer by layer in the order of age node to semantic node, semantic node to context node, context node to model node, model node to quantization node, quantization node to cache node, and cache node to power consumption node. For each node connection edge, the starting node, ending node, and edge order of the current edge are recorded. When a link reaches a response node, the link expansion ends and is saved. Then, for each saved link, the value sequences of age node, semantic node, context node, model node, quantization node, cache node, and power consumption node are recorded according to the edge order position to form a candidate link sequence set. The output is the candidate link sequence set, which is written to the link buffer for S3-2 to read. When a starting node does not have a successor connection edge, the starting node is deleted. When an expanded link loses a successor connection edge in the intermediate layer, the link is terminated and not written to the candidate link sequence set. When there are multiple response nodes, all of them are retained and expanded as the endpoints respectively. The purpose of S3-2 is to transform the candidate link sequence set into a link evaluation group that can be uniformly compared. Its mechanism involves calculating the child adaptation bias and edge resource usage for each candidate link edge-by-edge, accumulating these values according to edge order to form a quantitative evaluation result corresponding to each link. The inputs are the candidate link sequence set and joint state values. The processing actions include: for each candidate link in the candidate link sequence set, sequentially reading the values of the age node, semantic node, context node, model node, cache node, and power consumption node according to edge order; and calculating the age offset, semantic missing amount, context truncation amount, model loading amount, cache usage, and edge-by-edge values. Power consumption is determined by the following parameters: age offset is determined by the absolute value of the difference between the age node ordinal and the age segment mapping result ordinal in the joint state value; semantic missing quantity is determined by the sum of the number of intent categories not covered by the semantic node, the number of emotion categories not covered, and the number of terms higher than the semantic node term difficulty limit; context truncation quantity is determined by the positive difference between the context depth in the joint state value and the context capacity corresponding to the context node; model loading quantity is determined by the sum of the number of model blocks corresponding to the model node; cache usage is determined by the sum of the cache usage value corresponding to the cache node; and power consumption is determined by the sum of the product of the power consumption value corresponding to the power node and the duration. Then, cumulative summation is performed according to the edge order position, accumulating the age offset, semantic missing amount, context truncation amount, model loading amount, cache usage amount, and power consumption amount of each edge of the same candidate link to form the link evaluation group corresponding to this candidate link; the output is the link evaluation group corresponding to each candidate link, and it is written to the evaluation cache area for S3-3 to read; when there is a missing node value in a candidate link, the corresponding node field in the age semantic resource state graph is read back to fill in the missing value and then recalculated. If it still cannot be filled in, the candidate link is deleted; when the historical interaction field in the joint state value is empty, resulting in a context depth of zero, the context truncation amount of each edge is directly recorded as zero and the accumulation continues. The purpose of S3-3 is to screen candidate links from all link evaluation groups that maintain consistent child fit and meet edge resource constraints. Its mechanism involves using age offset and semantic missing values as fit screening criteria, and cache usage and power consumption as resource screening criteria. It performs a step-by-step comparison of each link evaluation group to form a candidate inference track set. The inputs are the link evaluation groups, the remaining memory in the joint state value, and the remaining battery power. The processing actions include comparing age offset, semantic missing values, context truncation, model loading, and cache usage for each link evaluation group. Regarding power consumption, first, the evaluation group of links with zero age offset and zero semantic missing value is retained. Then, the cache usage of the retained links is compared with the remaining memory corresponding to the joint state value. Links with cache usage not exceeding the remaining memory are retained. Next, the power consumption of the remaining links is compared with the allowed consumption value corresponding to the remaining battery power. The allowed consumption value is calculated based on the remaining battery power and the estimated session duration corresponding to the current interaction round. The estimated session duration is taken from the child dialogue session configuration table on the terminal side. Finally, links with power consumption not exceeding the allowed consumption value are retained. Subsequently, the reserved links are written into the candidate inference track set in ascending order of age offset, semantic missing amount, context truncation amount, model loading amount, cache usage, and power consumption. If the previous sorting item is the same, the next sorting item is compared. If all sorting items are the same, they are written in the order they were generated in the candidate link sequence set. The output is the candidate inference track set, which is written into the track cache for S4 to read. When there is no link that meets the conditions, the links with zero age offset and zero semantic missing amount in the link evaluation group are read back, and the first link is selected from the links with the highest cache usage and power consumption and written into the candidate inference track set. At the same time, a resource-limited mark is written for direct use in subsequent perturbation verification. Through the above implementation process, the age semantic resource state graph is transformed into a set of candidate inference tracks with clear start points, clear end points, clear edge order positions, and clear evaluation results. Subsequent perturbation injection and target inference track screening no longer need to return to the original state graph for repeated expansion, but can directly perform track-by-track verification and sorting on the candidate links. At the same time, this process has supplemented the starting node determination, response node identification, link evaluation group formation, resource constraint comparison, and back-off writing entry when no satisfied links are found, thereby eliminating ambiguities in link search and link retention. In practical applications: When an edge computing terminal is deployed in a children's story machine, and the current joint state value corresponds to an older age group, encyclopedia-style follow-up questions, a context depth of three rounds, a medium level of remaining memory, and a low level of remaining battery power, the system first selects the age node consistent with the older age group as the starting node in the age semantic resource state graph, and uses the terminating node with local question-and-answer generation capability as the response node. Multiple candidate links are formed by expanding layer by layer along the connection edges of each node. Then, for each candidate link, the age offset, semantic missing amount, context truncation amount, model loading amount, cache usage, and power consumption are calculated edge by edge, and accumulated according to the edge order to form a link evaluation group. Finally, links with non-zero age offset, non-zero semantic missing amount, cache usage higher than remaining memory, or power consumption higher than the allowed consumption value are deleted. Only the links that meet the conditions are written into the candidate inference track set in a predetermined order for direct use in subsequent perturbation verification.
[0020] S4. Inject child expression perturbation, environmental noise perturbation, context growth perturbation, memory decrease perturbation and power consumption fluctuation perturbation into each candidate inference track. Retain tracks with zero age offset, semantic missing amount and memory out-of-bounds amount, and sort them in ascending order by latency growth amount, power consumption offset amount and context truncation amount. Take the first track as the target inference track. In this embodiment, S4 is used to further identify target inference tracks within the candidate inference track set that can maintain child-adaptability consistency and resource closure under conditions of changes in child expression, environmental noise, context load, and fluctuations in edge computing terminal resources. The overall purpose is to further filter candidate inference tracks from statically feasible to dynamically stable. The principle process is as follows: First, five types of perturbations are injected into each candidate inference track in a fixed order, and the evaluation value is recalculated for each perturbation, forming a perturbation evaluation sequence corresponding to each perturbation type. Then, the consistency value and cumulative value are calculated for each perturbation evaluation sequence, and tracks that exhibit age shift, semantic missing, or memory overflow under any perturbation are deleted. Subsequently, perturbation iteration is performed on the retained tracks, and the difference between adjacent rounds is compared to identify converged tracks that have reached a state of invariance. Finally, all converged tracks are sorted according to the execution order of latency, power consumption, and context truncation, and a unique target inference track is output. This implementation process includes the following steps: The purpose of S4-1 is to form a perturbation evaluation sequence that reflects the response changes of each candidate inference track under multiple perturbations. Its mechanism involves sequentially writing changes in child expression, environment, context growth, and equipment resources into the same candidate inference track, and reusing the link evaluation group formed in S3 as the recalculation base value after each writing, thus obtaining the complete change process of the same track under continuous perturbation conditions. The inputs are the candidate inference track set, joint state values, and the link evaluation group corresponding to each candidate inference track. The processing actions include: processing each candidate inference track in the candidate inference track set according to child expression perturbation, environmental noise perturbation, context growth perturbation, and memory decrease perturbation. The power consumption fluctuation disturbances are injected one by one in a fixed order. Among them, the disturbance of children's expression is determined by the number of newly added repeated words, the number of omitted words in children's language, and the number of pronoun substitutions in the current round of text. The disturbance of environmental noise is determined by the increase of low-energy segments, the increase of pause interval, and the offset of reread position in the current round of children's speech. The disturbance of context growth is determined by the number of newly added reference rounds and the number of newly added entities in the historical interaction sequence. The disturbance of memory decline is determined by the decrease of the current remaining memory relative to the reserved safe memory. The reserved safe memory is taken from the memory protection configuration table on the terminal side. The disturbance of power consumption fluctuation is determined by the change of power consumption per unit time within the continuous sampling window. The length of the continuous sampling window is taken from the running sampling configuration table on the terminal side. After each injection, the link evaluation group corresponding to the candidate inference track is used as the starting point for recalculation. The age offset, semantic missing amount, context truncation amount, latency increase, memory overflow, and power consumption offset are recalculated item by item. The latency increase is determined by the difference between the perturbed loading latency and the original loading latency corresponding to the link evaluation group. The memory overflow is determined by the positive difference between the perturbed cache usage and the remaining memory. The power consumption offset is determined by the difference between the perturbed power consumption and the original power consumption. Subsequently, the perturbation evaluation sequence is written according to the perturbation type, the six recalculated results, and the track identifier. The output is the perturbation evaluation sequence corresponding to each candidate inference track. The data is listed and written to the perturbation buffer for S4-2 to read. When a candidate inference track has missing node fields after any perturbation injection, the link evaluation group corresponding to the candidate inference track is read back to fill in the fields and then recalculated. If the fields still cannot be filled in, the candidate inference track is deleted. Taking the offline chat scenario of children's watches as an example, when a child repeatedly asks "Why hasn't it gone home yet?" and there is bus station noise superimposed in the environment, the perturbation of the child's expression will increase the number of repeated words and pronoun substitutions, and the environmental noise perturbation will increase the low-energy segments and pause intervals. The system will then continuously recalculate the age offset, semantic missing amount and latency increase on the same candidate inference track. The purpose of S4-2 is to screen out constraint-preserving tracks from all candidate inference tracks that maintain age consistency, semantic consistency, and no memory out-of-bounds under all perturbation conditions. Its mechanism is to transform the perturbation-by-perturbation recalculation results into consistency values and cumulative values, and use consistency values as hard deletion conditions. The input is the perturbation evaluation sequence corresponding to each candidate inference track. The processing actions include: for the perturbation evaluation sequence of each candidate inference track, counting the number of occurrences of non-zero age offset to form an age offset consistency value, counting the number of occurrences of non-zero semantic missing amount to form a semantic missing consistency value, and counting the number of occurrences of non-zero memory out-of-bounds amount to form a memory out-of-bounds consistency value. Next, the latency increase is accumulated in the order of the disturbance to form the cumulative latency increase value, the power consumption offset is accumulated to form the cumulative power consumption offset value, and the context truncation is accumulated to form the cumulative context truncation value. Then, the candidate inference tracks with zero age offset consistency value, zero semantic missing consistency value, and zero memory out-of-bounds consistency value are determined as constraint-preserved tracks, and the corresponding cumulative latency increase value, cumulative power consumption offset value, and cumulative context truncation value are written into the constraint-preserved track record. Then, the remaining candidate inference tracks are deleted from the candidate inference track set. The output is the constraint-preserved track set, which is written to the reserved track buffer for S4-3 to read. When all candidate inference tracks have been deleted, the tracks with zero age offset consistency value and zero semantic missing consistency value in the candidate inference track set are read back, and the track with the highest memory out-of-bounds consistency value is selected and written into the constraint-preserved track set. At the same time, a memory-limited mark is written for subsequent calculation. The purpose of S4-3 is to identify convergent tracks within constrained tracks that have stabilized after repeated perturbations. Its mechanism involves comparing the difference between two adjacent perturbation iterations and using the fact that the difference remains unchanged for two consecutive iterations as a stopping condition, thereby distinguishing between short-term accidental stability and continuous stability. The input is the set of constrained tracks and its corresponding perturbation evaluation sequence. The processing actions include: calculating the time delay growth difference, power consumption offset difference, and context truncation difference between the previous and subsequent perturbations for each constrained track in the order of perturbation. The time delay growth difference is determined by subtracting the time delay growth of the previous perturbation from the time delay growth of the subsequent perturbation; the power consumption offset difference is determined by subtracting the power consumption offset of the previous perturbation from the power consumption offset of the subsequent perturbation; and the context truncation difference is determined by subtracting the context truncation of the previous perturbation from the context truncation of the subsequent perturbation. The latency growth difference, power offset difference, and context truncation difference obtained from two consecutive calculations are compared to the corresponding differences in the previous round. If all three differences are the same, the constraint-reserved track is determined as a convergent track. If any difference is different, the constraint-reserved track is re-injected and recalculated in the order of child expression perturbation, environmental noise perturbation, context growth perturbation, memory decrease perturbation, and power fluctuation perturbation. The stopping condition for repeated calculation is reaching an invariant state or reaching the maximum number of repetition rounds. The maximum number of repetition rounds is determined by the number of node connection edges in the candidate inference track. The output is the convergent track set, which is written to the convergent track buffer for S4-4 to read. When a constraint-reserved track has reached the maximum number of repetition rounds but has not reached an invariant state, the constraint-reserved track is deleted and will no longer participate in subsequent sorting. The purpose of S4-4 is to determine a unique target inference track from all converged tracks. Its mechanism involves comparing the cumulative latency growth, cumulative power offset, and cumulative context truncation value layer by layer in a fixed order, ultimately outputting the first track in the ranking. The input consists of the converged track set and the corresponding cumulative latency growth, cumulative power offset, and cumulative context truncation value for each converged track. The processing steps include: comparing the cumulative latency growth value for each converged track and sorting them in ascending order; if the cumulative latency growth values are the same, comparing the cumulative power offset values and sorting them in ascending order; if the cumulative power offset values are the same, comparing the cumulative context truncation values and sorting them in ascending order; if the cumulative context truncation values are still the same, comparing the model load and retaining them in ascending order; if the model load is still the same, comparing the cache usage and retaining them in ascending order; if the cache usage is still the same, retaining the track written to the converged track cache in the order it was written first. Then, the converged track with the first position in the sorting is taken as the target inference track, and its corresponding model node, quantization node, context node, cache node, and power consumption node are written into the target track cache area; the output is the target inference track and is written into the target track cache area for S5 to read; when there is only one converged track in the converged track set, the converged track is directly written into the target track cache area. Through the above implementation process, the candidate inference track set is further transformed into target inference tracks that maintain consistent age adaptation, semantic load consistency, and memory closure under multiple types of input perturbations and resource perturbations. Subsequently, when the edge computing terminal performs model loading and local dialogue inference, it can directly call the target tracks that have completed dynamic stability verification, without having to temporarily deal with the path swing problem under perturbations during the inference stage. At the same time, this process provides executable guidelines for the generation sources of the five types of perturbations, the perturbation injection order, the formation methods of consistency values and cumulative values, the convergence judgment criteria, the rules for stopping repeated calculations, and the final sorting and parallel resolution rules, thereby eliminating the ambiguities in the determination of target tracks. In practical applications: When a children's learning machine receives a series of follow-up questions like "Why did this dinosaur fly again later? I didn't understand just now" while offline, the system first injects perturbations of child expression, environmental noise, context growth, memory decrease, and power consumption fluctuation into each candidate inference track in sequence, and recalculates the age offset, semantic missing amount, context truncation amount, latency growth amount, memory out-of-bounds amount, and power consumption offset. Then, tracks that show age offset, semantic missing amount, or memory out-of-bounds amount under any perturbation are deleted, and only the constraint-preserving tracks are retained. The constraint-preserving tracks are then subjected to perturbation iterations, and the difference between adjacent rounds is compared to identify convergent tracks that have reached a constant state. Finally, the track ranked first is determined as the target inference track according to the cumulative value of latency growth, cumulative value of power consumption offset, and cumulative value of context truncation, which can be directly called for subsequent local model loading and response generation.
[0021] S5. On the edge computing terminal, perform model loading and local dialogue reasoning according to the target reasoning trajectory, generate the target response result, and output it directly when there is no boundary violation. When there is a boundary violation, replace the model node, quantization node, context node and cache node with adjacent low-occupancy nodes and then re-infer the result, and output the final interaction result. In this embodiment, S5 is used to complete the construction of the target inference environment, the execution of local dialogue inference, and the handling of out-of-bounds fallback on the edge computing terminal according to the target inference track. Its overall purpose is to place the target inference track determined in the previous steps into the actual execution chain on the edge side, and to ensure that the child adaptation results maintain the closure of sentence length, word difficulty, latency, memory, and power consumption during the actual inference process. Its principle flow is as follows: First, the target inference environment is constructed according to the model node, quantization node, context node, and cache node in the target inference track; then, local dialogue inference is performed on the current round of child speech and corresponding text in the target inference environment and the inference result group is generated synchronously; finally, a judgment is performed based on the five types of deviations and out-of-bounds results in the inference result group. If there is no out-of-bounds fallback, the target response result is directly output. If there is an out-of-bounds fallback, the model loading and local dialogue inference are re-executed along the age semantic resource state graph to the adjacent low-occupancy node. This implementation process includes the following steps: The purpose of S5-1 is to form a target inference environment that can directly execute local dialogue inference according to the target inference track. Its working mechanism is to write the model loading, quantization unpacking, context writing, and cache allocation into the edge computing terminal in a fixed order, so that the subsequent local dialogue inference has a definite parameter basis and resource boundaries. The input is the model nodes, quantization nodes, context nodes, and cache nodes in the target inference track, as well as the historical interaction sequence. The processing actions include: first, reading the model number, model block number, block number, and local storage address in the model node; performing weight loading in ascending order of model block number; reading each block from the local storage address and writing it into the weight cache area; loading the next block after the previous block is loaded; then reading the quantization bit width and parameter unpacking mode in the quantization node; restoring the loaded compressed parameters into inference-readable parameter blocks according to the quantization bit width; and writing them into the inference buffer in the order of model blocks. Then, the context capacity in the context node is read, and several rounds of interaction content corresponding to the context capacity are selected from the end of the historical interaction sequence backward. The user input text, system response text, entity referencing results, and intent categories in each round are written to the context cache in chronological order. Next, the cache quota in the cache node is read, and the cache quota is allocated to the inference cache in the order of weight cache, context cache, and intermediate result cache. The minimum reserved capacity of each cache is taken from the terminal-side cache configuration table. The output is the target inference environment and is written to the inference environment cache for S5-2 to read. When the model block reading fails, the corresponding local storage address is read again. If it still fails, a model loading exception flag is written and the current target inference environment construction is stopped. When the historical interaction sequence is insufficient for the context capacity, all existing historical interaction sequences are written and the unused capacity is recorded as zero. The purpose of S5-2 is to generate the target response result in the target inference environment and simultaneously form a group of inference results that can be called for subsequent out-of-bounds judgment. Its mechanism is to simultaneously count the child adaptation deviation and device resource out-of-bounds results during the same round of local dialogue inference, thereby avoiding separate verification after the target response result is output. The input includes the target inference environment, the current round of child speech, the corresponding text, and the age group mapping result in the joint state value. The processing actions include: first, performing local speech recognition on the current round of child speech in the target inference environment to obtain the current round of recognized text, then performing consistency comparison between the current round of recognized text and the corresponding text. If they are consistent, the corresponding text is directly used. If they are inconsistent, the text with more overlapping terms is used as the current round of input text. Then, the current round of input text and the historical interaction content written to the context buffer are sent to the local dialogue inference chain, and local inference is performed according to the model node and quantization node corresponding to the target inference track to generate the target response result. Subsequently, the output sentence length deviation, term difficulty deviation, latency exceedance, memory exceedance, and power consumption exceedance are statistically analyzed. Sentence length deviation is determined by whether the number of valid terms in the target response exceeds the allowed sentence length range for the age node. Term difficulty deviation is determined by whether the term difficulty in the target response exceeds the upper limit of the allowed difficulty for the age node. Latency exceedance is determined by whether the local inference time exceeds the upper limit of the latency corresponding to the target inference track; the upper limit of latency is taken from the cumulative result of model node loading latency and quantization unpacking latency in the target inference track. Memory exceedance is determined by whether the actual memory occupied by local inference exceeds the sum of the memory occupied by the model node and the memory occupied by the cache node. Power consumption exceeding the limit is determined by whether the actual power consumption during this round of inference exceeds the allowable power consumption value of the power node. The actual power consumption is obtained by the difference between the remaining battery power at the beginning of this round and the remaining battery power at the end of this round. The target response result, output sentence length deviation, term difficulty deviation, latency exceeding the limit, memory exceeding the limit, and power consumption exceeding the limit are then combined into an inference result group. The output is the inference result group and is written to the inference result cache for S5-3 to read. When local speech recognition fails, the corresponding text is directly used as the input text for the current round to continue executing local dialogue inference. When the target response result is empty, an empty response flag is written and the output sentence length deviation is recorded as non-zero. The purpose of S5-3 is to complete the final interactive result output or backtrack and re-infer based on the reasoning result group. Its mechanism is to first use the five types of deviations and out-of-bounds results as a unified judgment criterion, and then, when there are non-zero items, select adjacent low-occupancy nodes along the age semantic resource state graph to replace the corresponding nodes in the target reasoning track, thereby compressing resource occupation and re-executing reasoning while maintaining the child-adaptive master constraint. The inputs are the reasoning result group, the target reasoning track, and the age semantic resource state graph. The processing actions include: first reading the output sentence length deviation, term difficulty deviation, latency out-of-bounds, memory out-of-bounds, and power consumption out-of-bounds in the reasoning result group; when all five results are zero, directly output the target response result. Write the result to the interaction result area and output it as the final interaction result. When any non-zero item exists, read back the age semantic resource state graph and identify the nodes that have a direct connection with the current model node, quantization node, context node, and cache node and whose corresponding model load, cache usage, or power consumption is lower than the current node. These nodes are then identified as adjacent low-occupancy nodes. When multiple adjacent low-occupancy nodes coexist, select the node with the highest ranking in ascending order of model load, ascending order of cache usage, and ascending order of power consumption. Then, replace the model node, quantization node, context node, and cache node in the target inference track with the corresponding adjacent low-occupancy nodes, and re-execute S5-1 and S5-2. Then, a new inference result group is read again. When all five results are zero, a new target response result is output. If any non-zero item still exists, the replacement of adjacent low-occupancy nodes continues until all five results are zero or there are no adjacent low-occupancy nodes. When there are no adjacent low-occupancy nodes, the current round of safe response result is written into the interaction result area as the final interaction result. The safe response result is taken from the age group response template library on the terminal side and the corresponding template is called according to the current age group mapping result. The output is the final interaction result and is written into the interaction result area for external broadcast or display modules to read. Taking the offline children's learning tablet storytelling scenario as an example, when the sentence length of the target response result generated by the initial inference of the target inference track exceeds the allowed sentence length range for the young age group and a time delay exceeds the limit at the same time, the system replaces the current model node with a lower model loading node, replaces the context node with a lower context capacity node, and replaces the cache node with a lower cache occupancy node along the age semantic resource state graph, and then reloads and infers until a final interaction result that meets the sentence length requirements for the young age group and does not exceed the time delay limit is generated. Through the above implementation process, the target inference track is truly implemented as an executable inference chain on the edge computing terminal. By synchronously statistically analyzing the five types of deviations and out-of-bounds results, the child adaptation requirements and edge resource constraints are closed within the same result group. At the same time, this process supplements the model block order, quantization parameter expansion, context writing rules, cache allocation rules, the judgment criteria for the five types of deviations and out-of-bounds results, the rules for determining adjacent low-occupancy nodes, and the rules for ending re-inference, thereby eliminating ambiguities in actual implementation. In practical applications: When a child companion robot receives the input "Continue telling the story of the little bear going home" while offline, the system first loads the local model blocks according to the target inference track, expands the quantization parameters, writes the most recent rounds of story interaction content, and allocates the inference cache to form the target inference environment; then, it performs local dialogue inference in the target inference environment, generates the target response result, and simultaneously outputs the sentence length deviation, term difficulty deviation, latency out-of-bounds, memory out-of-bounds, and power consumption out-of-bounds. If all five results are zero, the target response result is directly broadcast. If memory out-of-bounds or sentence length deviation occurs, the target inference environment is reconstructed and inference is re-inferred along the age semantic resource state graph after replacing it with an adjacent low-occupancy node, until the final interaction result that meets the requirements of the current age group and adapts to the resource boundaries of the edge computing terminal is output.
[0022] The working principle of this solution is as follows: First, the edge computing terminal obtains the current round of child's voice, corresponding text, user age information, historical interaction sequence, remaining memory, processor usage, remaining battery power, and network connectivity status. Semantic regularization is performed on the child's voice and corresponding text to extract sentence length, word difficulty, intent category, emotion category, and context depth, forming a joint state value. Then, based on the joint state value, an age semantic resource state graph is constructed between age nodes, semantic nodes, context nodes, model nodes, quantization nodes, cache nodes, and power consumption nodes. Multiple candidate inference tracks are searched, and child expression perturbation, environmental noise perturbation, context growth perturbation, memory decrease perturbation, and power consumption fluctuation perturbation are injected into each candidate inference track to screen out the target inference track that still maintains age adaptation consistency and does not exceed resource limits under perturbation. Finally, the edge computing terminal completes model loading, local dialogue inference, and result verification according to the target inference track. When the output sentence length, word difficulty, latency, memory, and power consumption are all within limits, the response is directly output. When there are exceedances, the system automatically switches to an adjacent low-occupancy node for re-inference, thus forming a complete process between child adaptation requirements and edge resource constraints. For example, in an offline children's companion robot, if a child says, "Continue telling the story of the little bear going home, why did it cry later?", the terminal first analyzes this sentence along with the previous rounds of story content to determine the current user's age group, the difficulty of the sentence, the type of intent, the emotional tendency, and how much context needs to be invoked. Then, considering the device's current memory, battery level, and processor load, it selects a reasoning track from multiple local models, quantization methods, context capacity, and cache allocation schemes that can both tell the story and run smoothly under the current device conditions. If the reasoning process finds that the response is too long, the vocabulary is too difficult, the time consumption is too long, or the memory usage is too high, the system will not fail directly, but will automatically switch to an adjacent track with lower memory usage to regenerate the response, until a final dialogue result that is both age-appropriate and suitable for the current device's resource conditions is output.
[0023] The above description is merely a preferred embodiment of the present invention and is not intended to limit the present invention. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the protection scope of the present invention.
Claims
1. A method for adapting and optimizing edge-side AI dialogue interaction for children's smart hardware, characterized in that, include: S1. Obtain the current round of child speech, corresponding text, user age information, historical interaction sequence, remaining memory, processor usage, remaining battery power and network connectivity status on the edge computing terminal. Perform child semantic regularization on the current round of child speech and corresponding text, and extract sentence length, word difficulty, intent category, emotion category and context depth to form a joint state value. S2. Read age nodes, semantic nodes, context nodes, model nodes, quantization nodes, cache nodes, and power consumption nodes. Determine candidate transition edges based on age consistency, term difficulty matching, context capacity, remaining memory after deduction not less than zero, processor available load to meet execution, remaining power to meet running, and only retaining local model connections when disconnected from the network. Output the age semantic resource state graph. S3. Search for candidate links to the response node in the age semantic resource state graph, and count the age offset, semantic missing amount, context truncation amount, model loading amount, cache usage amount and power consumption amount, and output the candidate inference track set; S4. Inject child expression perturbation, environmental noise perturbation, context growth perturbation, memory decrease perturbation and power consumption fluctuation perturbation into each candidate inference track. Retain tracks with zero age offset, semantic missing amount and memory out-of-bounds amount, and sort them in ascending order by latency growth amount, power consumption offset amount and context truncation amount. Take the first track as the target inference track.
2. The method for adapting and optimizing edge AI dialogue interaction for children's smart hardware according to claim 1, characterized in that: Also includes: S5. On the edge computing terminal, perform model loading and local dialogue reasoning according to the target reasoning trajectory, generate the target response result, and output it directly when there is no boundary violation. When there is a boundary violation, replace the model node, quantization node, context node and cache node with adjacent low-occupancy nodes and then re-infer the result, and output the final interaction result.
3. The method for edge AI dialogue interaction adaptation and inference optimization for children's smart hardware according to claim 2, characterized in that: S1 includes: S1-1. Perform syllable segmentation and pause localization on the current round of children's speech, extract the duration, segment interval, stress position and intonation fluctuation of each speech segment, and perform alignment segmentation on the corresponding text according to the pause position, and output the speech-text aligned segment set; S1-2. Calculate the word occurrence order, repetition count, reference jump count, emotional word proportion and question-answering position for each segment of the speech-text aligned segment set, and generate segment feature sequences corresponding to sentence length, word difficulty, intent category, emotion category and context depth according to the segment order. S1-3. Write the user's age information, historical interaction sequence, remaining memory, processor usage, remaining battery power, and network connectivity status into the segment feature sequence, perform cumulative merging according to the order of the segments, and output the joint state value.
4. The method for adapting and optimizing edge AI dialogue interaction for children's smart hardware according to claim 3, characterized in that: S2 includes: S2-1. Perform combination expansion on age nodes, semantic nodes, context nodes, model nodes, quantization nodes, cache nodes, and power consumption nodes to generate node combination groups. For each node combination group, calculate the age difference, term difficulty difference, context margin, memory margin, processor margin, and battery life margin. Determine the node combination group with zero age difference, term difficulty difference not greater than zero, context margin not less than zero, memory margin not less than zero, processor margin not less than zero, battery life margin not less than zero, and whose model node belongs to the local model when the network connectivity status is disconnected as the initial feasible edge set. S2-2. Calculate the semantic retention cost, loading latency cost, cache occupancy cost, and power consumption cost for each edge in the initial feasible edge set. Then, sum the semantic retention cost, loading latency cost, cache occupancy cost, and power consumption cost to form the total edge cost. Sort each edge in ascending order of total edge cost, descending order of context space, and descending order of memory space. Perform a bounded search from the age node to the model node and output the candidate edge chain set.
5. The method for edge AI dialogue interaction adaptation and inference optimization for children's smart hardware according to claim 4, characterized in that: S2 also includes: S2-3. Inject the term growth, context growth, memory decrease and processor fluctuation into each candidate edge chain in the candidate edge chain set. Recalculate the changes in semantic retention cost, loading latency cost, cache crowding cost and power consumption cost before and after injection. Determine the candidate edge chain whose cost changes are not greater than the corresponding margin as the stable edge chain set. Write each edge in the stable edge chain into the corresponding node pair to form an undetermined connection relationship. S2-4. For multiple edges pointing to the same model node or the same cache node in a given connection relationship, compare the total edge cost, semantic retention cost, and loading delay cost respectively. Keep the edge with the highest total edge cost. If the total edge cost is the same, keep the edge with the highest semantic retention cost. If the semantic retention cost is the same, keep the edge with the highest loading delay cost. Delete the remaining edges and output the age semantic resource state graph.
6. The method for edge AI dialogue interaction adaptation and inference optimization for children's smart hardware according to claim 5, characterized in that: S3 includes: S3-1. Read the starting node, response node and the connecting edges of each node in the age semantic resource state graph, perform layer-by-layer expansion along the connecting edges of each node, generate multiple candidate links from the starting node to the response node, and record the value sequence of the age node, semantic node, context node, model node, quantization node, cache node and power consumption node in the edge order position for each candidate link, and output the candidate link sequence set. S3-2. For each candidate link in the candidate link sequence set, calculate the age offset between the age node and the age segment mapping result in the joint state value, the semantic missing amount between the semantic node and the term difficulty and intent category in the joint state value, the context truncation amount between the context node and the context depth in the joint state value, the model loading amount corresponding to the model node, the cache usage amount corresponding to the cache node, and the power consumption amount corresponding to the power consumption node. Then, perform cumulative summation according to the edge order position and output the link evaluation group corresponding to each candidate link. S3-3. For each link evaluation group, compare the age offset, semantic missing amount, context truncation amount, model loading amount, cache usage and power consumption. Retain candidate links with zero age offset, zero semantic missing amount, and cache usage and power consumption not exceeding the remaining memory and battery power constraints corresponding to the joint state value, and write the retained candidate links into the candidate inference track set.
7. The method for edge AI dialogue interaction adaptation and inference optimization for children's smart hardware according to claim 6, characterized in that: S4 includes: S4-1. For each candidate inference track in the candidate inference track set, inject child expression perturbation, environmental noise perturbation, context growth perturbation, memory decrease perturbation and power consumption fluctuation perturbation respectively. Recalculate the age offset, semantic missing amount, context truncation amount, latency growth amount, memory out-of-bounds amount and power consumption offset for each perturbation, and write them into the perturbation evaluation sequence according to the perturbation type. S4-2. For each candidate inference track's perturbation evaluation sequence, calculate the age offset consistency value, semantic missing consistency value, memory out-of-bounds consistency value, cumulative latency growth value, cumulative power consumption offset value, and cumulative context truncation value. Identify the candidate inference tracks with zero age offset consistency value, zero semantic missing consistency value, and zero memory out-of-bounds consistency value as constraint-preserving tracks, and delete the remaining candidate inference tracks from the candidate inference track set.
8. The method for edge AI dialogue interaction adaptation and inference optimization for children's smart hardware according to claim 7, characterized in that: S4 also includes: S4-3. For each constrained retention track, calculate the time delay growth difference, power consumption offset difference, and context truncation difference between the previous and subsequent perturbations in the order of perturbation. The constrained retention track whose time delay growth difference, power consumption offset difference, and context truncation difference remain unchanged in two consecutive calculations is determined as the convergent track. The constrained retention track that has not reached the unchanged state continues to be injected with five types of perturbations and the calculation is repeated until it reaches the unchanged state. S4-4. For each convergence track, compare the cumulative value of delay growth, cumulative value of power consumption offset, and cumulative value of context truncation. Sort them in ascending order by the cumulative value of delay growth. If the cumulative values of delay growth are the same, sort them in ascending order by the cumulative value of power consumption offset. If the cumulative values of power consumption offset are the same, sort them in ascending order by the cumulative value of context truncation. Take the convergence track with the first position in the sorted order as the target inference track.
9. The method for edge AI dialogue interaction adaptation and inference optimization for children's smart hardware according to claim 8, characterized in that: S5 includes: S5-1: Read the model nodes, quantization nodes, context nodes, and cache nodes in the target inference track. On the edge computing terminal, perform weight loading according to the model block order corresponding to the model node, perform parameter expansion according to the quantization format corresponding to the quantization node, write the historical interaction sequence according to the context capacity corresponding to the context node, allocate the inference cache area according to the cache quota corresponding to the cache node, and output the target inference environment.
10. The method for edge AI dialogue interaction adaptation and inference optimization for children's smart hardware according to claim 9, characterized in that: S5 also includes: S5-2. Perform local dialogue reasoning on the current round of child speech and corresponding text in the target reasoning environment, generate target response results, and simultaneously output sentence length deviation, word difficulty deviation, latency out-of-bounds, memory out-of-bounds and power consumption out-of-bounds, and output reasoning result group; S5-3. Perform an out-of-bounds judgment on the reasoning result group. When the output sentence length deviation, term difficulty deviation, latency out-of-bounds, memory out-of-bounds, and power consumption out-of-bounds are all zero, output the target response result as the final interaction result. When any non-zero item exists, replace the model node, quantization node, context node, and cache node in the target reasoning track with the corresponding adjacent low-occupancy node, and then re-execute the model loading and local dialogue reasoning to output the final interaction result.