A method and device for SIP media codec intelligent negotiation and cross-terminal seamless continuation

By employing a three-dimensional capability set cross-computation and state awareness approach, the traditional SIP engine addresses issues such as encoding/decoding degradation, video downgrade, poor IPv4/IPv6 compatibility, and non-uniqueness of WebRTC media streams in multi-terminal environments, achieving seamless media continuity across terminals and high-quality calls.

CN122247972APending Publication Date: 2026-06-19XIAMEN XINGZONG DIGITAL TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
XIAMEN XINGZONG DIGITAL TECH CO LTD
Filing Date
2026-04-13
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

In modern enterprise unified communications and multi-terminal hybrid deployment environments, traditional SIP engines suffer from encoding/decoding degradation, video degradation during call handover, poor IPv4/IPv6 compatibility, non-uniqueness of WebRTC media streams, and NAT traversal issues in encoding/decoding negotiation and media stream management, resulting in decreased call quality and poor user experience.

Method used

By employing a three-dimensional capability set cross-operation and state awareness method, and maintaining the encoding and decoding capability sets of endpoints, peers, and channels, multi-priority encoding and decoding selection is achieved. This protects the profile-level-id of H.264 video encoding and decoding, performs IPv6 address family detection, detects SSRC conflicts in WebRTC Bundle groups, and performs NAT traversal in FQDN domain name calls to ensure seamless continuation of media streams.

🎯Benefits of technology

It achieves seamless media continuity across terminals, avoids codec degradation, maintains video quality, improves IPv4/IPv6 compatibility, eliminates WebRTC media stream decoding chaos and video black screen issues, and ensures call quality and user experience continuity.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122247972A_ABST
    Figure CN122247972A_ABST
Patent Text Reader

Abstract

This invention discloses a method and apparatus for intelligent negotiation and seamless cross-terminal continuation of SIP media codecs. In this method, a three-dimensional capability set is maintained for each SIP session, distinguishing between initial negotiation and reINVITE, prioritizing the preservation of the current codec during reINVITE; a multi-priority selector performs four-level decision-making and two-dimensional verification; during call handover, the original context is cloned and the H.264 profile-level-id is protected; iLBC frames undergo RFC compliance negotiation by long-pressing; dual-stack adaptive creation of RTP instances; WebRTC Bundle group SSRC conflict detection and regeneration; and CNG packet hole punching penetrates NAT during FQDN calls. This invention eliminates reINVITE codec degradation through cross-operation of the three-dimensional capability set and state awareness, achieving seamless cross-terminal media continuation and improving IPv4 / IPv6 compatibility and the uniqueness of WebRTC media streams.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of Internet Protocol Communication (VoIP) technology, and in particular to a method and apparatus for intelligent negotiation and seamless cross-terminal continuation of SIP media codec. Background Technology

[0002] In the field of IP multimedia communication, the SIP protocol is the core signaling protocol of VoIP systems. The SDP (Session Description Protocol) Offer / Answer model is the standard mechanism for SIP media negotiation, used by communicating parties to exchange parameters such as encoding / decoding capabilities, transmission addresses, and media formats. As enterprise unified communications evolves towards multi-terminal, multi-network, and multi-vendor hybrid deployments, the intelligence and continuity of SDP negotiation directly determine call quality and user experience.

[0003] In modern enterprise unified communications and multi-terminal hybrid deployment environments, IPPBX systems face the following core pain points: First, traditional SIP engines only perform two-dimensional cross-operations of endpoint configuration encoding / decoding and peer-provided encoding / decoding. When a reINVITE request arrives (such as call hold resumption, transfer, or conference joining), it may select an encoding / decoding different from the currently active media stream, leading to repeated creation and destruction of transcoders and a sharp drop in audio and video quality. Second, during multi-terminal call switching, traditional engines completely discard the original call encoding / decoding context. In particular, for H.264 video calls, a single reINVITE of the profile-level-id parameter can cause the resolution to degrade from high definition to low resolution. Third, in IPv4 / IPv6 dual-stack environments, traditional engines rely on static endpoint configuration to determine the RTP instance address family, which cannot adapt to the actual address types carried in SDP, leading to media stream establishment failures. Fourth, in WebRTC multimedia stream bundle mode, traditional engines do not perform conflict detection on the SSRC within the group. If two streams have the same SSRC, the receiving end cannot distinguish between audio and video packets, resulting in decoding chaos. Fifth, when calling with an FQDN domain name, traditional solutions do not actively establish NAT mapping, resulting in the loss of key video frames and a prolonged black screen for the calling end.

[0004] Therefore, how to achieve seamless media continuity across terminals and improve IPv4 / IPv6 compatibility and WebRTC media stream uniqueness is a problem that needs to be solved by those skilled in the art. Summary of the Invention

[0005] This invention provides a method and apparatus for intelligent negotiation and seamless cross-terminal continuation of SIP media codecs, which can solve the problems of reINVITE codec degradation, video degradation during call switching, dual-stack adaptation failure, SSRC conflict and NAT traversal in the prior art.

[0006] The first aspect of this invention provides a method for intelligent negotiation and seamless cross-terminal continuation of SIP media codecs, comprising:

[0007] For each SIP session, maintain an endpoint configuration capability set, a peer format set, and a channel local format set. Check if the channel local format set is empty to distinguish between the initial INVITE negotiation mode and the reINVITE negotiation mode. In the reINVITE mode, take the intersection of the channel local format set and the peer format set to generate a session-level union set. If the session-level union set is not empty, prioritize using the session-level union set. If the session-level union set is empty, degrade to the endpoint-level union set generated by the intersection of the endpoint configuration capability set and the peer format set. Implement a multi-priority optimal codec selector, perform a four-level priority concatenation decision on codec entries in the SDP media description line, and perform a two-dimensional payload type verification for each candidate codec to determine the validity of the codec; When a SIP session object carries a call switching identifier, the original channel local format set is extracted and cloned from the original call media channel associated with the call switching identifier as a negotiation benchmark. Priority is given to maintaining the same codec combination as the original call, and a profile-level-id protection mechanism is implemented for H.264 video codec. The iLBC codec performs frame length mode negotiation. When the peer SDP's fmtp attribute does not carry the mode parameter, the default frame length is 30 milliseconds. When the peer carries mode as 20 and the endpoint is configured with a frame length of 20 milliseconds, the codec accepts the mode. Otherwise, it falls back to a frame length of 30 milliseconds and injects the determined frame length mode into the attribute of the codec format object. During the inbound SDP negotiation phase, the address field in the SDP connection information line is parsed to identify the IPv6 address. During the outbound SDP generation phase, a three-level concatenated address family probe is performed to determine the address family of the RTP instance. When creating a WebRTC Bundle media stream, it iterates through all created media streams in the same group in the current SDP, compares the SSRC values ​​of each stream, and if a conflict is detected, it regenerates the SSRC and resets the traversal index to the starting position to re-detect until the SSRC of all media streams in the group is globally unique. When a call is relayed via an FQDN domain name resolution server and the called party returns a 200 OK response, comfort noise data packets are sent synchronously on the audio and video RTP ports. These packets are then repeatedly sent at preset intervals via the scheduler. Once the remote media stream has been received, the timed sending is automatically stopped and the scheduler resources are released.

[0008] Optionally, the endpoint configuration capability set, peer format set, and channel local format set adopt a dual-index architecture, which includes a direct-address array with codec identifiers as keys and a preference sequence vector arranged in insertion order.

[0009] Optionally, the four-level priority cascade decision includes: The first priority is to find codecs that are compatible with both the channel's local format set and the peer's format set; The second priority is to select the first supported video codec on the called end; The third priority selects the first audio codec compatible with the local format set of the channel on the calling end; The fourth priority selects the first valid codec as a fallback.

[0010] Optionally, a two-dimensional validation of the payload type is performed for each candidate codec, including: Positive verification uses the rtpmap attribute to find the codec format name corresponding to the payload type. Reverse verification involves looking up the expected payload type number using the encoding / decoding format name. The encoding / decoding is considered valid when the results of forward and reverse verification match.

[0011] Optionally, a profile-level-id protection mechanism is implemented for H.264 video encoding and decoding, including: Extract the profile-level-id parameter provided by the peer from the SDP fmtp attribute, and perform a numerical comparison with the profile-level-id of the active H.264 instance in the current channel local format set; If the comparison results are inconsistent, the codec is marked as incompatible and forced to skip. During the call switching path, an instance that completely matches the original call profile-level-id is searched from the H.264 candidates of the peer SDP, and the currently selected codec is replaced with the matching instance through the codec replacement mechanism.

[0012] Optionally, a three-level cascaded address family detection is performed, including: The first level dynamically detects whether the transport layer address field of the session contact object contains IPv6 characteristics; The second level degrades to the IPv6 flag of the contact URI when the transport layer address is empty; The third level ultimately falls back to the endpoint static configuration when the contact object is unavailable.

[0013] Optionally, the scheduler may repeatedly transmit at preset intervals, including: Comfort noise packets are repeatedly sent at 200-millisecond intervals via the scheduler; Before each transmission, check the most recent transmission timestamp of the RTP transport instance; When the most recently sent timestamp is non-zero, it is determined that the remote media stream has been received, and the timed sending is automatically stopped and the scheduler resources are released.

[0014] A second aspect of the present invention provides a device for intelligent negotiation and seamless cross-terminal continuation of SIP media codec, comprising: The capability set maintenance unit is used to maintain the configuration of the capability set, peer format set, and channel local format set for each SIP session endpoint. The negotiation mode differentiation unit is used to detect whether the channel local format set is empty in order to distinguish between the first INVITE negotiation mode and the reINVITE negotiation mode. In the reINVITE mode, the channel local format set and the peer format set are intersected to generate a session-level union set. When the session-level union set is not empty, the session-level union set is used first. When the session-level union set is empty, it is downgraded to the endpoint-level union set generated by intersecting the endpoint configuration capability set and the peer format set. The codec selection unit is used to implement a multi-priority optimal codec selector, which performs a four-level priority concatenation decision on the codec entries in the SDP media description line, and performs a two-dimensional payload type verification for each candidate codec to determine the validity of the codec. The call handover processing unit is used to extract and clone the original channel local format set as the negotiation benchmark from the original call media channel associated with the call handover identifier when the SIP session object carries a call handover identifier, prioritize maintaining the codec combination consistent with the original call, and perform profile-level-id protection mechanism on H.264 video codec. The iLBC negotiation unit is used to negotiate the frame length mode for iLBC encoding and decoding. When the mode parameter is not carried in the fmtp attribute of the peer SDP, the default frame length is 30 milliseconds. When the peer carries mode as 20 and the endpoint is configured with a frame length of 20 milliseconds, the frame length is accepted; otherwise, it falls back to a frame length of 30 milliseconds and injects the determined frame length mode into the attribute of the encoding and decoding format object. The address family detection unit is used to parse the address field in the SDP connection information line to identify the IPv6 address during the inbound SDP negotiation phase, and to perform three-level concatenated address family detection during the outbound SDP generation phase to determine the address family of the RTP instance. The SSRC conflict detection unit is used to traverse all created media streams in the same group in the current SDP when creating a WebRTC Bundle group media stream, compare the SSRC values ​​of each stream, and regenerate the SSRC when a conflict is detected, and reset the traversal index to the starting position to re-detect until the SSRC of all media streams in the group is globally unique. The NAT traversal unit is used to synchronously send comfort noise data packets on the audio and video RTP ports when a call is relayed through an FQDN domain name resolution server and the called party returns a 200 OK response. The packets are repeatedly sent at preset intervals by the scheduler. When the remote media stream has been received, the timed sending is automatically stopped and the scheduler resources are released.

[0015] A third aspect of the present invention provides a device for intelligent negotiation and seamless cross-terminal continuation of SIP media codec, comprising: One or more processors; A memory on which one or more programs are stored; When the one or more programs are executed by the one or more processors, the one or more processors implement the SIP media codec intelligent negotiation and cross-terminal seamless continuation method as described in any of the above.

[0016] A fourth aspect of the present invention provides a computer storage medium for storing a program, which, when executed, is used to implement the SIP media codec intelligent negotiation and cross-terminal seamless continuation method as described in any of the preceding claims.

[0017] Beneficial effects: This invention ensures that reINVITE maintains the current codec through cross-operation of three-dimensional capability sets and state awareness, avoiding transcoder reconstruction; clones the codec context and protects the profile-level-id during call switching, achieving seamless 1080p image quality continuation across terminals; dual-stack adaptive creation of RTP instances eliminates media establishment failures caused by IPv4 / IPv6 mismatch; SSRC conflict detection and regeneration ensure the uniqueness of WebRTC multiplexing; CNG packets actively punch holes in NAT, eliminating black screens in video. Attached Figure Description

[0018] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments recorded in the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0019] Figure 1 This is a flowchart illustrating a method for intelligent negotiation and seamless cross-terminal continuation of SIP media codecs provided in an embodiment of the present invention. Figure 2 This is a schematic diagram of the structure of a SIP media codec intelligent negotiation and cross-terminal seamless continuation device provided in an embodiment of the present invention; Figure 3 This is a schematic diagram of the structure of a device provided in an embodiment of the present invention. Detailed Implementation

[0020] This invention provides a method and apparatus for intelligent negotiation and seamless cross-terminal continuation of SIP media codecs. It achieves a complete media negotiation closed loop within a self-developed IPPBX core communication routing engine, encompassing intelligent codec determination, call handover format preservation, dual-stack adaptation, and Bundle conflict resolution. This method is a SIP Session Description Protocol (SDP) media codec negotiation approach for a self-developed IPPBX core communication routing engine—specifically, how to intelligently determine the optimal codec combination and ensure media continuity during cross-terminal call handovers based on endpoint capabilities, peer provision, current session state, and service scenario during SIP Offer / Answer interactions.

[0021] See Figure 1 This figure is a flowchart illustrating a method for intelligent negotiation and seamless cross-terminal continuation of SIP media codecs provided in an embodiment of the present invention. The method for intelligent negotiation and seamless cross-terminal continuation of SIP media codecs provided in this embodiment of the present invention can be implemented, for example, through the following steps S101-107.

[0022] S101: Maintain endpoint configuration capability set, peer format set, and channel local format set for each SIP session. Detect whether the channel local format set is empty to distinguish between the initial INVITE negotiation mode and the reINVITE negotiation mode. In reINVITE mode, take the intersection of the channel local format set and the peer format set to generate a session-level union set. If the session-level union set is not empty, prioritize using the session-level union set. If the session-level union set is empty, degrade to the endpoint-level union set generated by taking the intersection of the endpoint configuration capability set and the peer format set.

[0023] In this embodiment of the invention, the endpoint configuration capability set, the peer format set, and the channel local format set adopt a dual-index architecture, which includes a direct addressing array with codec identifiers as keys and a preference sequence vector arranged in insertion order.

[0024] Specifically, a multi-stage codec decision engine based on three-dimensional capability set cross-operation and reINVITE state awareness is used. Three independent codec capability set containers are maintained for each SIP session: an endpoint configuration capability set (containing all available codecs configured by the administrator for that endpoint), a peer format set (peer-supported codecs parsed from SDP Offer / Answer), and a channel local format set. Each capability set container adopts a dual-index architecture—the first index is a direct-addressed array with codec identifiers as keys, and each slot carries a singly linked list of different parameter variants of the same codec, supporting O(1) codec existence determination; the second index is a preference sequence vector arranged in insertion order, supporting priority traversal. The container itself is managed through reference counting, and the caller protects concurrent access through channel-level mutex locks. The currently active media channel has negotiated and determined codecs.

[0025] When an SDP negotiation request arrives, the codec capability set determination engine first checks whether the local format set on the media channel object is empty, thereby accurately distinguishing between the two negotiation modes: In the initial INVITE negotiation mode (when the local format set is empty): the engine performs a two-dimensional cross operation, taking the intersection of the endpoint configuration capability set and the peer format set to generate an "endpoint-level union" (endpoint_joint), from which the available encoding and decoding are determined.

[0026] reINVITE negotiation mode (local format set not empty): The engine performs a three-dimensional cross operation (the three-dimensional cross operation is performed in the main negotiation path, producing a "pool of available codecs"). First, it takes the intersection of the channel's local format set and the peer's format set to generate a "session-level union". If the session-level union is not empty, this set is used first to ensure that the reINVITE negotiation keeps the currently active codecs unchanged as much as possible; if the session-level union is empty (the current codec has been removed by the peer), it automatically degrades to the endpoint-level union.

[0027] For video media streams, the engine performs additional intelligent selection, comparing the number of codecs in the session-level union with those in the endpoint-level union. When the endpoint-level union contains more codec options (meaning the other end has added video codec support), the engine switches to using the endpoint-level union to obtain a richer selection of video codecs.

[0028] After the ruling is completed, the engine performs differentiated read and write format allocation based on the audio / video type and the calling / receiving direction. In the audio stream, the calling party's write format is selected from the preferred codec of the union set, and the read format is selected from the best match of the current read format of the channel; the called party's is the opposite. This differentiated allocation between the calling and receiving parties ensures the consistency of the codec direction between the two bridging parties.

[0029] In one implementation of this invention, codec preference learning and prediction are based on historical negotiation records. A codec negotiation history statistics engine is introduced to maintain a sliding window of codec selection frequency statistics for each pair of communication endpoints. When a new SDP negotiation arrives, if multiple candidate codecs have similar scores, the codec with the highest historical success rate is prioritized. This mechanism is particularly suitable for optimizing codec selection between high-frequency call pairs within an enterprise, reducing the number of negotiation probes and lowering the initial media establishment delay.

[0030] In one implementation of this invention, dynamic codec switching is based on network quality awareness. Network quality is continuously monitored using metrics such as packet loss rate, jitter, and round-trip latency from the RTCP receive report. When network quality deteriorates beyond a threshold, the engine proactively initiates a reINVITE to downgrade the codec from high bandwidth (e.g., G.711) to low bandwidth (e.g., G.729 / Opus); when the network recovers, it automatically upgrades back to high bandwidth codec. This three-dimensional negotiation framework inherently supports this type of dynamic switching, and the session-level union ensures that the switched codec remains within the peer's support range, eliminating the need for full renegotiation.

[0031] S102: Implements a multi-priority optimal codec selector, performs a four-level priority cascading decision on codec entries in the SDP media description line, and performs a two-dimensional payload type verification for each candidate codec to determine the validity of the codec.

[0032] In this embodiment of the invention, the first priority is to find a codec that is compatible with both the local format set of the channel and the format set of the peer; the second priority is to select the first supported video codec of the called end; the third priority is to select the first audio codec of the calling end that is compatible with the local format set of the channel; and the fourth priority is to select the first valid codec as a fallback. Forward verification uses the rtpmap attribute to find the codec format name corresponding to the payload type; reverse verification uses the codec format name to look up the expected payload type number; when the matching results of forward and reverse verification are consistent, the codec is considered valid.

[0033] Specifically, an optimal codec decision algorithm based on multi-priority cascaded selection and dual-dimensional verification of payload type is proposed. A multi-priority optimal codec selector is implemented, performing a four-level priority cascaded decision for each codec entry in the SDP media description line. The multi-priority cascaded selection is performed within the SDP response processing path, searching the entire set of peer SDP formats. The first priority is a dual-hit condition: "simultaneously existing in both the channel's local format set and the peer's format set," not searching within the joint set.

[0034] First priority (reINVITE best hold): Traverse the codec list in the peer's SDP and find the first codec that is compatible with both the channel's local format set and the peer's format set. This codec represents "peer preferred and currently in use in the call," and a short-circuit return is made upon finding it.

[0035] Second priority (video called party priority): For video media streams, select the first supported codec on the called party to ensure that the called party's codec preferences are respected during video calls.

[0036] Third priority (Audio caller priority): For audio media streams, select the first codec compatible with the local format set on the calling end.

[0037] Fourth priority (fallback selection): If none of the above three levels are met, the first valid codec is selected as the final fallback.

[0038] The selector performs two-dimensional payload type verification for each candidate codec: forward verification uses the rtpmap attribute to find the codec format name corresponding to the payload type; reverse verification uses the codec format name to find the expected payload type number. The codec is considered valid only when both sides match, preventing malicious or misconfigured SDP from mapping incorrect formats to standard payload types.

[0039] S103: When the SIP session object carries a call switching identifier, the original channel local format set is extracted and cloned from the original call media channel associated with the call switching identifier as the negotiation benchmark. Priority is given to maintaining the same codec combination as the original call, and the profile-level-id protection mechanism is implemented for H.264 video codec.

[0040] In this embodiment of the invention, the profile-level-id parameter provided by the peer is extracted from the SDP fmtp attribute, and a numerical comparison is performed with the profile-level-id of the active H.264 instances in the current channel local format set. When the comparison results are inconsistent, the codec is marked as incompatible and forcibly skipped. In the call handover path, an instance that completely matches the original call profile-level-id is searched from the peer SDP's H.264 candidates, and the currently selected codec is replaced with the matching instance through the codec replacement mechanism.

[0041] Specifically, this involves a call handover scenario-aware codec context preservation and H.264 profile-level-id protection mechanism. When a SIP session object carries a call handover identifier, the codec decision engine initiates a special context preservation process: extracting and cloning the original local format set from the original call media channel associated with the call handover identifier, and using it as the benchmark for three-dimensional cross-operation. Even if the target terminal's codec capability set differs from the original terminal's, the negotiation result still prioritizes maintaining the codec combination consistent with the original call.

[0042] For H.264 video codecs, the engine implements a profile-level-id protection mechanism: In reINVITE scenarios, the H.264 profile-level-id parameter provided by the peer is extracted from the SDP fmtp attributes and compared precisely with the profile-level-id of the active H.264 instances in the current local format set. If the two are inconsistent (meaning the peer is attempting to change the resolution level), the codec is marked as incompatible in the selection bitmap and forcibly skipped, preventing unexpected video resolution degradation due to reINVITE. The profile-level-id hexadecimal value is parsed into an integer, shifted right by 16 bits, and the high 8 bits of profile_idc (Baseline=0x42 / Main=0x4D / High=0x64) are used for unsigned integer equality checks. This naturally avoids issues related to case sensitivity, spaces, leading zeros, and other string format differences. Furthermore, this is an intentionally lenient strategy—only the profile category is checked, allowing fine-tuning of different levels within the same profile, preventing catastrophic profile degradation while preserving flexible negotiation space at the level level.

[0043] In the dedicated path for call handover, the engine searches for an instance from all H.264 candidates in the peer SDP that perfectly matches the original call's profile-level-id. If a match is found, the currently selected codec is replaced with the matching instance through a codec replacement mechanism, ensuring seamless continuation of video resolution across terminals. When none of the target terminal's H.264 candidates match the original call's profile-level-id, a three-level graceful degradation chain is adopted: First, skip all H.264 entries and continue traversing non-H.264 video codecs (VP8 / VP9, etc.) in the peer SDP, attempting to replace the video codec. Second, if there are no compatible non-H.264 entries, and the video cross-result of the session-level union and endpoint-level union in the main negotiation path is empty, the engine clears the video capability, and this reINVITE abandons the video stream. Third, audio is unaffected, and the call continues in pure audio mode.

[0044] In one implementation of this invention, H.264 profile-level-id adaptive selection is based on AI video quality assessment. A real-time video quality assessment module is introduced to analyze the PSNR / SSIM quality indicators of the current video frame and the actual rendering capabilities of the terminal. In call handover scenarios, the optimal profile-level-id within the actual capabilities of the target terminal is intelligently selected. If the target terminal's screen resolution is lower than the original call resolution, it is proactively downgraded to reduce bandwidth consumption; if the target terminal has stronger capabilities, it attempts to upgrade to improve image quality.

[0045] S104: Perform frame length mode negotiation for iLBC encoding and decoding. When the mode parameter is not carried in the fmtp attribute of the peer SDP, the default frame length is 30 milliseconds. When the peer carries mode as 20 and the endpoint is configured with a frame length of 20 milliseconds, accept it; otherwise, back off to a frame length of 30 milliseconds and inject the determined frame length mode into the attribute of the encoding and decoding format object.

[0046] In this embodiment of the invention, iLBC frame length mode RFC compliance negotiation and cross-channel mode propagation are implemented. Strict RFC 3952 compliance negotiation is achieved for iLBC encoding and decoding. When the peer SDP's fmtp attribute does not carry a mode parameter, the engine follows the RFC specification and defaults to mode=30 (30 millisecond frame length), instead of silently using the endpoint's local configuration value. When the peer carries mode=20, the engine only accepts it if the endpoint is also configured with mode=20; otherwise, it falls back to mode=30.

[0047] After the mode determination is completed, the engine injects the determined iLBC frame length mode into the properties of the codec format object and simultaneously sets the frame interval parameter of the capability set container to ensure that the iLBC frame lengths of the local and remote ends are completely consistent. In bridging scenarios, the negotiated mode is synchronized to the bridging peer through the format set propagation mechanism of the peer media channel to ensure that the transcoder obtains the correct parameters when processing frame length conversion.

[0048] S105: During the inbound SDP negotiation phase, the address field in the SDP connection information line is parsed to identify the IPv6 address. During the outbound SDP generation phase, a three-level concatenated address family probe is performed to determine the address family of the RTP instance.

[0049] In this embodiment of the invention, the first level dynamically detects whether the transport layer address field of the session contact object contains IPv6 features; the second level degrades to the IPv6 flag bit of the contact URI when the transport layer address is empty; and the third level ultimately falls back to the static endpoint configuration when the contact object is unavailable.

[0050] Specifically, a dual-stack adaptive RTP instance creation is based on contact address detection and IPv4-mapped IPv6 identification. During the inbound SDP negotiation phase, the address field in the SDP connection information line is parsed. By detecting colon characters in the string and excluding IPv4-mapped IPv6 prefixes (mixed addresses in the form of ffff:xxxx), the true IPv6 address is accurately identified, and the correct address family is selected to create the RTP transport instance.

[0051] During the outbound SDP generation phase, instead of relying on static IPv6 configuration at the endpoint, a three-level cascaded address family detection is implemented: the first level dynamically detects whether the transport layer address field of the session contact object contains IPv6 characteristics; the second level, when the transport layer address is empty, degrades to the IPv6 flag bit of the contact URI; and the third level, when the contact object is unavailable, ultimately falls back to the static endpoint configuration. This three-level cascading ensures that the address family in the SDP connection information line is strictly consistent with the network layer actually used by the peer.

[0052] S106: When creating a WebRTC Bundle media stream, it iterates through all created media streams in the same group in the current SDP, compares the SSRC values ​​of each stream, and if a conflict is detected, it regenerates the SSRC and resets the traversal index to the starting position to re-detect until the SSRC of all media streams in the group is globally unique.

[0053] In this embodiment of the invention, the uniqueness of WebRTC multimedia streams is guaranteed based on Bundle-group traversal-style SSRC conflict detection and index reset retry. When creating an outbound SDP media stream, if it is detected that the current media stream belongs to a WebRTC Bundle group and has not yet been bound, the SSRC conflict detection process is initiated: traversing all created media streams in the same group in the current SDP, comparing the SSRC values ​​of each stream one by one. If a conflict is detected (two streams have the same SSRC), the source change interface of the RTP transmission instance is immediately called to regenerate the SSRC, and the traversal index is reset to the starting position to restart the conflict detection. This "detect-regenerate-redetect" loop ensures that the SSRC of all media streams in the final Bundle group is globally unique, meeting the strict multiplexing requirements of WebRTC.

[0054] S107: When a call is relayed through an FQDN domain name resolution server and the called party returns a 200 OK response, comfort noise data packets are sent synchronously on the audio and video RTP ports. The packets are repeatedly sent at preset intervals by the scheduler. When the remote media stream has been received, the timed sending is automatically stopped and the scheduler resources are released.

[0055] In this embodiment of the invention, the scheduler repeatedly sends comfort noise packets at 200-millisecond intervals; before each transmission, the most recent transmission timestamp of the RTP transmission instance is checked; when the most recent transmission timestamp is not zero, it is determined that the remote media stream has been received, the timed transmission is automatically stopped and the scheduler resources are released.

[0056] Specifically, this involves timed NAT traversal and hole punching for CNG-based FQDN domain name calls, with adaptive stopping. When a call is relayed via an FQDN domain name resolution server, the called party's 200 OK response triggers the engine to simultaneously send Comfort Noise (CNG) packets on both the audio and video RTP ports. The scheduler repeatedly sends CNG packets at 200-millisecond intervals, checking the most recent transmission timestamp of the RTP transmission instance before each transmission—if the remote media stream has been received (timestamp non-zero), the timed transmission automatically stops and scheduler resources are released. This mechanism ensures that NAT mapping is bidirectionally established as soon as media negotiation is complete, preventing the loss of video keyframes (I-frames) due to NAT failure, which could lead to a prolonged black screen for the calling party.

[0057] Beneficial effects: The three-dimensional capability set cross-operation eliminates the risk of codec degradation in reINVITE scenarios. Traditional solutions only perform two-dimensional cross-operation between endpoint configuration and peer-provided codecs during each SDP negotiation, ignoring the codec context already established in the current call. This invention introduces a channel-local format set as a third dimension on top of the two-dimensional one, prioritizing the preservation of currently active codecs during reINVITE negotiation. Degradation to endpoint-level negotiation only occurs if the current codec has been removed by the peer, avoiding unnecessary transcoder creation and destruction and media stream interruptions. This significantly improves media continuity in high-frequency scenarios such as call holdup recovery, call transfer, and conference joining.

[0058] This invention achieves seamless media continuity across devices by preserving the codec context during call handover and protecting the H.264 profile-level-id. Traditional solutions completely discard the original call codec state during device handover. This invention automatically clones the codec context from the original channel using a call handover identifier as a negotiation benchmark, and a precise H.264 profile-level-id matching and replacement mechanism prevents unexpected video resolution degradation. Users can maintain 1080p video quality even when switching from a mobile phone to a desktop client, achieving true seamless media continuity across devices.

[0059] Dual-stack adaptive RTP instance creation eliminates media establishment failures in IPv4 / IPv6 hybrid environments. Traditional solutions rely on static IPv6 configuration of endpoints, which frequently results in address family mismatches due to the complexity of real-world network environments. This invention dynamically determines the optimal address family through a three-level cascaded address family probing (transport layer address → URI flags → endpoint configuration), supplemented by precise IPv4-mapped IPv6 identification and filtering, ensuring that the address family of the RTP instance is strictly consistent with the actual network layer of the peer, significantly improving the media establishment success rate in IPv4 / IPv6 hybrid deployment environments.

[0060] Automatic SSRC conflict detection and regeneration within Bundle groups eliminates decoding chaos in WebRTC multimedia streams. Traditional solutions do not perform conflict detection on SSRCs within a Bundle group. This invention ensures the global uniqueness of the SSRC for all streams in the same group through traversal detection and index-based retrying, meeting the strict requirements of the WebRTC specification for multiplexing and fundamentally eliminating audio and video decoding chaos caused by SSRC conflicts.

[0061] FQDN NAT traversal and hole punching eliminates the video black screen problem caused by the loss of the first frame in a domain name call. Traditional solutions do not actively establish NAT mapping in domain name relay call scenarios. This invention synchronously sends CNG packets to the audio and video ports when a 200 OK response is triggered, continuously traversing at 200 millisecond intervals until the remote media stream is received, ensuring that the video I-frame is delivered as soon as NAT is established, eliminating the long black screen issue for the calling end.

[0062] Based on the methods provided in the above embodiments, this invention also provides a SIP media codec intelligent negotiation and cross-terminal seamless continuation device. The SIP media codec intelligent negotiation and cross-terminal seamless continuation device is described below with reference to the accompanying drawings.

[0063] See Figure 2 The figure is a schematic diagram of the structure of a SIP media codec intelligent negotiation and cross-terminal seamless continuation device provided in an embodiment of the present invention.

[0064] The SIP media codec intelligent negotiation and cross-terminal seamless continuation device 200 provided in this embodiment of the invention includes: a capability set maintenance unit 201, a negotiation mode differentiation unit 202, a codec selection unit 203, a call handover processing unit 204, an iLBC negotiation unit 205, an address family detection unit 206, an SSRC conflict detection unit 207, and a NAT traversal unit 208.

[0065] Capability set maintenance unit 201 is used to configure capability sets, peer format sets and channel local format sets for each SIP session maintenance endpoint. The negotiation mode differentiation unit 202 is used to detect whether the channel local format set is empty in order to distinguish between the first INVITE negotiation mode and the reINVITE negotiation mode. In the reINVITE mode, the channel local format set and the peer format set are intersected to generate a session-level union set. When the session-level union set is not empty, the session-level union set is used first. When the session-level union set is empty, it is downgraded to the endpoint-level union set generated by intersecting the endpoint configuration capability set and the peer format set. The codec selection unit 203 is used to implement a multi-priority optimal codec selector, perform a four-level priority concatenation decision on the codec entries in the SDP media description line, and perform a payload type two-dimensional verification for each candidate codec to determine the validity of the codec; The call switching processing unit 204 is used to extract and clone the original channel local format set as the negotiation benchmark from the original call media channel associated with the call switching identifier when the SIP session object carries a call switching identifier, prioritize maintaining the codec combination consistent with the original call, and perform a profile-level-id protection mechanism on the H.264 video codec. iLBC negotiation unit 205 is used to perform frame length mode negotiation on iLBC codec. When the mode parameter is not carried in the fmtp attribute of the peer SDP, the default frame length is 30 milliseconds. When the peer carries mode as 20 and the endpoint is configured with a frame length of 20 milliseconds, it is accepted; otherwise, it falls back to a frame length of 30 milliseconds and injects the determined frame length mode into the attribute of the codec format object. Address family detection unit 206 is used to parse the address field in the SDP connection information line to identify the IPv6 address during the inbound SDP negotiation phase, and to perform three-level concatenated address family detection during the outbound SDP generation phase to determine the address family of the RTP instance. The SSRC conflict detection unit 207 is used to traverse all created media streams in the same group in the current SDP when creating a WebRTC Bundle group media stream, compare the SSRC values ​​of each stream, regenerate the SSRC when a conflict is detected, and reset the traversal index to the starting position to re-detect until the SSRC of all media streams in the group is globally unique. The NAT traversal unit 208 is used to synchronously send comfort noise data packets on the audio and video RTP ports when a call is relayed through an FQDN domain name resolution server and the called party returns a 200 OK response. The packets are repeatedly sent at preset intervals by the scheduler. When the remote media stream has been received, the timed sending is automatically stopped and the scheduler resources are released.

[0066] In one possible implementation, the endpoint configuration capability set, the peer format set, and the channel local format set adopt a dual-index architecture, which includes a direct-address array with codec identifiers as keys and a preference sequence vector arranged in insertion order.

[0067] In one possible implementation, the codec selection unit 203 is specifically used for: The first priority is to find codecs that are compatible with both the channel's local format set and the peer's format set; The second priority is to select the first supported video codec on the called end; The third priority selects the first audio codec compatible with the local format set of the channel on the calling end; The fourth priority selects the first valid codec as a fallback.

[0068] In one possible implementation, the codec selection unit 203 is specifically used for: Positive verification uses the rtpmap attribute to find the codec format name corresponding to the payload type. Reverse verification involves looking up the expected payload type number using the encoding / decoding format name. The encoding / decoding is considered valid when the results of forward and reverse verification match.

[0069] In one possible implementation, the call handover processing unit 204 is specifically used for: Extract the profile-level-id parameter provided by the peer from the SDP fmtp attribute, and perform a numerical comparison with the profile-level-id of the active H.264 instance in the current channel local format set; If the comparison results are inconsistent, the codec is marked as incompatible and forced to skip. During the call switching path, an instance that completely matches the original call profile-level-id is searched from the H.264 candidates of the peer SDP, and the currently selected codec is replaced with the matching instance through the codec replacement mechanism.

[0070] In one possible implementation, the address family detection unit 206 is specifically used for: The first level dynamically detects whether the transport layer address field of the session contact object contains IPv6 characteristics; The second level degrades to the IPv6 flag of the contact URI when the transport layer address is empty; The third level ultimately falls back to the endpoint static configuration when the contact object is unavailable.

[0071] In one possible implementation, the NAT traversal unit 208 is specifically used for: Comfort noise packets are repeatedly sent at 200-millisecond intervals via the scheduler; Before each transmission, check the most recent transmission timestamp of the RTP transport instance; When the most recently sent timestamp is non-zero, it is determined that the remote media stream has been received, and the timed sending is automatically stopped and the scheduler resources are released.

[0072] Since the SIP media codec intelligent negotiation and cross-terminal seamless continuation device 200 is a device corresponding to the SIP media codec intelligent negotiation and cross-terminal seamless continuation method provided in the above method embodiments, the specific implementation of each unit of the SIP media codec intelligent negotiation and cross-terminal seamless continuation device 200 is based on the same concept as in the above method embodiments. Therefore, for the specific implementation of each unit of the SIP media codec intelligent negotiation and cross-terminal seamless continuation device 200, please refer to the description of the SIP media codec intelligent negotiation and cross-terminal seamless continuation method in the above method embodiments, and will not be repeated here.

[0073] This invention also provides a SIP media codec intelligent negotiation and cross-terminal seamless continuation device, the device comprising: a processor and a memory; The memory is used to store instructions; The processor is used to execute the instructions in the memory to perform the SIP media codec intelligent negotiation and cross-terminal seamless continuation method mentioned in the above embodiments.

[0074] It should be noted that the hardware structure of the SIP media codec intelligent negotiation and cross-terminal seamless continuation device provided in the embodiments of the present invention can be as follows: Figure 3 The structure shown, Figure 3 This is a schematic diagram of the structure of a device provided in an embodiment of the present invention.

[0075] Please see Figure 3 As shown, device 300 includes: a processor 310, a communication interface 320, and a memory 330. The number of processors 310 in device 300 can be one or more. Figure 3 Taking a processor as an example, in this embodiment of the invention, the processor 310, communication interface 320, and memory 330 can be connected via a bus system or other means. Figure 3 Taking the connection between China and Israel via bus system 340 as an example.

[0076] Processor 310 may be a central processing unit (CPU), a network processor (NP), or a combination of a CPU and an NP. Processor 310 may further include hardware chips. These hardware chips may be application-specific integrated circuits (ASICs), programmable logic devices (PLDs), or combinations thereof. The PLD may be a complex programmable logic device (CPLD), a field-programmable gate array (FPGA), generic array logic (GAL), or any combination thereof.

[0077] The memory 330 may include volatile memory, such as random-access memory (RAM); the memory 330 may also include non-volatile memory, such as flash memory, hard disk drive (HDD) or solid-state drive (SSD); the memory 330 may also include a combination of the above types of memory.

[0078] Optionally, the memory 330 stores an operating system and programs, executable modules, or data structures, or subsets thereof, or extended sets thereof. The programs may include various operation instructions for implementing various operations. The operating system may include various system programs for implementing various basic services and handling hardware-based tasks. The processor 310 can read the programs in the memory 330 to implement the SIP media codec intelligent negotiation and cross-terminal seamless continuation method provided in this embodiment of the invention.

[0079] The bus system 340 can be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus, etc. The bus system 340 can be divided into address bus, data bus, control bus, etc. For ease of representation, Figure 3The bus is represented by a single thick line, but this does not mean that there is only one bus or one type of bus.

[0080] This invention also provides a computer-readable storage medium including instructions that, when executed on a computer, cause the computer to perform the SIP media codec intelligent negotiation and cross-terminal seamless continuation method mentioned in the above embodiments.

[0081] This invention also provides a computer program product containing instructions that, when run on a computer, cause the computer to execute the SIP media codec intelligent negotiation and cross-terminal seamless continuation method mentioned in the above embodiments.

Claims

1. A method for intelligent negotiation and seamless cross-terminal continuation of SIP media codec, characterized in that, include: For each SIP session, maintain an endpoint configuration capability set, a peer format set, and a channel local format set. Detect whether the channel local format set is empty to distinguish between the initial INVITE negotiation mode and the reINVITE negotiation mode. In the reINVITE mode, take the intersection of the channel local format set and the peer format set to generate a session-level union set. If the session-level union set is not empty, prioritize using the session-level union set. If the session-level union set is empty, downgrade to the endpoint-level union set generated by taking the intersection of the endpoint configuration capability set and the peer format set. Implement a multi-priority optimal codec selector, perform a four-level priority concatenation decision on codec entries in the SDP media description line, and perform a two-dimensional payload type verification for each candidate codec to determine the validity of the codec; When a SIP session object carries a call switching identifier, the original channel local format set associated with the call switching identifier is extracted and cloned as a negotiation benchmark. Priority is given to maintaining the same codec combination as the original call, and a profile-level-id protection mechanism is implemented for H.264 video codec. The iLBC codec performs frame length mode negotiation. When the peer SDP's fmtp attribute does not carry the mode parameter, the default frame length is 30 milliseconds. When the peer carries mode as 20 and the endpoint is configured with a frame length of 20 milliseconds, the codec accepts the mode. Otherwise, it falls back to a frame length of 30 milliseconds and injects the determined frame length mode into the attribute of the codec format object. During the inbound SDP negotiation phase, the address field in the SDP connection information line is parsed to identify the IPv6 address. During the outbound SDP generation phase, a three-level concatenated address family probe is performed to determine the address family of the RTP instance. When creating a WebRTC Bundle media stream, it iterates through all created media streams in the same group in the current SDP, compares the SSRC values ​​of each stream, and if a conflict is detected, it regenerates the SSRC and resets the traversal index to the starting position to re-detect until the SSRC of all media streams in the group is globally unique. When a call is relayed via an FQDN domain name resolution server and the called party returns a 200 OK response, comfort noise data packets are sent synchronously on the audio and video RTP ports. These packets are then repeatedly sent at preset intervals via the scheduler. Once the remote media stream has been received, the timed sending is automatically stopped and the scheduler resources are released.

2. The method according to claim 1, characterized in that, The endpoint configuration capability set, the peer format set, and the channel local format set adopt a dual-index architecture, which includes a direct addressing array with codec identifiers as keys and a preference sequence vector arranged in insertion order.

3. The method according to claim 1, characterized in that, The four-level priority cascaded decision includes: The first priority is to find codecs that are compatible with both the channel's local format set and the peer's format set; The second priority is to select the first supported video codec on the called end; The third priority selects the first audio codec on the calling end that is compatible with the local format set of the channel; The fourth priority selects the first valid codec as a fallback.

4. The method according to claim 1, characterized in that, The two-dimensional verification of payload type for each candidate codec includes: Positive verification uses the rtpmap attribute to find the codec format name corresponding to the payload type. Reverse verification involves looking up the expected payload type number using the encoding / decoding format name. The encoding / decoding is considered valid when the results of forward and reverse verification match.

5. The method according to claim 1, characterized in that, The implementation of profile-level-id protection mechanism for H.264 video encoding and decoding includes: Extract the profile-level-id parameter provided by the peer from the SDP fmtp attribute, and perform a numerical comparison with the profile-level-id of the active H.264 instance in the current channel local format set; If the comparison results are inconsistent, the codec is marked as incompatible and forced to skip. In the call switching path, an instance that completely matches the original call profile-level-id is searched from the H.264 candidates of the peer SDP, and the currently selected codec is replaced with the matching instance through the codec replacement mechanism.

6. The method according to claim 1, characterized in that, The execution of the three-level cascaded address family detection includes: The first level dynamically detects whether the transport layer address field of the session contact object contains IPv6 characteristics; The second level degrades to the IPv6 flag of the contact URI when the transport layer address is empty; The third level ultimately falls back to the endpoint static configuration when the contact object is unavailable.

7. The method according to claim 1, characterized in that, The step of repeatedly sending data at preset intervals via a scheduler includes: Comfort noise packets are repeatedly sent at 200-millisecond intervals via the scheduler; Before each transmission, check the most recent transmission timestamp of the RTP transport instance; When the most recently sent timestamp is not zero, it is determined that the remote media stream has been received, the timed sending is automatically stopped and the scheduler resources are released.

8. A device for intelligent negotiation and seamless cross-terminal continuation of SIP media codec, characterized in that, include: The capability set maintenance unit is used to maintain the configuration of the capability set, peer format set, and channel local format set for each SIP session endpoint. The negotiation mode differentiation unit is used to detect whether the channel local format set is empty to distinguish between the initial INVITE negotiation mode and the reINVITE negotiation mode. In the reINVITE mode, the channel local format set and the peer format set are intersected to generate a session-level union set. When the session-level union set is not empty, the session-level union set is used first. When the session-level union set is empty, the system is downgraded to the endpoint-level union set generated by intersecting the endpoint configuration capability set and the peer format set. The codec selection unit is used to implement a multi-priority optimal codec selector, which performs a four-level priority concatenation decision on the codec entries in the SDP media description line, and performs a two-dimensional payload type verification for each candidate codec to determine the validity of the codec. The call switching processing unit is used to extract and clone the original channel local format set as the negotiation benchmark from the original call media channel associated with the call switching identifier when the SIP session object carries a call switching identifier, prioritize maintaining the codec combination consistent with the original call, and perform a profile-level-id protection mechanism on the H.264 video codec. The iLBC negotiation unit is used to negotiate the frame length mode for iLBC encoding and decoding. When the mode parameter is not carried in the fmtp attribute of the peer SDP, the default frame length is 30 milliseconds. When the peer carries mode as 20 and the endpoint is configured with a frame length of 20 milliseconds, the frame length is accepted; otherwise, it falls back to a frame length of 30 milliseconds and injects the determined frame length mode into the attribute of the encoding and decoding format object. The address family detection unit is used to parse the address field in the SDP connection information line to identify the IPv6 address during the inbound SDP negotiation phase, and to perform three-level concatenated address family detection during the outbound SDP generation phase to determine the address family of the RTP instance. The SSRC conflict detection unit is used to traverse all created media streams in the same group in the current SDP when creating a WebRTC Bundle group media stream, compare the SSRC values ​​of each stream, and regenerate the SSRC when a conflict is detected, and reset the traversal index to the starting position to re-detect until the SSRC of all media streams in the group is globally unique. The NAT traversal unit is used to synchronously send comfort noise data packets on the audio and video RTP ports when a call is relayed through an FQDN domain name resolution server and the called party returns a 200 OK response. The packets are repeatedly sent at preset intervals by the scheduler. When the remote media stream has been received, the timed sending is automatically stopped and the scheduler resources are released.

9. An electronic device, characterized in that, The device includes: a processor and a memory; The memory is used to store instructions; The processor is configured to execute the instructions in the memory to perform the method according to any one of claims 1-7.

10. A computer-readable storage medium, characterized in that, Including instructions that, when run on a computer, cause the computer to perform the method described in any one of claims 1-7 above.