[0039] In order to further illustrate the technical means and effects adopted by the present invention to achieve the predetermined purpose of the invention, the following combination attached drawing And a preferred embodiment, the specific implementation, method, steps and effects of the crowdsourced content distribution network proposed according to the present invention are described in detail as follows.
[0040] Regarding the aforementioned and other technical contents, features and effects of the present invention, reference is made below. picture The detailed description of the preferred embodiment of the formula will appear clearly. Through the description of the specific embodiments, the technical means and effects adopted by the present invention to achieve the predetermined purpose can be more deeply and specifically understood, but the attached drawing The formulas are only used for reference and description, and are not used to limit the present invention.
[0041] Unless otherwise specified, in the entire specification and claims, "include" and "include" mean "including but not limited to". "Connected" or its variants is a direct or indirect connection between two or more elements, modules or systems, which may be physical, logical, or a combination thereof. "/" means "or" and covers the following interpretations: any item in the list, all items in the list, any combination of all the items in the list. Words using the singular or plural may also refer to the singular or plural, respectively.
[0042] One starting point of the main idea of the present invention is to rationally utilize the idle bandwidth and storage resources of devices near the user side (zero-hop distance in network topology), and such devices are usually not service devices directly used by users, so users are not directly aware of them. --At least not sensitive, as long as task granularity, bandwidth and I/O distribution and load are properly controlled. Another starting point is to separate signaling or metadata (signaling (meta data)) and data ((mass) data), and distribute more data on devices that are zero hops away from the user. Storage services for servers, content distribution, and probabilistic availability are hosted on the user side: Although the bandwidth/storage cost per bit/Byte has been slowly declining in recent years, the data scale that still maintains super-linear growth will make many Internet services continue to be used for many years. The cloud model will not last long, and many giants have already seen the growth of operating expenses caused by the scale of data exceeding the growth of revenue. Then after the Cloud, more local processing (Local Processing) is needed, then the Fog (fog) mode essentially disclosed by the present invention can also be called local cloud or crowd-cloud. Information Centric Network will play an important role in reducing costs and improving performance in most scenarios. Another starting point is to design an incentive mechanism, discover more business models and business methods behind it, and return a part of the CDN and other business benefits to the corresponding users, which in turn facilitates the system, method, and equipment of the present invention. Better coverage to the edge of the network.
[0043] See figure 1 , which is a working illustration of a crowdsourced content distribution network according to an embodiment of the present invention picture , which includes: an edge portion 101 and a central portion 106 . The edge part 101 includes working nodes, that is, devices or modules with zero topological distance from the owner's network, including household or commercial broadband routers, IPTV SetTop Boxes, and network add-ons with Internet access functions. Storage (NAS) device, partner robot with Internet access function, etc., figure 1 In the embodiment scenario shown, 102 and 104 are smart Wi-Fi routers with additional storage, 103 and 105 are Web browsers or browser plug-ins that support the WebRTC protocol or the RTMFP protocol; the central part 106 contains the information used to schedule the entire network. The task and traffic coordinator module, which further includes STUN/TURN/Trickle ICE sub-module 107, smart DNS or dynamic DNS sub-module 108 to assist edge nodes to establish P2P connection, built in DNS The module's global load balancing GLB sub-module 109 serving certain scenarios, ALTO sub-module 110 for accelerating the discovery, selection or search of nodes and/or resources, may also include indexing some types of resources, maintaining some scenarios Middle node relationship, maintenance part optimal search or transmission path/distribution tree/ picture The Indexing/Routing submodule.
[0044] In other embodiments, 107, 108, and 109 may be deployed in both the central portion and the edge portion.
[0045] Manages LAN-WAN access capable devices or modules with zero hop topological distance from the owner's network, and functional modules deployed on them. For example, at 102 and 104, the program deployed on the device is registered with 106 every time the device is started and/or every certain time interval, and the current state information of the device is sent, such as currently available bandwidth, available memory/disk storage, CPU/storage Usage, CPU load, I/O load, etc.; in the interaction that meets certain conditions, you can bring the version number of a specific or all edge processing modules deployed on the device. If the conditions are met, the corresponding submodules of 106 The group returns a specific signaling message to instruct to upgrade the corresponding edge processing module or download and install a new processing module. Upgrading can use incremental update methods, such as binary differential updates, or the Courgette algorithm.
[0046] In addition, the above-mentioned management module also has the function of notifying the devices 102 and 104 to update the operating system or system module, update the system program and application program configuration, and restart the device. Any of the above signaling communication needs to be encrypted and transmitted with a negotiated key (such as IKE) to ensure security.
[0047] For web browsers or browser plug-ins such as 103, 105, register with the corresponding submodule of 106 each time the browser starts, the page opens, or the plug-in runs, during interaction with 106 and with any peer node During the interaction, both parties monitor the communication status. Once the communication is unreachable, it will report to the corresponding sub-module 106, and the corresponding sub-module will delete the node record in the online node list maintained by it or change the state attribute of the corresponding record.
[0048] In a few scenarios with high service quality requirements, the corresponding modules of 103 or 105 can also send heartbeat signaling at short intervals (for example, every 10--20s) for the sake of safety.
[0049] According to the physical characteristics and historical online duration distribution of the device or module, the working mode of the function module deployed on it is specifically determined, and the resources at which locations are to be indexed. In general, when the device contains additional storage, it will prefetch the resources that may be needed in the next time slot, otherwise, the resources will not be prefetched; when the device memory storage is large and the historical online time is mostly long, it will not only index the device's resources and index the state information and resources of other devices adjacent to them, otherwise only the resources of this device are indexed.
[0050] When the service served is a static resource, the edge node uses the DHT (distributed hash table) method for indexing. The hash value is generally a full-text hash of a file, or a string hash of a URL. At the indexing method level , a feasible embodiment is to use the Kademlia method based on XOR distance. When the service served is live streaming media, a method of dividing multiple distribution trees of sub-streams, such as FashMesh or its approximate method, is used. In the construction of the distribution tree of each sub-stream, for each node j, select the parent node's Heuristic information is in the form of Power ij =[min(r j ,s)] m /(d ij +D i ) n function, where r j is the remaining available bandwidth of node j, s is the average bit rate of the substream, d ij is the distance from node i to node j (usually measured by the connection delay, and can also be a function of RTT and packet_loss to express connectivity and throughput), D i is the longest distance from the source to the node i, m and n are positive real numbers used to adjust the dimension, and the node with the largest value is selected as the parent node each time.
[0051] According to the needs of business scenarios, generally, for resources such as Web pages, HTTP or HTTPS protocol is used after redirection, while for resources such as streaming media, the transmission of intermediate nodes generally uses UDP-based protocol, and the last segment serving users can use HTTP-based protocol. DASH such as HLS protocol.
[0052] In static resources and dynamic acceleration service scenarios, in order to further save traffic, the edge processing module has the function of compressing content, and can use dynamic dictionary compression for specific business time locality characteristics; in signaling acceleration scenarios, the edge processing module The group has the functions of compressing and SSL encryption for transmission data; in the streaming media service scenario, the edge processing module has the functions of multiplexing/demultiplexing, transcoding, transpacking, and merging sub-streams/dividing of media streams or media files. Slice/Frame/GoP functionality.
[0053] When the service served is a static resource, the DHT (distributed hash table) method is used for indexing, and the hash value is generally a certain full-text hash of the file, or a certain string hash of the URL (global resource locator). At the level of indexing method, a feasible embodiment is to use the Kademlia method based on XOR distance.
[0054] For resource search in time-insensitive scenarios, DHT can be used; for resource search in time-sensitive scenarios, local and neighbor index, DHT, and center index parallel methods can be used.
[0055] 102 and 104 are deployed with modules that use RTT and packet loss rate in the connection to estimate the end-to-end available bandwidth, and periodically detect and report information such as CPU usage, CPU load, available memory storage, and available disk storage to 106. module.
[0056]In operation, the edge module of 101 will introduce a random node selection strategy with a certain probability. After each end-to-end content distribution is completed, the connectivity of the transmission is counted and reported to 106. 106 includes automatic online edge nodes. The sub-modules of clustering and splitting refer to the IP database including the global IP segment-geolocation-ISP. According to the triplet as the initial information, in the operation service, according to the connectivity data reported in 101, press The E-M (expectation maximization) algorithm continuously iterates and re-clusters, and introduces an automatic splitting mechanism to maintain a dynamic node group database, and rationally uses MDS, GeoHash, and Z-ordering methods based on Peano or Hibert curves to form a better A virtual network location system for fast and highly concurrent kNN queries.
[0057] The above virtual network location system is an important foundation for constructing the intelligent/dynamic DNS sub-module 108 and the ALTO service sub-module 110 . Among them, 110 is mainly used for edge nodes to query their neighbor lists, so that edge nodes such as 101 can use the GOSSIP protocol to build a Membership table; and 108 can request redirection to adjacent or any node on the one hand, and provide information to the GLB on it on the other hand. The module 109 is used to balance the global load as much as possible under the premise of ensuring service quality.
[0058] Each time a node goes online, the edge ALTO module of 101 requests the sub-module 107 in 106 to obtain the network location and its neighboring node information, and after obtaining the information, it will be indexed into its GOSSIP Membership table, and/or the priority reservation of the DHT table In some cases, it communicates with its neighbor nodes, and then obtains the corresponding index table of the neighbor nodes. Generally, the distance between the nodes represented by the information is limited to within 3 hops.
[0059] The indexing/routing sub-module 111 indexes resource information for the selected or counted super nodes or stable nodes, maintains the optimal routing table for interconnecting these nodes, and a distribution network constructed for some services such as live streaming.
[0060] In order to improve the efficiency of resource prefetching, improve the hit rate, and speed up the search of resources and nodes, the 111 sub-module also maintains indexes of multiple domains such as user domain, interest domain, resource domain, and network topology domain, and 101 edge devices Or the module maintains distributed hash sub-tables for different types of resources and different above-mentioned domains, using Heterogeneous Hashing method for different data types Hamming spaces to support similarity search between different domains (similarity search across different domains) ) to optimize resource distribution and search paths. When searching, use the HmSearch method to speed up.
[0061] When an edge device or module 101 prefetches a resource or performs a fuzzy search for a resource or node, it can decide whether to search through the cross-domain DHT or the request center according to the load and service priority of the 111 module.
[0062] For prefetching resources, in another embodiment, it is implemented by analyzing historical time series data of users and collaborative filtering among users. In addition, it can also be explicitly provided by the content provider (Content Provider).
[0063] In a preferred embodiment, if the social network service provider cooperates, the social relationship chain/relationship can be picture Spectrum, or social media propagation model, pre-distributes a specific resource to the edge devices or modules closest to the resource that may access the resource in advance according to the 1-2 degree relationship of resource owner/pre-communicator, or the predicted propagation arrival node, And synchronize the central index submodule of 111. this square puja Greatly improve the distribution performance of UGC and other types of social media resources.
[0064] For the storage of cache and prefetching resources, devices 102 and 104 can use local file systems such as shared memory cache, NTFS or ext, and can form distributions in groups divided by geographic location, ISP, interest domain, resource attribute domain, etc. The 103 and 105 modules can use in-memory objects, LocalStorage, Indexed DB, and WebSQL.
[0065] When 101 is a browser page, call APIs that make protocols such as WebRTC run silently in the background of the browser to prevent connection and/or cache resource loss; when resource new/replacement/elimination occurs, broadcast corresponding signaling to its neighbors . .
[0066] For assisting the establishment of P2P connections between edge devices or modules, the Trickle ICE method including STUN and UPnP is preferred. The 107 module collects the triple SDP information of the 101 edge module, and the edge Connectivity detection is performed, and the TURN method is used if they cannot be connected to establish a transit connection. In a preferred embodiment, the selection of the transit edge node or server needs to be based on the ALTO sub-module 110 and/or the Smart DNS sub-module. Heuristics for group 108 are determined to reduce connection latency, increase throughput rates, and reduce traffic across geographies, across ISPs, and through the ISP backbone.
[0067] In order to improve the P2P connection rate, devices such as 102 and 104 periodically detect all available ports between 1024 and 65535, and store and maintain them.
[0068] 102, 104 If the device has an available mapping from the intranet to the intranet of the external network, it will register with the Coordinator module so that more nodes can access the external network from the outside. Establish a P2P connection. When the HTTP protocol type is available, register with the Smart DNS/DDNS sub-module 108, and re-register or report if there are any changes, so that as many HTTP CDN requests as possible can be directly redirected to the existing database at the application layer. On the device processing module that requested the resource.
[0069] Devices 102 and 104 have daemons that monitor the a port of the internal network and the b port of the external network at the same time. Such as 192.168.0.1:8888 and 123.456.789.123:9999, so that resources that can be hit within 1 hop can be quickly redirected.
[0070] In the service, redirection such as HTTP 302 can be used, and tasks such as crowd-mining tagging can also be embedded to help more efficient resource distribution.
[0071] In a preferred embodiment, if 102 and 104 are devices with additional storage and larger memory, they not only manage indexes or metadata of their own cache resources, but also store resources corresponding to other devices and active pages of modules in the region An index (which can be stored as an in-memory object, LocalStorage, Indexed DB, WebSQL, etc.).
[0072] Devices 102 and 103 are also deployed with a module for detecting IP Multicast Land, which periodically broadcasts IP multicast detection information to the Internet, and reports the node information that has responded to the scalable IP multicast coordination sub-module of the Coordinator module 106. The group, coordinating submodule maintains all IP Multicast Land and all node information contained in each.
[0073] In scenarios such as live streaming, super-hot resource prefetching, etc. that need to transmit data to many different nodes that may be in one IP Multicast Land, in the Coordinator module 106, all nodes in the same IP Multicast Island are merged to form a multicast domain , only one representative node needs to be selected for entering the domain, which greatly compresses the content distribution tree.
[0074] please combine figure 1 and figure 2 , figure 1 The devices shown 102, 103 contain distributed storage modules. Users contribute part of the storage space to store other users' resources and figure 2 Assets distributed by the indicated Content Provider (CP). It is necessary to use encrypted storage here, and logically isolate the user's own resources from other resources, or use the maintained account system to separate access rights—using DDNS or DHT plus a central index to make the user's own resources available to the entire network for access by this account .
[0075] image 3 An embodiment of the present invention is shown in the simple but special scenario of improving the availability of content distribution in the event of local network failures. The user corresponding to edge module (also called fog module) A accesses a web server (usually the CP client in the embodiment of the present invention), but due to a local network failure, the connection cannot reach the server or the requested resource cannot be returned , after the retry is invalid, A requests other nodes to assist in obtaining resources from the coordinator module, and the coordinator returns the fog that can communicate with both A and the web server and is close to A and/or the web server according to the connectivity, load and other information. Module B, the client and server modules such as STUN distributed on A and B, that is, the coordinator, help the two establish a P2P connection, and B constructs a corresponding request to help A obtain the required resources and then returns to A through this connection. This process It can be carried out in stages, gradually or asynchronously, and the transfer can also be multi-hop. image 3 The signaling transmission description of the interaction between the coherent parties is shown in Chinese and English.
[0076] For most large-scale CPs, CDNs and ISPs often use the peak or quintile of total bandwidth sampled in a specific period to price. However, most of CP's service users are regional, and there is a high degree of overlap in access time between people in specific regions, which often causes the peak at 10-11 o'clock in the evening to be significantly higher than the rest of the time. At this time, the 106 module adopts prefetching methods according to the characteristics of the scene to provide additional services of peak shaving and valley filling for specific CPs to help CPs reduce costs.
[0077] please combine image 3 and Figure 4 , In the case of maintaining the user account system, a reasonable pricing formula is given every month based on the peak user bandwidth contribution, total distribution bandwidth, and storage space contribution, and a part of the content distribution revenue of this system service is returned to the user. This can be in the form of cash, checks, vouchers, coupons, or other value-added services.
[0078] After obtaining the user's consent, an interest mining module is deployed in 101 to mine information such as keywords of the user's interest from the unencrypted communication data, using either the traditional data mining method or the Crowd-mining method. Obtaining information on the one hand provides domain of interest information to optimize resource distribution; on the other hand, it can be accurately pushed in voucher or coupon scenarios; on the other hand, it can form intermediate services to help manufacturers who want to promote their own products or services to quickly put trial products Or advertising information (which can be printed on the back of the above bill) to target users or potential users.
[0079] the description of the present invention and picture The formulas can be understood by those skilled in the relevant art and various modifications and variations can be made from the disclosed examples. Numerous details are described to provide a thorough understanding of the present disclosure. For example, most of the above scenarios are described by the application layer protocol HTTP, which is the most common and occupies the largest proportion on the Internet. FTP, RTP, SRTP, SCTP, UDP. However, in some embodiments, details that are well known or common to those skilled in the art are not described in order to avoid obscuring or unduly lengthy the description.
[0080] It should be noted that the scope of the system, method and device of the crowdsourcing content distribution network of the present invention includes, but is not limited to, any combination of the above components.
[0081] The above are only preferred embodiments of the present invention, and do not limit the present invention in any form. Although the present invention has been disclosed above with preferred embodiments, it is not intended to limit the present invention. The technical personnel, within the scope of the technical solution of the present invention, can make some changes or modifications by using the technical content disclosed above to be equivalent embodiments of equivalent changes, provided that they do not depart from the technical solution content of the present invention, according to the technical solution of the present invention. Any simple modifications, equivalent changes and modifications made to the above embodiments still fall within the scope of the technical solutions of the present invention.