Method for collecting sample data and apparatus for collecting sample data

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By tagging users and subscribing to sample data based on those tags, and using AI models to train and correct the identification results, the problem of accuracy in identifying encrypted and fraudulent traffic in existing business perception methods has been solved, achieving more efficient business traffic identification.

WO2026138309A1PCT designated stage Publication Date: 2026-07-02HUAWEI TECH CO LTD

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: WO · WO
Patent Type: Applications
Current Assignee / Owner: HUAWEI TECH CO LTD
Filing Date: 2025-11-24
Publication Date: 2026-07-02

Application Information

Patent Timeline

24 Nov 2025

Application

02 Jul 2026

Publication

WO2026138309A1

IPC: G06F18/24

AI Tagging

Technology Topics

Data miningSample Label

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

Smart Images

Figure CN2025137239_02072026_PF_FP_ABST

Patent Text Reader

Abstract

The present application relates to the technical field of communications. Disclosed are a method for collecting sample data and an apparatus for collecting sample data. The method comprises: receiving first traffic data of a first user, wherein a user category of the first user is a first user category, and historical user traffic data of the first user category satisfies a first feature; and training an awareness model on the basis of the first traffic data, wherein the awareness model is used for recognizing a service category of traffic data, a sample label of the first traffic data comprises a service category of the first traffic data, and there is an association relationship between the service category of the first traffic data and the first user category. The method in the embodiments of the present application can improve the recognition accuracy of the awareness model.

Need to check novelty before this filing date? Find Prior Art

Description

Methods and apparatus for collecting sample data

[0001] This application claims priority to Chinese Patent Application No. 202411933758.0, filed on December 24, 2024, entitled “Method and Apparatus for Collecting Sample Data”, the entire contents of which are incorporated herein by reference. Technical Field

[0002] This application relates to the field of communication technology, and specifically to a method and apparatus for collecting sample data. Background Technology

[0003] With the development of communication technology, some communication systems have introduced service awareness (SA) technology to sense different services within the transmission channel, thereby enabling different policy controls (such as service control and billing control) for different services. However, existing service awareness methods still have some problems. Summary of the Invention

[0004] This application provides a method and apparatus for collecting sample data, which can improve the recognition accuracy of the perception model.

[0005] In a first aspect, a method for collecting sample data is provided, the method being applied to a first network element or a component within the first network element (e.g., a processor, chip, chip system, circuit, or a functional module, etc.), the method comprising:

[0006] Receive the first traffic data of the first user, the user category of the first user is the first user category, and the historical traffic data of the first user category satisfies the first feature;

[0007] A perception model is trained based on the first traffic data. The perception model is used to identify the business category of the traffic data. The sample labels of the first traffic data include the business category of the first traffic data. There is a correlation between the business category of the first traffic data and the first user category.

[0008] In this embodiment, the historical traffic data of the first user category satisfies the first feature, the business category of the first traffic data is related to the first user category, and the sample label of the first traffic data includes the business category of the first traffic data. In this way, it is equivalent to adding a sample label to the first traffic data based on the user category of the first user, which can improve the sample quality of the training samples. At this time, training the perception model based on the first traffic data can improve the training effect of the model, thereby improving the recognition accuracy of the perception model.

[0009] In some possible implementations, the first user category includes a control category, wherein the historical traffic data of the users in the first user category satisfies a first characteristic, including:

[0010] When sorting the K historical traffic data corresponding to K users in descending order, the historical traffic data of the controlled user is one of the first M historical traffic data, and the controlled user is one of the N users among the K users who have the most blocked services, where K is a positive integer, and M and N are both positive integers less than or equal to K.

[0011] In this embodiment of the application, the first user category includes the control category. Thus, by training the perception model based on the first traffic data, the accuracy of the perception model in identifying the traffic data (business category) of the control category users can be improved.

[0012] In some possible implementations, the first user category includes a fraud category, wherein the historical traffic data of the users in the first user category satisfies a first characteristic, including:

[0013] If the historical traffic of fraudulent users consistently contains free-rate traffic within a first time period, and the proportion of free-rate traffic exceeds a first threshold within the first time period, the Service Awareness (SA) classifies the historical traffic of fraudulent users as free-rate traffic, but the perception model classifies it as fraudulent traffic.

[0014] In this embodiment of the application, the first user category includes fraud. Thus, by training the perception model based on the first traffic data, the accuracy of the perception model in identifying the traffic data (business category) of fraud users can be improved.

[0015] In some possible implementations, the first user category includes a friendly category, wherein the historical traffic data of the users in the first user category satisfies a first characteristic, including:

[0016] When sorting the K historical traffic data corresponding to K users in descending order, the historical traffic data of the controlled user is one of the first M historical traffic data, and the controlled user is one of the P users among the K users who have the fewest blocked services. There is no fraudulent traffic in the historical traffic data of the controlled user, K is a positive integer, and M and P are both positive integers less than or equal to K.

[0017] In this embodiment of the application, the first user category includes a friendly category. Thus, by training the perception model based on the first traffic data, the accuracy of the perception model in identifying the traffic data (business category) of friendly users can be improved.

[0018] In some possible implementations, before receiving the first user's first traffic data, the method further includes: sending first information, the first information being used to request the first traffic data.

[0019] In some possible implementations, the first information is used to indicate the first user category, or the first information is used to indicate the first user.

[0020] In this embodiment of the application, the first information is used to indicate the first user category, or the first information is used to indicate the first user. In this way, sample data can be subscribed based on a specific user group, thereby improving the efficiency of sample data collection and reducing the transmission of invalid sample data between network elements.

[0021] In some possible implementations, before receiving the first user's first traffic data, the method further includes: determining the user category of the first user as the first user category based on the first user's historical traffic data.

[0022] In this embodiment of the application, the user category of the first user is determined based on the historical traffic data of the first user. In this way, sample data can be collected based on a specific user group, which improves the efficiency of sample data collection and thus improves the training effect of the model and the recognition accuracy of the perception model.

[0023] Secondly, a method for collecting sample data is provided, the method being applied to a second network element or a component within the second network element (e.g., a processor, chip, chip system, circuit, or a functional module, etc.), the method comprising:

[0024] Collect the first traffic data of the first user, the user category of the first user is the first user category, and the historical traffic data of the first user category satisfies the first feature;

[0025] The first traffic data is sent to train a perception model based on the first traffic data. The perception model is used to identify the service category of the traffic data. The sample label of the first traffic data includes the service category of the first traffic data. There is a correlation between the service category of the first traffic data and the first user category.

[0026] In this embodiment of the application, the user category of the first user is the first user category, and the historical traffic data of the users of the first user category satisfies the first feature. In this way, sample data can be collected based on a specific user group, improving the efficiency of sample data collection, thereby improving the training effect of the model and improving the recognition accuracy of the perception model.

[0027] Meanwhile, there is a correlation between the service category of the first traffic data and the first user category. The sample label of the first traffic data includes the service category of the first traffic data. In this way, it is equivalent to adding a sample label to the first traffic data based on the user category of the first user, which can improve the sample quality of the training samples. At this time, sending the first traffic data helps the first network element to train the perception model based on the first traffic data, which helps to improve the training effect of the model, thereby helping to improve the recognition accuracy of the perception model.

[0028] In some possible implementations, the first user category includes a control category, wherein the historical traffic data of the users in the first user category satisfies a first characteristic, including:

[0029] When sorting the K historical traffic data corresponding to K users in descending order, the historical traffic data of the controlled user is one of the first M historical traffic data, and the controlled user is one of the N users among the K users who have the most blocked services, where K is a positive integer, and M and N are both positive integers less than or equal to K.

[0030] In this embodiment of the application, the first user category includes the control category. Thus, by training the perception model based on the first traffic data, the accuracy of the perception model in identifying the traffic data (business category) of the control category users can be improved.

[0031] In some possible implementations, the first user category includes a fraud category, wherein the historical traffic data of the users in the first user category satisfies a first characteristic, including:

[0032] If the historical traffic of fraudulent users consistently contains free-rate traffic within a first time period, and the proportion of free-rate traffic exceeds a first threshold within the first time period, the Service Awareness (SA) classifies the historical traffic of fraudulent users as free-rate traffic, but the perception model classifies it as fraudulent traffic.

[0033] In this embodiment of the application, the first user category includes fraud. Thus, by training the perception model based on the first traffic data, the accuracy of the perception model in identifying the traffic data (business category) of fraud users can be improved.

[0034] In some possible implementations, the first user category includes a friendly category, wherein the historical traffic data of the users in the first user category satisfies a first characteristic, including:

[0035] When sorting the K historical traffic data corresponding to K users in descending order, the historical traffic data of the controlled user is one of the first M historical traffic data, and the controlled user is one of the P users among the K users who have the fewest blocked services. There is no fraudulent traffic in the historical traffic data of the controlled user, K is a positive integer, and M and P are both positive integers less than or equal to K.

[0036] In this embodiment of the application, the first user category includes a friendly category. Thus, by training the perception model based on the first traffic data, the accuracy of the perception model in identifying the traffic data (business category) of friendly users can be improved.

[0037] In some possible implementations, before collecting the first user's first traffic data, the method further includes: receiving first information, the first information being used to request the first traffic data.

[0038] In some possible implementations, the first information is used to indicate the first user category, or the first information is used to indicate the first user.

[0039] In this embodiment of the application, the first information is used to indicate the first user category, or the first information is used to indicate the first user. In this way, sample data can be subscribed based on a specific user group, thereby improving the efficiency of sample data collection and reducing the transmission of invalid sample data between network elements.

[0040] Thirdly, an apparatus for collecting sample data is provided, comprising: the apparatus for collecting sample data can be used in a first network element of the first aspect; the apparatus for collecting sample data can be the first network element, or a device in the first network element (e.g., a chip, or a chip system, or a circuit, or a processor), or a device that can be matched with the first network element, or a logic module or software that can implement all or part of the first network element.

[0041] The device for collecting sample data includes modules that perform the methods / operations / steps / actions described in the first aspect or any possible implementation of the first aspect. These modules can be hardware circuits, software, or a combination of hardware circuits and software.

[0042] Fourthly, a device for collecting sample data is provided, comprising: the device for collecting sample data can be used in a first network element of the second aspect; the device for collecting sample data can be a second network element; or it can be a device in the second network element (e.g., a chip, a chip system, a circuit, or a processor); or it can be a device that can be used in conjunction with the second network element; or it can be a logic module or software that can implement all or part of the second network element.

[0043] The device for collecting sample data includes modules that perform the methods / operations / steps / actions described in the second aspect or any possible implementation of the second aspect. These modules can be hardware circuits, software, or a combination of hardware circuits and software.

[0044] Fifthly, an apparatus for acquiring sample data is provided, comprising: a processor and a memory, the processor being coupled to the memory, the memory being used to store a computer program (also referred to as code or instructions), the computer program being executed by the processor causing the apparatus to perform the method of the first aspect or any possible implementation thereof.

[0045] In some possible implementations, the device also includes a memory coupled to the processor.

[0046] In some possible implementations, there are one or more processors, and / or one or more memories.

[0047] In some possible implementations, the memory can be integrated with the processor, or the memory can be set up separately from the processor.

[0048] In a sixth aspect, an apparatus for acquiring sample data is provided, comprising: a processor and a memory, the processor being coupled to the memory, the memory being used to store a computer program (also referred to as code or instructions), the computer program being executed by the processor causing the apparatus to perform the method of the second aspect or any possible implementation thereof.

[0049] In some possible implementations, the device also includes a memory coupled to the processor.

[0050] In some possible implementations, there are one or more processors, and / or one or more memories.

[0051] In some possible implementations, the memory can be integrated with the processor, or the memory can be set up separately from the processor.

[0052] In a seventh aspect, a computer-readable storage medium is provided, on which a computer program (also referred to as code or instructions) is stored, which, when executed on a computer, causes the computer to perform the methods of any of the above aspects or any possible implementations thereof.

[0053] Eighthly, a computer program product is provided, comprising: a computer program (also referred to as code or instructions) that, when run on a computer, causes the computer to perform the method in any of the above aspects or any possible implementations of any of the above aspects.

[0054] A ninth aspect provides a chip comprising: a processor and a memory, the memory for storing a computer program (also referred to as code or instructions), the processor for calling and running the computer program stored in the memory, such that an apparatus or device on which the chip is mounted performs the method of any of the above aspects or any possible implementation thereof. Attached Figure Description

[0055] Figure 1 is a schematic block diagram of an intelligent SA scheme in an embodiment of this application.

[0056] Figure 2 is a schematic block diagram of a user classification method according to an embodiment of this application.

[0057] Figure 3 is a schematic block diagram of a wireless communication system applicable to this application.

[0058] Figure 4 is a schematic flowchart of a method for collecting sample data provided in an embodiment of this application.

[0059] Figure 5 is a schematic flowchart of a method for collecting sample data provided in another embodiment of this application.

[0060] Figure 6 is a schematic flowchart of a method for collecting sample data provided in another embodiment of this application.

[0061] Figure 7 is a schematic structural diagram of a device for collecting sample data according to an embodiment of this application.

[0062] Figure 8 is a schematic structural diagram of a device for collecting sample data provided in another embodiment of this application.

[0063] Figure 9 is a schematic structural diagram of an apparatus provided in one embodiment of this application. Detailed Implementation

[0064] The technical solutions in the embodiments of this application will now be described with reference to the accompanying drawings.

[0065] In the description of this application, unless otherwise stated, " / " indicates that the objects before and after are in an "or" relationship. For example, A / B can represent A or B. "And / or" in this application merely describes the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent: A alone, A and B simultaneously, and B alone, where A and B can be singular or plural. Furthermore, in the description of this application, unless otherwise stated, "multiple" refers to two or more. "At least one of the following" or similar expressions refer to any combination of these items, including any combination of single or plural items. For example, at least one of a, b, or c can represent: a, b, c, ab, ac, bc, or abc, where a, b, and c can be single or multiple. Additionally, to facilitate a clear description of the technical solutions of the embodiments of this application, the terms "first" and "second" are used in the embodiments of this application to distinguish identical or similar items with essentially the same function and effect. Those skilled in the art will understand that the terms "first," "second," etc., do not limit the quantity or order of execution, and that "first," "second," etc., do not necessarily imply that they are different. It should be understood that in this application, descriptions such as "in the case of," "if," "when," "if," etc., can be used interchangeably.

[0066] With the development of communication technology, some communication systems have introduced service awareness (SA) technology to sense different services within the transmission channel, thereby enabling different policy controls (such as service control, billing control, etc.) for different services.

[0067] The primary purpose of operator-implemented service awareness (SA) technology is to directly increase service revenue through differentiated billing based on service content; to promote the effective utilization of network resources and enrich service content by managing bandwidth and ensuring transmission quality for certain services; and to provide network planning and management references by generating service reports, thereby optimizing network quality, improving customer experience, and promoting business development. To achieve these goals, we first need to implement service content awareness on the service traffic within the transmission pipeline, and then apply different policy controls (service control and billing control) to different services.

[0068] Current common business awareness methods are plaintext recognition methods based on plaintext rules in a rule base. However, with the increasing amount of encrypted traffic in the live network, manually extracting plaintext rules is becoming increasingly difficult, and frequent application updates require continuous manpower to extract new recognition rules. To address these issues, an intelligent SA solution can be introduced into the network data analytics function (NWDAF) and user plane function (UPF) network elements. This solution aims to leverage the rapid self-learning capabilities of artificial intelligence (AI) to replace the complex work of protocol experts in analyzing packets and designing rules, thereby reducing labor costs, shortening SLA time, and solving the pain points and recognition problems of the current SA knowledge base.

[0069] For example, as shown in Figure 1, the NWDAF can send sample subscription messages for sampled users to the UPF. The UPF can then randomly sample users based on the sampling rate in the sample subscription messages sent by the NWDAF, collect the business flow characteristics of the sampled users, and report the collected sample data of the sampled users to the NWDAF. The NWDAF can then collect and manage the sample data reported by the UPF. After collecting samples for a period of time, the training and evaluation of the AI model are triggered. After training is completed, the AI model can be sent to the UPF. Correspondingly, the UPF (such as the traffic prediction unit (TPU) in the UPF) can load the AI model and use the AI model to perform AI inference, so as to enhance the recognition capabilities of the existing SA (such as the SA recognition of the interface service unit (ISU) in the UPF) by utilizing the AI recognition results, thereby improving the effectiveness of the matching strategy.

[0070] However, the aforementioned business awareness methods still have some problems. For example, fraudulent / spoofed traffic is prevalent in the current network, often used to evade application (APP) blocking or traffic billing. For instance, when accessing services using XVPN (a virtual private network, VPN) APP, users can customize and select different domain names for spoofing, thus deceiving traditional knowledge base identification. Intelligent SA's AI model training is a supervised learning process that requires labeled sample data. However, the current network contains a large amount of fraudulent traffic. If the traditional SA knowledge base identification results are used as labels, randomly sampling users to collect training data will result in the mixing of fraudulent traffic, causing the AI model to learn incorrectly and thus failing to accurately identify the business traffic (business category), leading to misidentification of business traffic.

[0071] To address one or more of the aforementioned technical problems, this application proposes a method and apparatus for collecting sample data. By tagging users based on their historical traffic data and collecting sample data and training samples based on these user tags, the training effect of the model is improved, thereby enhancing the recognition accuracy of the perception model.

[0072] The following example, using user categories including control, fraud, and friendly categories, illustrates the solution in this application.

[0073] Step 1: The first network element can profile users by using the user's historical access call detail records reported by the second network element, and then label users according to the user profile results. Finally, it collects sample data for specific scenarios based on the labeling.

[0074] Category tags can include categories such as control, fraud, and friendliness. The embodiments in Figure 4 will provide a detailed explanation of these categories.

[0075] For example, as shown in Figure 2, for users ranked in the top M of business traffic (M is a positive integer), user profiles can be created based on historical access behavior, number of blocked traffic, traffic information, protocol results, etc. Based on the user profile results, user categories can be assigned labels. Then, sample data for specific scenarios can be collected based on the category labels, such as traffic from blocked apps, fraudulent traffic, and sample data and negative samples from mainstream applications.

[0076] Step 2: The first network element can subscribe to sample data of a specific user group based on the user profile results.

[0077] Step 3: The second network element can collect sample data for specific user groups based on the subscription messages of the first network element and report it.

[0078] Step 4: The first network element can collect and manage sample data based on the user profile results.

[0079] Step 5: Train an AI model that supports protocol recognition on the first network element.

[0080] Step 6: Load the AI model onto the second network element to supplement / correct the existing recognition results and support policy matching based on the AI recognition results.

[0081] Step 7: The first network element can present a report on the effect of AI-enhanced recognition.

[0082] It should be noted that the steps or the execution order of the steps included in the above embodiments are merely examples and not limitations. The embodiments of this application may include more or fewer steps, or may include other steps. At the same time, the above steps may be executed in other orders, and the embodiments of this application are not limited in this regard.

[0083] The technical solutions of this application can be applied to various wireless communication systems. These systems can be various wireless communication systems, such as 5th generation (5G) systems, new radio (NR), long term evolution (LTE) systems, LTE frequency division duplex (FDD) systems, LTE time division duplex (TDD) systems, satellite and other non-terrestrial communication systems, and communication systems that integrate terrestrial and non-terrestrial communication. The technical solutions provided in this application can also be applied to future communication systems.

[0084] In subsequent embodiments, a wireless communication system is used as an example to describe the technical solutions of the embodiments of this application in detail. The wireless communication system applicable to the embodiments of this application is described below with reference to FIG3.

[0085] Figure 3 is a schematic architecture diagram of a wireless communication system applicable to an embodiment of this application. The wireless communication system 100 may include network slice selection function (NSSF), authentication server function (AUSF) network elements, policy control function (PCF) network elements, unified data management (UDM) network elements, network repository function (NRF) network elements, access and mobility management function (AMF) network elements, session management function (SMF) network elements, network data analytics function (NWDAF) network elements, radio access network (RAN) (which can be RAN or AN), and user plane function (UPF) network elements, etc.

[0086] In the aforementioned wireless communication system, the portion excluding the radio access network can be referred to as the core network. The core network can include control plane (CP) network elements and user plane (UP) network elements. Specifically, user plane network elements can include UPF, and control plane network elements can include AMF, SMF, and PCF.

[0087] The following section introduces the various network elements of the core network.

[0088] PCF network elements support a unified policy framework to manage network behavior and provide policy rules for network entities to implement.

[0089] UDM network elements are responsible for the management of user identifiers, contract data, authentication data, and the registration and management of user service network elements.

[0090] UPF network elements are modules in the core network that process data. Their main functions include: routing and forwarding data from the base station to the network, quality of service (QoS) control, and billing information statistics.

[0091] The AMF network element is responsible for UE authentication, authorization, registration, mobility management and connection management. For example, the AMF can interact with the RAN and UE through the N2 and N1 interfaces to complete functions such as registration, session establishment and mobility management.

[0092] The SMF network element is mainly responsible for session management, managing the creation and deletion of user PDU sessions, maintaining PDU session context and user plane forwarding pipeline information, allocating addresses to terminals, and managing various channels between terminals and the core network. For example, the SMF can control the UPF through the N4 interface.

[0093] The user plane function (UPF) entity's main functions include packet routing and forwarding, serving as a session anchor, acting as an uplink classifier to support routing traffic to the local data network, and serving as a branch point to support multi-homed PDU sessions.

[0094] The NWDAF network element can be responsible for security-related analysis. It should be noted that the security analysis function can also be performed by other network elements, and this embodiment of the application is not limited to this.

[0095] Data network (DN), such as carrier services, internet access, or third-party services.

[0096] The devices, network elements, or entities mentioned in this application can be interchanged in some scenarios.

[0097] It is understood that Figure 3 exemplarily illustrates the architecture of a communication system to which the methods provided in the embodiments of this application are applicable. The communication system to which the methods provided in the embodiments of this application are applicable may include other network elements or network entities, and this is not limited in the embodiments of this application.

[0098] In this embodiment, the first network element can be the NWDAF network element in Figure 3, and the second network element can be the UPF network element in Figure 3. As shown in Figure 3, the NWDAF network element can perform user profiling and AI training, while the UPF network element can perform sample collection and AI inference. It should be noted that the first network element and / or the second network element in this embodiment can also be other network elements, and this embodiment is not limited to them.

[0099] The method for collecting sample data in the embodiments of this application will be illustrated in detail below with reference to Figure 4.

[0100] Figure 4 is a schematic flowchart of a method for collecting sample data according to an embodiment of this application. The method 400 shown in Figure 4 may include steps S410, S420 and S430, as follows:

[0101] S410, the second network element, collects the first user's first traffic data.

[0102] Among them, the user category of the first user can be the first user category, and the historical traffic data of the first user category can satisfy the first feature.

[0103] In some embodiments, the first user category may include a control category, and the historical traffic data of the users in the first user category satisfies a first characteristic, which may include:

[0104] When sorting the K historical traffic data corresponding to K users in descending order, the historical traffic data of the controlled users is one of the top M historical traffic data, and the controlled users are one of the N users among the K users who have the most blocked services. K is a positive integer, and M and N are both positive integers less than or equal to K.

[0105] In other words, users subject to control can refer to those who rank in the top M for historical traffic and in the top N for blocking services.

[0106] In this embodiment, the first user category includes a control category. Therefore, training the perception model based on the first traffic data can improve the accuracy of the perception model in identifying the traffic data (business category) of control category users. For example, sample data of control category users can be collected, filtered, and used as positive samples for training and data mining to solve the identification problem of control applications.

[0107] In some embodiments, the first user category may include a fraud category, and the historical traffic data of the users in the first user category satisfy a first characteristic, which may include:

[0108] If the historical traffic of fraudulent users consistently contains free-fee traffic within a first time period, and the proportion of free-fee traffic exceeds a first threshold within that first time period, the SA (Service Advisor) will identify the historical traffic of fraudulent users as free-fee traffic, but the perception model will identify it as fraudulent traffic. The first time period and the first threshold can be set according to actual conditions, and the specific values are not limited in this embodiment. For example, the first threshold can be 80% or 98%.

[0109] In other words, fraudulent users can refer to users who have free traffic within a first time period, and whose free traffic accounts for more than a first threshold within that first time period. SA identifies them as users of apps with free traffic, but AI identifies them as users with fraudulent traffic.

[0110] In this embodiment, the first user category includes fraudulent users. Therefore, training the perception model based on the first traffic data can improve the accuracy of the perception model in identifying the traffic data (business category) of fraudulent users. For example, data on the free rates of fraudulent users can be collected as sample data, filtered, and used as samples for training and mining to solve the problem of identifying fraudulent applications.

[0111] In some embodiments, the first user category may include a friendly category, and the historical traffic data of the users in the first user category satisfies a first characteristic, which may include:

[0112] When sorting the K historical traffic data corresponding to K users in descending order, the historical traffic data of the controlled user is one of the first M historical traffic data, and the controlled user is one of the P users among the K users who have blocked the least number of services. There is no fraudulent traffic in the historical traffic data of the controlled user. K is a positive integer, and M and P are both positive integers less than or equal to K.

[0113] In other words, friendly users can refer to users who are ranked in the top M of historical traffic, ranked last in blocking services, and who do not have fraudulent business flows.

[0114] In this embodiment, the first user category includes a friendly category. Therefore, training the perception model based on the first traffic data can improve the accuracy of the perception model in identifying the traffic data (business category) of friendly users. For example, the business traffic of friendly users can be collected as negative samples and sample data from mainstream applications to solve the problem of identifying friendly applications.

[0115] It should be noted that the first user category may also include other categories, and this application embodiment does not limit this.

[0116] In some embodiments, prior to S410, method 400 may further include step S402, as follows:

[0117] S402, the first network element determines the user category of the first user based on the historical traffic data of the first user.

[0118] The historical traffic data of the first user can be reported by the second network element to the first network element. Optionally, the historical traffic data can be carried in the full call detail record (UFDR). For example, the historical traffic data can include the five-tuple information of the traffic data, traffic information, whether it was blocked, time, and other information elements.

[0119] For example, the first network element can profile users by reporting their historical access service call details from the second network element. This profile can be based on historical access behavior, number of blocked traffic, traffic information, protocol results, etc. Based on the user profile results, users can be categorized and tagged. Then, based on the categorization tags, sample data for specific scenarios can be collected, such as traffic from blocked apps, fraudulent traffic, and sample data and negative samples from mainstream applications.

[0120] In this embodiment of the application, the user category of the first user is determined based on the historical traffic data of the first user. In this way, sample data can be collected based on a specific user group, which improves the efficiency of sample data collection and thus improves the training effect of the model and the recognition accuracy of the perception model.

[0121] In some embodiments, prior to S410, method 400 may further include step S404, as follows:

[0122] S404, the first network element sends the first information to the second network element.

[0123] The first piece of information can be used to request the first traffic data.

[0124] In some embodiments, the first information may be used to indicate a first user category, or the first information may be used to indicate a first user. For example, the first information may indicate a first user category; or the first information may also indicate the identifier of the first user, such as the first user's International Mobile Subscriber Identity (IMSI) or Mobile Station International ISDN Number (MSISDN); or the first information may simultaneously indicate both the first user category and the identifier of the first user.

[0125] In this embodiment of the application, the first information is used to indicate the first user category, or the first information is used to indicate the first user. In this way, sample data can be subscribed based on a specific user group, thereby improving the efficiency of sample data collection and reducing the transmission of invalid sample data between network elements.

[0126] S420, the second network element sends the first user's first traffic data to the first network element.

[0127] In some embodiments, the second network element may also send a user category information element to the first network element to identify whether the sample (i.e. the first traffic data) is a sample of a controlled user, a sample of a fraudulent user, or a sample of a friendly user.

[0128] S430, the first network element trains a perception model based on the first traffic data.

[0129] The perception model can be used to identify the business category of traffic data. The sample label of the first traffic data can include the business category of the first traffic data, and there is a correlation between the business category of the first traffic data and the first user category.

[0130] In this embodiment, the historical traffic data of the first user category satisfies the first feature, the business category of the first traffic data is related to the first user category, and the sample label of the first traffic data includes the business category of the first traffic data. In this way, it is equivalent to adding a sample label to the first traffic data based on the user category of the first user, which can improve the sample quality of the training samples. At this time, training the perception model based on the first traffic data can improve the training effect of the model, thereby improving the recognition accuracy of the perception model.

[0131] In some embodiments, the first network element can distribute the trained perception model to the second network element. Correspondingly, the second network element can load the perception model and perform AI model inference, protocol recognition, and other tasks.

[0132] Furthermore, the second network element can report the identification results to the first network element. Correspondingly, the first network element can present a blocking report to demonstrate the enhanced effect of the identification results on application blocking, and can also present a fraud report to demonstrate the enhanced effect of the identification results on fraudulent traffic identification.

[0133] The following, with reference to Figure 5, takes as an example a first network element being UPF, a second network element being NWDAF, and a first user having user categories including controllable and friendly categories, to illustrate in detail the method for collecting sample data in this application embodiment.

[0134] Figure 5 is a schematic flowchart of a method for collecting sample data according to an embodiment of this application. The method 500 shown in Figure 5 may include steps S501 to S512, as follows:

[0135] S501, UPF collects information from users' historical traffic data streams.

[0136] Historical traffic data streams can include information such as quintuples, traffic, packet count, matching rules, and protocols.

[0137] S502, UPF reports the full UFDR call detail record to NWDAF.

[0138] UFDR full call detail records can carry historical traffic data streams, such as the five-tuple information of the data stream, traffic information, whether it was blocked, time, and other information elements.

[0139] S503, NWDAF creates user profiles.

[0140] NWDAF can profile a user based on historical access behavior, number of blocked flows, traffic information, and protocol results, and label the user as "friendly" or "controlled". For example, a user who ranks last in blocked services but ranks in the top M in terms of service traffic can be defined as a friendly user, while a user who ranks in the top N in blocked services but ranks in the top M in terms of service traffic can be defined as a controlled user.

[0141] S504, NWDAF sends the user's sample subscription message to UPF.

[0142] For example, NWDAF can carry "User Category" and "User List" information elements in sample subscription messages. The "User Category" information element can indicate the user's category attribute, such as friendly or controllable, while the "User List" information element can indicate the IMSI or IMSIDN of users under each user category.

[0143] For example, NWDAF can also carry a "user identifier" information element in the subscription message. For instance, it can subscribe to non-blocking sample data of friendly users, or it can subscribe to blocking sample data of controlled users.

[0144] S505, UPF reports the user's sample data to NWDAF.

[0145] UPF can include a "user category" information element in the reported message to identify whether the sample data is from a friendly user or a controlled user.

[0146] S506, NWDAF collects sample data based on user type.

[0147] For example, NWDAF can collect positive samples based on controlled users and negative samples based on friendly users.

[0148] S507, NWDAF trains and evaluates AI models (such as perception models) based on training samples.

[0149] S508, NWDAF sends the trained AI model to UPF.

[0150] S509, UPF loads AI models, performs AI model inference, and identifies protocols.

[0151] S510, UPF supplements / corrects the existing knowledge base recognition results based on the AI recognition results, and performs policy matching based on the AI recognition results to execute the blocking policy.

[0152] S511, UPF reports AI recognition results to NWDAF via UFDR full call one-way NWDAF.

[0153] S512, NWDAF presents a blocking report.

[0154] The blocking report can demonstrate the enhanced effect of AI recognition results on application blocking.

[0155] The following, with reference to Figure 6, uses the example of a first user whose user categories include fraudulent and friendly categories to illustrate the method for collecting sample data in this application embodiment.

[0156] Figure 6 is a schematic flowchart of a method for collecting sample data according to an embodiment of this application. The method 600 shown in Figure 6 may include steps S601 to S612, as follows:

[0157] S601, UPF collects information from users' historical traffic data streams.

[0158] Historical traffic data streams can include information such as 5-tuples, traffic, protocols, rates, and Domain Name System (DNS) flows.

[0159] S602, UPF reports the full UFDR call detail record to NWDAF.

[0160] UFDR full call detail records can carry historical traffic data streams, such as the five-tuple information of the data stream, rate, traffic information, protocol information, start and end time, AI recognition results, and other information elements.

[0161] S603, NWDAF creates user profiles.

[0162] NWDAF can profile users based on metrics such as business traffic, rates, and DNS traffic, labeling users as "friendly" or "fraudulent." For example, NWDAF can identify users whose free rates last for a certain duration and infer whether their behavior matches that of a fraudulent user based on the number of free services, traffic distribution, the number of non-free services, traffic distribution, DNS-related requests, SA, and AI identification results during that time period.

[0163] S604, NWDAF sends the user's sample subscription message to UPF.

[0164] For example, NWDAF can carry "User Category" and "User List" information elements in sample subscription messages. The "User Category" information element can indicate the user's category attribute, such as friendly or fraudulent, while the "User List" information element can indicate the IMSI or IMSIDN of users under each user category.

[0165] For example, NWDAF can also carry a "user identifier" element in the subscription message. For instance, it can subscribe to sample data of free rates for fraudulent users, or it can subscribe to full sample data of friendly users.

[0166] S605, UPF reports the user's sample data to NWDAF.

[0167] UPF can include a "user category" information element in the reported message to identify whether the sample data is from friendly users or fraudulent users.

[0168] S606, NWDAF collects sample data based on user type.

[0169] For example, NWDAF can collect fraudulent traffic based on fraudulent users and collect traffic from free / unrestricted apps based on friendly users.

[0170] S607, NWDAF trains and evaluates AI models (such as perception models) based on training samples.

[0171] S608, NWDAF sends the trained AI model to UPF.

[0172] S609, UPF loads AI models, performs AI model inference, and identifies protocols.

[0173] Load the AI model onto UPF and send the traffic of "fraudulent users" and traffic identified by SA as having free rates to the AI model for inference and protocol identification.

[0174] S610, UPF confirms rates based on AI recognition results.

[0175] For example, if the AI model identifies a data stream as fraudulent traffic (such as traffic from fraudulent users), the rate for that data stream will be changed from a free rate to a billed rate. If it identifies the data stream as traffic from other users (such as traffic from friendly users), the AI identification result will only be reported to NWDAF to provide fraud information for subsequent user profiling.

[0176] S611, UPF reports AI recognition results to NWDAF via UFDR full call one-way NWDAF.

[0177] S612, NWDAF presents a fraud report.

[0178] Fraud reports can demonstrate the enhanced effectiveness of AI in identifying fraudulent traffic.

[0179] The method embodiments of this application have been described in detail above with reference to Figures 1 to 6. The apparatus embodiments of this application will be described in detail below with reference to Figures 7 and 9. It should be understood that the descriptions of the method embodiments correspond to the descriptions of the apparatus embodiments; therefore, any parts not described in detail can be referred to the preceding method embodiments.

[0180] Figure 7 is a schematic structural diagram of a sample data acquisition device provided in an embodiment of this application. The sample data acquisition device 700 shown in Figure 7 can be used in the first network element in the foregoing embodiments. The sample data acquisition device 700 can be the first network element, or a device in the first network element (e.g., a processor, chip, chip system, circuit, or a functional module, etc.), or a device that can be matched and used with the first network element, or a logic module or software that can implement all or part of the first network element.

[0181] As shown in Figure 7, the device 700 for collecting sample data includes a receiving unit 710 and a training unit 720, as detailed below:

[0182] The receiving unit 710 is used to receive the first traffic data of the first user, wherein the user category of the first user is a first user category, and the historical traffic data of the first user category satisfies a first feature;

[0183] Training unit 720 is used to train a perception model based on the first traffic data. The perception model is used to identify the service category of the traffic data. The sample label of the first traffic data includes the service category of the first traffic data. There is a correlation between the service category of the first traffic data and the first user category.

[0184] In some possible implementations, the first user category includes a control category, wherein the historical traffic data of the first user category satisfies a first characteristic, including: when sorting the K historical traffic data corresponding to K users in descending order, the historical traffic data of the control category user is one of the top M historical traffic data, and the control category user is one of the N users among the K users who have the most blocked services, where K is a positive integer, and M and N are both positive integers less than or equal to K.

[0185] In some possible implementations, the first user category includes a fraud category, wherein the historical traffic data of the users in the first user category satisfies a first characteristic, including:

[0186] If the historical traffic of fraudulent users consistently contains free-rate traffic within a first time period, and the proportion of free-rate traffic exceeds a first threshold within the first time period, the Service Awareness (SA) classifies the historical traffic of fraudulent users as free-rate traffic, but the perception model classifies it as fraudulent traffic.

[0187] In some possible implementations, the first user category includes a friendly category, wherein the historical traffic data of the users in the first user category satisfies a first characteristic, including:

[0188] When sorting the K historical traffic data corresponding to K users in descending order, the historical traffic data of the controlled user is one of the first M historical traffic data, and the controlled user is one of the P users among the K users who have the fewest blocked services. There is no fraudulent traffic in the historical traffic data of the controlled user, K is a positive integer, and M and P are both positive integers less than or equal to K.

[0189] In some possible implementations, the apparatus 700 further includes a sending unit 730, which, before receiving the first traffic data from the first user, is configured to: send first information, the first information being used to request the first traffic data.

[0190] In some possible implementations, the first information is used to indicate the first user category, or the first information is used to indicate the first user.

[0191] In some possible implementations, the apparatus 700 further includes a determining unit 740, which, before receiving the first user's first traffic data, determines the user category of the first user as the first user category based on the first user's historical traffic data.

[0192] Figure 8 is a schematic structural diagram of a device for collecting sample data according to an embodiment of this application. The device 800 for collecting sample data shown in Figure 8 can be used in the second network element in the foregoing embodiments. The device 800 for collecting sample data can be the second network element, or a device in the second network element (e.g., a processor, chip, chip system, circuit, or a functional module, etc.), or a device that can be used in conjunction with the second network element, or a logic module or software that can implement all or part of the second network element.

[0193] As shown in Figure 8, the device 800 for collecting sample data includes a data acquisition unit 810 and a data transmission unit 820, as detailed below:

[0194] The acquisition unit 810 is used to acquire the first traffic data of the first user, wherein the user category of the first user is a first user category, and the historical traffic data of the first user category satisfies a first feature.

[0195] The sending unit 820 is used to send the first traffic data to train a perception model based on the first traffic data. The perception model is used to identify the service category of the traffic data. The sample label of the first traffic data includes the service category of the first traffic data. There is a correlation between the service category of the first traffic data and the first user category.

[0196] In some possible implementations, the first user category includes a control category, wherein the historical traffic data of the users in the first user category satisfies a first characteristic, including:

[0197] When sorting the K historical traffic data corresponding to K users in descending order, the historical traffic data of the controlled user is one of the first M historical traffic data, and the controlled user is one of the N users among the K users who have the most blocked services, where K is a positive integer, and M and N are both positive integers less than or equal to K.

[0198] In some possible implementations, the first user category includes a fraud category, wherein the historical traffic data of the users in the first user category satisfies a first characteristic, including:

[0199] If the historical traffic of fraudulent users consistently contains free-rate traffic within a first time period, and the proportion of free-rate traffic exceeds a first threshold within the first time period, the Service Awareness (SA) classifies the historical traffic of fraudulent users as free-rate traffic, but the perception model classifies it as fraudulent traffic.

[0200] In some possible implementations, the first user category includes a friendly category, wherein the historical traffic data of the users in the first user category satisfies a first characteristic, including:

[0201] When sorting the K historical traffic data corresponding to K users in descending order, the historical traffic data of the controlled user is one of the first M historical traffic data, and the controlled user is one of the P users among the K users who have the fewest blocked services. There is no fraudulent traffic in the historical traffic data of the controlled user, K is a positive integer, and M and P are both positive integers less than or equal to K.

[0202] In some possible implementations, the device 800 further includes a receiving unit 830, which, before collecting the first traffic data of the first user, is configured to: receive first information, the first information being used to request the first traffic data.

[0203] In some possible implementations, the first information is used to indicate the first user category, or the first information is used to indicate the first user.

[0204] Figure 9 is a schematic structural diagram of an apparatus provided in an embodiment of this application. The dashed lines in Figure 9 indicate that the unit or module is optional. This apparatus 900 can be used to implement the methods described in the above method embodiments. The apparatus 900 can be a chip or a device for acquiring sample data.

[0205] The device 900 may include one or more processors 910. The processor 910 may support the device 900 in implementing the methods described in the preceding method embodiments. The processor 910 may be a general-purpose processor or a special-purpose processor. For example, the processor may be a central processing unit (CPU). Alternatively, the processor may be other general-purpose processors, microprocessor units (MPUs), microcontroller units (MCUs), graphics processing units (GPUs), artificial intelligence processors (AI processors) or neural processing units (NPUs), digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or any conventional processor.

[0206] The device 900 may further include one or more memories 920. The memories 920 store a program that can be executed by the processor 910, causing the processor 910 to perform the methods described in the preceding method embodiments. The memories 920 may be independent of the processor 910 or integrated within the processor 910. In this embodiment, the memories 920 may include, but are not limited to, cache, read-only memory (ROM), random access memory (RAM), synchronous dynamic random access memory (SDRAM), hard disk drive (HDD) or solid-state drive (SSD), erasable programmable read-only memory (EPROM), or compact disc read-only memory (CD-ROM), etc.

[0207] The device 900 may also include a transceiver 930. The processor 910 can communicate with other devices or chips via the transceiver 930. For example, the processor 910 can send and receive data with other devices or chips via the transceiver 930.

[0208] It should be noted that the information interaction and execution process between the above-mentioned devices / units are based on the same concept as the method embodiments of this application. For details on their specific functions and technical effects, please refer to the method embodiments section, and they will not be repeated here.

[0209] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the above-described division of functional units and modules is merely an example. In practical applications, the above functions can be assigned to different functional units and modules as needed, that is, the internal structure of the device can be divided into different functional units or modules to complete all or part of the functions described above. The functional units and modules in the embodiments can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit. Furthermore, the specific names of the functional units and modules are only for easy differentiation and are not intended to limit the scope of protection of this application. The specific working process of the units and modules in the above system can be referred to the corresponding process in the foregoing method embodiments, and will not be repeated here.

[0210] This application also provides a computer-readable storage medium storing a computer program that, when run on a computer, causes the computer to perform the steps described in the various method embodiments above.

[0211] This application also provides a computer program product, which includes a computer program that, when run on a computer, causes the computer to perform the steps described in the various method embodiments above.

[0212] This application also provides a chip, which includes a processor and a memory. The memory is used to store a computer program, and the processor is used to call and run the computer program stored in the memory, so that a device or apparatus (such as a device for collecting sample data) with the chip installed performs the steps in the above-described method embodiments.

[0213] If the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, all or part of the processes in the methods of the above embodiments of this application can be implemented by a computer program instructing related hardware. The computer program can be stored in a computer-readable storage medium, and when executed by a processor, it can implement the steps of the various method embodiments described above. The computer program includes computer program code, which can be in the form of source code, object code, executable files, or some intermediate form. The computer-readable storage medium can include at least: any entity or device capable of carrying computer program code to a device / app, a recording medium, a computer memory, a read-only memory (ROM), a random access memory (RAM), an electrical carrier signal, a telecommunication signal, and a software distribution medium. Examples include USB flash drives, portable hard drives, magnetic disks, or optical disks. In some possible implementations, the computer-readable storage medium may not be an electrical carrier signal or a telecommunication signal.

[0214] In the above embodiments, the descriptions of each embodiment have different focuses. For parts that are not described in detail or recorded in a certain embodiment, please refer to the relevant descriptions of other embodiments.

[0215] Those skilled in the art will recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this application.

[0216] In the embodiments provided in this application, it should be understood that the disclosed apparatus / devices and methods can be implemented in other ways. For example, the apparatus / device embodiments described above are merely illustrative. For instance, the division of modules or units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces; the indirect coupling or communication connection between devices or units may be electrical, mechanical, or other forms.

[0217] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.

[0218] The above-described embodiments are only used to illustrate the technical solutions of this application, and are not intended to limit them. Although this application has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of this application, and should all be included within the protection scope of this application.

Claims

1. A method for collecting sample data, characterized in that, The method is applied to a first network element, and the method includes: Receive the first traffic data of the first user, the user category of the first user is the first user category, and the historical traffic data of the first user category satisfies the first feature; A perception model is trained based on the first traffic data. The perception model is used to identify the business category of the traffic data. The sample labels of the first traffic data include the business category of the first traffic data. There is a correlation between the business category of the first traffic data and the first user category.

2. The method according to claim 1, characterized in that, The first user category includes a control category, wherein the historical traffic data of users in the first user category satisfies a first characteristic, including: When sorting the K historical traffic data corresponding to K users in descending order, the historical traffic data of the controlled user is one of the first M historical traffic data, and the controlled user is one of the N users among the K users who have the most blocked services, where K is a positive integer, and M and N are both positive integers less than or equal to K.

3. The method according to claim 1 or 2, characterized in that, The first user category includes fraud, wherein the historical traffic data of users in the first user category satisfies a first characteristic, including: If the historical traffic of fraudulent users consistently contains free-rate traffic within a first time period, and the proportion of free-rate traffic exceeds a first threshold within the first time period, the Service Awareness (SA) classifies the historical traffic of fraudulent users as free-rate traffic, but the perception model classifies it as fraudulent traffic.

4. The method according to any one of claims 1 to 3, characterized in that, The first user category includes a friendly category, wherein the historical traffic data of users in the first user category satisfies a first characteristic, including: When sorting the K historical traffic data corresponding to K users in descending order, the historical traffic data of the controlled user is one of the first M historical traffic data, and the controlled user is one of the P users among the K users who have the fewest blocked services. There is no fraudulent traffic in the historical traffic data of the controlled user, K is a positive integer, and M and P are both positive integers less than or equal to K.

5. The method according to any one of claims 1 to 4, characterized in that, Before receiving the first user's first traffic data, the method further includes: Send a first message, which is used to request the first traffic data.

6. The method according to claim 5, characterized in that, The first information is used to indicate the first user category, or the first information is used to indicate the first user.

7. The method according to any one of claims 1 to 6, characterized in that, Before receiving the first user's first traffic data, the method further includes: Based on the first user's historical traffic data, the user category of the first user is determined to be the first user category.

8. A method for collecting sample data, characterized in that, The method is applied to a second network element, and the method includes: Collect the first traffic data of the first user, the user category of the first user is the first user category, and the historical traffic data of the first user category satisfies the first feature; The first traffic data is sent to train a perception model based on the first traffic data. The perception model is used to identify the service category of the traffic data. The sample label of the first traffic data includes the service category of the first traffic data. There is a correlation between the service category of the first traffic data and the first user category.

9. The method according to claim 8, characterized in that, The first user category includes a control category, wherein the historical traffic data of users in the first user category satisfies a first characteristic, including: When sorting the K historical traffic data corresponding to K users in descending order, the historical traffic data of the controlled user is one of the first M historical traffic data, and the controlled user is one of the N users among the K users who have the most blocked services, where K is a positive integer, and M and N are both positive integers less than or equal to K.

10. The method according to claim 8 or 9, characterized in that, The first user category includes fraud, wherein the historical traffic data of users in the first user category satisfies a first characteristic, including: If the historical traffic of fraudulent users consistently contains free-rate traffic within a first time period, and the proportion of free-rate traffic exceeds a first threshold within the first time period, the Service Awareness (SA) classifies the historical traffic of fraudulent users as free-rate traffic, but the perception model classifies it as fraudulent traffic.

11. The method according to any one of claims 8 to 10, characterized in that, The first user category includes a friendly category, wherein the historical traffic data of users in the first user category satisfies a first characteristic, including: When sorting the K historical traffic data corresponding to K users in descending order, the historical traffic data of the controlled user is one of the first M historical traffic data, and the controlled user is one of the P users among the K users who have the fewest blocked services. There is no fraudulent traffic in the historical traffic data of the controlled user, K is a positive integer, and M and P are both positive integers less than or equal to K.

12. The method according to any one of claims 8 to 11, characterized in that, Before collecting the first user's first traffic data, the method further includes: Receive first information, which is used to request the first traffic data.

13. The method according to claim 12, characterized in that, The first information is used to indicate the first user category, or the first information is used to indicate the first user.

14. A device for collecting sample data, characterized in that, include: A module or unit for performing the method as described in any one of claims 1 to 13.

15. A device for collecting sample data, characterized in that, include: A processor and a memory, the processor being coupled to the memory, the memory being used to store a computer program, which, when executed by the processor, causes the apparatus to perform the method as described in any one of claims 1 to 13.

16. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program that, when run on a computer, causes the computer to perform the method as described in any one of claims 1 to 13.

17. A computer program product, characterized in that, include: A computer program that, when run on a computer, causes the computer to perform the method as described in any one of claims 1 to 13.

18. A chip, characterized in that, include: A processor and a memory, the memory for storing a computer program, the processor for calling and running the computer program stored in the memory, causing a device or apparatus on which the chip is mounted to perform the method as described in any one of claims 1 to 13.