A network fingerprint-based identity recognition method and device
By constructing a user relationship network and utilizing network fingerprinting and clustering analysis, the problem of identity recognition in the case of users with the same name was solved, and the effect of accurately locating real users from massive amounts of data was achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- ADVANCED NEW TECHNOLOGIES CO LTD
- Filing Date
- 2018-09-21
- Publication Date
- 2026-06-23
AI Technical Summary
How to accurately locate real users among internet users when they have the same name, especially to identify the true identity of users with the same name in a massive database?
A user relationship network is constructed, and network fingerprints are used to filter and cluster the user relationship network. By calculating network fingerprint indicators and trust indicators, a unique user identifier is determined, and the most likely real users are selected.
By constructing a user relationship network and performing network fingerprinting and clustering analysis, it is possible to accurately locate real users from massive user data, provide more accurate user information, and provide a data foundation for subsequent business.
Smart Images

Figure CN116595397B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of information technology, and in particular to a method and apparatus for identity recognition based on network fingerprints. Background Technology
[0002] Network fingerprinting refers to a unique identifier for a local network, based on certain attributes of its nodes and edges. The clustering coefficient of nodes measures the tightness of connections between adjacent nodes. A network-like connected graph can be constructed using the connections between nodes and edges to analyze the relationships between them. With a large number of internet users and a high prevalence of duplicate names, accurately identifying the real user among several users with the same name using computing devices remains a significant technological challenge.
[0003] Application content
[0004] One objective of this application is to provide a method and apparatus for identity recognition based on network fingerprints to solve the problem of accurately identifying real users when they have the same name.
[0005] According to a first aspect of this application, a method for identity recognition based on network fingerprints is provided, comprising:
[0006] Construct a user relationship network; wherein the user relationship network uses user identifiers as nodes and the relationships between users as edges;
[0007] The user relationship network is filtered using network fingerprints;
[0008] Clustering is performed on the filtered user relationship network to determine the unique identifier of each user.
[0009] Furthermore, the method described in this application, specifically includes the step of filtering the user relationship network using network fingerprints, which includes:
[0010] Calculate the network fingerprint index for each node in the user relationship network;
[0011] Delete the nodes corresponding to network fingerprint indicators that are less than a preset threshold.
[0012] Furthermore, in the method described in this application, the network fingerprint index is calculated based on the clustering coefficient and the number of neighboring nodes of the node.
[0013] Furthermore, in the method described in this application, the clustering coefficient is calculated based on the number of neighboring nodes of the node and the number of edges between the neighboring nodes.
[0014] Furthermore, the method described in this application, specifically includes the step of clustering the filtered user relationship network to determine unique user identifiers, which includes:
[0015] The user relationship network is clustered to obtain subclasses; wherein, the subclasses include the nodes;
[0016] Calculate the credibility index for each of the subclasses;
[0017] The user identifier corresponding to the node with the largest network fingerprint index in the subclass with the largest trust index is determined as the user's unique identifier.
[0018] Furthermore, in the method described in this application, the reliability index is calculated based on the variance of the subclass and the mean of the subclass.
[0019] Furthermore, the method described in this application, prior to constructing the user relationship network, further includes:
[0020] Collect first user information;
[0021] Match the first user information with the second user information;
[0022] The second user information is real information and includes the relationship between users; both the first user information and the second user information include the user identifier.
[0023] According to a second aspect of this application, a network fingerprint-based identity recognition device is provided, comprising:
[0024] A network construction module is used to construct a user relationship network; wherein, the user relationship network uses user identifiers as nodes and the association relationships between users as edges;
[0025] The filtering module is used to filter the user relationship network using network fingerprints;
[0026] The determination module is used to perform clustering processing on the filtered user relationship network to determine the unique identifier of the user.
[0027] Furthermore, in the apparatus described in this application, the screening module is specifically used for:
[0028] Calculate the network fingerprint index for each node in the user relationship network;
[0029] Delete the nodes corresponding to network fingerprint indicators that are less than a preset threshold.
[0030] Furthermore, in the apparatus described in this application, the network fingerprint index is calculated based on the clustering coefficient and the number of neighboring nodes of the node.
[0031] Furthermore, in the apparatus described in this application, the clustering coefficient is calculated based on the number of neighboring nodes of the node and the number of edges between the neighboring nodes.
[0032] Furthermore, in the apparatus described in this application, the determining module is specifically used for:
[0033] The user relationship network is clustered to obtain subclasses; wherein, the subclasses include the nodes;
[0034] Calculate the credibility index for each of the subclasses;
[0035] The user identifier corresponding to the node with the largest network fingerprint index in the subclass with the largest trust index is determined as the user's unique identifier.
[0036] Furthermore, in the apparatus described in this application, the reliability index is calculated based on the variance of the subclass and the mean of the subclass.
[0037] Furthermore, the apparatus described in this application also includes:
[0038] The matching module is used to: collect first user information; and match the first user information with second user information.
[0039] The second user information is real information and includes the relationship between users; both the first user information and the second user information include the user identifier.
[0040] According to a third aspect of this application, a storage device is provided that stores computer program instructions, which are executed according to the method described in this application.
[0041] According to a fourth aspect of this application, a computing device includes: a memory for storing computer program instructions and a processor for executing the computer program instructions, wherein when the computer program instructions are executed by the processor, the computing device is triggered to execute the method described in this application.
[0042] The identity recognition method and apparatus based on network fingerprinting provided in this application construct a user relationship network based on the mutual acquaintance between users, then perform preliminary deduplication screening through network fingerprint indicators, and finally analyze the real users based on the degree of mutual acquaintance between users through clustering processing, providing a data foundation for subsequent processing. Attached Figure Description
[0043] Other features, objects, and advantages of this application will become more apparent from the following detailed description of non-limiting embodiments with reference to the accompanying drawings:
[0044] Figure 1 This is a flowchart illustrating the identity recognition method based on network fingerprints according to Embodiment 1 of this application;
[0045] Figure 2 This is a flowchart illustrating the identity recognition method based on network fingerprints according to Embodiment 2 of this application;
[0046] Figure 3 This is a schematic diagram of the user relationship network in this application;
[0047] Figure 4 This is a schematic diagram of the user relationship network after the filtering process in this application;
[0048] Figure 5 This is a schematic diagram of the network fingerprint-based identity recognition device according to Embodiment 3 of this application;
[0049] Figure 6 This is a schematic diagram of the network fingerprint-based identity recognition device according to Embodiment 4 of this application;
[0050] The same or similar reference numerals in the accompanying drawings represent the same or similar parts. Detailed Implementation
[0051] The present application will now be described in further detail with reference to the accompanying drawings.
[0052] In a typical configuration of this application, the terminal and the service network devices each include one or more processors (CPUs), input / output interfaces, network interfaces, and memory.
[0053] Memory may include non-persistent storage in computer-readable media, such as random access memory (RAM) and / or non-volatile memory, such as read-only memory (ROM) or flash RAM. Memory is an example of computer-readable media.
[0054] Computer-readable media include permanent and non-permanent, removable and non-removable media, which can store information by any method or technology. Information can be computer-readable instructions, data structures, devices of programs, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, read-only optical disc (CD-ROM), digital versatile optical disc (DVD) or other optical storage, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transfer medium that can be used to store information accessible by a computing device.
[0055] When using big data in existing technologies, the problem of accurately locating real users arises. For example, based on publicly available information about criminal gangs from the public security department, a list of real names is obtained, including Zhang Xiao A, Wang Da B, Li You C, and Yuan Wu D, who are suspected of involvement in a major gang fraud case. Meanwhile, within the massive database of big data, there are tens of thousands of users named Zhang Xiao A. Therefore, accurately locating the Zhang Xiao A suspected of involvement in a major gang fraud within this vast database presents a significant technical challenge for existing engineers.
[0056] This application primarily uses the relationships between users as the logical basis for calculations. For example, Zhang Xiao A, suspected of major gang fraud, must know Wang Da B, Li You C, and Yuan Wu D. However, in a massive database, there can generally only be one Zhang Xiao A who knows all three. Based on real user information obtained online and combined with massive user data from a large database, a user relationship network can be constructed, with users as nodes and the relationships between users as edges. Then, using network fingerprinting methods and considering the degree of mutual acquaintance between users, real users can be filtered from the massive user data. For example, based on a list of criminal gangs provided by the public security department, Zhang Xiao A, who knows all three, can be accurately located from the massive user data to obtain real user data, providing more accurate information for subsequent business operations.
[0057] Figure 1 This is a flowchart illustrating the network fingerprint-based identity recognition method according to Embodiment 1 of this application. Figure 1 As shown, the identity recognition method based on network fingerprint in Embodiment 1 of this application includes:
[0058] Step S101: Construct a user relationship network; wherein the user relationship network uses user identifiers as nodes and the relationships between users as edges.
[0059] The process involves obtaining a first user list and a second user list. Both lists include relationships between users who know each other. The second user list is a list of real user information. The first and second user lists are then matched, with each matching user identifier serving as a node. If users A and B know each other, an edge AB is established between nodes A and B. If users A and C do not know each other, there is no edge between nodes A and C; nodes A and C are independent and have no relationship. This results in a user relationship network, which serves as the data foundation for subsequent calculations. User identifiers include user names, nicknames, ID numbers, user accounts, and other information that identifies a user. The relationships between users are the inherent connections between them, such as mutual acquaintance.
[0060] Step S102: Use network fingerprints to filter the user relationship network.
[0061] One approach is to use network fingerprinting to calculate the network fingerprint index of each node, and then use a preset threshold as a standard to filter out a portion of the nodes, thereby removing a large amount of duplicate and redundant data.
[0062] Step S103: Perform clustering processing on the filtered user relationship network to determine the unique identifier of each user.
[0063] The process involves clustering the filtered user relationship network to determine the degree of association between users, i.e., the extent to which they know each other. A credibility index can then be calculated, and the node with the highest credibility index is selected as the most likely genuine user.
[0064] The identity recognition method based on network fingerprints in Embodiment 1 of this application can construct a user relationship network based on the mutual acquaintance and association between users, thereby calculating the most likely real users, thus identifying real users and providing a data foundation for subsequent business.
[0065] Figure 2 This is a flowchart illustrating the network fingerprint-based identity recognition method according to Embodiment 2 of this application, as shown below. Figure 2 As shown, the identity recognition method based on network fingerprint in Embodiment 2 of this application includes:
[0066] Step S201: Collect the first user information.
[0067] Step S202: Match the first user information with the second user information.
[0068] The second user information is authentic and includes relationships between users. Both the first and second user information include the user identifier. The user identifier includes information that identifies the user, such as name, nickname, ID number, or account number. The relationships between users refer to the inherent connections between them, such as mutual acquaintance. The second user information is authentic and can be obtained through searches on highly credible websites. For example, it can be extracted from the list of defaulters published on a court website. The first user information originates from a massive database of big data and can also be collected and analyzed in real time. For example, when a user needs a credit service, such as obtaining a credit loan, their name can be obtained as the first user information by filling out a form.
[0069] Step S203: Construct a user relationship network. The user relationship network uses user identifiers as nodes and the relationships between users as edges.
[0070] In this process, the user identifiers of the first user information are matched with the user identifiers of the second user information, and the matched user identifiers are used as nodes. If users A and B know each other, then there is an edge AB between the corresponding nodes A and B. If users A and C do not know each other, then there is no edge between the corresponding nodes A and C, and nodes A and C are independent and have no relationship. Figure 3 This is a schematic diagram of the user relationship network of this application, such as... Figure 3 As shown, a user relationship network is constructed. Due to the existence of duplicate names, the collected user information will match a large number of user identifiers, forming a very large relationship network.
[0071] Step S204: Calculate the network fingerprint index of each node in the user relationship network.
[0072] The network fingerprint index is calculated according to the following formula (1):
[0073] y = x × log(m) (1);
[0074] Where y represents the network fingerprint index, x represents the clustering coefficient, and m represents the number of neighboring nodes of the node.
[0075] The clustering coefficient is calculated according to the following formula (2):
[0076]
[0077] Where x represents the clustering coefficient, m represents the number of neighboring nodes of the node, and k represents the number of edges between the m neighboring nodes.
[0078] For example, calculate the network fingerprint of node A. Node A has edges with nodes B, D, and E, but no edge with node C, indicating that user A knows users B, D, and E, but not user C. Nodes B, D, and E are adjacent nodes of node A, so m = 3. There are edges between adjacent nodes B and D of node A, but no edges between nodes B and E or between nodes D and E, so k = 1.
[0079] Step S205: Delete the nodes corresponding to network fingerprint indicators that are less than a preset threshold.
[0080] The preset threshold is set to 100. According to the above formulas (1) and (2), the network fingerprint index of each node in the user relationship network can be calculated. Then, the network fingerprint indexes are sorted in descending order, and only the first 100 network fingerprint indices are retained. The nodes corresponding to the remaining network fingerprint indices are deleted. Figure 4 This is a schematic diagram of the user relationship network after filtering in this application, as shown below. Figure 4 As shown, it can be Figure 1 By deleting a large number of irrelevant nodes, the computational workload of subsequent steps is greatly reduced, resulting in a preliminary filtered and deduplicated user relationship network. Figure 4 The network structure shown is Figure 1 In the network structure diagram shown, the dark and light parts in the upper right corner have been filtered out.
[0081] Step S206: Perform clustering processing on the user relationship network to obtain subclasses; wherein, the subclasses include the nodes.
[0082] Step S207: Calculate the credibility index for each of the subclasses.
[0083] The credibility index is calculated according to the following formula (3):
[0084] h = d × v (3);
[0085] Where h represents the confidence index, d represents the variance of the subclass, and v represents the mean of the subclass.
[0086] Among them, after screening, for Figure 4 The network structure shown is subjected to clustering. For example, a clustering algorithm based on connected graph partitioning is used to obtain multiple subclasses in the network structure. The mean v and variance d of each subclass are calculated based on the network fingerprint index y of the nodes, and then the confidence index h of each subclass is calculated. The confidence index h is used to measure the degree of close connection within the subclass.
[0087] Step S208: Determine the user identifier corresponding to the node with the largest network fingerprint index in the subclass with the largest trust index as the user's unique identifier.
[0088] Specifically, the credibility index h is sorted in reverse order, and the subclass with the largest credibility index h is selected. Then, in the subclass with the largest credibility index h, the network fingerprint index y is sorted in reverse order, and the unique node with the largest network fingerprint index y is selected. The unique identifier of the user corresponding to the unique node is most likely to be the real user, thereby filtering and determining the real user.
[0089] Figure 5 This is a schematic diagram of the network fingerprint-based identity recognition device according to Embodiment 3 of this application, as shown below. Figure 5 As shown, the identity recognition device based on network fingerprint in Embodiment 3 of this application includes: a network construction module 51, a filtering module 52, and a determination module 53.
[0090] The network construction module 51 is used to construct a user relationship network; wherein the user relationship network uses user identifiers as nodes and the association relationships between users as edges.
[0091] The filtering module 52 is used to filter the user relationship network using network fingerprints.
[0092] The determination module 53 is used to perform clustering processing on the filtered user relationship network to determine the unique identifier of the user.
[0093] The network fingerprint-based identity recognition device in Embodiment 3 of this application is... Figure 1 The implementation apparatus of the method shown can be found in the following reference for its specific principles. Figure 1 Examples are not described here.
[0094] Figure 6 This is a schematic diagram of the network fingerprint-based identity recognition device according to Embodiment 4 of this application, as shown below. Figure 6 As shown, the identity recognition device based on network fingerprint in Embodiment 4 of this application includes: a network construction module 51, a filtering module 52, a determination module 53, and a matching module 54.
[0095] The network construction module 51 is used to construct a user relationship network; wherein the user relationship network uses user identifiers as nodes and the association relationships between users as edges.
[0096] The filtering module 52 is used to filter the user relationship network using network fingerprints.
[0097] Specifically, the filtering module 52 is used for:
[0098] Calculate the network fingerprint index for each node in the user relationship network;
[0099] Delete the nodes corresponding to network fingerprint indicators that are less than a preset threshold.
[0100] The network fingerprint index is calculated according to the following formula (1):
[0101] y=x×log(m)…………………………………………(1);
[0102] Where y represents the network fingerprint index, x represents the clustering coefficient, and m represents the number of neighboring nodes of the node.
[0103] The clustering coefficient is calculated according to the following formula (2):
[0104]
[0105] Where x represents the clustering coefficient, m represents the number of neighboring nodes of the node, and k represents the number of edges between the m neighboring nodes.
[0106] The determination module 53 is used to perform clustering processing on the filtered user relationship network to determine the unique identifier of the user.
[0107] Specifically, the determining module 53 is used for:
[0108] The user relationship network is clustered to obtain subclasses; wherein, the subclasses include the nodes;
[0109] Calculate the credibility index for each of the subclasses;
[0110] The user identifier corresponding to the node with the largest network fingerprint index in the subclass with the largest trust index is determined as the user's unique identifier.
[0111] The credibility index is calculated according to the following formula (3):
[0112] h=d×v………………………………………… (3);
[0113] Where h represents the confidence index, d represents the variance of the subclass, and v represents the mean of the subclass.
[0114] Matching module 54 is used for:
[0115] Collect first user information;
[0116] The first user information is matched with the second user information.
[0117] Wherein, the first user information is real user information, and the second user information includes the association relationship between users; both the first user information and the second user information include: the user identifier.
[0118] The network fingerprint-based identity recognition device in Embodiment 4 of this application is... Figure 2 The implementation apparatus of the method shown can be found in the following reference for its specific principles. Figure 2 Examples are not described here.
[0119] According to an embodiment of this application, a storage device is also provided, the storage device storing computer program instructions, the computer program instructions being based on this application. Figure 1 or Figure 2 Perform the method shown.
[0120] According to an embodiment of this application, a computing device is also provided, including: a memory for storing computer program instructions and a processor for executing the computer program instructions, wherein when the computer program instructions are executed by the processor, the computing device is triggered to execute this application. Figure 1 or Figure 2 The method shown.
[0121] Furthermore, some embodiments of this application also provide a computer-readable medium having computer program instructions stored thereon, which can be executed by a processor to implement the methods and / or technical solutions of the aforementioned embodiments of this application.
[0122] It should be noted that this application can be implemented in software and / or a combination of software and hardware, for example, using an application-specific integrated circuit (ASIC), a general-purpose computer, or any other similar hardware device. In some embodiments, the software program of this application can be executed by a processor to implement the steps or functions described above. Similarly, the software program of this application (including related data structures) can be stored in a computer-readable recording medium, such as RAM memory, magnetic or optical drives, floppy disks, and similar devices. Furthermore, some steps or functions of this application can be implemented in hardware, for example, as circuitry that cooperates with a processor to perform the various steps or functions.
[0123] It will be apparent to those skilled in the art that this application is not limited to the details of the exemplary embodiments described above, and that this application can be implemented in other specific forms without departing from the spirit or essential characteristics of this application. Therefore, the embodiments should be considered exemplary and non-limiting in all respects, and the scope of this application is defined by the appended claims rather than the foregoing description. Thus, all variations falling within the meaning and scope of equivalents of the claims are intended to be embraced within this application. No reference numerals in the claims should be construed as limiting the scope of the claims. Furthermore, it is clear that the word "comprising" does not exclude other units or steps, and the singular does not exclude the plural. Multiple units or devices recited in the apparatus claims may also be implemented by a single unit or device in software or hardware. The terms "first," "second," etc., are used to indicate names and do not indicate any particular order.
Claims
1. A method for identity recognition based on network fingerprints, characterized in that, include: Construct a user relationship network; wherein the user relationship network uses user identifiers as nodes and the relationships between users as edges; the relationships between users are that the users know each other; The user relationship network is filtered using network fingerprints; The user relationship network after the filtering process is clustered to determine the unique user identifier. The unique user identifier is the user identifier corresponding to the node with the largest network fingerprint index in the subclass with the largest trust index. The subclass with the largest trust index is determined based on the user relationship network after the clustering process. The trust index is used to measure the degree of close connection within the corresponding subclass.
2. The method according to claim 1, characterized in that, The step of using network fingerprints to filter the user relationship network specifically includes: Calculate the network fingerprint index for each node in the user relationship network; Delete the nodes corresponding to network fingerprint indicators that are less than a preset threshold.
3. The method according to claim 2, characterized in that, The network fingerprint index is calculated based on the clustering coefficient and the number of neighboring nodes of the node; The clustering coefficient is calculated based on the number of neighboring nodes of the node and the number of edges between the neighboring nodes.
4. The method according to claim 2, characterized in that, The step of clustering the filtered user relationship network to determine the unique identifier of each user specifically includes: The user relationship network is clustered to obtain subclasses; wherein, the subclasses include the nodes; Calculate the credibility index for each of the subclasses; The user identifier corresponding to the node with the largest network fingerprint index in the subclass with the largest trust index is determined as the user's unique identifier.
5. The method according to claim 4, characterized in that, The reliability index is calculated based on the variance and mean of the subclass.
6. The method according to claim 1, characterized in that, The construction of the user relationship network specifically includes: Obtain a first user list and a second user list; both the first user list and the second user list include the relationships between users who know each other, and the second user list is a list of the users' real information. The first user list is matched with the second user list, the user identifier is used as a node, and the relationship between the users is used as an edge.
7. The method according to claim 1, characterized in that, The user includes a user identifier, which is used to identify the user's identity. The user includes one or more of the following: user name, nickname, ID number, and user account.
8. The method according to any one of claims 1 to 7, characterized in that, Prior to constructing the user relationship network, the method further includes: Collect first user information; Match the first user information with the second user information; The second user information is real information and includes the relationship between users; both the first user information and the second user information include the user identifier.
9. An identity recognition device based on network fingerprinting, characterized in that, include: A network construction module is used to construct a user relationship network; wherein, the user relationship network uses user identifiers as nodes and the relationships between users as edges; the relationships between users are that the users know each other; The filtering module is used to filter the user relationship network using network fingerprints; The determination module is used to perform clustering processing on the filtered user relationship network to determine the unique user identifier, wherein the unique user identifier is the user identifier corresponding to the node with the largest network fingerprint index in the subclass with the largest trust index. The subclass with the largest trust index is determined based on the clustered user relationship network, and the trust index is used to measure the degree of close connection within the corresponding subclass.
10. A storage device, characterized in that, The storage device stores computer program instructions that execute the method according to any one of claims 1 to 8.
11. A computing device, characterized in that, include: A memory for storing computer program instructions and a processor for executing the computer program instructions, wherein when the computer program instructions are executed by the processor, the computing device is triggered to perform the method of any one of claims 1 to 8.