Big data-based internet data analysis method and system

By performing pre-analysis of Internet interactive data clusters using various analysis methods and re-analyzing current interactive tag clusters, the problem of insufficient accuracy and reliability in existing Internet data analysis technologies is solved, achieving more accurate analysis results.

CN115828012BActive Publication Date: 2026-06-19SHANGHAI YANFU INVESTMENT MANAGEMENT CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
SHANGHAI YANFU INVESTMENT MANAGEMENT CO LTD
Filing Date
2022-10-18
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing big data and internet data analysis technologies have limitations in accurately and reliably analyzing internet interaction data, resulting in insufficient precision and reliability of the analysis results.

Method used

Multiple analysis methods are used to pre-analyze Internet interaction data clusters and generate various analysis results. Then, the current interaction tag clusters of the target Internet interaction data are analyzed again to finally generate accurate and reliable analysis results.

🎯Benefits of technology

This improves the accuracy and reliability of the analysis results of Internet interactive data clusters, ensuring the accuracy and credibility of the analysis results.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN115828012B_ABST
    Figure CN115828012B_ABST
Patent Text Reader

Abstract

The internet data analysis method and system based on big data disclosed herein re-analyzes target internet interaction data within a cluster of internet interaction data requiring analysis and processing, generating a final analysis result. This disclosure analyzes and processes the target internet interaction data within the cluster of internet interaction data requiring analysis and processing through several analysis methods, generating several first analysis results; it optimizes these first analysis results to determine the current interaction tag cluster for each target internet interaction data, and re-analyzes the target internet interaction data within the cluster of internet interaction data requiring analysis and processing based on the current interaction tag cluster. This improves the accuracy and reliability of the analysis results for the target internet interaction data within the cluster of internet interaction data requiring analysis and processing.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of data analysis technology, and more specifically, to internet data analysis methods and systems based on big data. Background Technology

[0002] Data analysis refers to the process of analyzing large amounts of collected data using appropriate statistical analysis methods, summarizing, understanding, and digesting the data to maximize its functionality and effectiveness. Data analysis is the process of detailed study and summarization of data to extract useful information and draw conclusions.

[0003] Currently, when big data and internet data analytics technologies are combined, the limitations of these technologies may prevent accurate and reliable analysis of internet interaction data. Therefore, it is difficult to guarantee the accuracy and reliability of the analysis results for the target internet interaction data. Summary of the Invention

[0004] To address the technical problems existing in related technologies, this disclosure provides an internet data analysis method and system based on big data.

[0005] Firstly, a big data-based internet data analysis method is provided, applied to a data analysis system. The method includes at least: obtaining an internet interaction data cluster to be analyzed, the cluster comprising several target internet interaction data; performing pre-analysis on each of the target internet interaction data in the cluster using at least two analysis methods, generating a first analysis result corresponding to each of the at least two analysis methods; determining the current interaction tag cluster for each of the target internet interaction data by combining the first analysis results corresponding to each of the at least two analysis methods; and re-analyzing the target internet interaction data in the cluster using the current interaction tag cluster for each target internet interaction data, generating a final analysis result.

[0006] In one independently implemented embodiment, the step of pre-analyzing the target internet interaction data in the internet interaction data clusters to be analyzed and processed based on at least two analysis methods, and generating a first analysis result corresponding to each of the at least two analysis methods, includes: randomly identifying the target internet interaction data in the internet interaction data clusters to be analyzed and processed, generating at least two local interaction data clusters; and pre-analyzing the target internet interaction data in each of the local interaction data clusters according to the at least two analysis methods, generating a first analysis result corresponding to each analysis method and each local interaction data cluster.

[0007] In one independently implemented embodiment, the step of pre-analyzing the target internet interaction data in the internet interaction data cluster to be analyzed and processed based on at least two analysis methods, and generating a first analysis result corresponding to each of the at least two analysis methods, further includes: selecting knowledge inference information from the target internet interaction data in the internet interaction data cluster to be analyzed and processed, and generating first knowledge inference information for the target internet interaction data; simplifying the first knowledge inference information of each target internet interaction data to generate second knowledge inference information for each target internet interaction data; the step of pre-analyzing the target internet interaction data in the internet interaction data cluster to be analyzed and processed based on at least two analysis methods, and generating a first analysis result corresponding to each of the at least two analysis methods, includes: using at least two analysis methods, pre-analyzing the target internet interaction data in the internet interaction data cluster to be analyzed and processed based on the second knowledge inference information of each target internet interaction data, and generating a first analysis result corresponding to each of the at least two analysis methods.

[0008] In one independently implemented embodiment, the step of simplifying the first knowledge derivation information of each of the target Internet interaction data to generate the second knowledge derivation information of each of the target Internet interaction data includes: using principal component analysis to simplify the first knowledge derivation information of each of the target Internet interaction data one by one to generate the second knowledge derivation information of each of the target Internet interaction data.

[0009] In one independently implemented embodiment, determining the current interaction tag cluster of each of the target internet interaction data by combining the first analysis results corresponding to at least two analysis methods one by one includes: traversing each of the target internet interaction data in the internet interaction data cluster to be analyzed and processed; determining the neighboring internet interaction data cluster corresponding to the selected target internet interaction data based on the commonality score between the selected target internet interaction data and the remaining target internet interaction data in the internet interaction data cluster to be analyzed and processed, wherein the neighboring internet interaction data cluster includes a specified number of neighboring internet interaction data with the largest commonality score of the selected target internet interaction data; and determining the current interaction tag cluster of the target internet interaction data by combining the neighboring internet interaction data cluster of the target internet interaction data and the first analysis results corresponding to at least two analysis methods one by one, wherein the current interaction tag cluster is a staged set of the neighboring internet interaction data cluster.

[0010] In one independently implemented embodiment, determining the current interaction tag cluster of the target internet interaction data by combining the neighboring internet interaction data clusters of the target internet interaction data and the first analysis results corresponding to at least two analysis methods one by one includes: determining whether the target internet interaction data and the neighboring internet interaction data belong to the same tag by combining the credibility weights of the target internet interaction data and the neighboring internet interaction data in the first analysis results; and determining the current interaction tag cluster of the target internet interaction data by combining the neighboring internet interaction data in the neighboring internet interaction data clusters that belong to the same tag as the target internet interaction data.

[0011] In one independently implemented embodiment, the first analysis result includes the analysis method of the target internet interaction data covered in the internet interaction data cluster to be analyzed and processed; determining whether the target internet interaction data and the neighboring internet interaction data belong to the same tag by combining the credibility weights of the target internet interaction data and the neighboring internet interaction data in the first analysis result includes: filtering a neighboring internet interaction data in the neighboring internet interaction data cluster corresponding to the target internet interaction data; determining, in all the first analysis results, the number of first analysis results that simultaneously cover the target internet interaction data and the filtered neighboring internet interaction data as a first number; determining, in all the first analysis results that simultaneously cover the target internet interaction data and the filtered neighboring internet interaction data, the number of first analysis results that have the same analysis method for the target internet interaction data and the filtered neighboring internet interaction data as a second number; and determining, by combining the credibility weight of the second number in the first number, whether the target internet interaction data and the filtered neighboring internet interaction data belong to the same tag.

[0012] In one standalone embodiment, determining whether the filtered target internet interaction data and the neighboring internet interaction data belong to the same label by combining the credibility weight of the second number in the first number includes: determining that the target internet interaction data and the filtered neighboring internet interaction data belong to the same label in response to the credibility weight of the second number in the first number exceeding a specified vector; and determining that the target internet interaction data and the filtered neighboring internet interaction data do not belong to the same label in response to the credibility weight of the second number in the first number not being greater than the specified vector.

[0013] In one independently implemented embodiment, the step of re-analyzing the target internet interaction data in the internet interaction data cluster to be analyzed and processed, using the current interaction tag clusters of each target internet interaction data, to generate a final analysis result includes: determining the correlation between two random target internet interaction data using the current interaction tag clusters of each target internet interaction data; identifying the target internet interaction data in the internet interaction data cluster to be analyzed and processed into at least one tag internet interaction data cluster based on the correlation between the two random target internet interaction data; the tag internet interaction data cluster includes at least two target internet interaction data whose correlation exceeds a correlation target value; and integrating the at least one tag internet interaction data cluster to generate the final analysis result of the internet interaction data cluster to be analyzed and processed.

[0014] In one independently implemented embodiment, the correlation between the two random target internet interaction data is determined by the following method: randomly selecting two target internet interaction data from the plurality of target internet interaction data, one target internet interaction data being designated as a first internet interaction data and the other as a second internet interaction data; determining a first neighbor set by combining the complementary queue of the current interaction tag cluster of the second internet interaction data with the shared characteristics of the current interaction tag cluster of the first internet interaction data; the first neighbor set being a staged set of the current interaction tag cluster of the first internet interaction data; determining a second neighbor set by combining the complementary queue of the current interaction tag cluster of the first internet interaction data with the shared characteristics of the current interaction tag cluster of the second internet interaction data; the second neighbor set being a staged set of the current interaction tag cluster of the second internet interaction data; and determining the correlation between the first internet interaction data and the second internet interaction data by combining the first neighbor set and the second neighbor set with the current interaction tag cluster of the first internet interaction data and the current interaction tag cluster of the second internet interaction data, respectively.

[0015] In one standalone embodiment, determining the correlation between the first internet interaction data and the second internet interaction data by sequentially combining the first neighbor set and the second neighbor set with the current interaction tag clusters of the first internet interaction data and the second internet interaction data includes: determining a comparison result of commonality coefficients for each corresponding element of the first internet interaction data and the second internet interaction data by sequentially combining the first neighbor set and the second neighbor set with the current interaction tag clusters of the first internet interaction data and the second internet interaction data; and determining the correlation between the first internet interaction data and the second internet interaction data based on the commonality coefficient comparison result with the largest comparison vector among the comparison results of the first internet interaction data and the second internet interaction data.

[0016] In one independently implemented embodiment, the step of determining the commonality coefficient comparison results of the first internet interaction data and the second internet interaction data by sequentially combining the first neighbor set and the second neighbor set with the current interaction tag cluster of the first internet interaction data and the current interaction tag cluster of the second internet interaction data includes: determining the corresponding correlation metric values ​​between the first internet interaction data and the second internet interaction data and the first neighbor set and the second neighbor set by sequentially combining the current interaction tag cluster of the first internet interaction data and the current interaction tag cluster of the second internet interaction data with the first neighbor set and the second neighbor set; determining the commonality coefficient comparison result of the first internet interaction data by combining the comparison vector between the correlation metric value corresponding to the first internet interaction data and the first neighbor set and the correlation metric value corresponding to the second internet interaction data and the second neighbor set; and determining the commonality coefficient comparison result of the second internet interaction data by combining the comparison vector between the correlation metric value corresponding to the second internet interaction data and the first neighbor set and the correlation metric value corresponding to the second internet interaction data and the second neighbor set.

[0017] In one independently implemented embodiment, the final analysis result includes several key tags, each key tag including at least one target internet interaction data; the integration of the at least one tag internet interaction data cluster to generate the final analysis result of the internet interaction data cluster to be analyzed further includes: integrating the at least one tag internet interaction data cluster to generate at least one key tag based on whether two random tag internet interaction data clusters cover the same target internet interaction data; wherein, each key tag includes at least one tag internet interaction data cluster; when the key tag includes several tag internet interaction data clusters, two random tag internet interaction data clusters among the several tag internet interaction data clusters cover the same target internet interaction data; in response to two tag internet interaction data clusters covering the same target internet interaction data, all the target internet interaction data covered in the two analyzed internet interaction data clusters are combined to form the key tag; belonging to the same tag.

[0018] Secondly, an internet data analysis system based on big data is provided, including a processor and a memory that communicate with each other. The processor is used to read computer programs from the memory and execute them to implement the above-mentioned method.

[0019] The internet data analysis method and system based on big data provided in this disclosure obtains an internet interaction data cluster that needs to be analyzed and processed. This cluster includes several target internet interaction data sets. The method involves pre-analyzing the target internet interaction data in the cluster using at least two analysis methods, generating first analysis results corresponding to each of the two methods. Based on these first analysis results, the current interaction tag cluster for each target internet interaction data set is determined. Then, based on the current interaction tag clusters, the target internet interaction data in the cluster is re-analyzed to generate a final analysis result. This disclosure improves the accuracy and reliability of the analysis results by analyzing the target internet interaction data in the cluster using several analysis methods, generating several first analysis results, optimizing these results to determine the current interaction tag clusters for each target internet interaction data set, and re-analyzing the target internet interaction data based on these current interaction tag clusters. Attached Figure Description

[0020] To more clearly illustrate the technical solutions of the embodiments of this disclosure, the accompanying drawings used in the embodiments will be briefly described below. It should be understood that the following drawings only show some embodiments of this disclosure and should not be regarded as a limitation of the scope. For those skilled in the art, other related drawings can be obtained based on these drawings without creative effort.

[0021] Figure 1 This is a flowchart illustrating an internet data analysis method based on big data, provided as an embodiment of this disclosure.

[0022] Figure 2 This is a block diagram of an Internet data analysis device based on big data, provided as an embodiment of the present disclosure.

[0023] Figure 3 This is an architecture diagram of an Internet data analysis system based on big data, provided as an embodiment of this disclosure. Detailed Implementation

[0024] To better understand the above technical solutions, the technical solutions of this disclosure will be described in detail below with reference to the accompanying drawings and specific embodiments. It should be understood that the embodiments of this disclosure and the specific features in the embodiments are detailed descriptions of the technical solutions of this disclosure, rather than limitations thereof. In the absence of conflict, the embodiments of this disclosure and the technical features in the embodiments can be combined with each other.

[0025] Please see Figure 1 This paper demonstrates an internet data analysis method based on big data, which may include the technical solutions described in Steps 11-14.

[0026] Step 11: Obtain the Internet interaction data cluster that needs to be analyzed and processed. The Internet interaction data cluster that needs to be analyzed and processed includes several target Internet interaction data.

[0027] For example, knowledge derivation information is selected from the target internet interaction data in the internet interaction data cluster that needs to be analyzed and processed, generating the first knowledge derivation information of the target internet interaction data; the first knowledge derivation information of each target internet interaction data is then simplified to generate the second knowledge derivation information of each target internet interaction data. For example, principal component analysis is used to simplify the first knowledge derivation information of each target internet interaction data one by one to generate the second knowledge derivation information of each target internet interaction data.

[0028] Step 12: Based on at least two analysis methods, perform pre-analysis on the target Internet interaction data in the Internet interaction data cluster that needs to be analyzed and processed, and generate the first analysis result corresponding to at least two analysis methods.

[0029] For example, the target Internet interaction data in the Internet interaction data clusters that need to be analyzed and processed is randomly identified to generate no less than two local interaction data clusters; the target Internet interaction data in each local interaction data cluster is pre-analyzed one by one using no less than two analysis methods to generate a first analysis result that corresponds one-to-one between each analysis method and each local interaction data cluster.

[0030] Using no fewer than two analysis methods, the target Internet interaction data in the Internet interaction data clusters that need to be analyzed and processed are pre-analyzed one by one based on the second knowledge derivation information of each target Internet interaction data, and the first analysis results corresponding to no fewer than two analysis methods are generated.

[0031] Step 13: Based on the first analysis results corresponding to at least two analysis methods, determine the current interactive tag cluster of each target Internet interactive data.

[0032] For example, the process iterates through each target internet interaction data in the internet interaction data cluster that needs to be analyzed and processed. Based on the commonality score between the selected target internet interaction data and the remaining target internet interaction data in the internet interaction data cluster that needs to be analyzed and processed, the neighboring internet interaction data clusters corresponding to the selected target internet interaction data are determined. The neighboring internet interaction data clusters include a specified number of neighboring internet interaction data with the largest commonality score with the selected target internet interaction data. Based on the neighboring internet interaction data clusters of the target internet interaction data and the first analysis results corresponding one-to-one with no less than two analysis methods, the current interaction tag cluster of the target internet interaction data is determined. The current interaction tag cluster is a staged set of neighboring internet interaction data clusters.

[0033] In this embodiment, based on the credibility weights of the target internet interaction data and the neighboring internet interaction data in the first analysis result, it is determined whether the target internet interaction data and the neighboring internet interaction data belong to the same label; based on the neighboring internet interaction data in the neighboring internet interaction data cluster that belongs to the same label as the target internet interaction data, the current interaction label cluster of the target internet interaction data is determined.

[0034] Step 14: Based on the current interaction tag clusters of each target Internet interaction data, re-analyze the target Internet interaction data in the Internet interaction data clusters that need to be analyzed and processed, and generate the final analysis results.

[0035] For example, a neighboring internet interaction data is selected from the neighboring internet interaction data cluster corresponding to the target internet interaction data; among all the first analysis results, the number of first analysis results that simultaneously cover the target internet interaction data and the selected neighboring internet interaction data is determined as a first number; among all the first analysis results that simultaneously cover the target internet interaction data and the selected neighboring internet interaction data, the number of first analysis results that have the same analysis method for the target internet interaction data and the selected neighboring internet interaction data is determined as a second number; based on the credibility weight of the second number in the first number, it is determined whether the target internet interaction data and the selected neighboring internet interaction data belong to the same label.

[0036] If the confidence weight of the second number in the first number exceeds the specified vector, it is determined that the target Internet interaction data and the filtered neighbor Internet interaction data belong to the same label; if the confidence weight of the second number in the first number is not greater than the specified vector, it is determined that the target Internet interaction data and the filtered neighbor Internet interaction data do not belong to the same label.

[0037] In this embodiment, based on the current interactive tag clusters of each target internet interaction data, the correlation between two random target internet interaction data is determined; based on the correlation between the two random target internet interaction data, the target internet interaction data in the internet interaction data clusters that need to be analyzed and processed are identified into at least one tag internet interaction data cluster; the tag internet interaction data cluster includes at least two target internet interaction data whose correlation exceeds the correlation target value; the at least one tag internet interaction data cluster is integrated to generate the final analysis result of the internet interaction data clusters that need to be analyzed and processed.

[0038] In one possible implementation, a first neighbor set is determined based on the shared characteristics between the complementary queue of the current interactive tag cluster of the second Internet interactive data and the current interactive tag cluster of the first Internet interactive data; the first neighbor set is a staged set of the current interactive tag clusters of the first Internet interactive data; a second neighbor set is determined based on the shared characteristics between the complementary queue of the current interactive tag clusters of the first Internet interactive data and the current interactive tag clusters of the second Internet interactive data; the second neighbor set is a staged set of the current interactive tag clusters of the second Internet interactive data; the correlation between the first Internet interactive data and the second Internet interactive data is determined based on the first neighbor set and the second neighbor set sequentially with the current interactive tag clusters of the first Internet interactive data and the current interactive tag clusters of the second Internet interactive data.

[0039] Specifically, based on the current interactive tag clusters of the first Internet interactive data and the second Internet interactive data, respectively, the commonality coefficient comparison results of the first Internet interactive data and the second Internet interactive data are determined one by one; among the commonality coefficient comparison results of the first Internet interactive data and the second Internet interactive data, the correlation between the first Internet interactive data and the second Internet interactive data is determined based on the commonality coefficient comparison result with the largest comparison vector.

[0040] For example, based on the current interaction tag cluster of the first Internet interaction data and the current interaction tag cluster of the second Internet interaction data, and sequentially with the first neighbor set and the second neighbor set, the corresponding correlation metrics between the first Internet interaction data and the second Internet interaction data and the first neighbor set and the second neighbor set are determined; based on the comparison vector between the correlation metrics between the first Internet interaction data and the first neighbor set and the correlation metrics between the first Internet interaction data and the second neighbor set, the commonality coefficient comparison result of the first Internet interaction data is determined; based on the comparison vector between the correlation metrics between the second Internet interaction data and the first neighbor set and the correlation metrics between the second Internet interaction data and the second neighbor set, the commonality coefficient comparison result of the second Internet interaction data is determined.

[0041] In this embodiment, the final analysis result includes several key tags, and each key tag includes at least one target internet interaction data. Based on whether two randomly selected tag internet interaction data clusters cover the same target internet interaction data, at least one tag internet interaction data cluster is integrated to generate at least one key tag; wherein, each key tag includes at least one tag internet interaction data cluster; when a key tag includes several tag internet interaction data clusters, two randomly selected tag internet interaction data clusters from the several tag internet interaction data clusters cover the same target internet interaction data; in response to two tag internet interaction data clusters covering the same target internet interaction data, all target internet interaction data covered in the two analyzed internet interaction data clusters are combined to form a key tag; belonging to the same tag.

[0042] This embodiment provides a big data-based internet data analysis method, which includes obtaining an internet interaction data cluster to be analyzed and processed, wherein the internet interaction data cluster to be analyzed and processed includes several target internet interaction data; performing pre-analysis on the target internet interaction data in the internet interaction data cluster to be analyzed and processed one by one using at least two analysis methods, generating first analysis results corresponding to at least two analysis methods; determining the current interaction tag cluster of each target internet interaction data based on the first analysis results corresponding to at least two analysis methods; and performing a second analysis on the target internet interaction data in the internet interaction data cluster to be analyzed and processed based on the current interaction tag cluster of each target internet interaction data. This disclosure improves the accuracy and reliability of the analysis results of the target internet interaction data in the internet interaction data cluster to be analyzed and processed by using several analysis methods to analyze and process the target internet interaction data in the internet interaction data cluster to be analyzed and processed one by one, generating several first analysis results; optimizing the several first analysis results to determine the current interaction tag cluster of each target internet interaction data; and performing a second analysis on the target internet interaction data in the internet interaction data cluster to be analyzed and processed based on the current interaction tag cluster of the target internet interaction data, thereby generating a final analysis result.

[0043] This embodiment provides a big data-based internet data analysis method, which may specifically include the following steps.

[0044] Step 201: Obtain the cluster of Internet interaction data that needs to be analyzed and processed.

[0045] For example, the internet interaction data cluster requiring analysis and processing includes several target internet interaction data. These target internet interaction data include first internet interaction data, second internet interaction data, etc. Alternatively, they can be remaining internet interaction data covering the same attribute portion. In one possible implementation, X first internet interaction data constitute the internet interaction data cluster requiring analysis and processing. The X first internet interaction data can have Y differing indicators, and each first internet interaction data contains only one "first". X and Y are both integers greater than 0, and X is greater than or equal to Y.

[0046] Step 202: Select knowledge derivation information from the target Internet interaction data in the Internet interaction data cluster that needs to be analyzed and processed, and generate the first knowledge derivation information of the target Internet interaction data.

[0047] For example, a knowledge inference information selection model is used to select knowledge inference information from the target internet interaction data in the internet interaction data cluster that needs to be analyzed and processed, generating the first knowledge inference information for each target internet interaction data. The direction of the first knowledge inference information for all target internet interaction data covered in the internet interaction data cluster that needs to be analyzed and processed is direction 'a'. For instance, the first knowledge inference information for all target internet interaction data covered in the internet interaction data cluster that needs to be analyzed and processed is represented as z1, z2, z3, ..., zn, and each of these first knowledge inference information belongs to Ra.

[0048] Step 203: Simplify the first knowledge derivation information of each target Internet interaction data to generate the second knowledge derivation information of each target Internet interaction data.

[0049] For example, in order to reduce computation time and memory trust weight in the analysis process, the first knowledge derivation information of the target Internet interaction data can be simplified to generate the second knowledge derivation information of the target Internet interaction data.

[0050] In this embodiment, the systematic analysis method is used to simplify the first knowledge derivation information of each target Internet interaction data one by one, and generate the second knowledge derivation information of each target Internet interaction data.

[0051] The direction of the second knowledge derivation information is b-dimensional, where b is less than a. For example, the second knowledge derivation information of all target Internet interaction data covered in the Internet interaction data cluster that needs to be analyzed and processed is represented as z1, z2, z3, ..., zn, and each of the first knowledge derivation information belongs to Rb.

[0052] Step 204: Randomly identify the target Internet interaction data in the Internet interaction data clusters that need to be analyzed and processed, and generate no less than two local interaction data clusters.

[0053] For example, in order to improve the accuracy and reliability of the analysis results, target Internet interaction data is randomly selected from the Internet interaction data clusters that need to be analyzed and processed to determine several local interaction data clusters.

[0054] In this embodiment, each time a specified number of target internet interaction data are selected from the internet interaction data clusters that need to be analyzed and processed, they are assigned to a local interaction data cluster. The selected target internet interaction data are then returned to the internet interaction data clusters that need to be analyzed and processed, and another specified number of target internet interaction data are selected and assigned to the aforementioned local interaction data clusters. This process continues until the number of target internet interaction data in each local interaction data cluster meets the specified number, at which point one round of target internet interaction data selection ends. It is possible that some target internet interaction data in the internet interaction data clusters that need to be analyzed and processed may be selected several times, while some target internet interaction data may not be selected at all.

[0055] Step 205: Using no less than two analysis methods, based on the second knowledge derivation information of each target Internet interaction data, pre-analyze the target Internet interaction data in the local interaction data cluster one by one, and generate the first analysis results corresponding to no less than two analysis methods one by one.

[0056] For example, to improve the accuracy of the analysis results, several analysis methods are used to pre-analyze the target Internet interaction data in the local interaction data cluster one by one, generating the first analysis result of each analysis method for the local interaction data cluster. In other words, there are H analysis methods to analyze the local interaction data cluster one by one, generating H first analysis results for the local interaction data cluster. The first analysis result includes the analysis method of the target Internet interaction data covered in the Internet interaction data cluster that needs to be analyzed and processed.

[0057] Step 206: Traverse each target Internet interaction data in the Internet interaction data cluster that needs to be analyzed and processed. Based on the commonality score between the selected target Internet interaction data and the remaining target Internet interaction data in the Internet interaction data cluster that needs to be analyzed and processed, determine the neighboring Internet interaction data clusters corresponding to the selected target Internet interaction data.

[0058] For example, a neighbor internet interaction data cluster includes a specified number of neighbor internet interaction data that have the highest commonality score with the screened target internet interaction data.

[0059] The common scores among the first knowledge derivation information of each target internet interaction data in the internet interaction data cluster, which needs to be analyzed and processed as required, are used to determine the common scores among the target internet interaction data. The common scores corresponding one-to-one between the selected target internet interaction data and the remaining target internet interaction data are arranged in a certain distribution (e.g., in ascending or descending order). The remaining target internet interaction data corresponding to the top u common scores are selected as the neighboring internet interaction data of the selected target internet interaction data. The neighboring internet interaction data cluster of the selected target internet interaction data is formed by these u neighboring internet interaction data.

[0060] Iterate through each target Internet interaction data in the Internet interaction data cluster that needs to be analyzed and processed, and generate the neighbor Internet interaction data clusters of each target Internet interaction data.

[0061] Step 207: Based on the neighboring internet interaction data clusters of the target internet interaction data and the first analysis results corresponding to at least two analysis methods, determine the current interaction tag cluster of the target internet interaction data.

[0062] For example, in order to further determine whether the target internet interaction data and each neighbor's internet interaction data in the neighbor's internet interaction data cluster belong to the same tag, that is, to determine whether the target internet interaction data and the neighbor's internet interaction data belong to the same person.

[0063] The Internet data analysis method based on big data further explains step 207, which may include the following steps.

[0064] Step 207a: Based on the credibility weights of the target internet interaction data and the neighboring internet interaction data in the first analysis result, determine whether the target internet interaction data and the neighboring internet interaction data belong to the same label.

[0065] For example, a neighboring internet interaction data is selected from the neighboring internet interaction data cluster corresponding to the target internet interaction data; among all the first analysis results, the number of first analysis results that simultaneously cover the target internet interaction data and the selected neighboring internet interaction data is determined as a first number. Among all the first analysis results that simultaneously cover the target internet interaction data and the selected neighboring internet interaction data, the number of first analysis results that have the same analysis method for the target internet interaction data and the selected neighboring internet interaction data is determined as a second number; based on the credibility weight of the second number in the first number, it is determined whether the target internet interaction data and the selected neighboring internet interaction data belong to the same label.

[0066] Step 207b: Based on the neighboring Internet interaction data in the neighboring Internet interaction data cluster that belongs to the same label as the target Internet interaction data, determine the current interaction label cluster of the target Internet interaction data.

[0067] For example, in response to the second number having a confidence weight in the first number exceeding a specified vector, it is determined that the target internet interaction data and the filtered neighboring internet interaction data belong to the same label. In other words, if the probability that the target internet interaction data z1 and the neighboring internet interaction data belong to the same label exceeds the specified vector, then the neighboring internet interaction data is determined to be the current neighboring internet interaction data of the target internet interaction data z1.

[0068] If the confidence weight of the second number in the first number is not greater than the specified vector, it is determined that the target internet interaction data and the filtered neighboring internet interaction data do not belong to the same label. In other words, if the probability that the target internet interaction data z1 and the neighboring internet interaction data belong to the same label is not greater than the specified vector, then the neighboring internet interaction data is not the current neighboring internet interaction data of the target internet interaction data z1.

[0069] The process iterates through the neighboring internet interaction data clusters of the target internet interaction data z1, identifying the probabilities between each neighboring internet interaction data and z1. It then filters out all neighboring internet interaction data within these clusters that share the same label as z1 and determines them as the current neighboring internet interaction data. These current neighboring internet interaction data form the current interaction label cluster for z1. For example, the current interaction label cluster for z1 might be where u1 represents the number of current neighboring internet interaction data included in the current interaction label cluster for z1.

[0070] Compared to the target internet interaction data z1, the current interaction tag cluster contains less interference than the neighboring internet interaction data clusters of the target internet interaction data z1. Therefore, it is more likely that the target internet interaction data z1 and the current neighboring internet interaction data contained in the current interaction tag cluster are of the same type.

[0071] In this embodiment, two target internet interaction data sets are randomly selected from a plurality of target internet interaction data sets. One target internet interaction data set is designated as the first internet interaction data set, and the other is designated as the second internet interaction data set. For some possible implementation embodiments, the internet interaction data cluster to be analyzed and processed includes at least the first internet interaction data set, the second internet interaction data set, and the third internet interaction data set.

[0072] Step 208: Based on the current interaction tag cluster of the first Internet interaction data and the current interaction tag cluster of the second Internet interaction data, determine the corresponding correlation between the first Internet interaction data and the second Internet interaction data.

[0073] The Internet data analysis method based on big data further explains step 208, which may include the following steps.

[0074] Step 208a: Determine the first neighbor set based on the shared characteristics between the complementary queue of the current interactive tag cluster of the second Internet interactive data and the current interactive tag cluster of the first Internet interactive data.

[0075] For example, a first neighbor set M1 is determined by the shared characteristics between the complementary queue of the current interactive tag cluster N of the second Internet interactive data and the current interactive tag cluster M of the first Internet interactive data. The first neighbor set M1 consists of elements in the current interactive tag cluster M of the first Internet interactive data that are not included in the current interactive tag cluster N of the second Internet interactive data. In other words, the first neighbor set M1 can be represented as follows: The first neighbor set is a phased set of the current interactive tag clusters of the first Internet interactive data.

[0076] Step 208b: Determine the second neighbor set based on the shared characteristics between the complementary queue of the current interactive tag cluster of the first Internet interaction data and the current interactive tag cluster of the second Internet interaction data.

[0077] For example, a second neighbor set N1 is determined by the shared characteristics between the complementary queues of the current interactive tag cluster M of the first Internet interactive data and the current interactive tag cluster N of the second Internet interactive data. The second neighbor set N1 consists of elements in the current interactive tag cluster N of the second Internet interactive data that are not included in the current interactive tag cluster M of the first Internet interactive data. In other words, the second neighbor set N1 can be represented as follows: The second neighbor set is a phased set of the current interactive tag clusters of the second Internet interactive data.

[0078] Step 208c: Based on the current interactive tag clusters of the first Internet interactive data and the second Internet interactive data, which are sequentially interacted with the first neighbor set and the second neighbor set, determine the correlation between the first Internet interactive data and the second Internet interactive data.

[0079] For example, the specific steps for determining the correlation between the first Internet interaction data and the second Internet interaction data based on the first neighbor set and the second neighbor set sequentially interacting with the current interaction tag cluster of the first Internet interaction data and the current interaction tag cluster of the second Internet interaction data are as follows.

[0080] The Internet data analysis method based on big data further explains step 208c, which may specifically include the following steps.

[0081] Step 208c1: Based on the current interactive tag cluster of the first Internet interaction data and the current interactive tag cluster of the second Internet interaction data, and sequentially with the first neighbor set and the second neighbor set, determine the corresponding correlation metric values ​​between the first Internet interaction data and the second Internet interaction data and the first neighbor set and the second neighbor set.

[0082] In one possible implementation, for the current interactive tag cluster M of the first Internet interactive data, the correlation metric between the current interactive tag cluster M of the first Internet interactive data and the first neighbor set M1 and the second neighbor set N1 are calculated.

[0083] Step208c2: Based on the current interactive tag clusters of the first Internet interactive data and the second Internet interactive data, respectively, determine the commonality coefficient comparison results of the first Internet interactive data and the second Internet interactive data.

[0084] For example, the commonality coefficient comparison result of the first Internet interaction data is determined by comparing the correlation metric between the current interactive tag cluster M of the first Internet interaction data and the first neighbor set M1 with the correlation metric between the current interactive tag cluster M of the first Internet interaction data and the second neighbor set N1.

[0085] The commonality coefficient comparison result of the second Internet interactive data is determined by comparing the correlation metric between the current interactive tag cluster N of the second Internet interactive data and the first neighbor set M1 with the correlation metric between the current interactive tag cluster N of the second Internet interactive data and the second neighbor set N1.

[0086] If the comparison vector between the commonality coefficient comparison result StepA of the first Internet interaction data and the commonality coefficient comparison result StepB of the second Internet interaction data approaches 0, it indicates that the first Internet interaction data and the second Internet interaction data are more likely to belong to the same person.

[0087] Step 208c3: Based on the comparison results of the common coefficients of the first Internet interaction data and the second Internet interaction data, the correlation between the first Internet interaction data and the second Internet interaction data is determined.

[0088] For example, the commonality coefficient comparison result of the first Internet interaction data is determined based on the comparison vector between the relevant metric value corresponding to the first Internet interaction data and the first neighbor set and the relevant metric value corresponding to the first Internet interaction data and the second neighbor set; the commonality coefficient comparison result of the second Internet interaction data is determined based on the comparison vector between the relevant metric value corresponding to the second Internet interaction data and the first neighbor set and the relevant metric value corresponding to the second Internet interaction data and the second neighbor set.

[0089] Step 209: Based on the correlation between the first Internet interaction data and the second Internet interaction data, determine whether the first Internet interaction data and the second Internet interaction data belong to the same labeled Internet interaction data cluster.

[0090] For example, the correlation between the first internet interaction data and the second internet interaction data is compared with a correlation target value. In response to the correlation exceeding the correlation target value, it is determined that the first internet interaction data and the second internet interaction data belong to the same labeled internet interaction data cluster.

[0091] Steps 208 and 209 determine whether the second and third Internet interaction data belong to the same label Internet interaction data cluster, and whether the first and third labels belong to the same label Internet interaction data cluster.

[0092] Step 210: Integrate at least one tagged Internet interaction data cluster to generate the final analysis results of the Internet interaction data cluster that needs to be analyzed and processed.

[0093] For example, based on whether two randomly selected labeled internet interaction data clusters cover the same target internet interaction data, at least one labeled internet interaction data cluster is integrated to generate at least one key label; wherein each key label includes at least one labeled internet interaction data cluster; when a key label includes several labeled internet interaction data clusters, two randomly selected labeled internet interaction data clusters from the several labeled internet interaction data clusters cover the same target internet interaction data; in response to two labeled internet interaction data clusters covering the same target internet interaction data, all target internet interaction data covered in the two analyzed internet interaction data clusters are combined to form a key label; belonging to the same label.

[0094] In one possible implementation, the first internet interaction data and the second internet interaction data belong to the same labeled internet interaction data cluster, and the second internet interaction data and the third internet interaction data belong to the same labeled internet interaction data cluster. The first internet interaction data and the third internet interaction data belong to different labels.

[0095] Based on the performance of the correlation between target Internet interaction data, the first Internet interaction data and the second Internet interaction data belong to the same label, and the third Internet interaction data and the second Internet interaction data belong to the same label Internet interaction data cluster. Even if the first Internet interaction data and the third Internet interaction data belong to different labels based on the correlation, the first Internet interaction data and the second Internet interaction data, as well as the third Internet interaction data, can be classified into the same label Internet interaction data cluster based on the performance.

[0096] By traversing all the target internet interaction data in the internet interaction data that needs to be analyzed and processed through the above steps, the final analysis results corresponding to the internet interaction data that needs to be analyzed and processed are generated.

[0097] The big data-based internet data analysis method provided in this embodiment includes obtaining an internet interaction data cluster to be analyzed and processed, which includes several target internet interaction data; performing pre-analysis on the target internet interaction data in the internet interaction data cluster to be analyzed and processed one by one using at least two analysis methods, generating first analysis results corresponding to at least two analysis methods; determining the current interaction tag cluster of each target internet interaction data based on the first analysis results corresponding to at least two analysis methods; and re-analyzing the target internet interaction data in the internet interaction data cluster to be analyzed and processed based on the current interaction tag cluster of each target internet interaction data. This disclosure improves the accuracy and reliability of the analysis results of the target internet interaction data in the internet interaction data cluster to be analyzed and processed by analyzing and processing the target internet interaction data in the internet interaction data cluster one by one using several analysis methods, generating several first analysis results; optimizing the several first analysis results to determine the current interaction tag cluster of each target internet interaction data; and re-analyzing the target internet interaction data in the internet interaction data cluster to be analyzed and processed based on the current interaction tag cluster of the target internet interaction data.

[0098] Based on the above, please refer to the following: Figure 2 A big data-based internet data analysis device 200 is provided, which is applied to a big data-based internet data analysis cloud platform. The device includes:

[0099] The interactive data acquisition module 210 is used to acquire Internet interactive data clusters that need to be analyzed and processed, wherein the Internet interactive data clusters that need to be analyzed and processed include several target Internet interactive data.

[0100] The first result analysis module 220 is used to perform pre-analysis on the target Internet interaction data in the Internet interaction data cluster that needs to be analyzed and processed, based on at least two analysis methods, and generate a first analysis result corresponding to each of the at least two analysis methods.

[0101] The interactive tag determination module 230 is used to determine the current interactive tag cluster of each target Internet interactive data by combining the first analysis results corresponding to the not less than two analysis methods one by one.

[0102] The final result analysis module 240 is used to re-analyze the target Internet interaction data in the Internet interaction data cluster that needs to be analyzed and processed through the current interaction tag cluster of each target Internet interaction data, and generate the final analysis result.

[0103] Based on the above, please refer to the following: Figure 3 The present invention illustrates an Internet data analysis system 300 based on big data, comprising a processor 310 and a memory 320 that communicate with each other. The processor 310 is used to read computer programs from the memory 320 and execute them to implement the above-described method.

[0104] Based on the above, a computer-readable storage medium is also provided, on which a computer program stored implements the above method during runtime.

[0105] In summary, based on the above scheme, an internet interaction data cluster requiring analysis and processing is obtained. This cluster includes several target internet interaction data sets. At least two analysis methods are used to pre-analyze each target internet interaction data set within the cluster, generating first analysis results corresponding to each of the two methods. Based on these first analysis results, the current interaction tag cluster for each target internet interaction data set is determined. Based on the current interaction tag clusters, the target internet interaction data within the cluster is re-analyzed to generate the final analysis result. This disclosure improves the accuracy and reliability of the analysis results by using several analysis methods to analyze and process the target internet interaction data within the cluster, generating several first analysis results. Optimizing these first analysis results to determine the current interaction tag clusters for each target internet interaction data set, and then re-analyzing the target internet interaction data based on these current interaction tag clusters, further enhances the accuracy and reliability of the analysis results.

[0106] It should be understood that the systems and modules described above can be implemented in various ways. For example, in some embodiments, the systems and modules can be implemented by hardware, software, or a combination of both. The hardware portion can be implemented using dedicated logic; the software portion can be stored in memory and executed by an appropriate instruction execution system, such as a microprocessor or dedicated-design hardware. Those skilled in the art will understand that the methods and systems described above can be implemented using computer-executable instructions and / or included in processor control code, for example, such code provided on a carrier medium such as a disk, CD, or DVD-ROM, a programmable memory such as read-only memory (firmware), or a data carrier such as an optical or electronic signal carrier. The systems and modules of this disclosure can be implemented not only by hardware circuits such as very large-scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, or programmable hardware devices such as field-programmable gate arrays, programmable logic devices, etc., but also by software, for example, executed by various types of processors, or by a combination of the aforementioned hardware circuits and software (e.g., firmware).

[0107] It should be noted that different embodiments may produce different beneficial effects. In different embodiments, the beneficial effects may be any one or a combination of the above, or any other possible beneficial effects.

[0108] The basic concepts have been described above. It is obvious that the detailed disclosure above is merely illustrative and does not constitute a limitation of this disclosure. Although not explicitly stated herein, various modifications, improvements, and corrections may be made to this disclosure by those skilled in the art. Such modifications, improvements, and corrections are suggested in this disclosure and therefore remain within the spirit and scope of the exemplary embodiments of this disclosure.

[0109] Furthermore, this disclosure uses specific terms to describe embodiments of the present disclosure. For example, "an embodiment," "one embodiment," and / or "some embodiments" refer to a particular feature, structure, or characteristic associated with at least one embodiment of the present disclosure. Therefore, it should be emphasized and noted that references to "an embodiment," "one embodiment," or "an alternative embodiment" in different locations throughout this specification do not necessarily refer to the same embodiment. Moreover, certain features, structures, or characteristics in one or more embodiments of the present disclosure can be appropriately combined.

[0110] Furthermore, those skilled in the art will understand that aspects of this disclosure can be described and illustrated through several patentable types or situations, including any new and useful combination of processes, machines, products, or substances, or any new and useful improvements thereof. Accordingly, aspects of this disclosure can be implemented entirely by hardware, entirely by software (including firmware, resident software, microcode, etc.), or by a combination of hardware and software. The aforementioned hardware or software may be referred to as a “data block,” “module,” “engine,” “unit,” “component,” or “system.” Furthermore, aspects of this disclosure may be embodied as a computer product located on one or more computer-readable media, the product including computer-readable program code.

[0111] Computer storage media may contain a propagated data signal containing computer program code, for example, on baseband or as part of a carrier wave. This propagated signal may take various forms, including electromagnetic, optical, and suitable combinations thereof. Computer storage media can be any computer-readable medium other than a computer-readable storage medium, which can be connected to an instruction execution system, apparatus, or device to enable communication, propagation, or transmission of a program for use. The program code located on the computer storage medium can be propagated through any suitable medium, including radio, cable, fiber optic cable, RF, or similar media, or any combination of the above media.

[0112] The computer program code required for the operation of each part of this disclosure can be written in any one or more programming languages, including object-oriented programming languages ​​such as Java, Scala, Smalltalk, Eiffel, JADE, Emerald, C++, C#, VB.NET, Python, etc., conventional procedural programming languages ​​such as C, Visual Basic, Fortran 2003, Perl, COBOL 2002, PHP, ABAP, dynamic programming languages ​​such as Python, Ruby, and Groovy, or other programming languages. This program code can run entirely on the user's computer, or as a standalone software package on the user's computer, or partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server. In the latter case, the remote computer can be connected to the user's computer via any network, such as a local area network (LAN) or wide area network (WAN), or connected to an external computer (e.g., via the Internet), or in a cloud computing environment, or used as a service such as Software as a Service (SaaS).

[0113] Furthermore, unless expressly stated in the claims, the order of processing elements and sequences, the use of numbers and letters, or other names described in this disclosure are not intended to limit the order of the processes and methods of this disclosure. Although various examples have been discussed in the foregoing disclosure of some embodiments that are currently considered useful, it should be understood that such details are for illustrative purposes only, and the appended claims are not limited to the disclosed embodiments; rather, the claims are intended to cover all modifications and equivalent combinations that conform to the spirit and scope of the embodiments of this disclosure. For example, while the system components described above can be implemented by hardware devices, they can also be implemented solely by software solutions, such as installing the described system on existing servers or mobile devices.

[0114] Similarly, it should be noted that, in order to simplify the description of this disclosure and thus aid in the understanding of one or more embodiments of the invention, the foregoing description of embodiments of this disclosure may sometimes combine multiple features into a single embodiment, drawing, or description thereof. However, this disclosure method does not imply that the subject matter of this disclosure requires more features than those mentioned in the claims. In fact, the embodiments contain fewer features than all the features of a single embodiment disclosed above.

[0115] In some embodiments, numbers describing the quantity of components and attributes are used. It should be understood that such numbers used in the description of embodiments are modified in some examples with the terms "approximately," "approximately," or "generally." Unless otherwise stated, "approximately," "approximately," or "generally" indicates that the numbers are open to adaptive variation. Accordingly, in some embodiments, the numerical parameters used in the specification and claims are approximate values, which may be changed depending on the desired characteristics of individual embodiments. In some embodiments, numerical parameters are taken into account a specified number of significant digits and employ a general method of digit reservation. Although the numerical ranges and parameters used to confirm their breadth of range in some embodiments of this disclosure are approximate values, in specific embodiments, such values ​​are set as precisely as feasible.

[0116] For each patent, patent application, patent application publication, and other material such as articles, books, specifications, publications, and documents referenced in this disclosure, the entire contents of that publication are incorporated herein by reference. This excludes historical application documents that are inconsistent with or conflict with this disclosure, as well as documents that limit the broadest scope of the claims of this disclosure (currently or subsequently appended to this disclosure). It should be noted that in the event of any inconsistency or conflict between the descriptions, definitions, and / or terminology used in the supplementary materials to this disclosure and the content of this disclosure, the descriptions, definitions, and / or terminology used in this disclosure shall prevail.

[0117] Finally, it should be understood that the embodiments described in this disclosure are merely illustrative of the principles of the embodiments of this disclosure. Other variations may also fall within the scope of this disclosure. Therefore, alternative configurations of the embodiments of this disclosure are considered as examples and not limitations, and are regarded as consistent with the teachings of this disclosure. Accordingly, the embodiments of this disclosure are not limited to those explicitly described and illustrated herein.

[0118] The above are merely embodiments of this disclosure and are not intended to limit the scope of this disclosure. Various modifications and variations can be made to this disclosure by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this disclosure should be included within the scope of the claims of this disclosure.

Claims

1. A method for internet data analysis based on big data, characterized in that, Applied to a data analysis system, the method includes at least: Obtain an Internet interaction data cluster that needs to be analyzed and processed, the Internet interaction data cluster that needs to be analyzed and processed includes several target Internet interaction data; Based on at least two analysis methods, the target Internet interaction data in the Internet interaction data cluster that needs to be analyzed and processed is pre-analyzed one by one to generate a first analysis result corresponding to each of the at least two analysis methods. By combining the first analysis results corresponding to at least two analysis methods one by one, the current interaction tag cluster of each target Internet interaction data is determined; By using the current interaction tag clusters of each target Internet interaction data, the target Internet interaction data in the Internet interaction data clusters that need to be analyzed and processed are analyzed again to generate the final analysis results. The step of combining the first analysis results corresponding to at least two analysis methods one by one to determine the current interaction tag cluster of each of the target Internet interaction data includes: Traverse each target internet interaction data in the internet interaction data cluster that needs to be analyzed and processed, and determine the neighboring internet interaction data cluster corresponding to the selected target internet interaction data based on the commonality score between the selected target internet interaction data and the remaining target internet interaction data in the internet interaction data cluster that needs to be analyzed and processed. The neighboring internet interaction data cluster includes a specified number of neighboring internet interaction data with the largest commonality score with the selected target internet interaction data. By combining the neighboring internet interaction data clusters of the target internet interaction data with the first analysis results corresponding to at least two analysis methods, the current interaction tag cluster of the target internet interaction data is determined, and the current interaction tag cluster is a staged set of the neighboring internet interaction data clusters.

2. The internet data analysis method based on big data as described in claim 1, characterized in that, The process of pre-analyzing the target internet interaction data in the internet interaction data cluster that needs to be analyzed and processed, based on at least two analysis methods, and generating a first analysis result corresponding to each of the at least two analysis methods, includes: Randomly identify the target Internet interaction data in the Internet interaction data cluster that needs to be analyzed and processed, and generate no less than two local interaction data clusters; Pre-analyze the target Internet interaction data in each of the local interaction data clusters according to at least two analysis methods, and generate the first analysis result that corresponds one-to-one between each analysis method and each of the local interaction data clusters.

3. The internet data analysis method based on big data as described in claim 1 or 2, characterized in that, The process of pre-analyzing the target internet interaction data in the internet interaction data cluster to be analyzed and processed based on at least two analysis methods, generating a first analysis result corresponding to each of the at least two analysis methods, further includes: Knowledge inference information is selected from the target Internet interaction data in the Internet interaction data cluster that needs to be analyzed and processed, and first knowledge inference information of the target Internet interaction data is generated. The first knowledge derivation information of each of the target Internet interaction data is simplified and processed to generate the second knowledge derivation information of each of the target Internet interaction data; The step of performing pre-analysis on the target Internet interaction data in the Internet interaction data cluster that needs to be analyzed and processed one by one based on at least two analysis methods, and generating a first analysis result corresponding to each of the at least two analysis methods, includes: using at least two analysis methods, performing pre-analysis on the target Internet interaction data in the Internet interaction data cluster that needs to be analyzed and processed one by one through the second knowledge derivation information of each of the target Internet interaction data, and generating a first analysis result corresponding to each of the at least two analysis methods.

4. The internet data analysis method based on big data as described in claim 3, characterized in that, The step of simplifying the first knowledge derivation information of each of the target Internet interaction data to generate the second knowledge derivation information of each of the target Internet interaction data includes: using principal component analysis to simplify the first knowledge derivation information of each of the target Internet interaction data one by one to generate the second knowledge derivation information of each of the target Internet interaction data.

5. The internet data analysis method based on big data as described in claim 1, characterized in that, The determination of the current interaction tag cluster of the target internet interaction data by combining the neighboring internet interaction data clusters of the target internet interaction data and the first analysis results corresponding to at least two analysis methods one by one includes: By combining the credibility weights of the target internet interaction data and the neighbor's internet interaction data in the first analysis result, it is determined whether the target internet interaction data and the neighbor's internet interaction data belong to the same label; By combining the neighboring internet interaction data in the neighboring internet interaction data cluster that belong to the same tag as the target internet interaction data, the current interaction tag cluster of the target internet interaction data is determined.

6. The internet data analysis method based on big data as described in claim 5, characterized in that, The first analysis result includes the analysis method of the target internet interaction data covered in the internet interaction data cluster that needs to be analyzed and processed; the step of determining whether the target internet interaction data and the neighboring internet interaction data belong to the same tag by combining the credibility weights of the target internet interaction data and the neighboring internet interaction data in the first analysis result includes: Select one neighboring internet interaction data from the neighboring internet interaction data cluster corresponding to the target internet interaction data; In all the first analysis results, the number of first analysis results that simultaneously cover the target Internet interaction data and the filtered neighbor Internet interaction data is determined to be the first number; Among all the first analysis results that simultaneously cover the target Internet interaction data and the filtered neighbor Internet interaction data, the number of first analysis results that obtain the same analysis method for the target Internet interaction data and the same analysis method for the filtered neighbor Internet interaction data is determined to be the second number. By combining the credibility weight of the second number in the first number, it is determined whether the target Internet interaction data and the filtered neighbor Internet interaction data belong to the same label.

7. The internet data analysis method based on big data as described in claim 6, characterized in that, The step of determining whether the filtered target internet interaction data and the neighboring internet interaction data belong to the same tag by combining the credibility weight of the second number in the first number includes: In response to the second number having a confidence weight in the first number exceeding a specified vector, it is determined that the target Internet interaction data and the filtered neighbor Internet interaction data belong to the same label; In response to the fact that the confidence weight of the second number in the first number is not greater than the specified vector, it is determined that the target Internet interaction data and the filtered neighbor Internet interaction data do not belong to the same label.

8. The internet data analysis method based on big data as described in claim 1, characterized in that, The step involves re-analyzing the target internet interaction data within the internet interaction data clusters that require further analysis, using the current interaction tag clusters of each target internet interaction data cluster, to generate a final analysis result, including: By using the current interaction tag clusters of each of the target Internet interaction data, the correlation between two random target Internet interaction data is determined; Based on the correlation between two random target Internet interaction data, the target Internet interaction data in the Internet interaction data cluster that needs to be analyzed and processed is identified into no less than one tagged Internet interaction data cluster; the tagged Internet interaction data cluster includes no less than two target Internet interaction data whose correlation exceeds the correlation target value; The at least one tagged Internet interaction data cluster is integrated to generate the final analysis result of the Internet interaction data cluster that needs to be analyzed and processed; The correlation between the two random target internet interaction data is determined in the following ways: Two target internet interaction data are randomly selected from the plurality of target internet interaction data, one of which is designated as the first internet interaction data and the other as the second internet interaction data; The first neighbor set is determined by combining the complementary queue of the current interactive tag cluster of the second Internet interactive data with the shared features between the current interactive tag cluster of the first Internet interactive data; The first neighbor set is a phased set of the current interactive tag clusters of the first Internet interactive data; The second neighbor set is determined by combining the complementary queue of the current interactive tag clusters of the first Internet interactive data with the shared features between the current interactive tag clusters of the second Internet interactive data; The second neighbor set is a phased set of the current interactive tag clusters of the second Internet interactive data; By combining the first neighbor set and the second neighbor set with the current interaction tag clusters of the first Internet interaction data and the second Internet interaction data respectively, the correlation between the first Internet interaction data and the second Internet interaction data is determined; The step of determining the correlation between the first internet interaction data and the second internet interaction data by combining the first neighbor set and the second neighbor set with the current interaction tag clusters of the first internet interaction data and the second internet interaction data, respectively, includes: By combining the first neighbor set and the second neighbor set with the current interaction tag cluster of the first Internet interaction data and the current interaction tag cluster of the second Internet interaction data in turn, the commonality coefficient comparison results of the first Internet interaction data and the second Internet interaction data are determined one by one; In the comparison results of the common coefficients of the first Internet interaction data and the second Internet interaction data, the correlation between the first Internet interaction data and the second Internet interaction data is determined based on the common coefficient comparison result with the largest comparison vector. The step of combining the first neighbor set and the second neighbor set with the current interaction tag clusters of the first internet interaction data and the second internet interaction data, respectively, to determine the commonality coefficient comparison results of the one-to-one correspondence between the first internet interaction data and the second internet interaction data, includes: By combining the current interaction tag cluster of the first Internet interaction data and the current interaction tag cluster of the second Internet interaction data with the first neighbor set and the second neighbor set in sequence, the corresponding correlation metric values ​​between the first Internet interaction data and the second Internet interaction data with the first neighbor set and the second neighbor set in sequence are determined; By combining the correlation metric values ​​corresponding to the first Internet interaction data and the first neighbor set and the comparison vector between the correlation metric values ​​corresponding to the first Internet interaction data and the second neighbor set, the commonality coefficient comparison result of the first Internet interaction data is determined. By combining the relevant metric values ​​corresponding to the second Internet interaction data and the first neighbor set, and the comparison vector between the relevant metric values ​​corresponding to the second Internet interaction data and the second neighbor set, the commonality coefficient comparison result of the second Internet interaction data is determined. The final analysis result includes several key tags, each key tag including at least one target internet interaction data; the integration of the at least one tag internet interaction data cluster to generate the final analysis result of the internet interaction data cluster to be analyzed and processed further includes: Based on whether two random tag internet interaction data clusters cover the same target internet interaction data, the not less than one tag internet interaction data cluster is integrated to generate not less than one key tag; wherein, each key tag includes not less than one tag internet interaction data cluster; When the key tag includes several tag Internet interaction data clusters, two random tag Internet interaction data clusters among the several tag Internet interaction data clusters cover the same target Internet interaction data; If two of the aforementioned tagged internet interaction data clusters contain the same target internet interaction data, then all the target internet interaction data contained in the two analyzed internet interaction data clusters are combined to form the key tag; belonging to the same tag.

9. An internet data analysis system based on big data, characterized in that, The method includes a processor and a memory that communicate with each other, the processor being configured to read a computer program from the memory and execute it to implement the method of any one of claims 1-8.

Citation Information

Patent Citations

  • Data calibration method and device

    CN104391934A

  • Offline anti-cheating method and device, electronic equipment and readable storage medium

    CN112101993A