[0052] The technical solutions in the embodiments of the present application will be described clearly and completely in conjunction with the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are only a part of the embodiments of the present application, rather than all the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work fall within the protection scope of this application.
[0053] In order to achieve the purpose of reducing computing costs, this application provides a method for determining the page churn rate, such as figure 2 Shown, including steps:
[0054] S11. Obtain a record including the visitor ID, the visit website, the visit time and the last visit website in the visit log;
[0055] In this application, the records in the access log include user identification, access URL, access time, and last access URL. The record in the specific access log can be in the following form:
[0056] 1. Jack, www.alibaba.com, www.google.com, 12:00:01;
[0057] 2. Mike, www.alibaba.com, www.baidu.com, 12:00:02;
[0058] 3. Jack, www.alibaba.com/offerlist/mp3.html, www.alibaba.com, 12:01:01;
[0059] 4. Jack, www.alibaba.com/offerdetail/123.html, www.alibaba.com/offerlist/mp3.html, 12:02:02;
[0060] 5. Jack, www.alibaba.com/offerdetail/234.html, www.alibaba.com/offerlist/mp3.html, 12:03:01;
[0061] 6, Mike, www.alibaba.com/offerlist/mp3.html, www.alibaba.com, 12:04:02
[0062] 7, Jack, Jack, community.alibaba.com/, www.alibaba.com/, 12:04:31;
[0063] 8, Mike, www.alibaba.com/offerdetail/234.html, www.alibaba.com/offerlist/mp3.html, 12:05:31
[0064] 9, Jack, community.alibaba.com/help.html, community.alibaba.com/, 12:06:31
[0065] Among them, 1, 2, 3-9 are the identifiers of each record; Jack and Mike are the identifiers of the visiting users; the website adjacent to the user ID is the visit website currently visited by the user recorded in the record, such as the serial number 1. Www.alibaba.com in the record; the website after the current visit URL is the last visit website visited by the user, that is, the user jumped from the last visit website to the current visit website in this record, such as the serial number Www.google.com in the record of 1; in other words, after visiting the webpage www.google.com, the visitor jumped to the webpage www.alibaba.com from this URL; in addition, 12 in the record of serial number 1 :00:01 also means that the time to visit www.alibaba.com is 12:00:01.
[0066] S12. Determine a record set with the same user ID, and determine the record of the last access time in the record set as the target analysis record;
[0067] In order to construct the URL record of the set of access URLs visited by each user during each complete access process, it is necessary to create a record set of each user first, so that each record set includes only the records of the visiting user.
[0068] In addition, since only the last visited URL can be tracked from the record of the last visited URL, the set of visited URLs visited by the visitor during a complete visit can be constructed completely. Therefore, the record with the last time in the record set is taken as Target analysis record.
[0069] Specifically, the record set whose user ID is Mike can be determined in each record listed in step S11; then, the record with the last access time in the record set is determined as the target analysis record, so that record 8 can be determined , Mike, www.alibaba.com/offerdetail/234.html, www.alibaba.com/offerlist/mp3.html, 12:05:31 is the target analysis record.
[0070] S13. Determine the next target analysis record in the upper-level record obtained according to the previous access URL in the current target analysis record; use the next target analysis record as the current target analysis record; repeat this step until the current target analysis The last access URL in the record is recorded as an invalid access URL link;
[0071] In the current target analysis record, the last visited URL is included. According to the last visited URL, you can track to the upper level record, that is, jump to the source URL of the visited URL in the current target analysis record; for example, the current target analysis record 8, Mike, www.alibaba.com/offerdetail/234.html, www.alibaba.com/offerlist/mp3.html, 12:05:31, according to the last access URL, you can find the last access URL and more corresponding The record of 6, that is, 6, Mike, www.alibaba.com/offerlist/mp3.html, www.alibaba.com, 12:04:02; that is to say, you can find the record with the previous access URL as the access URL .
[0072] Preferably, in this application, when there are multiple upper-level records, the record corresponding to the visited URL in the upper-level record whose access time is the last is determined as the next target analysis record; this is because in many cases , Obtaining the upper-level record through the last visit URL will find multiple upper-level records. Since the real upper-level record is generally the record closest to its next-level record time, you can use the access time to Determine the real upper level record.
[0073] Since the visiting user is likely to have visited many webpages through a complete access path in the website, this step needs to be repeated until the previous access URL in the current target analysis record is recorded as an invalid access URL link, that is, To track the record corresponding to the initial visit URL of the visiting user, the initial visit record will no longer include a valid visit URL link. Specifically, the non-valid access URL link may include: the previous access URL record is empty or the previous access URL link is invalid. For example, take the multiple records listed in step S11 as an example, 9, Jack, community.alibaba.com/help.html, community.alibaba.com/, the upper record of 12:06:31 is 7, Jack , Jack, community.alibaba.com/, www.alibaba.com/, 12:04:31; and record 7, Jack, Jack, community.alibaba.com/, www.alibaba.com/, 12:04:31 ; The upper level record is 2, Mike, www.alibaba.com, www.baidu.com, 12:00:02; at this time, record 2, Mike, www.alibaba.com, www.baidu.com, 12 :00:02; The last visited URL www.baidu.com in is not included in the user's record set, which is an ineffective URL link, so at this time, the access path of a complete visit of the user is tracked .
[0074] S14. Arrange according to the acquisition order of the target analysis records, and construct a web address record of a collection of web addresses visited by the visiting user during one visit;
[0075] In the access path of a complete visit of the visiting user, the sequence of each visit to the website forms a website record, which can intuitively reflect the visiting behavior of the website of the visiting user. Specifically, the current target analysis records in step S13 can be arranged in the order of acquisition to obtain the URL record: 9, Jack,community.alibaba.comhelp.html/|community.alibaba.com/|www.alibaba.com/| www.baidu.com.
[0076] In the above-mentioned URL record, the record identifier 9 whose time is the last, the visiting user identifier Jack, and the various URLs of the visiting user at the time of the visit are community.alibaba.com/help.html, community.alibaba.com, www. alibaba.com, www.baidu.com; URLs at all levels can be separated by the symbol |, so that they can be identified as different URLs.
[0077] The recorded content of the URL record can have multiple recording methods, as long as it includes all the web pages visited by the visiting user in one visit according to the visiting order, which is not limited here.
[0078] After all the records in the access log are determined as target analysis records and the URL record is constructed, the constructed multiple URL record set includes the access path of each visit of each user in the access log; The record includes all the visited pages of the visiting user when visiting the website, and all the visited pages are arranged in the order of visit, so the URL record can be used as the visit path of the visiting user; because the visit path includes the complete visit webpage and the order of the visit Therefore, the amount of information included in the access path is much larger than the amount of information included in the path pair in the prior art, which can directly reflect the access behavior of the visiting user. Therefore, the statistical basis of the access path can effectively reduce the analysis of the website. Calculate the cost, thereby reducing the consumption of system resources.
[0079] Specifically, taking the calculation of the churn rate of webpage visits for example, when the prior art analyzes the churn rate of webpage visits to large websites, if the website is visited by 10 million people, at least 10 million access paths must be included, which is estimated to be split into There are 100 million path pairs. At this time, if 1,000 paths need to be calculated for analysis, 100 billion comparison calculations are required. It can be seen that the method in the prior art has a huge amount of calculation and extremely consumes system resources. According to the technical solution in this application, the establishment of the web site record collection is to splice log records. If the site is still visited by 10 million people, including 10 million access paths, for example, since the web site record records both the individual user orders The access path of the website visited for the first time, so only 10 million URL records need to be established; after the URL record collection is established, it can be easily realized by query statements. Specifically, two links can be calculated separately through SQL statements The number of related pages in the set of URL records, and then the ratio of the number of the two pages can be calculated to get the page loss rate between the two pages. Thereby greatly reducing the computational cost of website analysis, thereby reducing the consumption of system resources.
[0080] Further, in this application, in S14, the following steps are included after arranging according to the order in which the target analysis records are obtained, and constructing a set of access URLs visited by the visiting user during one visit:
[0081] S15. Among the records in the record set that have not been determined as target analysis records, the record with the last access time is determined as another target analysis record;
[0082] In order to avoid repeated analysis of records in the process of constructing the URL record of the set of visited URLs visited by each visitor during each complete visit process, it is necessary to record in the record set that has not yet been determined as the target analysis record To determine another target analysis record in the target analysis record, so as to determine the previous access URL of the access URL in the target analysis record based on the target analysis record, thereby establishing another URL record.
[0083] S16. Repeat steps 13 to 15 until all records are determined as target analysis records;
[0084] In order to analyze each record in the access log to construct a URL record of the set of access URLs visited by each visiting user during each visit, it is necessary to traverse each record in the access log to perform steps 13-15.
[0085] In this application, since there can be multiple user IDs, it is also possible to separately determine record sets with the same user ID, so as to determine the record set for the records of each user ID. Further, the records can be sorted according to the user identification.
[0086] Such as image 3 As shown, in this application, a method for determining the page churn rate is also provided, including the steps:
[0087] S21. Obtain a record including the visitor ID, the visit website, the visit time and the last visit website in the visit log;
[0088] S22. Determine a record set with the same user ID, and determine the record of the last access time in the record set as the target analysis record;
[0089] S23. Determine the next target analysis record in the upper-level record obtained according to the previous access URL in the current target analysis record; use the next target analysis record as the current target analysis record; repeat this step until the current target analysis The last access URL in the record is recorded as an invalid access URL link;
[0090] S24. Arrange according to the acquisition order of the target analysis records, and construct a web address record of a collection of web addresses visited by the visiting user during one visit;
[0091] S25. Among the records in the record set that have not been determined as target analysis records, the record with the last access time is determined as the target analysis record;
[0092] S26. Repeat steps S23 to S25 until all records are determined as target analysis records.
[0093] Since steps S21 to S26 in this application are figure 1 The corresponding steps S11 to S16 have the same content, and their principles and functions are also the same, so they will not be repeated here.
[0094] S27. Obtain the number of first access URLs and the number of second access URLs in the set of URL records, and calculate the ratio of the first access URL to the second access URL to obtain the first access URL to the second access The loss rate of the website; the second visited website is a website that can be accessed by the first visited website through at least one link jump.
[0095] When calculating the churn rate between two pages in a website, it is first necessary to confirm that the URLs of the two pages can be linked to jump, that is, access can be realized through at least one link jump. Specifically, the two pages for churn rate statistics may be the first access URL and the second access URL respectively.
[0096] Since the constructed URL records include the access records of the URLs in the website in each complete access behavior of all visiting users, so long as the number of first access URLs and the number of second access URLs in the set of URL records are obtained, Calculating the ratio of the first access URL to the second access URL can obtain the loss rate from the first access URL to the second access URL.
[0097] In summary, this application uses the last visited URL included in the record of the access log to track the URLs visited and the sequence of visits when the visitor visits the website, and then integrate the visit behavior of the visiting user , Construct the URL record of the set of access URLs visited by the visiting user during each visit. After the URL record is constructed, the number of records in the URL record collection of the two access URLs in the website can be directly compared to obtain the page loss rate between the two access URLs. Compared with the method in the prior art that needs to calculate the two access URLs in the webpage churn rate and each path pair in the access log, this application effectively reduces the calculation cost and improves the statistical efficiency of the webpage churn rate.
[0098] Such as Figure 4 As shown, in the present application, a device for determining an access path is also provided, including: a record acquisition unit 1, a target analysis record determination unit 2, a next target analysis record determination unit 3, and a visit URL collection acquisition unit 4, wherein :
[0099] The record obtaining unit 1 is used to obtain a record including the visiting user ID, the visiting website, the visiting time and the last visiting website in the visit log;
[0100] In this application, the records in the access log include user identification, access URL, access time, and last access URL. The record in the specific access log can be in the following form:
[0101] 1. Jack, www.alibaba.com, www.google.com, 12:00:01;
[0102] 2. Mike, www.alibaba.com, www.baidu.com, 12:00:02;
[0103] 3. Jack, www.alibaba.com/offerlist/mp3.html, www.alibaba.com, 12:01:01;
[0104] 4. Jack, www.alibaba.com/offerdetail/123.html, www.alibaba.com/offerlist/mp3.html, 12:02:02;
[0105] 5. Jack, www.alibaba.com/offerdetail/234.html, www.alibaba.com/offerlist/mp3.html, 12:03:01;
[0106] 6, Mike, www.alibaba.com/offerlist/mp3.html, www.alibaba.com, 12:04:02
[0107] 7, Jack, Jack, community.alibaba.com/, www.alibaba.com/, 12:04:31;
[0108] 8, Mike, www.alibaba.com/offerdetail/234.html, www.alibaba.com/offerlist/mp3.html, 12:05:31
[0109] 9, Jack, community.alibaba.com/help.html, community.alibaba.com/, 12:06:31
[0110] Among them, 1, 2, 3-9 are the identifiers of each record; Jack and Mike are the identifiers of accessing users; the URLs adjacent to the user identifiers are the access URLs recorded by the record, such as the record with serial number 1. Www.alibaba.com; the URL after the access URL is the last access URL visited by the user, that is, the user jumped from the previous access URL to the access URL in this record, such as the record with serial number 1. In other words, after visiting the webpage www.google.com, the visitor jumped to the webpage www.alibaba.com from this URL; in addition, 12:00:01 in the record with serial number 1 It also indicates that the time to visit www.alibaba.com is 12:00:01.
[0111] The target analysis record determining unit 2 is used to determine a set of records with the same user ID. Among the records that have not been determined as target analysis records in each of the record sets, a target analysis record is determined starting from the record of the last access time ;
[0112] In order to construct the URL record of the set of access URLs visited by each user during each complete access process, the first target analysis record determining unit 2 needs to first establish a record set of each user. In this way, in each record set Only the records of the visiting user are included.
[0113] In order to avoid repetitive analysis of the record target analysis record determining unit 2 in the process of constructing the URL record of the set of visited URLs visited during each complete visit process of each visiting user, it is necessary that the record set has not been determined as Among the records of target analysis records, the record with the last access time is determined as the target analysis record.
[0114] In addition, since only the last visited URL can be tracked from the record of the last visited URL, the set of visited URLs visited by the visitor during a complete visit can be constructed completely. Therefore, the record with the last time in the record set is taken as Target analysis record.
[0115] Specifically, the record set whose user ID is Mike can be determined among the records listed above; then, the record whose access time is the last in the record set is determined as the target analysis record, so that record 8. Mike , Www.alibaba.com/offerdetail/234.html, www.alibaba.com/offerlist/mp3.html, 12:05:31 is the target analysis record.
[0116] In this application, the target analysis record determination unit may specifically include a sorting module, which is used to sort the records according to user identifications.
[0117] The next target analysis record determining unit 3 is configured to determine the next target analysis record from the upper-level record obtained according to the previous access URL in the current target analysis record; and use the next target analysis record as the current target analysis record ; Repeat the determination of the next target analysis record until the previous access URL record in the current target analysis record is an invalid access URL link;
[0118] In the current target analysis record, the last visited URL is included. According to the last visited URL, you can track to the upper level record, that is, jump to the source URL of the visited URL in the current target analysis record; for example, the current target analysis record 8, Mike, www.alibaba.com/offerdetail/234.html, www.alibaba.com/offerlist/mp3.html, 12:05:31, according to the last access URL, you can find the last access URL and more corresponding The record of 6, that is, 6, Mike, www.alibaba.com/offerlist/mp3.html, www.alibaba.com, 12:04:02; that is to say, you can find the record with the previous access URL as the access URL .
[0119] Preferably, in the present application, the next target analysis record determination unit 3 may specifically include a time determination module for determining that the access time corresponds to the access URL in the last upper-level record when there are multiple upper-level records. The record is the next target analysis record.
[0120] In many cases, obtaining the upper-level record through the previous access URL will find multiple upper-level records. Since the real upper-level record is generally the record closest to its next-level record time, you can access Time to determine the real upper level record.
[0121] Since the visiting user is likely to have visited many webpages through a complete access path in the website, this step needs to be repeated until the previous access URL in the current target analysis record is recorded as an invalid access URL link, that is, To track the record corresponding to the initial visit URL of the visiting user, the initial visit record will not include a valid visit URL link. Specifically, the non-valid access URL link may include: the previous access URL record is empty or the previous access URL link is invalid. For example, take the multiple records listed above as an example, 9, Jack, community.alibaba.com/help.html, community.alibaba.com/, the upper level record of 12:06:31 is 7, Jack, Jack , Community.alibaba.com/, www.alibaba.com/, 12:04:31; and record 7, Jack, Jack, community.alibaba.com/, www.alibaba.com/, 12:04:31; The upper level record is 2, Mike, www.alibaba.com, www.baidu.com, 12:00:02; at this time, record 2, Mike, www.alibaba.com, www.baidu.com, 12:00 :02; The last visited URL www.baidu.com is not included in the user's record collection, and is an invalid URL link, so the access path of a complete visit of the user has been tracked at this time.
[0122] The access web address collection obtaining unit 4 is used to arrange the access web addresses according to the order of obtaining the target analysis records to construct a web address collection of the web addresses visited by the visiting user during one visit.
[0123] In the access path of a complete visit of the visiting user, the sequence of each visit to the website forms a website record, which can intuitively reflect the visiting behavior of the website of the visiting user. Specifically, the current target analysis records can be arranged in the order of acquisition to obtain the URL record: 9, Jack,community.alibaba.com/help.html/|community.alibaba.com/|www.alibaba.com/|www. baidu.com.
[0124] In the above-mentioned URL record, the record identifier 9 whose time is the last, the visiting user identifier Jack, and the various URLs of the visiting user at the time of the visit are community.alibaba.com/help.html, community.alibaba.com, www. alibaba.com, www.baidu.com; URLs at all levels can be separated by the symbol |, so that they can be identified as different URLs.
[0125] The recorded content of the URL record can have multiple recording methods, as long as it includes all the web pages visited by the visiting user in one visit according to the visiting order, which is not limited here.
[0126] After all the records in the access log are determined as target analysis records and the URL record is constructed, the constructed multiple URL record set includes the access path of each visit of each user in the access log; The URL record includes all the visited pages of the visiting user when they visit the website, and all the visited pages are arranged in the order of visit, so the URL record can be used as the access path of the visiting user; because the access path includes the complete access webpage and webpage access Therefore, the amount of information included in the access path is much larger than the amount of information included in the path pair in the prior art, which can directly reflect the access behavior of the visiting user, so the statistical basis of the access path can effectively reduce website analysis The calculation cost of the system, thereby reducing the consumption of system resources.
[0127] Specifically, taking the calculation of the churn rate of webpage visits for example, when analyzing the churn rate of webpage visits to large websites, if the website is visited by 10 million people, at least 10 million paths must be included, which is estimated to be split into 1. There are billions of path pairs. At this time, if 1,000 paths need to be calculated for analysis, 100 billion comparison calculations are required. It can be seen that the method in the prior art has a huge amount of calculation and extremely consumes system resources. With the technical solution in this application, after the set of URL records is established, it can be easily realized through query statements. Specifically, it can be calculated through SQL statements that the two linked pages are in the set of URL records. Number, and then calculate the ratio of the number of two web pages to get the page loss rate between the two web pages. Thereby greatly reducing the computational cost of website analysis, thereby reducing the consumption of system resources.
[0128] Such as Figure 5 As shown, in this application, a system for determining page churn rate is also provided, including image 3 The device for determining the access path and the statistical unit 5 in the corresponding embodiment;
[0129] The statistical unit 5 is configured to obtain the number of the first access URL and the number of the second access URL in the set of URL records, and calculate the ratio of the first access URL to the second access URL to obtain the first access URL to the second access URL. Churn rate of the visited website; the second visited website is an visited website that can be accessed by the first visited website through at least one link jump:
[0130] Since the device for determining the access path in this application is image 3 The corresponding devices for determining the access path have the same structure, and their principles and functions are also the same, so they will not be repeated here.
[0131] When calculating the churn rate between two pages in a website, it is first necessary to confirm that the URLs of the two pages can be linked to jump, that is, access can be realized through at least one link jump. Specifically, the two pages for churn rate statistics may be the first access URL and the second access URL respectively.
[0132] Since the constructed URL records include the access records of the URLs in the website in each complete access behavior of all visiting users, the statistical unit 5 obtains the number of first access URLs and the second access in the set of URL records The number of URLs, the ratio of the first access URL to the second access URL can be calculated to obtain the loss rate from the first access URL to the second access URL.
[0133] In summary, this application uses the last visited URL included in the record of the access log to track the URLs visited and the sequence of visits when the visitor visits the website, and then integrate the visit behavior of the visiting user , Construct the URL record of the set of access URLs visited by the visiting user during each visit. After the URL record is constructed, the number of records in the URL record collection of the two access URLs in the website can be directly compared to obtain the page loss rate between the two access URLs. Compared with the method in the prior art that needs to calculate the two access URLs in the webpage churn rate and each path pair in the access log, this application effectively reduces the calculation cost and improves the statistical efficiency of the webpage churn rate.
[0134] The above description of the disclosed embodiments enables those skilled in the art to implement or use this application. Various modifications to these embodiments will be obvious to those skilled in the art, and the general principles defined in this document can be implemented in other embodiments without departing from the spirit or scope of the application. Therefore, this application will not be limited to the embodiments shown in this text, but should conform to the widest scope consistent with the principles and novel features disclosed in this text.