Information push method, device and system
An information push and pre-push technology, applied in the field of communications, can solve problems such as spam, and achieve the effect of avoiding spam and improving accuracy
Inactive Publication Date: 2010-10-20
HUAWEI TECH CO LTD
2 Cites 80 Cited by
AI-Extracted Technical Summary
Problems solved by technology
[0008] Existing technology 1 provides users with messages and/or content associated with their location and/or location information, and such messages and/or content only rely on the analysis of the use...
Abstract
The embodiment of the invention relates to an information push method in the field of communication, which comprises the following steps of: acquiring web pages which a user are interested in; acquiring user interests according to the web pages which the user are interested in; and determining information to be pushed to the user according to the user interests. The embodiment of the invention also provides an information push device and an information push system. The embodiment of the invention can acquire the user interests according to the web pages that the user browses so as to push the information to the user according to the user interests, which effectively improves push accuracy and avoids the user receiving a large amount of uninterested junk information.
Application Domain
Special data processing applications
Technology Topic
Acquired web
Image
Examples
- Experimental program(1)
Example Embodiment
[0033] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. It is understandable that the described embodiments are only a part of the embodiments of the present invention, not all of them. example. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present invention.
[0034] The embodiment of the present invention provides an information pushing method, which includes the following steps:
[0035] Get web pages that users are interested in;
[0036] Obtaining user interests based on the web pages that the user is interested in;
[0037] The information pushed to the user is determined according to the user's interest.
[0038] The information pushing method provided by the embodiment of the present invention can push information to the user in combination with the user's interest, and avoid the user from receiving a large amount of spam information.
[0039] Another embodiment of the present invention provides an information pushing method to figure 1 As shown in the example, it includes the following steps:
[0040] Step 10: Obtain web pages that are of interest to users
[0041] The obtaining of webpages that the user is interested in includes: counting each webpage visited by the user within a time window; determining the category to which the text content of each webpage visited by the user within the time window belongs; and counting the frequency of the user's access to each category of webpage , Determine that the webpage whose visit frequency meets the specified threshold is the webpage that the user is interested in. It also includes: dynamically adjusting the size of the time window according to the webpage browsing speed of the user.
[0042] The statistics of each webpage visited by the user in a time window starts with the current browsing time of the user, and a time range conforming to the user's browsing speed habit is used as a benchmark, and each webpage visited by the user within the time range is analyzed. The size of the time window should ensure that it can reflect the user’s current time’s concentrated interest, and since different users have different browsing speeds and habits, the initial value of the time window can be set to a fixed value, and then it can be set according to the user’s The browsing speed and habits are automatically adjusted. A method for adjusting the size of a time window provided by an embodiment of the present invention is as follows:
[0043] 1) Statistic user history access density Among them, T is a historical period of time, and M is the number of web pages browsed by the user in the period of T;
[0044] 2) The initial time window setting value is Among them, α is an empirical value used to adjust the size of the time window, and the size of α is adjustable. After a specified period of time, the total number of views is counted, according to the formula Adjust α;
[0045] 3) After a certain period of time, recalculate the user's access density in a new period of time,
[0046] 4) The adjusted time window value is:
[0047] It can be understood that the method for adjusting the time window is not limited to this. Other adjustment schemes that can be easily thought of by those skilled in the art based on the current description of the embodiment of the present invention are within the protection scope of the present invention. For example, it may be specified that the current time window In the internal statistics, when the number of webpages visited by users meets a prescribed threshold, the time window is adjusted to be smaller by a specific value, etc., or when it is lower than a prescribed threshold, the time window is increased by a specific value.
[0048] Obtain the text content of the webpage according to the Uniform Resource Locator (URL, Uniform Resource Locator) of the webpage accessed by the user within the time window, and determine the category to which the text content belongs; perform statistics on the access frequency of each category of webpage, and determine that the access frequency meets The webpage with the specified threshold is the webpage that the user is interested in.
[0049] The process of obtaining the text content of the webpage according to the URL address of the webpage accessed by the user within the time window includes: removing the URL address of the webpage accessed by the user from useless webpages and inaccessible webpages to obtain a filtered URL address, The filtered URL address is linked, and the page title and text information are extracted. The useless web pages include portal website homepages and navigation website homepages that do not contain specific text content. The following shows the general distribution of the text information of a web page source file:
[0050]
[0051] Among them, link 4 and link 5 are both link information and text information.
[0052] match Obtain the title information, the text and useful link information obtained from the above web pages, such as text 1, link 4, text 2, link 5, text 3, that is, the text content obtained through the URL address includes the web page corresponding to the URL address Title and body information.
[0053] The determination of the category to which the text content belongs is mainly based on the obtained title and body information of the web page in comparison with a predefined subject category to determine the category of the web page.
[0054] Multiple subject categories can be pre-defined, such as sports, catering, IT, real estate, cars or travel, etc. In the embodiment of the present invention, the method for determining the category of the text content of the webpage according to the obtained title and body information of the webpage may adopt existing technical solutions, such as the existing decision tree method, Support Vector Machine (SVM, Support Vector Machine), or Naive Baye. This embodiment of the present invention does not limit this.
[0055] The statistical frequency is to count the number of times that users visit webpages of the same category, obtain the user visit frequency of each category of webpages, and determine that webpages whose visit frequency meets a prescribed threshold are webpages of interest to the user.
[0056] Step 11: Obtain user interest based on the webpage that the user is interested in;
[0057] Obtaining user interest based on the webpages that the user is interested in includes: performing text analysis on the webpages that the user is interested in to obtain the topic of the webpage; merging the topics with repeated keywords to form a topic group, and combining the topics The topic of the keyword with the largest weight among the repeated keywords in the group is taken as the core topic of the topic group, and the attention degree of the topic group is calculated; the topic group whose attention degree meets the prescribed threshold is determined as the user interest.
[0058] The subject includes information such as events, keywords, and categories. Therefore, to obtain the subject of a web page, it is necessary to obtain events, keywords, and categories first. For the category, it has been determined in step 10. The event includes time, location, person, and event name; the process of obtaining the event includes the process of extracting time, location, person, and event name. The extraction method can use the existing mature technology, which will not be listed here. The process of extracting time, place, person and event name provided by the embodiment of the present invention is as follows:
[0059] Character extraction: using an existing Hidden Markov Model (HMM, Hidden Markov Model) statistical model extraction, the HMM realization idea is to accept the word object once, use the trained model parameters, and use the filtering and decoding algorithm to obtain the word segmentation sequence The optimal internal state sequence is then used to identify the named entity by combining the states of the optimal state sequence, that is, the person is extracted.
[0060] Time extraction: specific methods of time extraction include:
[0061] 1) Count the most frequently occurring time feature words, such as "year, month, day", etc., and store them in a feature word table timeWords;
[0062] 2) Use the rough segmentation result of the word segmentation software to traverse it, and if it encounters a time characteristic word, then recognize the time word;
[0063] 3) Repeat step 2) until the entire array is traversed. If the time word is recognized, the recognized time word array is returned, otherwise a null value is returned, and the time word recognition process ends.
[0064] Location extraction: Establish two feature word lists, forwardWords and signWords, where forwardWords refers to the forward words of the location, that is, the words that generally appear in front of the location word, and the words in signWords refer to some identifier words to improve the recognized location The specific location extraction steps are as follows:
[0065] 1) Use word segmentation software to roughly segment sentences, and the result of segmentation is stored in an object array of Word class;
[0066] 2) Traverse the Word array, if it encounters a location word separated by word segmentation, store it in the location array, if the word encountered belongs to the forwardWords vocabulary, trigger the rule for location recognition, and store the identified location in the array;
[0067] Repeat 2) until one sentence is traversed. If the location array is not empty, return the array; if the location array is empty, that is, no location word is recognized, and no words in the word segmentation result are identified as location words, then return empty Value, the location identification process ends.
[0068] Event name extraction: call the event name thesaurus to extract the event name, that is, match the web page content with the event thesaurus, and extract the event name that appears in the web page.
[0069] Combine the obtained time, place, person, and event name to obtain an event on a web page.
[0070] The keyword refers to one or several words that can summarize the purpose of the article. The keyword extraction process includes:
[0071] Perform word segmentation on the full text, filter out stop words (words with less semantic meaning, such as function words and some high-frequency words, stop words appear in many files, so they do not contribute to information analysis); extract text titles and Extract the text content of the first paragraph, second paragraph, and last paragraph of the text; determine whether the text title is "concrete type", if it is not "concrete type", that is, "abstract type", then analyze the text content to obtain keywords, such as using the TFIDF method Search for words with a weight higher than a certain threshold in the full text as candidate words, and then judge whether the word is a keyword based on the position of the candidate word. The higher the weight of the sentence, the greater the possibility of becoming a keyword; if it is "Specific type", analyze the title to obtain keywords. For example, after the title is segmented, the nouns and verbs obtained are the keywords of the text. When calculating the sentence weight, the words in the title word list are given a larger weight scale factor.
[0072] Combine the above-mentioned extracted events, keywords, and webpage categories as the subject of the webpage, record the subject in a vector form, called a subject vector, generate topic groups based on the topic vectors, and calculate the user’s attention to each topic group, It is determined that the topic group whose attention degree meets the prescribed threshold is user interest. The specific method includes the following steps:
[0073] 1) Add the keywords in all theme vectors under the same category to the keyword list K under this category;
[0074] 2) Unify the repeated keywords that appear in the joining process, that is, record only once in K. The repeated keywords trigger the aggregation of candidate similar topics, that is, all the theme vectors to which the repeated keywords belong are merged together to form A topic group;
[0075] 3) For the topic group where each repeated keyword is located, compare the original weight of the keyword in each topic vector of the topic group, and find the topic vector with the largest weight as the core topic of the topic vector of the topic group ;
[0076] 4) Calculate the user's attention to the topic group and add it to the candidate hot topic list cad idateTopic along with the core topic;
[0077] The calculation formula of the user's attention degree Hot(T) for the topic group T is as follows:
[0078] Hot ( T ) = X i = 1 N Valid i / Max ,
[0079] Among them, Hot(T) represents the user's attention to the topic group T;
[0080] Indicates the effective click-through rate of all web pages included in the topic group T;
[0081] N represents the number of web pages contained in topic group T;
[0082] Max represents the maximum effective click rate of web pages in the system;
[0083] Valid i Indicates the effective click-through rate of the i-th webpage contained in the topic group T. The calculation formula is as follows:
[0084] Valid i =e (-distance/danpNumber)
[0085] dampNumber is the damping coefficient, which can be adjusted according to the actual situation, for example, the value is 2;
[0086] distance is the effective time interval;
[0087] distance=(currentTime-occurTime)/(1000*TimeNumberOfOneDay);
[0088] currentTime is the current statistical time of the system in milliseconds;
[0089] OccurTime is the click time of the webpage in milliseconds;
[0090] TimeNumberOfOneDay: The number of seconds in a day, equal to the constant 86400 seconds;
[0091] 5) Calculate the attention degree of each topic group according to the above steps 1)-4), and determine that the topic group whose attention degree meets the specified threshold is the user's interest.
[0092] The user interest includes: interest item, interest category, attention degree and generation time; in specific implementation, user interest can be expressed as a tree structure, the upper layer of the tree structure represents the type of user interest, and the lower layer represents the current Interest subcategories or topics. The tree structure can not only save the user's interest type information, but also save the user's interest feature words. For example, user A, his interest preferences can be expressed as figure 2 Shown
[0093] Step 12: Determine the information pushed to the user according to the user's interest;
[0094] The information to be pushed to the user based on the user's interest includes:
[0095] Determine whether the user interest meets the localized information push service; the method of determining whether the user interest category includes predetermined localized push content, for example, whether it includes any one or more of the following content: weather, traffic query, Book tickets, discounts, travel classics and/or specialty products, etc.
[0096] If it does not meet the localized information push service, then use the content-associated push method to push the information to the user; that is, match the user interest with the corresponding pre-push information; push the pre-push whose matching degree with the user’s interest meets a predetermined threshold Information to the user. The method for calculating the matching degree between user interest and pre-push information can be: For pre-push information that already has category information, if the classification system is consistent with the user interest classification, the existing subtree matching algorithm is used to calculate the matching degree; If the user interest classification is inconsistent, the pre-push information is corrected, and then the subtree matching algorithm is adopted. Modifying the pre-push information is an existing technology, that is, modifying the pre-push information subtree, which may specifically include: querying the root node t-root of the pre-push information subtree in the user interest tree, and if t-root is found, the two trees continue Downward nodes are matched. If a mismatch is encountered during the process, the pre-push information subtree will remove the unmatched nodes, and continue to match down nodes until the leaf nodes of the pre-push information subtree form a new pre-push information subtree; if If no corresponding node is found from t-root, the matching degree is 0; for pre-push information without category information, the same classification method as user interest classification is used to classify the pre-push information to form a pre-push information subtree, Then use the subtree matching algorithm. If the classification cannot be performed, the matching degree is 0.
[0097] If it conforms to the localized information push service, push information to the user in a location-associated push mode; specific steps may include: obtaining user browsing location information; extracting the pre-push information associated with the browsing location information from the pre-push information, In the embodiment of the present invention, the pre-push information associated with the browsing location information is referred to as location-associated information; the user interest is matched with the location-associated information; and the matching degree with the user interest meets a predetermined threshold. The location-related information is provided to the user (the matching degree calculation method is the same as that described in the above-mentioned content-related push method). If there is no location-related information whose matching degree meets the predetermined threshold, extract the pre-push information whose matching degree with the user's interest meets the specified threshold from the pre-push information (the matching degree calculation method is the same as that described in the above content-related push method) , And the location information corresponding to the pre-push information whose matching degree with the user’s interest satisfies a prescribed threshold; calculate the recommended route from the user’s browsing location to the location corresponding to the location information; push the location information corresponding to the location information The pre-push information, the location information, and the recommended route. The calculation of the recommended route can be: in the established road network space model (the establishment of the road network space model is an existing technology), the existing path algorithm can be used according to the needs of the user (the shortest distance, high speed priority, etc.) Recommended route, such as shortest path algorithm, Dijkstra algorithm or A * Algorithms, etc., which are not specifically limited by the present invention.
[0098] The embodiments of the present invention can obtain user interest according to the webpages browsed by the user, thereby pushing information to the user according to the user interest, and can combine the user interest with the user's browsing position, effectively improving the accuracy of the push, and avoiding the user from receiving a large number of uninterested Spam.
[0099] The embodiment of the present invention also provides an information push device to image 3 The examples shown in include:
[0100] The first obtaining unit 30 is configured to obtain webpages of interest to the user;
[0101] The second obtaining unit 31 is configured to obtain user interests based on the web pages that the user is interested in;
[0102] The pushing unit 32 is configured to determine the information pushed to the user according to the user's interest.
[0103] Such as Figure 4 As shown, the first acquiring unit 30 further includes:
[0104] The time window determination and adjustment subunit 301 is used to determine the time window for obtaining the user's browsing webpage, and dynamically adjust the time window;
[0105] The first obtaining subunit 302 is configured to obtain the webpages visited by the user within the time window determined by the time window determination and adjustment subunit;
[0106] The webpage classification subunit 303 is used to determine the category to which the text content of each webpage visited by the user belongs;
[0107] The first statistics sub-unit 304 is used to count the frequency of users accessing various types of webpages;
[0108] The first determining subunit 305 is configured to determine that a webpage whose access frequency meets a prescribed threshold is a webpage of interest to the user.
[0109] Such as Figure 5 As shown, the second acquiring unit 31 further includes:
[0110] The analysis subunit 311 is used to perform a text analysis on the webpage that the user is interested in to obtain the subject of the webpage;
[0111] The second statistical subunit 312 is used for merging the topics with repeated keywords to form a topic group, and taking the topic of the keyword with the largest weight among the repeated keywords in the topic group as the topic group The core theme of the topic, and calculate the attention of the topic group;
[0112] The second determining subunit 313 is configured to determine that the topic group whose attention degree meets the prescribed threshold is the user's interest.
[0113] Such as Image 6 As shown, the pushing unit 32 further includes:
[0114] The first judging subunit 321 is used to judge whether the user's interest complies with the localized information push service;
[0115] The first pushing sub-unit 322 is configured to determine in the first judging sub-unit 321 that the user's interest does not meet the localized information push service, and push information to the user in a content-related push mode; or the second pushing sub-unit 323 , Used for judging that the user's interest conforms to the localized information push service in the first judging subunit, and push information to the user in a location-associated push mode.
[0116] Such as Figure 7 As shown, the first pushing subunit 322 further includes:
[0117] The first matching submodule 3221 is configured to match the user interest with corresponding pre-push information;
[0118] The first push sub-module 3222 is configured to push the pre-push information whose degree of matching with the user's interest meets a predetermined threshold to the user.
[0119] Such as Figure 8 As shown, the second pushing subunit 323 further includes:
[0120] The location analysis submodule 3231 is used to obtain user browsing location information;
[0121] The location-related information acquisition sub-module 3232 is configured to extract location-related information associated with the browsing location information from the pre-push information;
[0122] The second matching submodule 3233 is configured to match the user interest with the location-related information;
[0123] The second pushing sub-module 3234 is configured to push location-related information that matches the user's interest with a predetermined threshold to the user.
[0124] The second pushing subunit 323 further includes:
[0125] The third matching submodule 3235 is configured to extract, from the pre-push information, the location information corresponding to the pre-push information whose matching degree of the user's interest meets a prescribed threshold;
[0126] The calculation sub-module 3236 is used to calculate a recommended route from the user's browsing location to the location corresponding to the location information;
[0127] The third pushing submodule 3237 is configured to push the location information, the pre-push information corresponding to the location information, and the recommended route.
[0128] The information pushing device according to the embodiment of the present invention can obtain user interest according to the webpages browsed by the user, thereby pushing information to the user according to the user interest, and can combine the user interest with the user's browsing position, which effectively improves the accuracy of pushing and avoids users Receive a lot of uninteresting spam.
[0129] Another embodiment of the present invention provides an information push system, such as Picture 9 As shown, the system includes: a first database 90, a second database 91, and the aforementioned information pushing device 92;
[0130] The second database 91 is used to store pre-push information;
[0131] The information pushing device 92 is configured to obtain user interest according to the webpage currently browsed by the user, and determine the pre-push information to be pushed to the user according to the user interest;
[0132] The first database 90 is used to store user interests.
[0133] The system may also include:
[0134] The gateway device 93 is configured to provide the webpage currently browsed by the user to the information pushing device 92.
[0135] The embodiment of the present invention can obtain user interest according to the webpages browsed by the user, thereby pushing information to the user according to the user interest, and can combine the user interest with the user's browsing position, effectively improving the accuracy of the push, and avoiding the user from receiving a large number of uninterested Spam.
[0136] In summary, the embodiments of the present invention can obtain user interest based on the webpages the user browses, thereby pushing information to the user based on the user’s current information, and can combine the user interest with the user’s browsing location, which effectively improves the accuracy of the push and avoids Users receive a lot of uninteresting spam.
[0137] A person of ordinary skill in the art can understand that all or part of the steps in the methods of the foregoing embodiments can be implemented by a program instructing related hardware, and the program can be stored in a computer-readable storage medium. The readable storage medium is, for example, read only memory (ROM for short), random access memory (RAM for short), magnetic disk, optical disk, etc.
[0138] The above are only preferred specific embodiments of the present invention, but the protection scope of the present invention is not limited thereto. Any person skilled in the art can easily think of changes or changes within the technical scope disclosed by the present invention. All replacements shall be covered within the protection scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.
PUM


Description & Claims & Application Information
We can also present the details of the Description, Claims and Application information to help users get a comprehensive understanding of the technical details of the patent, such as background art, summary of invention, brief description of drawings, description of embodiments, and other original content. On the other hand, users can also determine the specific scope of protection of the technology through the list of claims; as well as understand the changes in the life cycle of the technology with the presentation of the patent timeline. Login to view more.
Similar technology patents
Imaging apparatus and flicker detection method
Owner:RENESAS ELECTRONICS CORP
Techniques for sentiment analysis of data using a convolutional neural network and a co-occurrence network
Owner:ORACLE INT CORP
Emotion classifying method fusing intrinsic feature and shallow feature
Owner:CHONGQING UNIV OF POSTS & TELECOMM
Scene semantic segmentation method based on full convolution and long and short term memory units
Owner:UNIV OF ELECTRONIC SCI & TECH OF CHINA
Classification and recommendation of technical efficacy words
- improve accuracy
Golf club head with adjustable vibration-absorbing capacity
Owner:FUSHENG IND CO LTD
Stent delivery system with securement and deployment accuracy
Owner:BOSTON SCI SCIMED INC
Method for improving an HS-DSCH transport format allocation
Owner:NOKIA SOLUTIONS & NETWORKS OY
Catheter systems
Owner:ST JUDE MEDICAL ATRIAL FIBRILLATION DIV
Gaming Machine And Gaming System Using Chips
Owner:UNIVERSAL ENTERTAINMENT CORP