Method and device for determining web page type
A type of web page and web page technology, applied in the computer field, can solve problems such as difficulty in analyzing web page content, affect accuracy, and low efficiency, and achieve obvious effects, improve efficiency and speed, and have a wide range of applications
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0062] After analyzing the user's search behavior, it is found that after the user submits a query for search, the webpage clicked in the search results can usually reflect the needs of the user, and in turn, the query corresponding to the clicked webpage can also reflect the type of the webpage. Based on this, the method provided by the invention is as figure 1 As shown, it mainly includes the following steps:
[0063] Step 101: Obtain all queries corresponding to when the webpage to be identified is clicked in the search log.
[0064] In the embodiment of the present invention, all queries corresponding to the webpage to be identified are collected when the webpage to be identified is clicked. These queries reflect the type of the webpage to be identified. Therefore, the feature vector of the webpage to be identified is determined through these queries.
[0065] In addition, usually when a user clicks on a webpage after searching, it is largely influenced by the title of th...
Embodiment 2
[0084] figure 2 The flow chart of the method for obtaining the preset type of training corpus provided in Embodiment 2 of the present invention, such as figure 2 As shown, the acquisition method for a certain type of training corpus includes the following steps:
[0085] Step 201: Obtain the type of seed query.
[0086] It is enough that the seed query can fully reflect the requirements of this type. Since the number of seed queries does not need to be very large, usually dozens of them are enough, so manual configuration can be used.
[0087] Taking the recipe class as an example, the configured seed query can be: recipes of home-cooked dishes, recipes of home-cooked dishes, recipes, common recipes, Sichuan cuisine recipes, etc. For the convenience of understanding and examples, here are two seed queries "recipes of home-cooked dishes" and "recipes of home-cooked dishes" as examples.
[0088] Step 202: Obtain the clicked url corresponding to the seed query in the search ...
Embodiment 3
[0100] In this embodiment, the type of the webpage to be identified is determined by calculating the overlap rate between the feature vector of the webpage to be identified and the feature vectors of each preset type.
[0101] In this case, the way to obtain the feature vectors of each preset type from each preset type of training corpus is to determine each n-gram of each preset type of training corpus, count the number of occurrences of each n-gram and based on each n The number of occurrences of -gram determines the weight of each n-gram, so as to obtain the feature vector of each preset type. The weight of the n-gram may be the ratio of the number of occurrences of the n-gram to the total number of occurrences of all n-grams.
[0102] When determining the n-grams of the training corpus, in order to prevent the ambiguity caused by too small granularity, n-grams with larger granularity or even the entire query can be used, for example, 3-gram, 4-gram, etc. are used.
[0103...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com