Calculation method and device for association degree of words and web page

A computing method and technology of a computing device are applied in the field of word crawling, which can solve the problems of ignoring the information contained in the web page title and the low calculation accuracy of the IDF correlation degree, and achieve the effect of improving the correlation accuracy.

Active Publication Date: 2016-06-29
NAT UNIV OF DEFENSE TECH
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The purpose of the present invention is to provide a method and device for calculating the degree of association between words and webpages. This invention solves the problems that the calculation results of the degree of association in the prior art are easily affected by the content of the text set, the calculation accuracy of the degree of IDF association is not high, and the title of the webpage is ignored. Technical issues with the information contained

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Calculation method and device for association degree of words and web page
  • Calculation method and device for association degree of words and web page
  • Calculation method and device for association degree of words and web page

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0034] The accompanying drawings constituting a part of this application are used to provide further understanding of the present invention, and the schematic embodiments and descriptions of the present invention are used to explain the present invention, and do not constitute an improper limitation of the present invention.

[0035] see figure 1 On the one hand, the present invention provides a method for calculating the degree of association between a word and a webpage, comprising the following steps:

[0036] Step S100: Read the title and text content of the webpage, perform word segmentation and part-of-speech tagging, obtain the text word list bodyList and the title word list titleList, and filter and preprocess the text word segmentation list bodyList and the title word segmentation list titleList respectively;

[0037] Step S200: building a word connection set linkMap;

[0038] Step S300: Correct the word connection set linkMap according to the title word list titleLi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a calculation method and device for the association degree of words and a web page.Word segmentation and pretreatment are performed to the title of the web page and the text content, headwords and text words are used for constructing a word connection set to be used for calculating the TextRank score of the words, and the TextRank score is taken as the association degree of the words and the web page and saved in a database.According to the method, the TextRank score is taken as the association degree, the association relationship of the words and the web page can be effectively reflected, the words of the title are used for amending the word connection set constructed on the basis of the text words, and the amended word connection set is used for calculating the TextRank score of the words, the significance of the title in the web page information is fully considered, and the association precision can be improved favorably.

Description

technical field [0001] The present invention relates to the technical field of crawling words, in particular to a method and device for calculating the degree of association between words and webpages. Background technique [0002] With the rapid development of the Internet, a large amount of news information is generated every day, and the information is disseminated in the form of HTML webpage documents on the Internet. However, massive amounts of information have brought great challenges to users' efficient retrieval and acquisition of information. The emergence of various search engines and recommendation systems and other applications has provided them with effective ways, and the establishment of associations between words and web pages is the key to these The basis of the application. The relationship between a word and a web page is expressed by the degree of association. Currently, the degree of association between a word and a web page is mainly represented by the...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27
CPCG06F40/216
Inventor 刘忠陈发君黄金才朱承修保新程光权陈超冯旸赫
Owner NAT UNIV OF DEFENSE TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products