Webpage topic sentence extraction method and apparatus

A technology for topic sentences and webpage titles, applied in the field of Internet applications, can solve the problems of low accuracy of topic sentences, achieve accurate partial order values, and improve the effect of selection accuracy

Active Publication Date: 2016-04-13
ALIBABA (CHINA) CO LTD
View PDF4 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In view of this, this application provides a method for extracting topic sentences from webpages to solve the technical problem of low accuracy of topic sentences determined in the technology

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Webpage topic sentence extraction method and apparatus
  • Webpage topic sentence extraction method and apparatus
  • Webpage topic sentence extraction method and apparatus

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0024] The following will clearly and completely describe the technical solutions in the embodiments of the application with reference to the drawings in the embodiments of the application. Apparently, the described embodiments are only some of the embodiments of the application, not all of them. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the scope of protection of this application.

[0025] refer to figure 1 , which shows the flow of Embodiment 1 of the method for extracting a topic sentence of a web page provided by this application. Such as figure 1 As shown, the method embodiment 1 may specifically include step S101 to step S104.

[0026] Step S101: Obtain a webpage to be determined and a pre-built machine learning model; wherein, the webpage to be determined contains a plurality of pre-selected candidate topic sentences, and each candidate topic sentence c...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

An embodiment of the invention provides a webpage topic sentence extraction method. The method comprises: firstly, obtaining a to-be-determined webpage, wherein the to-be-determined webpage contains a plurality of alternative topic sentences, and each alternative topic sentence contains a plurality of segmented words; secondly, determining a word feature value of each segmented word, inputting the word feature value into a preset machine learning model to obtain a partial order value of the segmented word, and further according to the partial order value of the segmented word, determining a partial order value of each alternative topic sentence; and finally, determining the alternative topic sentence with the partial order value greater than a preset threshold as a target topic sentence. According to the embodiment of the invention, the partial order values of the alternative topic sentences are obtained by utilizing the machine learning model; the machine learning model can reflect a degree of correlation between a query statement and a recalled webpage, so that the determined partial order values are more accurate and the accuracy of selecting the target topic sentence is improved. In addition, the invention furthermore provides a webpage topic sentence extraction apparatus, which is used for ensuring the application and implementation of the method in practice.

Description

technical field [0001] The present application relates to the technical field of Internet applications, and more specifically, to a method and device for extracting topic sentences of webpages. Background technique [0002] With the rapid development of Internet technology, the Internet has become an important channel for people to obtain information. Specifically, the information query user can input a certain query term in the search engine, and the search engine recalls multiple web pages for the user to selectively view. It should be noted that, for the convenience of users to view, the recalled web pages are arranged in the search engine in order according to their relevance to the query statement. [0003] Wherein, the correlation is the similarity between the topic sentence of the recalled web page and the query sentence. For example, the query sentence is "symptoms of hepatitis B", the topic sentence of the recalled webpage 1 is "what are the symptoms of hepatitis ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06F17/30G06N3/08
CPCG06F16/955G06N20/00G06F40/216G06F40/284G06F16/36G06F16/951
Inventor 李晨尧曾洪雷
Owner ALIBABA (CHINA) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products