Method for automatically obtaining short text of knowledge domain from community question-and-answer website

A knowledge field and community question-and-answer technology, which is applied in the field of automatically obtaining short texts in the knowledge field from community question-and-answer websites, can solve problems such as unfavorable use and learning of learners, inability to fully cover resources, and incomplete resources, so as to facilitate learning and use. Effect

Active Publication Date: 2016-07-13
XI AN JIAOTONG UNIV
View PDF6 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Therefore, the resources crawled by the above patents based on the URL may not be complete and cannot

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for automatically obtaining short text of knowledge domain from community question-and-answer website
  • Method for automatically obtaining short text of knowledge domain from community question-and-answer website
  • Method for automatically obtaining short text of knowledge domain from community question-and-answer website

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0056] The present invention will be further described in detail below in conjunction with specific embodiments, which are for explanation rather than limitation of the present invention.

[0057] The present invention is a method for automatically acquiring short texts in the knowledge field from a community Q&A website, which realizes automatic collection and sorting of short texts in the knowledge field of the community Q&A website. It includes the following steps.

[0058] (1) Crawling the web pages of the knowledge domain in the community question and answer website: crawl the dynamic web pages of the community question and answer website and ensure the integrity of the data. Taking the Quora website as an example, web pages containing knowledge domain knowledge include topic pages, question pages, and author pages, which are crawled according to the depth-first traversal algorithm. First, crawl the topic page according to the Quora topic page address, obtain the hyperlinks t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a method for automatically obtaining short text of the knowledge domain from a community question-and-answer website.Question-and-answer web pages and author web pages of each subject of the domain corresponding to the knowledge domain can be crawled from the community question-and-answer website, a system with comprehensive data is obtained, and learning and using of a user are convenient.The method comprises the following steps that 1, a Web page of the knowledge domain in the community question-and-answer website is crawled; 2, short text, with concentrated web page data, of the knowledge domain is extracted; 3, a domain subject tree is constructed; 4, storing of the domain subject tree is conducted.By means of the method, the short text of the knowledge domain can be automatically extracted from semi-structured data of the community question-and-answer website, the question-and-answer web pages and the author web pages of each subject of the domain corresponding to the knowledge domain are crawled from the community question-and-answer website, a web page data set of the knowledge domain is constructed, the short text of the knowledge domain is automatically extracted from the web page data set, and parent child relationships are found, so that the domain subject tree is constructed, storing of the domain subject tree is achieved, and learning and using of the user are convenient.

Description

Technical field [0001] The invention relates to a method for acquiring website information, in particular to a method for automatically acquiring short texts in a knowledge field from a community question and answer website. Background technique [0002] Open knowledge sources represented by community Q&A websites have become an important source of knowledge for people. These knowledge sources have an open and collaborative knowledge sharing mechanism, which can effectively promote the dissemination and application of knowledge, but at the same time it also exacerbates the fragmentation of knowledge. The accumulated fragmented knowledge is scattered in different corners and exists in the form of short texts. repeat. Take the community question and answer website Quora as an example. Quora is a community question and answer website (English website) with short texts in a rapidly growing field of knowledge. The questions on the Quora website are mainly organized in the form of top...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/951G06F16/955
Inventor 魏笔凡郑元浩刘均郑庆华吴蓓闫彩霞郭朝彤张玲玲
Owner XI AN JIAOTONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products