Automatic special subject generating method based on book catalogue

A book catalog and automatic generation technology, applied in the fields of natural language processing and machine learning, can solve the problems of scale, insufficient update cycle of knowledge coverage, inability to manually build a database, and consume a lot of manpower and material resources

Active Publication Date: 2016-08-24
ZHEJIANG UNIV
View PDF4 Cites 17 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Although they are of high quality, they are obviously insufficient in terms of scale, knowledge coverage, update cycle, etc. In addition, ma

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Automatic special subject generating method based on book catalogue
  • Automatic special subject generating method based on book catalogue
  • Automatic special subject generating method based on book catalogue

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0114] The specific steps that this example implements in detail below in conjunction with the method of the present invention:

[0115] 1). From the more than 2.5 million electronic books scanned by the CADAL digital library, 114,768 books in 11 categories, with a total of 5,719,462 catalog chapters, were selected for experiments. There are 11 categories including agricultural science, industrial technology, transportation, aerospace, environmental science and safety science, comprehensive books, and astronomy and earth.

[0116] 2). Here we take the catalog of two books as an example to illustrate figure 1 all the processes. Some chapters of the catalog of Book 1 and Book 2 are attached image 3 And attached Figure 4 shown. First, use regular expressions to filter out the serial numbers in the directory, and select the 2000 words with the highest frequency from the filtered directory statistics, and select meaningless words such as "answer", "overview", "introduction", ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an automatic special subject generating method based on a book catalogue. For each book, each chapter in the catalogue is regarded as a word, features of the words are extracted, a classifier is trained out to recognize entities in the book catalogue, and a chapter pair meeting the hyponymy is extracted out for each pair of superior and subordinate chapter words belonging to the entities in the book catalogue. The concept hierarchical structure of each word is constructed according to the hyponymy, and the same or similar concept hierarchies in all the books are fused; for each concept word in the concept hierarchies, the content of the concept word in a webpage and a book is searched to serve as the description content of the word; finally, the concept hierarchies and the concept word contents are organized into the form of a special subject. By means of the automatic special subject generating method, the structured information of the book catalogue and a machine learning related algorithm are utilized, extraction and reorganization of knowledge are achieved, reference can be provided for compiling the special subject, the manpower cost of related work can be greatly reduced, and high practicability is achieved.

Description

technical field [0001] The invention relates to the fields of natural language processing and machine learning, in particular to a method for automatically generating topics based on book catalogs. Background technique [0002] With the rapid development of computer science and technology, network data is also growing explosively. These network data have the characteristics of wide sources, no structure, no hierarchy, complex components, and much noise. How to extract knowledge from it and organize and apply it in a certain way has become a hot topic in the fields of natural language processing, machine learning and information retrieval. Knowledge bases provide a feasible solution to this problem, however, the construction of large-scale knowledge bases is still a challenging task. WordNet, EurowordNet, and Cyc are all knowledge bases manually compiled by domain experts. Although they are of high quality, they are obviously insufficient in terms of scale, knowledge covera...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06K9/62
CPCG06F16/38G06F18/22
Inventor 鲁伟明李彬庄越挺吴飞魏宝刚
Owner ZHEJIANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products