A Method of Automatically Generating Topics Based on Book Catalog

A book catalog and automatic generation technology, applied in the fields of natural language processing and machine learning, can solve the problems of inability to manually build a database, consume a lot of manpower and material resources, scale, and insufficient update cycle of knowledge coverage

Active Publication Date: 2019-02-12
ZHEJIANG UNIV
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Although they are of high quality, they are obviously insufficient in terms of scale, knowledge coverage, update cycle, etc. In addition, manual writing requires a lot of manpower and material resources
In the context of big data, manual database construction is even more powerless

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Method of Automatically Generating Topics Based on Book Catalog
  • A Method of Automatically Generating Topics Based on Book Catalog
  • A Method of Automatically Generating Topics Based on Book Catalog

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0114] The specific steps that this example implements in detail below in conjunction with the method of the present invention:

[0115] 1). From the more than 2.5 million electronic books scanned by the CADAL digital library, 114,768 books in 11 categories, with a total of 5,719,462 catalog chapters, were selected for experiments. There are 11 categories including agricultural science, industrial technology, transportation, aerospace, environmental science and safety science, comprehensive books, and astronomy and earth.

[0116] 2). Here we take the catalog of two books as an example to illustrate figure 1 all the processes. Some chapters of the catalog of Book 1 and Book 2 are attached image 3 And attached Figure 4 shown. First, use regular expressions to filter out the serial numbers in the directory, and select the 2000 words with the highest frequency from the filtered directory statistics, and select meaningless words such as "answer", "overview", "introduction", ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for automatically generating special topics based on book catalogues. For each book, each chapter in the catalog is regarded as a word, the features of the words are extracted, and a classifier is trained to identify the entities in the catalog. For each pair of upper and lower chapter words belonging to entities in the catalog Extract the chapter pairs that meet the hypernymy relationship. Construct the conceptual hierarchy of each word according to the hyponymy relationship, and fuse the same or similar conceptual hierarchies in all books. For each concept word in the concept hierarchy, its content in web pages and books is retrieved as the description content of the word. Finally, organize the concept hierarchy and the content of concept words into thematic form. The invention realizes the extraction and reorganization of knowledge by using the structured information of the book catalog and the related algorithms of machine learning, which can be used for reference when writing special topics, can greatly reduce the labor cost of related work, and has high practicability.

Description

technical field [0001] The invention relates to the fields of natural language processing and machine learning, in particular to a method for automatically generating topics based on book catalogs. Background technique [0002] With the rapid development of computer science and technology, network data is also growing explosively. These network data have the characteristics of wide sources, no structure, no hierarchy, complex components, and much noise. How to extract knowledge from it and organize and apply it in a certain way has become a hot topic in the fields of natural language processing, machine learning and information retrieval. Knowledge bases provide a feasible solution to this problem, however, the construction of large-scale knowledge bases is still a challenging task. WordNet, EurowordNet, and Cyc are all knowledge bases manually compiled by domain experts. Although they are of high quality, they are obviously insufficient in terms of scale, knowledge covera...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/38G06K9/62
CPCG06F16/38G06F18/22
Inventor 鲁伟明李彬庄越挺吴飞魏宝刚
Owner ZHEJIANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products