Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method for automatically establishing back-of-book indexes of book based on book contents

An automatic construction and book book technology, applied in natural language data processing, unstructured text data retrieval, special data processing applications, etc. uniform effect

Active Publication Date: 2017-05-10
ZHEJIANG UNIV
View PDF2 Cites 24 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, in our country, modern editors and publishers do not pay much attention to the back-of-book index, and often even delete the existing back-of-book index in translated books. These short-sighted behaviors that seem to save efficiency have many reasons, but one of them cannot be ignored. Insufficient understanding of the information retrieval function of the index at the back of the book

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for automatically establishing back-of-book indexes of book based on book contents
  • Method for automatically establishing back-of-book indexes of book based on book contents
  • Method for automatically establishing back-of-book indexes of book based on book contents

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0260] as attached figure 1 As shown, the general flow of the implementation of the book index word generation method is given. Taking the generation of the index word list of the book "Practical Power Supply Technical Handbook: UPS Power Supply Volume" as an example, the specific steps of this example implementation will be described in detail below in conjunction with the method of the present invention and the accompanying drawings, as follows:

[0261] (1) As attached figure 2As shown, the full-text text of the book "Practical Power Supply Technology Manual: UPS Power Supply Volume" is obtained through part-of-speech matching to obtain candidate phrases;

[0262] (2) As attached figure 2 As shown, according to the feature value of the candidate phrase, the candidate index words are obtained through the support vector machine algorithm;

[0263] (3) As attached image 3 As shown, according to the candidate index words and the title of the book "Practical Power Supply ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method for automatically establishing back-of-book indexes of a book based on book contents. The method comprises the following steps: first, analyzing text in a digital book; taking a chapter as a unit, and performing part-of-speech tagging on the text by using a natural language processing tool to obtain a part-of-speech array; matching by utilizing a high-frequency part-of-speech rule, and extracting candidate phrases; then, classifying to obtain the phrases to serve as candidate index terms by using a support vector machine algorithm by utilizing semantic and grammatical characteristics; calculating the similarity between the candidate index terms and the field corresponding to the book to serve as termhood; calculating information amounts, term frequency, point mutual information and encyclopedia key values to obtain an index degree; calculating a title offset distance, a candidate index term proportion and interestingness to obtain a context weight value; finally, combining the termhood, the indexing degree and the context weight value to obtain an index score, and obtaining the book index terms through limited sequencing. According to the method, the indexes can be added to the book which does not have the back-of-book terms, and the readability and the searchability of the book are improved.

Description

technical field [0001] The invention relates to the generation of post-book index words by using methods such as computer artificial intelligence and data mining, and in particular relates to a method for automatically constructing book post-book indexes based on book content. Background technique [0002] The index at the back of the book generally refers to the appendix at the back of the book. It uses certain words in the book as clues to point out where in the book the content related to the description objects of these words is located. Generally, great works, especially academic works, have extensive and profound content, and an index is often attached after the main text for readers to search. [0003] Book indexes have been widely valued in the West for a long time, and Western readers are accustomed to using book indexes to retrieve the content they need. Some countries have made clear regulations on the index at the back of the book. If an academic work is not in...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/27
CPCG06F16/24575G06F16/35G06F16/90344G06F40/289G06F40/30
Inventor 鲁伟明杨德志庄越挺
Owner ZHEJIANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products