Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Digital publication extraction method based on word frequency

A technology for digital publications and high-frequency vocabulary, which is applied in electrical digital data processing, special data processing applications, natural language data processing, etc. , The effect of small workload and high accuracy

Active Publication Date: 2018-02-09
BEIJING TECHNOLOGY AND BUSINESS UNIVERSITY
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Some words may appear thousands of times in a book, and some words may only appear once, but recording the page number of each word is not only cumbersome, but also has no rules to follow
At the same time, a document only has page number information after typesetting, so it is necessary to re-determine the page number of each vocabulary when revising or changing the file format, and the workload is huge

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Digital publication extraction method based on word frequency

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0026] The vocabulary extraction service for digital publications is implemented based on the following language rules:

[0027] Research by linguists has found that although there are many words in a language, commonly used words account for the vast majority in people's actual lives and books. The higher the language level of a reader, the more difficult or low-frequency words he recognizes. If the vocabulary of a language is segmented by word frequency, readers with a high language level have fewer new words and tend to have low-frequency word segments, while readers with a lower language level have more new words and a larger span of word frequency segments. Therefore, according to linguistic research results, different vocabulary extraction strategies can be used to provide vocabulary lists of different frequency bands to meet the needs of readers of different language levels. In this way, the following problems in people's language learning and reading can be solved at ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a digital publication extraction method based on word frequency, which reduces a calculated amount consumed for extracting words in a digital publication per chapter and page number, is convenient to quickly and accurately position the page number after the words are rearranged, and thus improves publishing work efficiency. The method comprises the steps of judging the language of a digital publication to be published and a reading level thereof, and calling a corresponding high frequency dictionary, a corresponding middle frequency dictionary and a corresponding low frequency dictionary according to a judgment result; converting an original document of the digital publication into an XML format document, acquiring a page number mark by arranging the XML format document, and forming the XML format document with chapter, paragraph and page number marks and storing the same as a copy of the document; extracting the words of the copy of the document to generate thehigh frequency dictionary, the middle frequency dictionary, the low frequency dictionary and the new word list; after the words are extracted, waiting for publishing the digital publication.

Description

technical field [0001] The invention relates to a digital publication vocabulary extraction method based on word frequency, which belongs to the technical field of digital publication. Background technique [0002] Human beings have entered the era of digital publishing. The digital characteristics of digital publishing platforms and electronic books make it possible to use various modern technologies to provide readers with new and effective services including entertainment and auxiliary learning functions. The citation of achievements in the field of multimedia technology and artificial intelligence research, such as natural language processing, not only brings about changes in information carriers, sources of reading materials, and reading methods for digital publishing, but also changes people's learning methods and learning effects during the reading process. Lots of new changes. [0003] Vocabulary is the foundation of language learning and reading. In advanced readin...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/27G06F17/22G06F17/30
CPCG06F16/3346G06F40/151G06F40/284
Inventor 孙继兰
Owner BEIJING TECHNOLOGY AND BUSINESS UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products