Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Word frequency sorting and vocabulary analysis method based on NLP technology

A vocabulary and word technology, applied in the field of language learning, can solve problems such as energy-consuming, inability to memorize dictionaries according to pictures, and high-frequency word lists do not have objective indicators, so as to improve learning efficiency, solve dictionary memorization efficiency and target problems.

Pending Publication Date: 2022-06-10
杭州网看科技有限公司
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] (2) When memorizing words, you usually start from a. However, according to the alphabetical order, there will be a lot of "uncommon words" mixed in, which consumes energy
[0008] (2) Memorizing dictionaries should not be "searched by pictures";
[0009] (3) Various independent high-frequency word lists do not have objective indicators, and are word lists isolated from dictionaries
[0010] Therefore, the above traditional product methods do not make full use of technology to digitize and automate the problem of learning and memorizing dictionaries.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0023] The technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the embodiments of the present invention. Obviously, the described embodiments are only some of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0024] A method for word frequency sorting and vocabulary analysis based on NLP technology, comprising:

[0025] 1. Collect corpus data sets;

[0026] 2. Clean the data, format the data, and count the vocabulary in the data. Using the frequency as a parameter, use the sorting algorithm to get the word frequency sorting information. Through the "frequency sorting" function, it can help learners master a large number of vocabulary in a focused and directional way. There is no n...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention belongs to the technical field of language learning, and discloses a word frequency sorting and vocabulary analysis method based on an NLP technology, and the method comprises the steps: collecting a corpus data set; cleaning the data, formatting the data, counting vocabularies in the data, and obtaining word frequency sorting information by taking frequency as a parameter and utilizing a sorting algorithm; performing word clustering analysis by using an NLP technology, analyzing word category attributes, and constructing a filtering tool; and training the data set by using a machine learning method and an NLP technology to obtain a word vector model, and analyzing and calculating associated words and a context vocabulary of all words in real time by using the word vector model. According to the scheme, the NLP technology is utilized, a large-scale data set corpus is trained and extracted through machine learning to generate word frequency sorting, word clustering, context vocabulary and the like of the dictionary, and therefore the innovative function of the dictionary is constructed, learners are helped to improve the learning efficiency, the problems of dictionary memorizing efficiency and target of users are solved, and the users are helped to master a large number of related vocabularies.

Description

technical field [0001] The invention relates to the technical field of language learning, in particular to a method for word frequency sorting and vocabulary analysis based on NLP technology. Background technique [0002] With the development of machine learning and natural language processing technology, some software tools with translation, pronunciation, reading and other functions have appeared, which can effectively help understand foreign language articles and words. However, even in the current era of intelligence, basic "dictionary" tools are still necessary. The use of natural language processing technology can make dictionary products more "intelligent", helping language learners learn and master a large number of key vocabulary efficiently. [0003] Traditional paper dictionaries look up words according to "alphabetical order", but users have two dilemmas: [0004] (1) When looking up words, I don’t know the importance of the words I found, whether it is worth sp...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/31G06F16/35G06F16/36G06F40/242G06F40/284G06N20/00
CPCG06F16/31G06F16/35G06F16/374G06F40/242G06F40/284G06N20/00
Inventor 丁峰
Owner 杭州网看科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products