Check patentability & draft patents in minutes with Patsnap Eureka AI!

A screening method of vocabulary list

A screening method and vocabulary technology, applied in the direction of instruments, semantic analysis, text database query, etc., can solve problems such as insufficient and meaningless TopN

Active Publication Date: 2021-07-23
UNISOUND SHANGHAI INTELLIGENT TECH CO LTD
View PDF11 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

And according to the existing technology of screening vocabulary, but generally to the screening of vocabulary, just give the weight of word according to general statistical information such as word frequency in the corpus, take the word of TopN after sorting and put into vocabulary list
There are the following problems in the process of using its method to screen vocabulary: First, the selection of TopN is meaningless under the statistics of general word frequency: usually the size of N is determined by engineering needs
Through the above statement, it is not enough to use word frequency as the basis for screening vocabulary

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A screening method of vocabulary list
  • A screening method of vocabulary list
  • A screening method of vocabulary list

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0063] The preferred embodiments of the present invention will be described below in conjunction with the accompanying drawings. It should be understood that the preferred embodiments described here are only used to illustrate and explain the present invention, and are not intended to limit the present invention.

[0064] An embodiment of the present invention provides a screening method for a vocabulary, such as figure 1 shown, including:

[0065] Step 1: Determine the frequency of the preset vocabulary in the pre-stored corpus;

[0066] Step 2: Determine the position information of the preset vocabulary in the preset segment in the pre-stored corpus, and obtain the position entropy corresponding to the preset vocabulary according to the position information;

[0067] Step 3: Calculate the weight of the preset vocabulary according to the frequency of occurrence of the determined preset vocabulary in the pre-stored corpus and the obtained position entropy corresponding to the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention provides a vocabulary screening method, comprising: determining the frequency of preset vocabulary in a pre-stored corpus; determining the position information of the preset vocabulary in a preset segment in the pre-stored corpus, and according to The location information obtains the location entropy corresponding to the preset vocabulary; according to the frequency of the determined preset vocabulary appearing in the pre-stored corpus and the obtained location entropy corresponding to the preset vocabulary, calculate the preset Vocabulary weights: According to the calculated weight results of the preset vocabulary, relevant preset vocabulary is screened out from a pre-stored corpus to form a screening vocabulary list. To improve the accuracy and reliability of screening vocabulary.

Description

technical field [0001] The invention relates to the technical field of vocabulary screening, in particular to a vocabulary screening method. Background technique [0002] As we all know, the choice of vocabulary has a huge impact on the performance of deep learning. If the vocabulary is too large, it will affect the performance of the online system on the one hand; on the other hand, it is not realistic, because there will be endless new words in various languages. , Compound words appear, that is, there is no upper limit to the size of the vocabulary; if the vocabulary is too small, it will also cause many unregistered words to appear frequently in practical applications, so it is very important to choose an appropriate vocabulary. [0003] Moreover, according to Zipf's Law, most of the vocabulary is in the long tail of word frequency. And according to the prior art of screening vocabulary, but generally to the screening of vocabulary, just give the weight of word accordin...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/33G06F40/216G06F40/30G06F40/284G06K9/62
CPCG06F16/3344G06F40/216G06F40/284G06F40/30G06F18/214
Inventor 陈峰
Owner UNISOUND SHANGHAI INTELLIGENT TECH CO LTD
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More