Unlock instant, AI-driven research and patent intelligence for your innovation.

A new word extraction method and device

An extraction method and an extraction device technology, which are applied in the computer field, can solve problems such as phrases being easily missed, and achieve the effect of avoiding widespread application and improving accuracy

Active Publication Date: 2021-06-22
京华信息科技股份有限公司
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

If the speeches of different leaders are used as the given corpus, the existing new word extraction method will cause the phrases in "green water and green mountains are golden mountains and silver mountains" to be easily missed and will not be extracted.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A new word extraction method and device
  • A new word extraction method and device
  • A new word extraction method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0028] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

[0029] Such as figure 1 As shown, an embodiment of the present invention provides a new word extraction method, including:

[0030] S101: Obtain a given corpus, and perform word segmentation processing on the given corpus to obtain several first phrases; wherein, the given corpus includes several articles.

[0031] S102: Calculating the degrees of solidification and degrees of freedom of each first phrase, and then extracting a number of first phrases whose degree...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method for extracting new words, comprising: obtaining a given corpus, and performing word segmentation processing on the given corpus to obtain several first phrases; wherein, the given corpus includes several articles; calculating the number of each first phrase Degree of solidification and degrees of freedom, and then extract some first phrases whose degrees of solidification and degrees of freedom meet the preset threshold conditions, as the second phrase; calculate the article frequency of each article in the given corpus for each second phrase; wherein, article frequency is the number of times a phrase appears in an article; according to the article influence of each article, each article frequency of each second phrase is weighted to obtain the weighted article frequency of each second phrase; according to the weight of each second phrase The article frequency is sorted, and new words are extracted according to the sorting results. By implementing the embodiments of the present invention, it is possible to avoid words with a low total word frequency from being omitted, and improve the accuracy of new word extraction.

Description

technical field [0001] The invention relates to the field of computer technology, in particular to a method and device for extracting new words. Background technique [0002] In the era of information explosion, various new words emerge in an endless stream. The extraction of new words is of great significance to many Chinese information processing fields such as information retrieval, automatic word segmentation, dictionary compilation, and machine translation. In the prior art, the extraction of new words is mainly through a given corpus such as multiple articles as a given corpus, then word segmentation, and calculation of the total word frequency of words in the given corpus, and finally sorting according to the overall word frequency and Extraction, but using the above method to extract new words will result in some new words with low total word frequency, but widely used and highly popular phrases are ignored and cannot be extracted. For example, if there are 10 artic...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/216G06F40/289
CPCG06F40/216G06F40/289
Inventor 蓝建敏池沐霖
Owner 京华信息科技股份有限公司