Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method and device for discovering new words

A new word discovery and new word technology, applied in the computer field, can solve problems such as new word omission, and achieve the effect of reducing dependence

Active Publication Date: 2020-06-05
BEIJING GRIDSUM TECH CO LTD
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

For mixed corpora in different fields, new words with low frequency are easily filtered out, resulting in the omission of new words

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for discovering new words
  • Method and device for discovering new words
  • Method and device for discovering new words

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0024] Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided for more thorough understanding of the present disclosure and to fully convey the scope of the present disclosure to those skilled in the art.

[0025] In order to solve the problem that low-frequency new words are easily missed in the existing methods for finding new words, the embodiment of the present invention provides a method for finding new words, such as figure 1 As shown, the method includes:

[0026] 101. Acquire a candidate new word and a substring of the candidate new word.

[0027] A character string whose occurrence frequency satisfies a preset frequency thre...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a new word discovering method and device, relates to the technical field of a computer and aims at solving the problem that low-frequency new words are liable to omit in an existing new word discovering method. The method provided by the invention comprises the steps of obtaining candidate new words and substrings of the candidate new words, wherein the candidate new words are strings which satisfy a preset frequency threshold and appear in corpuses for discovering the new words; calculating intra-word statistics information values of the candidate new words according to relationships of statistics information of left and right affixes of the candidate new words and the statistics information of the left and right affixes of the substrings; calculating inter-word statistics information values of the candidate new words according to the statistics information of the left and right affixes of the candidate new words and appearing frequencies of the candidate new words in the corpuses; and carrying out calculation to obtain word forming scores of the candidate new words according to the intra-word statistics information values and the inter-word statistics information values; and determining whether the candidate new words are the new words or not according to the word forming scores. The method and the device are applicable to a text analysis or information mining process.

Description

technical field [0001] The invention relates to the field of computer technology, in particular to a method and device for discovering new words. Background technique [0002] With the convenience of information dissemination methods, the speed of new information on the Internet is getting faster and faster, and the amount of information is constantly expanding. Various new Internet words, buzzwords, and industry new words are emerging one after another. How to quickly and effectively identify these new words in the fields of text processing and information mining has become a major difficulty. Usually, dictionaries or thesaurus are used for word recognition in text processing and information mining. Therefore, for the recognition of new words, a relatively complete new thesaurus can also be established for research analysts to use as a reference for new word recognition. [0003] The usual establishment of a new thesaurus uses traditional statistical methods to discover ne...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/289G06F40/216G06F16/9535
CPCG06F16/951G06F40/216G06F40/289
Inventor 史立华
Owner BEIJING GRIDSUM TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products