New word discovery method and system, electronic equipment and medium

A new word discovery and new word technology, applied in the field of data capabilities, can solve the problems of dependence on existing, low accuracy rate of new word discovery, low logic of new word discovery methods, etc., to improve accuracy, purpose and advantages Concise and easy Understand the effect

Pending Publication Date: 2021-09-28
SHANGHAI MININGLAMP ARTIFICIAL INTELLIGENCE GRP CO LTD
View PDF5 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The embodiment of the present application provides a method, system, electronic device and medium for discovering new words, so as to at least solve the problem of relying on existing thesaurus in the process of discovering new words, low accuracy rate of finding new words and the method of discovering new words through the present invention. low logic problem

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • New word discovery method and system, electronic equipment and medium
  • New word discovery method and system, electronic equipment and medium
  • New word discovery method and system, electronic equipment and medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0065] This embodiment provides a new word discovery method. Please refer to figure 1 , figure 1 is a flow chart of a new word discovery method according to an embodiment of the present application, such as figure 1 As shown, the new word discovery method includes the following steps:

[0066] Candidate word cohesion calculation step S1: after calculating the candidate word frequency and split word frequency, calculate the candidate word cohesion degree according to the candidate word frequency and the split word frequency;

[0067] Candidate word degree of freedom calculation step S2: calculate the information entropy of the left adjacent word and the information entropy of the right adjacent word of the candidate word, and select the information entropy with a small information entropy value from the information entropy of the left adjacent word and the information entropy of the right adjacent word Information entropy is used as the degree of freedom of candidate words; ...

Embodiment 2

[0084] Please refer to figure 2 , figure 2 It is a structural schematic diagram of the new word discovery system of the present invention. like figure 2 Shown, the new word discovery of invention is applicable to above-mentioned new word discovery method, new word discovery system, comprises:

[0085] Candidate word cohesion calculation unit 51: after calculating the candidate word frequency and split word frequency, calculate the candidate word cohesion degree according to the candidate word frequency and the split word frequency;

[0086] Candidate word degree of freedom calculation unit 52: calculate the information entropy of the left adjacent word and the information entropy of the right adjacent word of the candidate word, and select the information entropy value from the information entropy of the left adjacent word and the information entropy of the right adjacent word Information entropy is used as the degree of freedom of candidate words;

[0087] New word jud...

Embodiment 3

[0100] combine image 3 As shown, this embodiment discloses a specific implementation manner of an electronic device. The electronic device may include a processor 81 and a memory 82 storing computer program instructions.

[0101] Specifically, the above-mentioned processor 81 may include a central processing unit (CPU), or a specific integrated circuit (Application Specific Integrated Circuit, ASIC for short), or may be configured as one or more integrated circuits implementing the embodiments of the present application.

[0102] Among others, memory 82 may include mass storage for data or instructions. By way of example and not limitation, the memory 82 may include a Hard Disk Drive (HDD), a floppy disk drive, a Solid State Drive (SSD), a flash memory, an optical disk, a magneto-optical disk, a magnetic tape, or a Universal Serial Bus (Universal SerialBus, abbreviated as USB) drive or a combination of two or more of these. Memory 82 may include removable or non-removable ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a new word discovery method and system, electronic equipment and a medium, and the new word discovery method comprises the following steps: a candidate word cohesion calculation step: calculating a candidate word frequency and a word splitting frequency, and then calculating the candidate word cohesion according to the candidate word frequency and the word splitting frequency; a candidate word freedom degree calculation step: calculating a left adjacent word information entropy and a right adjacent word information entropy of the candidate word, and selecting an information entropy with a small information entropy value from the left adjacent word information entropy and the right adjacent word information entropy as a candidate word freedom degree; and a new word judgment step: calculating vocabulary scores according to the candidate word cohesion degree and the candidate word freedom degree, selecting vocabularies of which the vocabulary scores are greater than a vocabulary score threshold value from the candidate words to obtain words, comparing the words with words in a word bank, and obtaining new words according to a comparison result. According to the invention, the new word discovery accuracy is improved, and the new word discovery process is more logical.

Description

technical field [0001] The present application relates to the technical field of data capabilities, in particular to a new word discovery method, system, electronic equipment and media. Background technique [0002] In the field of Chinese word segmentation, new word discovery is a very important NLP topic. On the one hand, in the context of people’s increasing material and cultural needs, the richness of words is also developing extremely rapidly, and a large number of new words appear every year; How can we identify newly emerging words such as person names, place names, organization names, brand names, professional terms, abbreviations, new words on the Internet, etc.? In the past ten years, the field of Chinese word segmentation has concentrated on overcoming this difficulty. The discovery and recognition of new words has become a key link. The traditional new word discovery method relies on the existing tokenizer to segment the text first, and then guesses that the r...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/33G06F16/31G06F16/36
CPCG06F16/3346G06F16/3344G06F16/313G06F16/36
Inventor 付金伟梁吉光
Owner SHANGHAI MININGLAMP ARTIFICIAL INTELLIGENCE GRP CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products