Online automatic neologism excavating method and electronic device

A technology of automatic mining and new words, applied in the field of information processing, can solve problems such as not being able to meet input requirements in a timely manner

Active Publication Date: 2014-06-18
BAIDU INT TECH (SHENZHEN) CO LTD
View PDF6 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Manual collection of pseudonym-Chinese entries of these new words (for example: from Blog, Twitter, Facebook, papers,

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Online automatic neologism excavating method and electronic device
  • Online automatic neologism excavating method and electronic device
  • Online automatic neologism excavating method and electronic device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0059] The present invention will be described in detail below in conjunction with the accompanying drawings and embodiments.

[0060] First of all, the new words referred to in the method and device for automatically mining new words online of the present invention are described. The new words excavated by the present invention include: "Chinese term-English explanation", or "Japanese Chinese character string-Japanese kana pronunciation" entry. E.g:

[0061] 1) Montblanc (montblanc);

[0062] 2) Anna (あんな).

[0063] For ease of description, the new words in the full text are only the entries of "Japanese Chinese character string-Japanese kana pronunciation", but it should not be considered that the new words excavated by the present invention are limited to the "Japanese Chinese character string-Japanese kana pronunciation" entry.

[0064] see figure 1 , figure 1 It is a flow chart of an embodiment of the method for automatically mining new words online in the present in...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an online automatic neologism excavating method and an electronic device. The online automatic neologism excavating method comprises obtaining a plurality of candidate items in a corpus; obtaining a first candidate item set through a similarity algorithm; obtaining a probability of a first character string which is corresponding to the candidate items in the first candidate item set according to established word aligning models of first linguistic form and second linguistic form assemblies; judging whether weighing scores in the candidate items of the first candidate item set achieve a second threshold value or not and enabling the candidate items in the first candidate item set to be excavated neologisms if the weighing scores achieve the second threshold value. According to the online automatic neologism excavating method, neologisms in a large amount can be excavated rapidly and accurately through a computer, manual collecting is replaced, and user gradually increased requirements for input are satisfied.

Description

technical field [0001] The invention relates to the technical field of information processing, in particular to an online automatic new word mining method and an electronic device. Background technique [0002] In the Japanese input method, constructing a Chinese character sequence that meets the user's expectations based on the kana sequence input by the user, and conversely marking the kana pronunciation according to the Chinese character sequence, requires a large-scale "kana-kanji" entry. [0003] In the information age, new words are constantly being born on the Internet every day, such as: organization name, company name, person's name, technical term and so on. Manually collecting the pseudonym-Chinese character entries of these new words (for example: from Blog, Twitter, Facebook, papers, patents, etc.) has been unable to meet the increasing input needs of hundreds of millions of users in a timely manner. Contents of the invention [0004] The technical problem ma...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/374
Inventor 吴先超
Owner BAIDU INT TECH (SHENZHEN) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products