Compound word processing method and device used for semantic mining and equipment thereof

A technology of semantic mining and processing methods, applied in the field of information processing, can solve the problems of affecting the effect of the bag of words model, affecting the memory performance, and high cost of corpus training

Active Publication Date: 2018-04-10
BEIJING BAIDU NETCOM SCI & TECH CO LTD
View PDF10 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] For this reason, the first purpose of the present invention is to propose a compound word processing method for semantic mining, which is used to solve the high cost of corpus training in the prior art, and in order to improve the effect of the bag-of-words model, more binary words need to be introduced. Meta-bundled words, which affect memory performance; or simply embed the compound words obtained by bundling two words into the word vector space as new semantic features, and input them to the bag-of-words model, which affects the effect of the bag-of-words model

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Compound word processing method and device used for semantic mining and equipment thereof
  • Compound word processing method and device used for semantic mining and equipment thereof
  • Compound word processing method and device used for semantic mining and equipment thereof

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0029] Embodiments of the present invention are described in detail below, examples of which are shown in the drawings, wherein the same or similar reference numerals designate the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary and are intended to explain the present invention and should not be construed as limiting the present invention.

[0030] The compound word processing method, device and equipment for semantic mining according to the embodiments of the present invention will be described below with reference to the accompanying drawings.

[0031]The embodiment of the present invention provides a compound word processing method for semantic mining, which can expand the statistical method of Bigram features to any N adjacent words and combine them into Ngram phrases. These newly generated words are out of order to achieve statistical word frequency and other pa...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention puts forward a compound word processing method and device used for semantic mining and equipment thereof. The method comprises the following steps: determining M segmented words of eachstatement in a training corpus; selecting N segmented words to generate N-dimension compound words according to the order of appearance of the M segmented words, wherein M is larger than or equal to 2and N is larger than or equal to 2 but smaller than or equal to M; putting character strings of N-dimension compound words into K-time hash operations, searching a pre-established random hash dictionary space to obtain positions only corresponding to hash operations each time and generating K-dimension word vectors of the N-dimension compound words according to floating point numbers of K positions corresponding to the K-time hash operation results, wherein K is an integer larger than 1; and screening out N-dimension target compound words meeting the pre-set condition according to K-dimensionword vectors of all the N-dimension compound words and inputting the N-dimension target compound words into a word bag model for semantic mining. Therefore, semantic features with more and larger granularities are introduced into the word bag model so that the effect of the word bag model is further improved.

Description

technical field [0001] The invention relates to the technical field of information processing, in particular to a compound word processing method, device and equipment for semantic mining. Background technique [0002] Artificial Intelligence (Artificial Intelligence), the English abbreviation is AI. It is a new technical science that studies and develops theories, methods, technologies and application systems for simulating, extending and expanding human intelligence. Artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that responds in a manner similar to human intelligence. Research in this field includes robotics, speech recognition, image recognition, natural language processing and expert systems, etc. [0003] At present, in the text semantic relevance matching task, the common bag of words (Bag of Words) model has a wide range of applications. In the related technology...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27
CPCG06F40/30G06F40/289G06F40/284
Inventor 陈徐屹冯仕堃朱志凡何径舟朱丹翔曹宇慧
Owner BEIJING BAIDU NETCOM SCI & TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products