Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method for extracting words containing two Chinese characters based on restriction of semantic word forming

A semantic constraint and semantic technology, applied in complex mathematical operations and other directions, can solve problems such as difficulty in induction, poor effect, and poor generality of rules.

Inactive Publication Date: 2003-10-08
TSINGHUA UNIV
View PDF0 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, it is quite difficult to induce the corresponding rules from linguistics, and the universality of the rules is poor, so the effect of these methods is not very good

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for extracting words containing two Chinese characters based on restriction of semantic word forming
  • Method for extracting words containing two Chinese characters based on restriction of semantic word forming
  • Method for extracting words containing two Chinese characters based on restriction of semantic word forming

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0060] See Figure 1~2 . Taking the word "Russian Army" as the candidate string, the steps are as follows:

[0061] (1) Join the "Russian Army"

[0062] (2) Check "Chinese Character Meaning Information Base":

[0063] The word "Russia" has two meanings: Di02 (Russia) and Eb25 (very short time);

[0064] The word "army" has two meanings: Di09 (army organization) and Di11 (army);

[0065] (3) Checked from the training data

[0066] p 11 =p(Di02|R)=0.99686 p 12 =p(Eb25|R)=0.00314

[0067] p 21 =p(Di09|Army)=0.00485 p 21 =p(Di11|Army)=0.99515

[0068] MI 11 =MI(Di02, Di09)=-0.15850

[0069] MI 12 = MI(Di02, Di11) = 4.31200

[0070] MI 21 =MI(Eb25, Di09)=3.76725

[0071] MI 22 =MI(Eb25, Di11)=-10.74512

[0072] (4) Calculate the possibility of forming words = MI 11 × p 11 × p 21 + MI 12 ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

Two Chinese words extraction based on method semantic structure binding characterizes it balancing the semantic binding strength between words to judge if the being elected alphabetic string can forma word, that is to renew semantic status transfer probability matrix in HMM and output character probability matrix at a status transfer plate with Baum-welch algorithm on the basis of HMM expressing, Chinese vocabulary sementic to get the joint closing states of related sementic probability and semantic sequence expressing semantic binding relationship according to the probability matrix expressing status transfer time places to calculate the word judgement value.

Description

technical field [0001] Chinese two-character word extraction method based on semantic word formation constraints belongs to the technical field of natural language processing Background technique [0002] Language develops with the passage of time. The powerful communication ability of the Internet makes people's vocabulary grow and change more rapidly. Simply using general-purpose dictionaries or professional dictionaries cannot accommodate all the information. In Chinese, there are no explicit separators between words, so how to automatically recognize words has become an important research topic. The Chinese automatic word extraction method uses a computer as a processing tool, and through automatic machine learning, the computer can automatically judge whether a candidate string is a word. [0003] In Chinese, words are made up of characters. This is similar to the situation of phrases in English: phrases are composed of several words, and there are no explicit separat...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/16
Inventor 罗盛芬孙茂松
Owner TSINGHUA UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products