Word relationship mining method and device

A relationship mining and relationship technology, applied in the Internet and computer fields, can solve the problems of multi-error relationships and low correct rate, etc., to achieve the effect of improving correlation, improving correct rate, and improving user experience

Active Publication Date: 2011-07-20
SHENZHEN SHI JI GUANG SU INFORMATION TECH
View PDF0 Cites 21 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The currently provided word relationship mining method based on co-occurrence only uses the statistical value of mutual information to mine word relationship, there are many wrong relationships, and the correct rate is not very high

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Word relationship mining method and device
  • Word relationship mining method and device
  • Word relationship mining method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0062] In order to improve the correctness of the mined word relationship and improve user experience, the embodiment of the present invention provides a word relationship mining method, see figure 1 , the content of the method is as follows:

[0063] 101: Obtain the candidate relationship between two entries, the frequency of the candidate relationship, and the word frequency of the entry;

[0064] 102: Obtain the statistical value of mutual information and the statistical value of log likelihood ratio according to the candidate relationship, frequency and word frequency;

[0065] 103: Obtain a normalized value of credibility according to the statistical value of the mutual information and the statistical value of the log likelihood ratio;

[0066] 104: Sorting according to the normalized value of the credibility, and outputting candidate relationships that meet the preset threshold as word relationships.

[0067] Among them, obtaining the candidate relationship between two...

Embodiment 2

[0090] In order to improve the correct rate of mined word relations and improve user experience, the embodiment of the present invention provides a word relation mining method, see figure 2 , the content of the method is as follows:

[0091] 201: The computer prepares the original corpus data;

[0092] Wherein, in this embodiment, the corpus is composed of question-and-answer documents.

[0093] 202: Obtain the title and the first best answer from the original corpus data prepared in step 201;

[0094] Among them, taking each question and answer document in the original corpus data as a unit, since the title and answer in each question and answer document are marked by a specific delimiter, you can enter a specific delimiter to obtain the title and answer, During processing, identify each question and answer document one by one until all question and answer documents are identified. In the embodiment of the present invention, the title delimiter refers to the title startin...

Embodiment 3

[0142] In order to improve the correct rate of mined word relations and improve user experience, the embodiment of the present invention provides a word relation mining method, see image 3 , the specific method is as follows:

[0143] 301: The computer prepares the original corpus data;

[0144] Wherein, in this embodiment, its corpus is composed of common documents.

[0145] 302: Take each sentence as a unit, perform word segmentation processing, and obtain a set of lemmas;

[0146] Among them, in order to have correlation between the excavated words, the word segmentation processing is usually performed on each sentence in each document in the corpus, and the word segmentation processing is performed to obtain a set of entries composed of the sentence, for example The sentence is "What's interesting in Beijing, please help, thank you?", through the word segmentation processing of the word segmentation system, it is obtained from the entries of "Beijing, there, what, fun, ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a word relationship mining method and a word relationship mining device, which belong to the fields of computers and the Internet. The method comprises the following steps of: acquiring candidate relationships between two entries, the frequency of the candidate relationships and the word frequency of the entries; acquiring the statistic of mutual information and the statistic of a log likelihood ratio according to the candidate relationships, the frequency and the word frequency; acquiring a credibility normalization value according to the statistic of the mutual information and the statistic of the log likelihood ratio; and performing sequencing according to the credibility normalization value, and outputting the candidate relationships meeting a preset threshold as word relationships. The device comprises a first acquisition module, a second acquisition module, a third acquisition module and an output module. By the scheme provided by the embodiment of the invention, the accuracy of the mined word relationships is improved, and the user experiences are improved.

Description

technical field [0001] The invention relates to the fields of computers and the Internet, in particular to a word relationship mining method and device. Background technique [0002] Word relationship is a relatively important type of knowledge, which can be expressed in many forms, such as hyponymy relationship, part-whole relationship, geographical location relationship, company acquisition relationship, job change relationship, etc. The two most commonly used word relationship mining methods are pattern-based and co-occurrence-based methods. Among them, pattern-based word relationship mining methods are usually based on large-scale corpus and certain types of representations; co-occurrence-based Word relationship mining methods usually calculate the statistical features of two words co-occurring in a sentence or document, which indicates that there is a certain relationship between the two words. [0003] In the prior art, a word relationship mining method based on co-oc...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/27
Inventor 田国刚贾自艳
Owner SHENZHEN SHI JI GUANG SU INFORMATION TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products