Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Iterative extraction of Chinese synonyms based on pattern learning

A technology of pattern learning and synonyms, applied in semantic analysis, unstructured text data retrieval, text database clustering/classification, etc. Effect

Active Publication Date: 2019-03-26
ZHEJIANG UNIV
View PDF9 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In order to solve the problem that synonymous information is difficult to obtain in massive unstructured texts, the present invention proposes a Chinese synonymous iterative extraction method based on pattern learning, which can effectively extract a large number of Chinese synonymous entities with high accuracy

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Iterative extraction of Chinese synonyms based on pattern learning
  • Iterative extraction of Chinese synonyms based on pattern learning
  • Iterative extraction of Chinese synonyms based on pattern learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0080] Below in conjunction with the method of this technology describe in detail the concrete steps that this example implements, as follows:

[0081] (1) if figure 1As shown, a Lucene index is established for the encyclopedia text, and 5000 pairs of synonyms are randomly selected from the seed thesaurus as seeds; the seed word pairs are used to search in the corpus, and the text between each word pair is extracted as a candidate pattern; for the candidate pattern Carry out clustering, each candidate pattern group is represented by its pattern prototype, count the frequency of the candidate pattern group, keep the candidate pattern group whose frequency is greater than 5;

[0082] (2) if figure 1 As shown, match candidate patterns, and extract entity pairs before and after the pattern in each candidate sentence as candidate synonym pairs;

[0083] (3) if figure 1 As shown, use word2vec to calculate the semantic similarity between word pairs as positive and negative example...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an iterative extraction method of Chinese synonyms based on pattern learning: taking unstructured data of encyclopedia entries as a corpus, matching seed synonyms pair obtainedby redirecting with corpus text to obtain text between word pairs as a candidate pattern; Through candidate pattern matching, extracting the entity pairs before and after the pattern as candidate synonym pairs in the text statement. Adopting Word2vec to calculate the semantic similarity between entity pairs and evaluate the similarity of word pairs. Counting The number of seeds supported by candidate patterns, and evulating the quality of the candidate patterns according to the words extracted from the candidate patterns. Then, scoring the candidate synonyms by pattern score, entity confidence and word pair similarity, and the effective synonym entity pairs are selected. Using the extracted high quality synonyms as new seeds and obtaining more Chinese synonym pairs. The method provided bythe invention successfully extracts a large number of Chinese synonym entities with high accuracy from ten million encyclopedic entry texts, and has great application significance for extracting synonym information by using massive unstructured texts.

Description

technical field [0001] The invention relates to a method for iteratively extracting Chinese synonyms based on pattern learning, in particular to an open method for iteratively extracting synonyms. Background technique [0002] Synonyms refer to a group of words or phrases that have the same or almost the same meaning and express the same concept. As a typical semantic relationship, synonymous relationship is conducive to better understanding of rich and varied languages ​​and digging out important information in texts. As a basic resource in the field of information processing, synonymous relations have a wide range of applications in information retrieval, natural language processing, text mining, and knowledge graph construction. With the advent of the information age, the massive growth of data has led to the rapid increase of synonyms, and manual extraction will consume a lot of time and manpower. Therefore, designing and implementing an automatic synonym extraction sy...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/27G06F16/35
CPCG06F40/216G06F40/247G06F40/289G06F40/30
Inventor 鲁伟明俞家乐吴飞庄越挺
Owner ZHEJIANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products