Method and system for expanding corpus regularities of sample corpora

A corpus and sample technology, applied in the field of semantic analysis, can solve problems such as poor generalization ability

Pending Publication Date: 2020-10-30
GUANGDONG XIAOTIANCAI TECH CO LTD
View PDF0 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the writing of the regular corpus is poor in generalization through the analysis of the user corpus sentence pattern and the expansion of the lexicon.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for expanding corpus regularities of sample corpora
  • Method and system for expanding corpus regularities of sample corpora
  • Method and system for expanding corpus regularities of sample corpora

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0068] In the following description, specific details such as specific system structures and technologies are presented for the purpose of illustration rather than limitation, so as to thoroughly understand the embodiments of the present application. However, it will be apparent to those skilled in the art that the present application may be practiced in other embodiments without these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

[0069] It should be understood that when used in this specification and the appended claims, the term "comprising" indicates the presence of described features, integers, steps, operations, elements and / or components, but does not exclude one or more other Presence or addition of characteristics, wholes, steps, operations, elements, components and / or collections.

[0070] In order ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a method and system for expanding corpus regularities of sample corpuses. The method comprises the steps: establishing a knowledge graph according to entity content; obtaining asample corpus, and generating a corpus regular expression according to the sample corpus; performing word segmentation on the sample corpus to obtain corpus segmented words; comparing the corpus segmented words with the entity content of the knowledge graph, and defining the corresponding corpus segmented words as entity segmented words if the corpus segmented words accord with the entity contentof the knowledge graph; obtaining an entity concept of an upper level corresponding to the entity segmented words according to the knowledge graph; and adjusting the corpus regular expression according to the entity concept to obtain a regular expression. According to the method, the entity concept of the previous level corresponding to the entity segmented words in the sample corpus is obtainedthrough the knowledge graph, and the corpus regularization obtained according to the sample corpus is expanded according to the entity concept, so that the generalization ability of semantic regularization is improved.

Description

technical field [0001] The invention relates to the technical field of semantic analysis, in particular to a method and system for expanding the corpus regular expressions of sample corpus. Background technique [0002] With the rapid development of the network, it is becoming more and more common to process information intelligently through computers. Computers, smart devices, etc. may need to process thousands of information every day. Smart devices generally analyze the corpus to obtain the corresponding regular expressions of the corpus to parse the corpus. However, the writing of regular corpus is poor in generalization ability through the analysis of user corpus sentence patterns and the expansion of thesaurus. Therefore, there is a need for a method and system for expanding the corpus regular expressions of the sample corpus to improve the generalization ability of the corpus regular expressions. Contents of the invention [0003] The purpose of the present inven...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F40/30G06F40/289G06F16/36G06F16/33
CPCG06F16/367G06F16/334
Inventor 李选洪
Owner GUANGDONG XIAOTIANCAI TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products