A method and system for batch corpus generation

A corpus and batch technology, applied in the direction of semantic tool creation, unstructured text data retrieval, etc., can solve problems such as low efficiency, inability to cover corpus, and insufficient corpus data, so as to save manpower and time, and complete scene reusability , The effect of rich corpus data

Active Publication Date: 2019-02-26
杭州快小智科技有限公司
View PDF5 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] (1) The stages of data collection, statistics and labeling all take considerable manpower and time, and the efficiency is low;
[0006] (2) The corpus data obtained from the above data source channels are not rich and complete, and cannot cover all possible corpus;
[0007] (3) For different application scenarios, data collection, statistics and labeling stages need to be carried out separately, and the reusability of scenarios is poor

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method and system for batch corpus generation
  • A method and system for batch corpus generation
  • A method and system for batch corpus generation

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment

[0062] The first context: When will buy one get one free start? !

[0063] Second context: When did you start buying one and getting one free?

[0064] Third Context: When to Buy One Get One Free

[0065] The above-mentioned first context, second context, and third context have different sentence patterns, different punctuation (maybe even no punctuation), different number of words, and different order of words. However, the purpose of the language is the same, and they are all questions. The specific time of the activity of "buy one get one free", after the epistatic generalization of the language purposes in these three contexts, this situation is named "asking the time to do something", and the reason for naming it " Ask about the time to do something" instead of naming it "Ask about the time to buy one get one free", because of the reusability of the scene, the event of "buy one get one free" is further elevated to "do something ". In this example, the context "asking ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

A method and system for batch generation of corpus, the method includes the following steps: S1, setting a scene of corpus application; S2, setting an intention database for the scene, wherein the intention database contains at least one intention corresponding to the scene; S3, setting a situation database and a sentence pattern database, wherein the situation database comprises at least one situation, and the sentence pattern database comprises at least one sentence pattern, and at least one sentence pattern corresponding to each situation is respectively set; S4: selecting at least one situation corresponding to each intention; S5: generating a plurality of phrases required for a complete sentence pattern for each sentence pattern setting in each situation according to the scene, intention and situation to which the sentence pattern belongs; S6, applying a plurality of phrases to corresponding sentence patterns to obtain a primary selected corpus; S7: Screening the primary corpus and selecting the high-quality corpus. The corpus generated by the invention is rich in data, complete and has good scene reusability, can save a lot of manpower and time, and has strong practicability.

Description

technical field [0001] The present invention relates to the field of natural language generation, in particular to a method and system for generating corpus in batches. Background technique [0002] In recent years, with the development of the information age and the Internet age, the number of users and the penetration rate of e-commerce platforms have grown rapidly. In e-commerce platforms, artificial customer service is generally set up on the enterprise side to deal with user inquiries and after-sales services. Tracking and other requirements. Due to the rapid increase in the number of e-commerce platform users, the user needs to be dealt with have also increased rapidly. At the same time, the cost of traditional manual customer service services has continued to rise, making it difficult for manual customer service to cope with the huge service demand. Therefore, in order to meet the increasing service needs of users and improve user experience, it has become the active...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/36
Inventor 胡云华郑俊成莫瑜孔委高鹏
Owner 杭州快小智科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products