Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Corpus expansion method and related equipment

A corpus and grammatical information technology, applied in the field of corpus expansion methods and related equipment, can solve problems such as low expansion efficiency, and achieve the effects of improving richness, realizing automatic expansion, and improving expansion efficiency.

Active Publication Date: 2019-10-08
SIMPLECREDIT MICRO LENDING CO LTD
View PDF15 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] At present, the way to expand the initial corpus mainly depends on the way of manual expansion. For example, artificial divergent thinking on a certain initial corpus can obtain more than a dozen or more expansion corpora that match the query method of the initial corpus, and the expansion efficiency is low.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Corpus expansion method and related equipment
  • Corpus expansion method and related equipment
  • Corpus expansion method and related equipment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0023] In the embodiment of the present invention, the dynamic word vector of each word in the short text corpus to be expanded, the substantive word information, function word information, and grammatical information corresponding to the short text corpus can be obtained, and based on the dynamic word vector, the and The candidate set of synonyms matching the content word information and the function word information. Since dynamic word vectors can reflect the meanings of words in different contexts, the accuracy of the determined candidate set of synonyms can be improved; further, the short text corpus can be expanded by combining the candidate set of synonyms and / or grammatical information, Determine the target expansion corpus corresponding to the short text corpus. In this way, on the one hand, the short text corpus can be automatically expanded to improve the efficiency of expansion; on the other hand, the short text corpus is expanded in combination with the candidate se...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The embodiment of the invention discloses a corpus expansion method and related equipment. The corpus expansion method is applied to the technical field of data processing, and comprises the steps: obtaining a dynamic word vector of each word in a short text corpus to be expanded, and real word information, virtual word information and grammatical information corresponding to the short text corpus; determining a synonym candidate set matched with the real word information and the virtual word information from a corpus set based on the dynamic word vector; and carrying out expansion processingon the short text corpus according to the synonym candidate set and / or the grammatical information, and determining a target expansion corpus set corresponding to the short text corpus. By adopting the corpus expansion method, the automatic expansion of the short text corpus can be realized, and the expansion efficiency of the short text corpus is improved.

Description

Technical field [0001] The present invention relates to the technical field of data processing, in particular to a method for expanding a corpus and related equipment. Background technique [0002] In the intelligent customer service system, in order to understand the user's business, it is necessary to learn and recognize the question and answer label data of each user through machine learning, but machine learning often requires a certain initial corpus. For business scenarios in various fields, it is often difficult to provide a large number of standardized initial corpus. Therefore, when the initial corpus is insufficient, it is often necessary to expand the initial corpus. [0003] At present, the way to expand the initial corpus mainly relies on the way of manual expansion. For example, manually diverging thinking on a certain initial corpus to get a dozen or more expanded corpora that match the questioning method of the initial corpus. The expansion efficiency is low. . S...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/332G06F16/33G06F17/27G06F16/35
CPCG06F16/3329G06F16/3344G06F16/35G06F40/211
Inventor 张欢韵
Owner SIMPLECREDIT MICRO LENDING CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products