Unlock instant, AI-driven research and patent intelligence for your innovation.

A corpus training method, device, electronic equipment and storage medium

A training method and corpus technology, applied in the creation of semantic tools, digital data processing, natural language data processing, etc., can solve the problem of low accuracy of learning vectors, and achieve the effect of improving accuracy and increasing data volume.

Active Publication Date: 2020-05-05
GUANGZHOU LIZHI NETWORK TECH CO LTD
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] Embodiments of the present invention propose a corpus training method, device, electronic equipment, and storage medium to solve the problem of low accuracy of vectors learned for geographic regions such as cities in the absence of data fitting

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A corpus training method, device, electronic equipment and storage medium
  • A corpus training method, device, electronic equipment and storage medium
  • A corpus training method, device, electronic equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0077] In order to make the above objects, features and advantages of the present invention more comprehensible, the present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0078] refer to figure 1 , which shows a flow chart of the steps of a method for training corpus according to an embodiment of the present invention, which may specifically include the following steps:

[0079] Step 101, obtaining original corpus.

[0080] In the embodiment of the present invention, the original corpus can be obtained by crawling from the Internet by a spider, downloading manually classified dialect statistical information documents, and the like.

[0081] In a specific implementation, the original corpus includes a geographical area, and languages ​​that are applied in the geographical area and have affiliation.

[0082] Wherein, geographical areas can be divided according to administrative areas, such as provi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The embodiment of the invention provides a corpus training method and apparatus, an electronic device, and a storage medium. The method comprises the following steps: obtaining original corpus, wherein the original corpus comprises geographic areas, and languages applied in the geographic areas and having an affiliation relationship; using the languages as nodes, and generating a language tree according to the affiliation relationship; dividing the geographic areas to the nodes in the language tree; and training the geographic areas in the same node as target corpus. In the corpus training method and apparatus provided by the embodiment of the invention, different geographic areas are associated with languages of different levels to train samples, so the data sizes of the samples are improved, and then the accuracy of a vector of the learned geographic area is improved.

Description

technical field [0001] The present invention relates to the technical field of natural language processing, in particular to a corpus training method, device, electronic equipment and storage medium. Background technique [0002] In natural language processing, geographical regions such as cities are one of the commonly used corpus. [0003] For example, in an information recommendation system, the city where the user is located is obtained, and the city is input into a prediction model (such as a neural network) as a user feature to predict the user's interest in certain information. [0004] When processing the corpus of cities, digitize and vectorize the features of the id type, that is, convert a city into a floating-point value as input. [0005] The general way is to regard each city and province as an id type, use an int value to represent it, and perform one-hot mapping to obtain a one-hot vector representation, and then rely on a large amount of data to learn the w...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/289G06F16/36
CPCG06F40/289
Inventor 庄正中
Owner GUANGZHOU LIZHI NETWORK TECH CO LTD