Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method for generating text based on pre-trained structured data

A structured data and pre-training technology, applied in neural learning methods, electrical digital data processing, special data processing applications, etc., can solve problems such as low accuracy of text generation and failure to consider the inherent implicit relationship of data

Active Publication Date: 2019-12-24
HARBIN INST OF TECH
View PDF9 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] The purpose of the present invention is to solve the problem that the existing model does not consider the inherent implicit relationship between data when modeling tabular data, resulting in low accuracy of text generation, and proposes a A Method for Generating Text Based on Pretrained Structured Data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for generating text based on pre-trained structured data
  • Method for generating text based on pre-trained structured data
  • Method for generating text based on pre-trained structured data

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment approach 1

[0053] Specific implementation mode one: combine figure 1 Describe this embodiment, the specific process of a method for generating text based on pre-trained structured data in this embodiment is:

[0054] The specific implementation is carried out on the NBA game data set rotowire. The NBA game Rotowire data set was proposed by the Harvard University Natural Language Processing Research Group in the "Challenges in Data-to-document Generation" paper work at the 2017 EMNLP conference. The data set consists of 4853 NBA games, and each game corresponds to a news report published by a reporter.

[0055] Construct the digital modeling pre-training target by artificially writing rules: due to the logical relationship of addition, subtraction, multiplication and division between data in the table information, that is, the total score of the team is composed of the scores of all the players of the team, or the total score of the players is composed of The team's four quarters are com...

specific Embodiment approach 2

[0115] Specific embodiment 2: The difference between this embodiment and specific embodiment 1 is that in the step 2, the model pre-training coding part is performed, and all the triple information obtained in step 1 is input into the pre-training model for entity relationship modeling to obtain The row vector row after performing mean pooling on all records in the same row in the table i (the same row in the table belongs to an entity, and the overall representation of the entity is obtained);

[0116] An example is as follows: Player A scored 16 points, 10 rebounds, and 4 assists in a game. The attributes of the entity of all A players are one row. Assume that the i-th row is the data of the A player, and the j-th The attribute is the score, ie r i,j Indicates that player A scored 16 points in this game, and the final modeling goal is to hope that the vector of scoring 16 integrates the information of all the data of player A, that is, to measure whether the score of 16 poi...

specific Embodiment approach 3

[0132] Specific implementation mode 3: The difference between this implementation mode and specific implementation mode 1 or 2 is that, in the step 3, to generate the @ hidden calculation sequence, the decoder needs to decode at each moment to generate the composition calculation sequence content; At time t of a decoding, there are two ways to obtain text through decoding, one is to copy from 602 triples, that is, the copy probability, and the other is to select a word from the vocabulary to generate, that is, the generation probability;

[0133] By generating text (vocabulary generation) or copying the value of the triplet (triple copy) to form the calculation sequence of @, until all of them are input into the pre-training model, and retaining the parameters of the pre-training model, retaining the parameters is equivalent to retaining the model through The ability obtained by pre-training; the process is:

[0134] Pass the hidden layer of the decoder LSTM at the current mom...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method for generating a text based on the pre-trained structured data, and relates to a method for generating the text based on the structured data. The objective of the invention is to solve the problem of low text generation accuracy caused by the fact that an internal implicit relationship between data is not considered when an existing model models the table data on the generation of the text based on the structured data. The method comprises the following steps of 1, randomly masking one piece of data in one triad in a plurality of triads, and replacing the datawith a symbol; obtaining a characterization symbol implicit calculation sequence according to a calculation sequence relationship between data in a table; 2, obtaining the row vectors of all records in the same row in the table after mean pooling; 3, obtaining a pre-training model, and reserving the parameters of the pre-training model; 4, obtaining a table row vector; 5, verifying the pre-training model in the step 3; 6, obtaining the row vectors of all records in the same row in the table after pooling; and 7, obtaining the information represented by the data in the table. The method is usedin the field of text generation.

Description

technical field [0001] The invention relates to a method for generating text from structured data. Background technique [0002] In the current published research work on generating text from structured data, the quality of the generated text often depends on the model's ability to model the size of numbers and the relationship between numbers. When language models such as bert and elmo are proposed, Through model pre-training, the contextual relationship of each word in a sentence is enhanced. The pre-training of language models such as bert and elmo is based on text training, so that the model can obtain the relationship in the text expression through pre-training, including part of speech, verb-object relationship, etc. The relationship between the subject-predicate-object in the representation, the meaning of a word in different contexts, and other information), while Table-to-text lacks a similar relationship because the input is a single triplet data, not text. . Th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/22G06F17/24G06F17/27G06N3/04G06N3/08
CPCG06N3/049G06N3/08
Inventor 冯骁骋秦兵刘挺陈昱宇
Owner HARBIN INST OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products