Method for automatically generating abbreviations of English paper titles in computer

A technology for automatically generating and acronyms, applied in the field of text analysis, which can solve the problems of many candidates and it is difficult for users to pick out abbreviations.

Inactive Publication Date: 2018-01-12
NANJING UNIV
View PDF0 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, when the content of the description becomes longer, this method will cause another problem, that ...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for automatically generating abbreviations of English paper titles in computer
  • Method for automatically generating abbreviations of English paper titles in computer
  • Method for automatically generating abbreviations of English paper titles in computer

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0106] Suppose the title to be abbreviated is "A Second Generation RDF Query Language".

[0107] First, the title is syntactically analyzed to obtain a dependency parsing tree. syntax tree such as image 3 shown.

[0108] After the dependency tree is obtained, the words are scored according to their position on the tree. It is assumed here that the weights related to the syntax tree are set to 0.99368, 0.95529, 0.44995, and 0.15046. Words in the title are scored as follows:

[0109] Table I

[0110] word

A

second

Generation

RDF

Query

Language

scoring

1.91059

1.91059

1.91059

1.91059

1.91059

1.98736

[0111] After the syntax tree analysis, the title needs to be semantically analyzed. The word vector has been trained before, and here it can be read directly from the saved file. Because the word vector is a 300-dimensional vector, the word vectors of the words in the title are not listed here. After calculation,...

Embodiment 2

[0125] All codes of the present invention are written by Java, and the model adopted is an Intel Xeon X7550 processor with a main frequency of 2.00GHZ and a memory of 40G. The standFord parser and word2vec used in the present invention are common open source syntax analysis and word vector training tools at present.

[0126] More specifically, as figure 1 As shown, the present invention operates as follows:

[0127] 1. Description content analysis: use standford parser and word2vec to analyze the title, and get the score of each word in the title, that is, the importance of the word;

[0128] 2. Use beamsearch to generate candidate acronyms: Use beamsearch to search the candidate abbreviation space, and calculate the score of the current acronym every time the status is updated.

[0129] 3. Adjust the score of each candidate abbreviation: use the language model and the length of the abbreviation to adjust the score of the abbreviation, and output the score in descending order....

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for automatically generating abbreviations of English paper titles in a computer. The method includes the steps of analyzing description content to obtain weight of each word, namely the degree of importance; according to the weight of each word, adopting a beamsearch method to query candidate abbreviations in the whole abbreviation space, and obtaining primary scores of the abbreviations; adjusting the scores of the abbreviations to obtain the final scores, and then sorting the final scores in a descending mode. According to the method, by improving existing abbreviation generation methods, the equal processing problem of each part of the description text can be avoided. Meanwhile, through syntactic parsing, semantic parsing, language models and other techniques related to natural language processing, related linguistics knowledge taken into account when people create the abbreviations can be learned to a certain degree, so that the explanatory degreeis higher during generation of the abbreviations.

Description

technical field [0001] The invention belongs to the field of text analysis, in particular to a method for automatically generating English thesis title abbreviations in a computer. Background technique [0002] Using abbreviations to name an item or a long text description is a very common language phenomenon. For example, the acronym IBM is often used to represent International Business Machines Corporation. Acronyms also often play an important role in scholarly communication. Usually, the full name of a method or a system needs to use more characters to summarize the core content of the method or system, and such a name increases the difficulty for users to remember or mention. In comparison, acronyms that are very similar to words are easier for people to remember and refer to, and they can also better remind people of what they represent. [0003] There are many ways to create acronyms. For example, SVM is an acronym for Support Vector Machine, which uses the initial...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27
Inventor 张建兵黄书剑孙一欣王晓亮俞扬戴新宇陈家骏
Owner NANJING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products