An Iterative Method for Text Sequences for Semantic Understanding

A technology for text sequence and semantic understanding, which is applied in the field of text sequence iteration for semantic understanding, and can solve problems such as high cost of reproduction, scalability to be improved, and low efficiency

Active Publication Date: 2018-10-02
哈尔滨工业大学人工智能研究院有限公司
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0014] The purpose of the present invention is to solve the problem that the prior art adopts the method of crawling search engine retrieval numbers in terms of semantic similarity calculation, which leads to low efficiency and excessive cost of reproduction, and relies on ontology library and semantic resources, requiring manual proofreading or Purely artificial construction, the scalability needs to be improved, which limits the practical application to a certain extent, and proposes a text sequence iteration method for semantic understanding

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • An Iterative Method for Text Sequences for Semantic Understanding
  • An Iterative Method for Text Sequences for Semantic Understanding
  • An Iterative Method for Text Sequences for Semantic Understanding

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment approach 1

[0034] Specific implementation mode one: combine figure 1 Describe this embodiment, a text sequence iteration method for semantic understanding in this embodiment, specifically prepared according to the following steps:

[0035] Step 1. Extract background knowledge base triples and original text triples, and the original text is used to verify the model;

[0036] As the name implies, the knowledge concept is a unit that expresses a complete concept information. As mentioned in 4.2, it is expressed in the form of triples in this model. In order to enable triples to fully express the semantic information in the text, we use Semantic Role Labeling (SRL) to extract the backbone information of the sentences in the text [19](Liu T, Che W, Li S, et al. Semantic rolelable system using maximum entropy classifier[C] / / Proceedings of the NinthConference on Computational Natural Language Learning. Association for Computational Linguistics, 2005:189-192.), mainly extraction A triplet suc...

specific Embodiment approach 2

[0042] Specific implementation mode two: the difference between this implementation mode and specific implementation mode one is: the background knowledge base triplet and the original text triplet are extracted in the described step one; The specific process is:

[0043] The experimental data set comes from the Internet text classification corpus provided by Sogou Lab. After preliminary filtering (filtering by artificial settings, filtering out illegal characters in the article and articles with a long text length), the number of available texts is 17,199. There are 9 categories of texts in the Internet text classification corpus, namely finance, IT, health, sports, tourism, education, recruitment, culture, and military. 200 articles are randomly selected for each type of text as test corpus, a total of 1800 original texts. The extraction tool uses Harbin The LTP language technology platform of the Social Computing and Information Retrieval Research Center of the Industrial Un...

specific Embodiment approach 3

[0046] Specific embodiment three: what this embodiment is different from specific embodiment one or two is: in the described step 3, the weight value of original text triple is set to 1, with original text triple as search starting point, by Cosine similarity (cosine similarity) to calculate the semantic similarity between the real number vector of the original text triplet and the real number vector of the background knowledge base triplet; the specific process is:

[0047] The semantic similarity formula between the real number vector of the original text triplet and the real number vector of the background knowledge base triplet is:

[0048]

[0049] In the formula, A is the real number vector of the triplet in the original text, B is the real number vector of the triplet in the background knowledge base, θ is the angle between A and B, · is the inner product of the vector, * is the multiplication, n is the dimension of the vector Number, positive integer, ||A|| is the n...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a text sequence iterative method for semantic understanding. The text sequence iterative method for semantic understanding aims at solving the problems that in the prior art, a method for crawling searching of the engine retrieval number is adopted on the aspect of semantics similarity computation, the efficiency is low, recurrence cost is oversize, depending on an ontology base and a semantic resource, manual checking or purely manual construction is needed, the expandability aspect needs improving, and the practical application is limited to some extent. According to the technical scheme, the method comprises the steps that 1, an original text triad and a background knowledge base triad are extracted; 2, a real number vector of the original text triad and a real number vector of the background knowledge base triad are calculated; 3, the semantic similarity is calculated; 4, the weighted value of the real number vector of the background knowledge base triad is calculated; 5, a sequence iterative model is constructed; 6, text classification is conducted on a text through a support vector machine, and estimation is conducted on classification performance. The text sequence iterative method for semantic understanding is applied to computer fields.

Description

technical field [0001] The invention relates to a text sequence iteration method for semantic understanding. Background technique [0002] In traditional natural language and text mining [1] (Kao A, Poteet S R. Natural language processing and text mining [M]. Springer Science & Business Media, 2007.) In the research, the main focus is on the vocabulary, phrase and sentence levels, but as the application scenarios become more complex and technical In recent years, discourse semantic analysis has received more and more attention and has gradually become a current research hotspot. [0003] Text semantic analysis takes text as the research object, and the research goal is to analyze the semantics behind the text. In the existing semantic analysis research, most of the work focuses on the text itself, ignoring the background information related to the text content. However, when the author writes the text, in order to better highlight the main content, some relevant backgroun...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
CPCG06F16/3334G06F16/3335G06F16/3344G06F16/35
Inventor 秦兵刘挺张牧宇郑茂李家琦
Owner 哈尔滨工业大学人工智能研究院有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products