Patent literature similarity measurement method based on ontology

A technology of similarity measurement and patent documents, applied in the field of semantic retrieval system of subject terms of patent documents, can solve the problems of low recall and precision rate, inconvenient retrieval, etc., to speed up retrieval, improve comprehensiveness, and improve accuracy and the effect of relevance

Inactive Publication Date: 2017-10-13
BEIJING INSTITUTE OF TECHNOLOGYGY
View PDF4 Cites 72 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, in the retrieval process of the above two methods, the characteristics of the data of the patent documents themselves are not fully considered, resulting in problems such as low retrieval and precision rates, and inconvenient retrieval.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Patent literature similarity measurement method based on ontology
  • Patent literature similarity measurement method based on ontology
  • Patent literature similarity measurement method based on ontology

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0039] Such as figure 1 Shown is a schematic flow chart of a patent document similarity measurement method based on knowledge ontology, which includes the following steps:

[0040] Step 1), extract the core technical solution information according to the structural characteristics, location characteristics and keyword characteristics of the patent documents;

[0041] Here, the structural feature of the patent document is the corresponding relationship between the category of the patent document description and its XML file tag; the position feature is the paragraph information in the document where the core technical solution information is extracted, such as the first paragraph of the content of the invention, the keyword feature Special description words for extracting information categories of core technical solutions, such as optimization, improvement, solution, etc.

[0042] As we all know, the "Patent Laws of the People's Republic of China" stipulates: To apply for an i...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a patent literature similarity measurement method based on ontology, and relates to the technical field of natural language information processing for the ontology. The method comprises the following steps: extracting a core technical scheme according to the structural features, the position features and the keyword features of patent literatures; constructing a model for the relation between thematic terms of patent classes; constructing a field dictionary according to the model for the relation between the thematic terms of the patent classes and segmenting terms and removing stop terms for the core technical scheme; extracting keywords and weight by combining the relation between the thematic terms to TF-IDF as TextRank term initial weight; training a FastText model, and generating a term vector; and calculating an EMD distance to obtain a semantic distance according to keywords, term weight and term vector. Compared with the prior art, the patent literature similarity measurement method based on the ontology solves the problem that the similarity is low due to the fact that the structural features, the field features, the term relation features and the semantics approximate expression of the patent literature are not fully considered.

Description

technical field [0001] The invention discloses a patent document similarity measurement method based on knowledge ontology and a patent document subject word semantic retrieval system using the method, and relates to the technical field of patent text-oriented natural language information processing. Background technique [0002] Today's society is an information-based society. Massive data are generated in various fields of society. How to dig out valuable information from massive data has always been a hot spot in academic research. As a special information strategic resource, patent is an important part of the development of national strategic resources. [0003] Patent information records the achievements of inventions and creations of human society. It integrates technology, law and economy into one, and is the most important treasure house of technical knowledge in contemporary society. Patents have the characteristics of novelty, creativity and practicability. With t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/27
CPCG06F16/90332G06F40/30
Inventor 李建宏张华平
Owner BEIJING INSTITUTE OF TECHNOLOGYGY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products