Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A patent document clustering method

A patent document, clustering method technology, applied in special data processing applications, instruments, electronic digital data processing, etc., can solve the problems of missing hidden information, insufficient patent analysis, etc., to achieve the effect of avoiding dimensional disasters

Active Publication Date: 2017-10-17
DALIAN UNIV OF TECH
View PDF4 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Selecting a clustering factor for patent cluster analysis may not be comprehensive enough for patent analysis
On the other hand, only using the same form of corpus for cluster fusion may miss a lot of hidden information

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A patent document clustering method
  • A patent document clustering method
  • A patent document clustering method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0037] S1. Corpus collection and preprocessing:

[0038] a1. Corpus collection:

[0039]Select the automotive field, and use crawler technology to crawl patent document information in each category according to the eight categories of patent IPC classification numbers A-H from the "State Intellectual Property Office Patent Database" to form a corpus. Patent document information includes patent titles, IPC classification numbers, and patent abstracts; the patent abstracts of all patent documents in the extracted corpus are stored as word vector training corpus; the patent abstracts of 1,000 patent documents in the extracted corpus are stored as attribute and attribute value model training corpus Set, attribute and attribute value model training corpus contains patent abstracts of eight categories A-H and extracts 125 patent abstracts for each category; extracts patent titles, patent abstracts and IPC classification numbers of 640 patent documents from the corpus and stores them...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

A patent literature clustering method comprises the following steps that S1, a corpus set is collected and preprocessed; S2, clustering analysis is carried out on feature word extraction of corpuses; S3, vector representation of data patients is analyzed based on clustering of term vectors; S4, clustering is carried out; S5, a clustering result is evaluated. The title and summary information of patent literature is comprehensively considered through the patent literature clustering method, the patient summary information is utilized from different angles, overall information of patent summary texts is considered, meanwhile, information of attributes and attribute values in patent summaries are considered, and connotative semantic information in the patent text summaries is fully mined. Information hidden in large-scale corpuses is fully utilized, the large-scale corpuses are utilized for characteristic training, words are expressed in a low-latitude vector form, the curse of dimensionality is avoided, and meanwhile information in the texts is extracted better. Different weights are set, data of three forms are fused through the titles, the summaries and the attribute values of the summaries, and a good patent clustering effect is obtained.

Description

technical field [0001] The invention relates to a method for clustering patent document corpus, in particular to a method for clustering patent document. Background technique [0002] In the current economic environment, patents play an increasingly important role in enhancing corporate value. By applying for a patent, the intellectual property rights of the enterprise can be protected, thereby protecting the core competitiveness of the enterprise. At present, scholars have conducted a lot of research on patent documents, such as labeling patent abstracts, extracting key technologies of patents, and performing cluster analysis on patents. [0003] In recent years, in the field of data mining, research on text clustering has achieved many results. Many of these methods are based on expressing documents as vectors, and use clustering algorithms to cluster and analyze documents. Patent documents contain a large number of unstructured forms of information, so clustering can b...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/27
Inventor 林鸿飞孙东普
Owner DALIAN UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products