Chinese herbal medicine plant picture capturing method based on professional term vector of traditional Chinese medicine and pharmacy field

A technology of word vectors and Chinese medicine, applied in the field of crawling pictures of Chinese herbal medicine plants based on professional word vectors in the field of Chinese medicine, can solve problems such as unsatisfactory results

Active Publication Date: 2016-08-03
ZHEJIANG UNIV
View PDF3 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Therefore, the text-based image retrieval function provided by the general search engine can be used to quickly build a Chinese medi...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese herbal medicine plant picture capturing method based on professional term vector of traditional Chinese medicine and pharmacy field
  • Chinese herbal medicine plant picture capturing method based on professional term vector of traditional Chinese medicine and pharmacy field
  • Chinese herbal medicine plant picture capturing method based on professional term vector of traditional Chinese medicine and pharmacy field

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0055] Such as figure 1 As shown, a method of crawling Chinese herbal medicine plant pictures based on professional word vectors in the field of Chinese medicine. In this embodiment, steps that are not described in detail, such as steps 3) and 8), are specifically performed as described in the foregoing specific implementation manners. The main steps adopted in this embodiment are as follows:

[0056] 1) Perform OCR processing on books such as "The Essence of Modern Famous Chinese Medical Cases in China", "Famous Doctors' Cases", "Continued Medical Cases" and "Continued Famous Doctors' Cases" to extract the text information of medical records; at the same time, crawl Baidu Entries related to traditional Chinese medicine on encyclopedia websites such as Baike, Hudong Baike, Sogou Baike and Wikipedia.

[0057] 2) The text obtained in step 1) is segmented using the CRF model and the longest word matching method, and the stop words are filtered out to construct a Word2Vec traini...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a Chinese herbal medicine plant picture capturing method based on a professional term vector of the traditional Chinese medicine and pharmacy field. The Chinese herbal medicine plant picture capturing method comprises the following steps: firstly, selecting and collecting traditional Chinese medicine and pharmacy text data, text information of medicine books and dictionary entry information relevant to traditional Chinese medicine and pharmacy, captured from an encyclopaedia website; secondly, training a Word2Vec model by using the text data so as to obtain professional term vectors of the traditional Chinese medicine and pharmacy field; then, acquiring a candidate picture set and text information thereof on source web pages through an image retrieval function based on texts provided by common search engines, such ash Google and Baidu, and calculating characteristic vectors of the source web pages by using a Doc2Vec model; finally, according to the similarity of the characteristic vector of the source web page and the corresponding Chinese herbal medicine term vector, rearranging the candidate picture set, and selecting a plurality of pictures arranged at the front, and de-duplicating the picture set by using a perceptual hash algorithm so as to obtain a final picture set.

Description

technical field [0001] The invention relates to a method for crawling pictures of Chinese herbal medicine plants, in particular to a method for crawling pictures of Chinese herbal medicine plants based on professional word vectors in the field of Chinese medicine. Background technique [0002] In order to build a professional Chinese herbal medicine plant image retrieval system, it is first necessary to build a Chinese herbal medicine plant image library. With the rapid development of Internet technology and the rapid growth of image acquisition equipment, a large number of pictures of Chinese herbal plants have been generated on the Internet. Therefore, grabbing pictures of Chinese herbal plants from the Internet is an effective way to quickly build an image library of Chinese herbal plants. As people's demand for retrieving target images from massive images becomes more and more urgent, general search engines, such as Google and Baidu, provide image retrieval functions, in...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/5866G06F16/9535
Inventor 魏宝刚张引庄越挺谭亮
Owner ZHEJIANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products