Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A method for extracting enterprise name keywords

A technology of enterprise name and extraction method, which is applied in the field of data processing, can solve the problems of large investment and increased difficulty, and achieve the effect of high coverage

Active Publication Date: 2021-08-03
中检美亚(厦门)科技有限公司
View PDF18 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Due to the complexity and diversity of enterprise names, it is more difficult to use data processing technology to extract enterprise name keywords
At present, for enterprise name keyword data, it can only be screened and supplemented manually. In order to obtain a large amount of data and high coverage of enterprise name keyword data, a large amount of manpower is needed in actual operation.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method for extracting enterprise name keywords
  • A method for extracting enterprise name keywords
  • A method for extracting enterprise name keywords

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0036] see figure 1 , the invention discloses a method for extracting enterprise name keywords, comprising the following steps:

[0037] S1. Build a basic hot word library related to the name of the enterprise, and tag the hot words in the basic hot word library to define the tag categories of the hot words. The basic hot thesaurus is built by the following methods:

[0038] S11. Prepare enterprise name data in advance. In this embodiment, the enterprise name data is collected by a web crawler, and the enterprise name data contains more than 40 million enterprise names.

[0039] S12. Perform Chinese word segmentation processing on the enterprise name data. The Chinese word segmentation processing is to utilize IKAnalyzer word segmentation device, word segmentation device, Ansj word segmentation device or Stanford word segmentation device to carry out Chinese word segmentation processing, certainly also can adopt other word segmentation device, the present invention does not...

example 1

[0060] 1. In step S2, the user inputs "Xiamen Meiya Shangding Information Technology Co., Ltd.", and the word segmentation result is:

[0061] {Xiamen, Xiamen City, Meiya, Yashang, Information Technology Co., Ltd., Information, Technology Co., Ltd., Technology Co., Ltd., Technology, Co., Ltd., Co., Ltd.}

[0062] 2. In step S3, the obtained array arrs_a (that is, the word segmentation matched with the hot thesaurus) is:

[0063] {Xiamen, Xiamen City, Information Technology Co., Ltd., Information, Technology Co., Ltd., Technology Co., Ltd., Technology, Co., Ltd.}

[0064] 3. In step S4, the sorted array arrs_a is:

[0065] {Information Technology Co., Ltd., Technology Co., Ltd., Technology Co., Ltd., Xiamen City, Company, Technology, Information, Xiamen}

[0066] 4. In step S5, the blank operation process is as follows:

[0067]

[0068] The final result is: Meiya Shang Ding.

[0069] 5. In step S6, it is determined that the length of "Meiya Shangding" is greater than 2,...

example 2

[0071] 1. The user enters "Xiamen Beichen Shanchuan Culture Communication Co., Ltd.", and executes steps S2-S6. The company name is all replaced with blanks, and the result is "", and executes step S7.

[0072] 2. The execution process of step S7 is:

[0073]

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method for extracting enterprise name keywords, comprising the following steps: building a basic hot thesaurus related to enterprise names; performing Chinese word segmentation processing on enterprise names input by users, and outputting word segmentation results; declaring a new array arrs_a, traversing The word segmentation result, if a certain word in the word segmentation result in the traversal process matches a hot word in the basic hot thesaurus, the word is added to the array arrs_a; the array arrs_a is processed according to the length of the word segmentation and the position of the word segmentation in turn. Sorting; traverse the sorted array arrs_a, and for each word segment in the array arrs_a, perform a blank operation on the company name in sequence, and use the final word obtained as the keyword of the company name. The present invention can quickly extract enterprise name keywords according to enterprise names, and facilitates the acquisition of enterprise name keyword data with a large amount of data and a high coverage rate.

Description

technical field [0001] The invention relates to the technical field of data processing, in particular to a method for extracting enterprise name keywords. Background technique [0002] The enterprise name keyword is the most important part of the enterprise name, and it is also the core data asset of the enterprise. The enterprise name keyword plays an important role in the process of processing enterprise data. If the keywords of the enterprise name can be quickly extracted based on the collected enterprise name, it can be provided to third-party systems for other purposes, including but not limited to search engines, crawlers, public opinion analysis and other application scenarios. [0003] The name of an enterprise usually consists of four elements: administrative division, font size, industry, and organizational form, among which the font size is the core part of the keyword of the enterprise name. Due to the complexity and diversity of enterprise names, it is more dif...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/284G06F16/2458
CPCG06F16/2462G06F40/284
Inventor 郑旭王志永郭建辉林文东吴少茂
Owner 中检美亚(厦门)科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products