Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Enterprise name keyword extraction method

A technology of enterprise name and extraction method, applied in the field of data processing, can solve the problems of large investment and increased difficulty, and achieve the effect of high coverage

Active Publication Date: 2018-03-02
中检美亚(厦门)科技有限公司
View PDF18 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Due to the complexity and diversity of enterprise names, it is more difficult to use data processing technology to extract enterprise name keywords
At present, for enterprise name keyword data, it can only be screened and supplemented manually. In order to obtain a large amount of data and high coverage of enterprise name keyword data, a large amount of manpower is needed in actual operation.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Enterprise name keyword extraction method
  • Enterprise name keyword extraction method
  • Enterprise name keyword extraction method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0036] see figure 1 , the invention discloses a method for extracting enterprise name keywords, comprising the following steps:

[0037] S1. Build a basic hot word library related to the name of the enterprise, and tag the hot words in the basic hot word library to define the tag categories of the hot words. The basic hot thesaurus is built by the following methods:

[0038] S11. Prepare enterprise name data in advance. In this embodiment, the enterprise name data is collected by a web crawler, and the enterprise name data contains more than 40 million enterprise names.

[0039] S12. Perform Chinese word segmentation processing on the enterprise name data. The Chinese word segmentation processing is to utilize IKAnalyzer word segmentation device, word segmentation device, Ansj word segmentation device or Stanford word segmentation device to carry out Chinese word segmentation processing, certainly also can adopt other word segmentation device, the present invention does not...

example 1

[0060] 1. In step S2, the user inputs "Xiamen Meiya Shangding Information Technology Co., Ltd.", and the word segmentation result is:

[0061] {Xiamen, Xiamen City, Meiya, Yashang, Information Technology Co., Ltd., Information, Technology Co., Ltd., Technology Co., Ltd., Technology, Co., Ltd., Co., Ltd.}

[0062] 2. In step S3, the obtained array arrs_a (that is, the word segmentation matched with the hot thesaurus) is:

[0063] {Xiamen, Xiamen City, Information Technology Co., Ltd., Information, Technology Co., Ltd., Technology Co., Ltd., Technology, Co., Ltd.}

[0064] 3. In step S4, the sorted array arrs_a is:

[0065] {Information Technology Co., Ltd., Technology Co., Ltd., Technology Co., Ltd., Xiamen City, Company, Technology, Information, Xiamen}

[0066] 4. In step S5, the blank operation process is as follows:

[0067]

[0068] The final result is: Meiya Shang Ding.

[0069] 5. In step S6, it is determined that the length of "Meiya Shangding" is greater than 2,...

example 2

[0071] 1. The user inputs "Xiamen Beichen Shanchuan Culture Communication Co., Ltd.", and executes steps S2-S6. The company name is all replaced with blanks, and the result is "", and executes step S7.

[0072] 2. The execution process of step S7 is:

[0073]

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an enterprise name keyword extraction method. The method comprises the following steps of establishing a basic hot word library related to an enterprise name; performing Chinese word segmentation processing on the enterprise name input by a user, and outputting a word segmentation result; declaring a new array arrs_a, traversing the word segmentation result, and if a segmented word in the word segmentation result is matched with a hot word in the basic hot word library in the traversal process, adding the segmented word to the array arrs_a; according to the word lengthsand positions of the segmented words in sequence, sorting the array arrs_a; and traversing the sorted array arrs_a, performing over-displacement operation on the enterprise name in sequence for eachsegmented word in the array arrs_a, and taking an obtained final word as an enterprise name keyword. The enterprise name keyword can be quickly extracted according to the enterprise name, so that large-data-volume and high-coverage-rate enterprise name keyword data can be obtained conveniently.

Description

technical field [0001] The invention relates to the technical field of data processing, in particular to a method for extracting enterprise name keywords. Background technique [0002] The enterprise name keyword is the most important part of the enterprise name, and it is also the core data asset of the enterprise. The enterprise name keyword plays an important role in the process of processing enterprise data. If the keywords of the enterprise name can be quickly extracted based on the collected enterprise name, it can be provided to third-party systems for other purposes, including but not limited to search engines, crawlers, public opinion analysis and other application scenarios. [0003] The name of an enterprise usually consists of four elements: administrative division, font size, industry, and organizational form, among which the font size is the core part of the keyword of the enterprise name. Due to the complexity and diversity of enterprise names, it is more dif...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06F17/30
CPCG06F16/2462G06F40/284
Inventor 郑旭王志永郭建辉林文东吴少茂
Owner 中检美亚(厦门)科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products