Data cleaning and integrating method and system for enterprise website

A data cleaning, enterprise technology, applied in network data retrieval, electronic digital data processing, other database retrieval and other directions

Inactive Publication Date: 2017-11-07
南京樯图数据科技有限公司
View PDF6 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The disadvantage of the existing technology is that there is a lack of a way to analyze the relationship between enterprises

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data cleaning and integrating method and system for enterprise website
  • Data cleaning and integrating method and system for enterprise website

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0036] Embodiments of the present invention are described in detail below, examples of which are shown in the drawings, wherein the same or similar reference numerals designate the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary and are intended to explain the present invention and should not be construed as limiting the present invention.

[0037] like figure 1 As shown, the data cleaning and integration method for the official website of the enterprise in the embodiment of the present invention includes the following steps:

[0038] Step S1, obtain the enterprise name input by the user, call the search engine API to search according to the enterprise name provided by the user, collect multiple records, and obtain the returned URL link page.

[0039] In one embodiment of the present invention, the number of records collected by the search engine API is determined a...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a data cleaning and integrating method and system for an enterprise website. The method includes the steps of obtaining an enterprise name inputted by a user, calling a search engine for searching according to the enterprise name, collecting multiple records, obtaining returned website link pages, analyzing and scoring the pages, setting the page with the highest scores as the enterprise website, extracting texts of multiple paragraphs without hyperlinks and with the highest rank of the number of words in the page to be stored, calculating the frequency of words repeatedly appearing in the texts, extracting words high frequently appearing in the given texts and low frequently appearing in a corpus to serve as enterprise keywords, conducting searching in a preset database according to the enterprise keywords to obtain a returned search result, and conducting trend analysis on the search result to obtain final enterprise evaluation data. The preliminary structuring of relevant enterprise information is achieved, and subsequent analysis and evaluation are convenient.

Description

technical field [0001] The invention relates to the technical field of Internet data processing, in particular to a data cleaning and integration method and system for corporate official websites. Background technique [0002] Most of the existing enterprise information comprehensive websites simply list enterprise information, and are mainly aimed at the information summary and analysis of a single enterprise. The disadvantage of the prior art is that there is a lack of a way to analyze the interrelationships between enterprises. Among them, how to search the massive data according to the user's keywords, and filter the official website of the enterprise from it, and carry out structured processing on the data is a technical problem that needs to be solved at present. Contents of the invention [0003] The aim of the present invention is to solve at least one of said technical drawbacks. [0004] Therefore, the object of the present invention is to propose a data cleani...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/9535G06F16/955
Inventor 辛柯俊
Owner 南京樯图数据科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products