Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

62 results about "Web scraping" patented technology

Web scraping, web harvesting, or web data extraction is data scraping used for extracting data from websites. Web scraping software may access the World Wide Web directly using the Hypertext Transfer Protocol, or through a web browser. While web scraping can be done manually by a software user, the term typically refers to automated processes implemented using a bot or web crawler. It is a form of copying, in which specific data is gathered and copied from the web, typically into a central local database or spreadsheet, for later retrieval or analysis.

Method for automatically finding network content quotation

ActiveCN1770159ASpeed ​​up the auto-discovery processLow hardware requirementSpecial data processing applicationsInformation retrievalNatural language understanding
The invention relates to a method for finding network contents being quoted automatically which comprises steps of: introducing pre-searching process for accelerating automatic found process, employing the indexing service provided by searching engine website to eliminate web page grabs and establishing content index. The invention has the advantages of having low requirement on hardware and of being abet to protect intelligent property of network contents.
Owner:新方正控股发展有限责任公司 +2

Method and device for reading webpage resources, and electronic equipment

The embodiment of the invention discloses a method and a device for reading webpage resources, and electronic equipment. The method is applied to the webview controls of an Android operating system of 4.0 to 4.3 versions. The method comprises the following steps: if the loading state of webpage resources to be fetched is loading completion, obtaining the URL (Uniform Resource Locator) information of the webpage resources to be fetched, wherein the webpage resources to be fetched correspond to an obtained webpage fetching request; according to the package name of an application program which constructs the current webpage, obtaining a resource cache file path mapped by the package name; extracting a binary data file under the resource cache file path, and traversing the binary data file to obtain an information field matched with the URL information; and inquiring information before the matched information field, obtaining preset symbolic information, obtaining a webpage resource file corresponding to the URL information according to the information before the symbolic information and a filename calculation strategy, and reading the webpage resource file under the resource cache file path. The method and the device can be applied to improve web resource utilization efficiency.
Owner:KINGSOFT

Filtering expression and rendering engine based method for automatically monitoring update of dynamic webpage

The invention discloses a filtering expression and rendering engine based method for automatically monitoring update of a dynamic webpage. A user appoints an interested part in the webpage as a concerned point through a visualized interface, and an application or a client automatically generates a filtering expression corresponding to the concerned point; a server renders the dynamic webpage by utilizing the rendering engine to obtain the same page seen by the user, and extracts the concerned point of the user; and when the concerned point of the user is updated, the server pushes the update content to the user in time. According to the method, a customizable dynamic webpage monitoring program is realized by helping the user appoint the concerned point and utilize the rendering engine for automatically inspecting webpage update at the server, the problem of lack of customization for a conventional information subscription mode (such as RSS (really simple syndication)) is solved, the defect of incapability of analyzing the dynamic webpage in conventional webpage capture is also overcome, and the efficiency for obtaining webpage information update by the user is improved.
Owner:SOUTHEAST UNIV

Cascade crawling method and device for multi-level pages based on web crawlers

The invention relates to a cascade crawling method for multi-level pages based on web crawlers. The method comprises: grabbing an upper-level page, storing grabbed data in an upper-level page data analysis table, and setting main key values for objects needing to continue to grab a lower-level page in the upper-level page data analysis table, wherein the main key values corresponding to the objects are different; grabbing a subordinate page and storing the captured data in a subordinate page data analysis table; setting a foreign key value for the lower-level page data analysis table, obtaining a main key value of an object corresponding to a lower-level page from an upper-level page data analysis table, and taking the main key value as the foreign key value of the lower-level page data analysis table, thereby realizing associated query of an upper-level webpage and a lower-level webpage after grabbed data falls to the ground. According to the method, a data acquisition mode capable ofrestoring logics before and after the webpage is provided, the webpage capture integrity is ensured, the data is stored according to the original webpage hierarchy sequence, and the associated multi-hierarchy page data can be conveniently obtained.
Owner:厦门商集网络科技有限责任公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products