Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

60 results about "Web extraction" patented technology

Automated identification of phishing, phony and malicious web sites

A method and system for automated identification of phishing, phony, and malicious web sites are disclosed. According to one embodiment, a computer implemented method, comprises receiving a first input, the first input including a universal resource locator (URL) for a webpage. A second input is received, the second input including feedback information related to the webpage, the feedback information including an indication designating the webpage as safe or unsafe. A third input is received from a database, the third input including reputation information related to the webpage. Data is extracted from the webpage. A safety status is determined for the webpage, including whether the webpage is hazardous by using a threat score for the webpage and the second input, wherein calculating the threat score includes analyzing the extracted data from the webpage. The safety status for the webpage is reported.
Owner:CLOUDFLARE

Managing and indexing content on a network with image bookmarks and digital watermarks

A method of managing content, and in particular, managing content on the Internet retrieves a web page that includes an image and detects whether the image included within the web page is embedded with a digital watermark. It generates an indicia associated with an image included in the web page that is embedded with a digital watermark. The indicia indicate to the user which images include watermarks. The watermarks may be used to convey links to related web pages or specific information about the images, such as usage rights and licensing information. Variations of this method create image bookmarks to web pages including images using thumbnails of those images. A content management system comprises a first program for retrieving web pages including images. It also includes a second program for extracting an image from a web page, creating a thumbnail of the image, and forming an image bookmark linking the thumbnail to the web page that the image has been extracted from. The thumbnails are used to create a visual index to corresponding web pages from which the images originated on the Internet. A method of visual indexing of content on a network, such as the Internet, retrieves a web page, extracts an image included on the web page, generates a thumbnail of the image, and creates a link between the thumbnail and a location of the web page from which the image has been extracted.
Owner:DIGIMARC CORP

Method and system for extracting information from web pages

A crawler collects webpage data and obtains a list of URL's of interest used to construct a searchable index. The HTML stream is received for each relevant URL and each HTML stream is imported onto a browser or rendering engine so as to render the page. From the browser, the run-time data structure for each page is obtained. From the run-time data structure, layout information of the webpage is obtained. The layout information can include location and size of images, text, video clips, banners, etc. Using various heuristics, selected items of interest are identified as relevant according to their associated layout information. Then, when a query is received and a match is found in the index, only the information identified as relevant is fetched and presented to the user.
Owner:BRILLIANT SHOPPER

Computer method and apparatus for extracting data from web pages

Computer method and apparatus for extracting information from a Web page is disclosed. The invention apparatus is formed of an extractor coupled to receive Web pages from a source. The extractor uses natural language processing to extract desired information from the Web page. A storage subsystem receives from the extractor the extracted desired information and stores the extracted desired information in a database. The invention method for extracting data from a Web page includes the computer implemented steps of (i) using natural language processing, finding possible formal names on a given Web page, (ii) using pattern matching, searching the given Web page for formal names not found by the natural language processing, and (iii) refining a combined set of the found formal names to produce a working set of people and organization names extracted from the given Web page. The refining includes determining aliases of respective people and organization names, so as to effectively reduce duplicate names.
Owner:ELIYON TECH CORP

System and method for transcoding web content for display by alternative client devices

A computer-implemented method and system for processing transactions between a client device and a web page. The system includes an adapter for receiving and interpreting a request from the client device, wherein the adapter is configured to interface with the client device. A generator retrieves a web page specified by the request. A transcoder receives the retrieved web page and applies a transcoding rule to extract data from the web page. The transcoding rule used is one of a set of predefined rules relating to the web page. The transcoder also transforms the data into a standardized form so that the adapter can then modifying the standardized data into a compatible form for display by the client device. Therefore, web based transactions can be performed by a variety of client devices, including portable, wireless and voice-based devices.
Owner:D-LINK

Web browser embedded button for structured data extraction and sharing via a social network

The present invention is directed to a system and method which users can use to identify data base elements in a web page, store the extraction template representing the location and type of elements on the page, extract and store the product record in their collection, use the extraction template to automatically extract all the data from the web site and constantly check the extraction templates for correctness and update the extraction templates if necessary. Additionally, the present invention system provides crowd sourced web page data record extraction template creation to build a database of web page extraction templates which could then be used by others to extract the information from the web pages at the site where the extraction template(s) were created, and to save the information to a social network. Moreover, crowd based web page data record extraction template creation and storage system can be used to create extraction templates for batch extraction of information from remote web sites. Also, the data record information extracted from the web page to find the same or similar products at other web sites can be sited in a central product record data base that is created with the previously mentioned batch extraction system.
Owner:PAPPAS DEREK EDWIN +1

Method for Extracting Data from Web Pages

Embodiments of the invention describe a computer-implemented method for extracting data from web pages. During a learning stage, the embodiments receive a template web page represented by a template Document Object Model (DOM) and select a record node, which is a root node of a sub-tree of the template DOM that contains data to be extracted. After that, a record node sub-tree and data field sub-paths are stored in a memory, wherein the record node is a root node of the record node sub-tree, and the data field sub-paths are relative paths of the template DOM from the record node to data field nodes. During the extraction stage, a web page represented by a DOM-tree is received and a matched sub-tree of the DOM-tree according to a structure of the record node sub-tree is identified. Next, data from the matched sub-tree according to the data field sub-paths are extracted.
Owner:MITSUBISHI ELECTRIC RES LAB INC

Managing and indexing content on a network with image bookmarks and digital watermarks

A method of managing content, and in particular, managing content on the Internet retrieves a web page that includes an image and detects whether the image included within the web page is embedded with a digital watermark. It generates an indicia associated with an image included in the web page that is embedded with a digital watermark. The indicia indicate to the user which images include watermarks. The watermarks may be used to convey links to related web pages or specific information about the images, such as usage rights and licensing information. Variations of this method create image bookmarks to web pages including images using thumbnails of those images. A content management system comprises a first program for retrieving web pages including images. It also includes a second program for extracting an image from a web page, creating a thumbnail of the image, and forming an image bookmark linking the thumbnail to the web page that the image has been extracted from. The thumbnails are used to create a visual index to corresponding web pages from which the images originated on the Internet. A method of visual indexing of content on a network, such as the Internet, retrieves a web page, extracts an image included on the web page, generates a thumbnail of the image, and creates a link between the thumbnail and a location of the web page from which the image has been extracted.
Owner:DIGIMARC CORP

An image search method and its search engine

InactiveCN102270234AImplement the extraction functionAchieve a specific effectSpecial data processing applicationsImage segmentationImaging Feature
The invention provides an image search method and a search engine thereof. The method obtains similar images by crawling the obtained pictures on the whole network, extracts the context and subject information of the pictures from the source webpage of the similar images, and finally provides the search results of the images comprehensively according to the semantic features and visual features of the images . The image search engine includes an acquisition module, a primary search module, a secondary search module, a word segmentation module and a determination module. The acquisition module obtains the source image, the primary search module obtains a similar image set, and the secondary search module establishes a data structure of web page information of the similar image set. The word segmentation module marks the position weight of the picture context, extracts the longest phrase and marks the word weight, and the determination module extracts the core subject words and crawls relevant picture information. The present invention provides a search engine and search method that more comprehensively uses images to search subject information and related images, and users can generate different needs according to different scenarios and achieve specific effects.
Owner:BEIHANG UNIV

Web page content translator

A system, method, and computer readable medium for reformatting web content into a format readable on one or more mobile devices is provided. A user generates a user request for a web page from a mobile device to a proxy server. The proxy server forwards the user request to an origin web server, which returns the requested web page to the proxy server. A conversion engine within the proxy server extracts the desired content from the web page, and reformats the content in accordance with one or more predefined transform methods associated with the one or more mobile devices before transmitting the transformed web page with the desired content to the one or more mobile devices. Secure or unsecure connection provided via a decorated uniform resource locator can be used to connect a mobile device, the proxy server, and an origin web server.
Owner:CRFD RES

Method and apparatus for web page co-browsing

A method and apparatus for extracting information from a web page on a standard end user browser without plug-ins, includes the steps of dynamically creating an element on a web page being viewed by an end user, copying at least a portion of the contents of the web page or form field values, and uploading the data to a target domain, wherein the target domain may be different from the domain of the web page. In co-browsing applications, the data uploaded is used to create a copy of the website for display to a third party.
Owner:ORACLE OTC SUBSIDIARY

Method and system for extracting Web information based on Nutch

The invention discloses a system for extracting Web information based on Nutch. The system comprises an information extraction module, a storage module, an index module and a retrieval module, wherein the information extraction module is used for capturing webpage data from the Internet through a Nutch frame and analyzing the data; the storage module is used for storing webpage extraction files in which the webpage data is filtered; the index module is used for transmitting the webpage information collected by the Nutch to Solr to establish an index; the retrieval module is used for using the Solr to respond to a user query request and displaying the query result to a user in an XML page form. The response and running sped, stability and expandability of information extraction are improved, the excessive storage space occupied by the program is reduced, and guarantees are provided for the fact that the user can obtain effective information in time.
Owner:NANTONG UNIVERSITY

Information processing apparatus, and method and system for searching for reputation of content

An information processing apparatus, includes: an acquisition section configured to acquire content-related information; a keyword extraction section configured to extract a search keyword from the content-related information; a site search section configured to perform a search through the Internet for websites with a web page, and acquire information concerning websites; a first site selection section configured to select top N websites from the websites; a second site selection section configured to access each of the N websites to extract a written text from a web page of each of the N websites, and select two or more of the N websites as seed sites; and a reputation result acquisition section configured to collect written texts from the seed sites and subordinate websites linked to the seed sites, and acquire a reputation result of the content from the collected written texts.
Owner:SONY CORP

Method and device for extracting webpage content

The invention provides a method and a device for extracting webpage content. The method comprises the following steps: based on a digital document analyzing (DDA) method, extracting the webpage content of an input webpage to generate a DDA extraction result; based on a document image recognition (DIR) method, extracting the webpage content of the input webpage to generate a DIR extraction result; and merging the DDA extraction result and the DIR extraction result to generate a merging result. The method and the device can acquire better webpage extraction result compared with the prior art.
Owner:RICOH KK

Method and system for scheduling tasks of distributed network crawlers

The invention belongs to the technical field of internet search engines, and provides a method and system for scheduling tasks of distributed network crawlers. The method comprises the following steps: configurating distributed network crawler clusters; analyzing a webpage corresponding to a first layer link, and extracting a second layer link existed in the webpage by a first crawler; distributing a crawling task corresponding to the second layer link according to a Hash consistency algorithm; recording the crawling task corresponding to the second layer link to a crawling task document corresponding to a crawler with the corresponding sequence number if the second layer link is distributed to a crawler apart from the first crawler; packaging and uploading crawling task documents to a shared directory at every other pre-set time intervals; extracting and performing a corresponding crawling task in the shared directory by each crawler regularly. According to the invention, the cooperative task scheduling of the distributed network crawler tasks is realized through the shared directory, so that the tasks can be distributed to each crawler uniformly.
Owner:SHENZHEN COSHIP ELECTRONICS CO LTD

Webpage extraction method based on attribute reproduction and labeled path

InactiveCN102760150AEfficiently determine the label pathSpecial data processing applicationsWeb extractionText string
The invention discloses a webpage extraction method based on attribute reproduction and labeled path. The web extraction method comprises the following steps of: constructing an attribute value seed set through extracting a target website or an attribute value list page, wherein part value of a target attribute is contained; acquiring a partial sample page, and determining a relative labeled path, between an attribute name and an attribute value, of each attribute; downloading a partial page, constructing a training sample base, and storing the acquired codes in a local database; inquiring and labeling all reproductions of each seed attribute value in the training webpage, recording to the labeled path corresponding to each reproduction; taking the labeled path with highest support to a same attribute webpage as an extraction rule for extracting other webpage information except the training samples; accessing other webpage HTML (Hypertext Markup Language) trees in the target website by using the acquired labeled path, locating the label where the attribute value is, and extracting a text character string; and deleting the attribute value without the attribute name or with an incorrect attribute name, and storing the correct attribute value into the local database, thereby finishing the attribute value extraction of page attribute.
Owner:NAT UNIV OF DEFENSE TECH

Hadoop cluster-based large-scale Web information extraction method and system

The invention discloses a Hadoop cluster-based large-scale Web information extraction method and a Hadoop cluster-based large-scale Web information extraction system, aiming at the problem that a single node cannot be competent to the requirement for large-scale Web information extraction. The method comprises the following steps that an aggregation processing node extracts a website seed to be inquired according to predetermined conditions, performs load balancing segmentation according to the processing capacity of each query node, and transmits the seed to be inquired to each query node; each query node performs Web extraction locally according to the seed to be inquired and reports to the aggregation processing node; the aggregation processing node aggregates the reported information to obtain large-scale Web information. According to the method and the system, massive data extraction is performed in a Hadoop cluster mode, and data are processed by a high-efficiency Hbase-type memory database; the extraction efficiency is greatly improved compared with a single machine and a traditional relational database, and the reliability and the expansibility are high.
Owner:华夏文广传媒集团股份有限公司

Seat belt retractor for the safety belt of a motor vehicle

A seat belt retractor for a safety belt of a motor vehicle having a belt shaft (1) rotatably mounted in a frame, a profile head (2) lockable in relation to the frame, a load limiting device (20) located between the profile head (2) and the belt shaft (1) for enabling the belt shaft (1) to undergo a load limited rotation in the belt webbing extraction direction (A) with the profile head (2) being locked and a load limiting level predetermined by the load limiting device (20) being exceeded. the load limiting device (20) is formed from at least two load limiting elements (4, 5), by the activation of which the load limiting level can be switched from a lower to a higher level during the load limited belt webbing extraction, a first load limiting element (4) with a higher load limiting level is provided, which load limiting element with a first end (4a) is connected to the profile head (2), and a second load limiting element (5) with a lower load limiting level is provided, which load limiting element with a second end (5b) is connected to the belt shaft (1), and the second end (4b) of the load limiting element (4) with the higher load limiting level is connected to the first end (5a) of the load limiting element (5) with the lower load limiting level via a connecting element (6), and a coupling element (7) is provided for coupling the connecting element (6) to the belt shaft (1), via which coupling element the connecting element (6) can be coupled to the belt shaft (1) after the same has performed a rotation of a predetermined angle.
Owner:AUTOLIV DEV AB

Storm based stream computing frame text index method and system

The invention discloses a Storm based stream computing frame text index method and system. The Storm based stream computing frame text index method includes: implementing topology of a storm, designing a Storm real-time data processing frame, and completing a webpage automatic extraction program of a web spider; automatically extracting key words; and classifying texts: classifying texts into one or more classifications according to content or attributes of the texts. The method and the system can allow backup data back in the conditions of data corruption or data loss, and recovers data; the function of centralized operation and maintenance of the system can be provided; an interface is beautiful and practical, and a convenient and visual graphical user management interface is achieved; function expansion can meet the demands of a user for later system expansion and use range expansion; and fault tolerance allows the system to have a certain fault tolerance mechanism when illegal data generates due to user input or wrong operations.
Owner:YUNNAN UNIV +2

Topic Map for Navigational Control

Included are embodiments for providing a topic map. At least one embodiment of a method includes receiving a plurality of web pages, the web pages including metadata, extracting at least a portion of the metadata from the web pages, and creating at least one topic associated with the web pages, the at least one topic corresponding to at least a portion of the metadata.
Owner:BELLSOUTH INTPROP COR

Apparatus and method for sharing web contents using inspector script

An apparatus for sharing Web contents is provided. The apparatus includes a Web browser that loads and outputs a Web page, and a Web content transmission client that is linked with the Web browser to extract context information that is current state information from the Web page, and transmits the extracted context information to at least one other terminal.
Owner:ELECTRONICS & TELECOMM RES INST

Variable passenger restraint controlled system

An aspect of the present invention provides a passenger protection device that includes, a brake pedal sensor configured to detect the amount by which a brake pedal of a vehicle is operated, a webbing, one end of the webbing fixed to the vehicle, configured to restrain a passenger seated on a seat of the vehicle, a retractor connected to the other end of the webbing, the rector configured to retract the webbing and inhibit of the webbing extraction, and an ECU electrically coupled to the brake pedal sensor, the ECU configured to detect that the brake pedal operation amount detected by the pedal sensor exceeds a first threshold value, the ECU configured to revise the threshold value based on safety related information of the vehicle to control the inhibition of the webbing extraction of the retractor.
Owner:NISSAN MOTOR CO LTD

Method of printing web page by using mobile terminal and mobile terminal for performing the method

A method of printing a web page by using a mobile terminal and a mobile terminal are provided. The method includes displaying the web page on the mobile terminal, extracting objects that are to be printed from the web page displayed on the mobile terminal, setting a layout of the extracted objects, and generating printing data by rendering the objects according to the layout.
Owner:HEWLETT PACKARD DEV CO LP

Method and device for detecting malicious website

The invention provides a method for detecting a malicious website. The method comprises the steps that a target webpage pointed to by a target web address of a target website to which a user has access is acquired, and a contact way in the target webpage is extracted; a malicious contact way database is searched for target information data matched with the contact way in the target webpage, wherein information data determined as a malicious contact way are stored in the malicious contact way database; when the target information data matched with the contact way in the target webpage are acquired, the target website is determined as the malicious website, and the target website is written into a malicious web address database. In addition, the invention provides a device for detecting the malicious website. According to the method and device for detecting the malicious website, the efficiency of detecting the malicious website can be improved.
Owner:TENCENT TECH (SHENZHEN) CO LTD

Data Extraction System, Terminal Apparatus, Program of the Terminal Apparatus, Server Apparatus, and Program of the Server Apparatus

This invention provides a terminal searching for web pages on the web and extracting the prescribed data from the web pages and a server verifying and accumulating the extracted data. The prescribed data can be extracted from the web pages on the web in a manner that the process relating to the data extraction is distributed between the terminal and the server. Therefore, necessary processes up to the data extraction are distributed, and the burden placed on each apparatus can be lessened. Further, new data not formerly found in the web pages can be found out and extracted from the web pages that has been updated or newly made.
Owner:SQUARE ENIX HLDG CO LTD

Passenger protection device

An aspect of the present invention provides a passenger protection device that includes, a brake pedal sensor configured to detect the amount by which a brake pedal of a vehicle is operated, a webbing, one end of the webbing fixed to the vehicle, configured to restrain a passenger seated on a seat of the vehicle, a retractor connected to the other end of the webbing, the rector configured to retract the webbing and inhibit of the webbing extraction, and an ECU electrically coupled to the brake pedal sensor, the ECU configured to detect that the brake pedal operation amount detected by the pedal sensor exceeds a first threshold value, the ECU configured to revise the threshold value based on safety related information of the vehicle to control the inhibition of the webbing extraction of the retractor.
Owner:NISSAN MOTOR CO LTD

Seat belt retractor for the safety belt of a motor vehicle

A seat belt retractor for a safety belt of a motor vehicle having a belt shaft (1) rotatably mounted in a frame, a profile head (2) lockable in relation to the frame, a load limiting device (20) located between the profile head (2) and the belt shaft (1) for enabling the belt shaft (1) to undergo a load limited rotation in the belt webbing extraction direction (A) with the profile head (2) being locked and a load limiting level predetermined by the load limiting device (20) being exceeded.
Owner:AUTOLIV DEV AB

Information retrieval apparatus and computer program

An information retrieval apparatus for retrieving a web page using a retrieval keyword includes a word extract portion for extracting an objective word from the web page on a display based on the specifying a display position of the web page on the display, wherein the word extract portion further extracts a peripheral word placed around the objective word, and a retrieval portion that performs web page retrieval using the objective and peripheral words.
Owner:KDDI CORP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products