Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

598 results about "Content extraction" patented technology

Content extraction is the task of separating boilerplate such as comments, navigation bars, social media links, ads, etc, from the main body of text of an article formatted as HTML. The main content typically accounts for only a small portion of a page’s source code (highlighted in red in the image below).

Automated world wide web navigation and content extraction

Storage mediums and a computer-implemented method for automating web navigation and content extraction are provided. In particular, a storage medium with program components which are executable through a common application program interface and are utilizable by a developer to write programming instructions is provided. In some cases, the storage medium may include a program component for adaptively navigating through one or more websites and another program component for extracting scripted content from the one or more websites. In addition or alternatively, the storage medium may include a program component for standardizing content on a web page. In some cases, the storage medium may be configured to allow a user to include XPath query language in program instructions written from the storage medium. A storage medium comprising program instructions executable using a processor for performing such functions and a computer-implemented method employing such processes are also provided herein.
Owner:ACTIAN CORP

Method, apparatus and system for capturing and analyzing interaction based content

An apparatus and methods for capturing and analyzing customer interactions the apparatus comprising interaction information units, interaction meta-data information units associated with each of the interaction information units, a rule based analysis engine component for receiving the interaction information, an adaptive database, an interaction capture and storage component for capturing interaction information, a multi segment interaction capture device, an initial set up and calibration device and a pre processing and content extraction device.
Owner:NICE SYSTEMS

Meta-content analysis and annotation of email and other electronic documents

Meta-content analysis and annotation upon the body of email documents, and other electronic documents, and to create a displayable index of these instances of meta-content, which is sorted and annotated by type are provided. In addition, the electronic document is enhanced by providing links for the semantic foci to external documents containing related information. An electronic document adapted for delivery to one or more recipients, the electronic document including a header and a body, is processed by:performing meta-content extraction of semantic foci within said header and said body, the semantic foci comprising a plurality of type of information including one or more of email addresses, URLs, dates, currency values, organization names, names of people, names of places, and phone numbers;creating a meta-content index the document based upon said extracted semantic foci;arranging the meta-index according to said plurality of types;combining said meta-content index with said header and said body to provide an enhanced document; andsending said enhanced document to said one or more recipients via a communication network.The process includes converting the electronic mail document to a markup language format, and wherein said meta-content index comprises one or more objects expressed in said markup language adapted for presentation with body in said enhanced document.
Owner:SAP AMERICA

Systems and methods for content extraction

ActiveUS20070050708A1Maintain informationMaintain usabilityWeb data indexingNatural language data processingAutomatic controlWorld Wide Web
Systems and methods are presented for content extraction from markup language text. The content extraction process may parse markup language text into a hierarchical data model and then apply one or more filters. Output filters may be used to make the process more versatile. The operation of the content extraction process and the one or more filters may be controlled by one or more settings set by a user, or automatically by a classifier. The classifier may automatically enter settings by classifying markup language text and entering settings based on this classification. Automatic classification may be performed by clustering unclassified markup language texts with previously classified markup language texts.
Owner:THE TRUSTEES OF COLUMBIA UNIV IN THE CITY OF NEW YORK

Method for extracting and processing network information and its system

The invention relates to a network information extracting and processing method, adopting artificial intelligence and natural language processing technique, able to automatically download daily up-to-date news and information from named websites, making content extraction, classification, automatic abstracting and retrenching full text, then storing the full text, and then indexing the full text for making high-efficiency full text retrieval in future.
Owner:陈文中

Linguistic extraction of temporal and location information for a recommender system

One embodiment of the present invention provides a system that recommends activities. During operation, the system receives a piece of content obtained from text or converted to text from speech. The system then analyzes the received content to identify any activity type, indication of willingness to participate in any type of activities, and at least one piece of temporal information, which can be implicitly and / or explicitly stated in the content, and / or one piece of location information associated with the activity type. The system further recommends one or more activities, venues, and / or services that afford or support activities for a user based on the information extracted from the content.
Owner:XEROX CORP

Method, apparatus and system for capturing and analyzing interaction based content

An apparatus and methods for capturing and analyzing customer interactions the apparatus comprising interaction information units, interaction meta-data information units associated with each of the interaction information units, a rule based analysis engine component for receiving the interaction information, an adaptive database, an interaction capture and storage component for capturing interaction information, a multi segment interaction capture device, an initial set up and calibration device and a pre processing and content extraction device.
Owner:NICE SYSTEMS

Customer service information providing method and device, electronic equipment and storage medium

The invention provides a customer service information providing method and device, electronic equipment and a storage medium. The method comprises the steps of receiving a Chinese text input by a user; inputting the input Chinese text into a Chinese customer service question-answering model based on a Bi-LSTM (Bidirectional Long Short-Term Memory) model and a CNN (Convolutional Neural Network) model to acquire an answering statement; inputting the input Chinese text into a content extraction and intention classification model based on a Bi-LSTM-CRF (Conditional Random Field) model and an LSTMclassifier to acquire customer intention classification and key information; determining service recommended to a user according to the customer intention classification and the key information; inputting the input Chinese text into a Chinese text emotion analysis model based on the CNN model to acquire a user emotion classification; adjusting the answering statement according to the user emotionclassification; and in combination with the adjusted answering statement and the determined service, providing customer service information to the user. According to the method and device optimizationmodel provided by the invention, the automatic customer service answering is realized.
Owner:上海携程国际旅行社有限公司

Document image information management apparatus and document image information management program

Metadata of document images can be universally handled by dealing with the document images in units of individual regions according to their contents, thereby making it possible to improve convenience for management, search, operation thereof and so on. In order to mange metadata of contents and contexts related to the document images, prescribed image regions are analyzed as image objects based on image contents of the document images, and attribute information is extracted based on contents of the image objects thus analyzed, so that the metadata of the contents thus extracted is managed in association with the document images and the image objects. Also, attribute information is extracted based on a situation of the documents of the document images, so that the metadata of the contexts extracted is managed in association with the document images and the image objects.
Owner:KK TOSHIBA +1

Systems and methods for indexing and searching digital video content

The present invention relates to systems and methods for indexing digital video content maintained on a storage media item. The method of the present invention comprises extracting caption and subtitle content from one or more video object (“VOB”) files maintained on the storage media item. The extracted caption and subtitle content are segmented into one or more segments and video and audio content corresponding to the one or more segments are extracted. Descriptions of the video and audio content corresponding to the segmented caption and subtitle content are generated. The captions, subtitles, descriptions, and corresponding video and audio content associated with the one or more segments of the one or more VOB files are indexed.
Owner:VERIZON PATENT & LICENSING INC

Knowledge management tool

A document processor for use with an indexing application comprising: a content extractor proxy that implements a pre-defined programmatic interface for content extractors; a data store; and an extended document metadata processor; wherein: the content extractor proxy receives a signal from the indexing application identifying a target document; and the document metadata processor creates from the target document extended document metadata for storage in the data store.
Owner:BA INSIGHT

Method for extracting, analyzing and searching network flow and content

The invention discloses a method for extracting, analyzing and searching network flow and content. The method comprises the following steps: shunting original flow into n data processing queues; independently processing an original data message of each data processing queue by the data processing queue, performing protocol recognition and filtration on the message and performing conversation recombination on TCP (Transmission Control Protocol) flow in the message; performing protocol resolving and decoding on a recombined TCP conversation and extracting out structured data information therein; and as for key information specified by requirements, performing searching labeling in data content extracted by a content resolving and extracting module based on a multimode matching algorithm or a search engine technology, and submitting labeling results to a searching labeling information database, thereby providing searching labeling results for multiple modes of applications. The method can be used for solving the problems of repeated data packets, serial number zero adjustment and the like in the TCP conversation recombination, realizing the character labeling for the original flow, and ensuring that a user can acquire effective information conveniently.
Owner:XI AN JIAOTONG UNIV

Method for picking-up, and aggregating micro content of web page, and automatic updating system

A method for picking up and gathering micro-content of web page includes inputting web page address at user end, transmitting legal content to micro-content analysis subsystem of web page at server end then labeling different micro-content block or column as per superchaining group, transmitting labeled html text content back to user end, selecting original micro-content or its father node and adding it on micro-content desk subsystem at user end for finalizing desk arrangement.
Owner:北京中搜云商网络技术有限公司

Display control apparatus, recording media, display control method, and display control program

A display control apparatus is provided for controlling displaying of a play list specifying a reproduction sequence of a plurality of pieces of content. The display control apparatus has a play list feature extraction block configured to extract a feature of a play list on the basis of a plurality of content belonging to the play list, a display pattern selection block configured to select a display pattern for displaying the play list on the basis of the feature of the play list, the feature being extracted by the feature extraction block, and a control block configured to execute control such that the play list is displayed on a display block on the basis of the display pattern selected by the display pattern selection block. The novel configuration provides new ways of enjoying content.
Owner:SONY CORP

Digital media content extraction and natural language processing system

An automated lesson generation learning system extracts text-based content from a digital programming file. The system parses the extracted content to identify one or more topics, parts of speech, named entities and / or other material in the content. The system then automatically generates and outputs a lesson containing content that is relevant to the content that was extracted from the digital programming file.
Owner:WESPEKE

Apparatus and method of delivering content between applications

Disclosed are an apparatus and a method of delivering content between applications. Content which is to be delivered from a source application to a target application may be extracted according to a user input signal, a content type describing object and a content extraction scheme corresponding to a content type. For example, the source application may be a web application including information received through a network, and the target application may be a local application which is executed using information stored in the apparatus, and vice versa.
Owner:SAMSUNG ELECTRONICS CO LTD

Knowledge management tool

A document processor for use with an indexing application comprising: a pre-defined programmatic interface for content extractors; a data store; and an extended document metadata processor; wherein: the content extractor proxy receives a signal from the indexing application identifying a target document; and the document metadata processor creates from the target document extended document metadata for storage in the data store.
Owner:BA INSIGHT

Listed-company announcement classification and abstract generation method based on deep learning

The invention discloses a listed-company announcement classification and abstract generation method based on deep learning. The method comprises the following steps: step 1, acquiring announcement original-text data, extracting text, picture and form information, and establishing structured documents. step 2, establishing a classification rule word library of different announcements on the basis of industry knowledge of announcement fields according to various company operation change event keyword differences, and carrying out statistical judgment on announcement classes; and step 3, for the announcements of the different classes, extracting announcement document contents, combining the rule word library of corresponding class keywords to train an announcement content classification model, and automatically generating document abstract contents, wherein content extraction, training set selection, keyword model optimization, model training, model testing, result analysis and content generation are included. The method can solve technical problems of automatically classifying the announcements for a large amount of announcement information generated each day, automatically extracting key and important information according to classification situations, generating the abstract contents and the like.
Owner:北京文因互联科技有限公司

Method and System for a Speech Synthesis and Advertising Service

Methods and systems for providing a network-accessible text-to-speech synthesis service are provided. The service accepts content as input. After extracting textual content from the input content, the service transforms the content into a format suitable for high-quality speech synthesis. Additionally, the service produces audible advertisements, which are combined with the synthesized speech. The audible advertisements themselves can be generated from textual advertisement content.
Owner:CHEMTRON RES

Webpage text content extracting method and device

The invention discloses a webpage text content extracting method and device. The method comprises the following steps of: acquiring two webpages which belong to a catalogue at the same hierarchy below the same site; for each acquired webpage, respectively executing the following steps of: dividing the webpage into content blocks; determining label density and / or link density of each content block; selecting the content block the label density and / or link density of which meets corresponding preset conditions; extracting the content block with the text content of being not consistent with the text contexts of the content blocks selected from another webpage; and determining the extracted content block as the text content of the webpage. By adopting the technical scheme of the invention, the problem that accuracy is lower when the text content of the webpage is extracted in the prior art can be solved.
Owner:CHINA MOBILE COMM GRP CO LTD

Webpage content extraction forwarding system for mobile communication terminal and application method thereof

InactiveCN101674374ASolve the technical problem of not being able to send to by SMSEasy to shareSubstation equipmentSpecial data processing applicationsHyperlinkText message
The invention relates to the field of a mobile communication equipment terminal, in particular to a browse system for the mobile communication equipment terminal and an application method thereof. Theinvention provides the browse system for the mobile communication equipment terminal, which comprises a browse module, a short message converting module, a shortening module, an identifying module and a skipping module, wherein the browse module is arranged in the mobile communication equipment terminal and used for browsing a page, the short message converting module is arranged in the mobile communication equipment terminal and used for sending a hyperlink by a short message, the shortening module is arranged in the mobile communication equipment terminal and uses a short link for replacingthe hyperlink, the identifying module is arranged on a transferring server and used for transmitting the short link. The browse system transmits the hyperlink to users and friends in a short messagemode, causes the users to conveniently share network resources, solves the technical problem that in the short message, the overlong hyperlink can not be sent by the short message, and causes the users to send various hyperlink by the short message.
Owner:UCWEB

System, method and program for extracting web page core content based on web page layout

The invention provides a system and method for extracting webpage kernel contents, and the system receives HTML documents (web pages) and extracts the kernel contents, and comprises: text block analyzer for using HTML label as delimiter to divide the text fragments in each available basic structure in the input HTML documents into one or plural independent file blocks and in order connecting all the file blocks together to output, where the available basic structure comprises webpage kernel contents; and text block checker for removing the file blocks without the kernel contents and outputting the rest as the webpage kernel contents. The invention determines if each file block contains advertisements and navigation information, thus able to accurately determine the webpage kernel contents and also raises the processing efficiency.
Owner:IBM CN

Method and system for extracting news webpage content using webpage label clustering

The invention provides a method and system for extracting news webpage content by using webpage tag clustering. The method includes: preprocessing the webpage content, including parsing the webpage content into a DOM tree and counting the information of each node of the DOM tree; deleting the nodes of the DOM tree heuristically; deleting the DOM tree according to rules The nodes of the tree; and clustering and deleting the nodes of the DOM tree based on the tag structure, thereby generating a final DOM tree for output.
Owner:BEIJING ZHONGSOU NETWORK TECH

Apparatus and method for sharing social media content

An apparatus for sharing social media content includes a content management unit for, when sharable content is input by a user, extracting profile information about the sharable content by analyzing the sharable content, and generating social media content by associating the extracted profile information about the sharable content with profile information about a user to store the generated social media content in a database. Further, the apparatus for sharing the social media content includes a content searching unit for extracting an initial sample by searching the database based on keywords requested to be searched for in response to a search request of a user, and searching for the sample by comparing profile information about each piece of content included in the initial sample with one of the keywords.
Owner:ELECTRONICS & TELECOMM RES INST

Apparatus and method for displaying multimedia contents

Provided are an apparatus and method of displaying multimedia contents, more particularly, an apparatus and method of displaying stored multimedia contents to accommodate a user's preference using limited buttons of a remote control device or a cellular phone. The apparatus for displaying multimedia contents includes an alignment condition determination unit determining an alignment condition corresponding to a first user command signal among a plurality of alignment conditions, a detailed condition determination unit determining a detailed condition corresponding to a second user command signal among detailed conditions included in the alignment condition, a contents extraction unit extracting first multimedia contents according to the determined detailed condition, and a display unit displaying the determined alignment condition in a first region, the determined detailed condition in a second region, and a second multimedia contents selected by a user among the extracted first multimedia contents in a third region of a screen.
Owner:SAMSUNG ELECTRONICS CO LTD

Inference method and device of MicroBlog user interests

The invention provides a method for establishing a MicroBlog user interest inference model. The method comprises an interest label calculation model, an interest model used for MicroBlog text content extraction and a blogger interest point model used for blogger social relationship extraction, and the three models are fused through a model fusion strategy to obtain the final MicroBlog user interest inference model. The method combines personal information, MicroBlog contents and the social relationship, adopts a USER strategy that all MicroBlog contents of the same blogger are mixed by aiming at the sparsity problem of the MicroBlog contents, mines an implicit theme of the MicroBlog by a LPA (Label propagation algorithm), puts forwards a social label propagation algorithm on the basis of a network formed by blogger attention, and calculates influence on the blogger by various interest labels. The method exhibits good identification capability and information filtering capability, and filters false information to identify false bloggers before recommendation is carried out, so that the recommendation quality and accuracy of a recommendation system can be improved, and better experience is brought for the blogger.
Owner:HARBIN INST OF TECH SHENZHEN GRADUATE SCHOOL

Systems and methods for content extraction

ActiveUS8468445B2Maintain informationMaintain usabilityWeb data indexingNatural language data processingWorld Wide WebHuman language
A content extraction process may parse markup language text into a hierarchical data model and then apply one or more filters. Output filters may be used to make the process more versatile. The operation of the content extraction process and the one or more filters may be controlled by one or more settings set by a user, or automatically by a classifier. The classifier may automatically enter settings by classifying markup language text and entering settings based on this classification. Automatic classification may be performed by clustering unclassified markup language texts with previously classified markup language texts.
Owner:THE TRUSTEES OF COLUMBIA UNIV IN THE CITY OF NEW YORK

Method and device for extracting webpage text content

The invention discloses a method and a device for extracting webpage text content. The method includes steps of dividing a webpage with requirement on text content extraction into different content blocks; executing operations, including determining link text length and non-link text length of the content blocks, to the different divided content blocks respectively; determining the link text density of the corresponding content block according to the determined link text length and non-link text length; and determining that the content blocks are the text content of the webpage when the link text density is not higher than a first specified threshold value. By the method and the device for extracting webpage text content, the problem of low accuracy in webpage text content extraction in the prior art is solved.
Owner:ALIBABA (CHINA) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products