Patents
Literature
Patsnap Copilot is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Patsnap Copilot

5952 results about "Information extraction" patented technology

Information extraction (IE) is the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents. In most of the cases this activity concerns processing human language texts by means of natural language processing (NLP). Recent activities in multimedia document processing like automatic annotation and content extraction out of images/audio/video/documents could be seen as information extraction Due to the difficulty of the problem, current approaches to IE focus on narrowly restricted domains.

Visual and interactive wrapper generation, automated information extraction from web pages, and translation into xml

A method and a system for information extraction from Web pages formatted with markup languages such as HTML [8]. A method and system for interactively and visually describing information patterns of interest based on visualized sample Web pages [5,6,16-29]. A method and data structure for representing and storing these patterns [1]. A method and system for extracting information corresponding to a set of previously defined patterns from Web pages [2], and a method for transforming the extracted data into XML is described. Each pattern is defined via the (interactive) specification of one or more filters. Two or more filters for the same pattern contribute disjunctively to the pattern definition [3], that is, an actual pattern describes the set of all targets specified by any of its filters. A method and for extracting relevant elements from Web pages by interpreting and executing a previously defined wrapper program of the above form on an input Web page [9-14] and producing as output the extracted elements represented in a suitable data structure. A method and system for automatically translating said output into XML format by exploiting the hierarchical structure of the patterns and by using pattern names as XML tags is described.
Owner:LIXTO SOFTWARE

Wrapper induction by hierarchical data analysis

An inductive algorithm, denominated STALKER, generating high accuracy extraction rules based on user-labeled training examples. With the tremendous amount of information that becomes available on the Web on a daily basis, the ability to quickly develop information agents has become a crucial problem. A vital component of any Web-based information agent is a set of wrappers that can extract the relevant data from semistructured information sources. The novel approach to wrapped induction provided herein is based on the idea of hierarchical information extraction, which turns the hard problem of extracting data from an arbitrarily complex document into a series of easier extraction tasks. Labeling the training data represents the major bottleneck in using wrapper induction techniques, and experimental results show that STALKER performs significantly better than other approaches; on one hand, STALKER requires up to two orders of magnitude fewer examples than other algorithms, while on the other hand it can handle information sources that could not be wrapped by prior techniques. STALKER uses an embedded catalog formalism to parse the information source and render a predictable structure from which information may be extracted or by which such information extraction may be facilitated and made easier.
Owner:UNIV OF SOUTHERN CALIFORNIA +1

Dynamic information extraction with self-organizing evidence construction

A data analysis system with dynamic information extraction and self-organizing evidence construction finds numerous applications in information gathering and analysis, including the extraction of targeted information from voluminous textual resources. One disclosed method involves matching text with a concept map to identify evidence relations, and organizing the evidence relations into one or more evidence structures that represent the ways in which the concept map is instantiated in the evidence relations. The text may be contained in one or more documents in electronic form, and the documents may be indexed on a paragraph level of granularity. The evidence relations may self-organize into the evidence structures, with feedback provided to the user to guide the identification of evidence relations and their self-organization into evidence structures. A method of extracting information from one or more documents in electronic form includes the steps of clustering the document into clustered text; identifying patterns in the clustered text; and matching the patterns with the concept map to identify evidence relations such that the evidence relations self-organize into evidence structures that represent the ways in which the concept map is instantiated in the evidence relations.
Owner:TECHTEAM GOVERNMENT SOLUTIONS

Medical information extraction system and method based on depth learning and distributed semantic features

ActiveCN105894088AAvoid floating point overflow problemsHigh precisionNeural learning methodsNerve networkStudy methods
he invention discloses a medical information extraction system and method based on depth learning and distributed semantic features. The system is composed of a pretreatment module, a linguistic-model-based word vector training module, a massive medical knowledge base reinforced learning module, and a depth-artificial-neural-network-based medical term entity identification module. With a depth learning method, generation of the probability of a linguistic model is used as an optimization objective; and a primary word vector is trained by using medical text big data; on the basis of the massive medical knowledge base, a second depth artificial neural network is trained, and the massive knowledge base is combined to the feature leaning process of depth learning based on depth reinforced learning, so that distributed semantic features for the medical field are obtained; and then Chinese medical term entity identification is carried out by using the depth learning method based on the optimized statement-level maximum likelihood probability. Therefore, the word vector is generated by using lots of unmarked linguistic data, so that the tedious feature selection and optimization adjustment process during medical natural language process can be avoided.
Owner:神州医疗科技股份有限公司 +1
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products