Discovering terms using statistical corpus analysis

Inactive Publication Date: 2016-04-28

IBM CORP

View PDF13 Cites 180 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Benefits of technology

The present invention provides a method, computer program product, and system for identifying relevant terms in a corpus of text based on their initial context and adding them to a set of category related terms. The system then identifies new terms based on the set of initial contextual characteristics and the set of additions. The technical effect is improved efficiency and accuracy in identifying relevant terms in a text.

Problems solved by technology

Many challenges in NLP involve natural language understanding (that is, enabling computers to derive meaning from human or natural language input).

However, since domain ontologies represent concepts in very specific and often eclectic ways, they are often incompatible.

In the context of NLP, term extraction becomes difficult when the text being processed belongs to a different domain (for example, medical technology) than the domain from which the NLP software was built (for example, financial news).

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

example embodiment

II. Example Embodiment

[0045]FIG. 2 shows flowchart 250 depicting a method according to the present invention. FIG. 3 shows program 300 for performing at least some of the method steps of flowchart 250. This method and associated software will now be discussed, over the course of the following paragraphs, with extensive reference to FIG. 2 (for the method step blocks) and FIG. 3 (for the software blocks).

[0046]The present embodiment refers extensively to a high precision domain lexicon (HPDL). The HPDL (also referred to as a “set of category related terms”) is a collection of terms (words or sets of words) that belong to a specific domain, category, or genre (“domain”). In term extraction, and more generally in natural language processing, the HPDL can serve as an underlying “knowledge base” for a given domain so as to extract more contextually relevant terms from a piece of text (or corpus). In many embodiments of the present invention, the HPDL is used to: (i) extract contextually ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

Software that extracts contextually relevant terms from a text sample (or corpus) by performing the following steps: (i) identifying a first term from a corpus, based, at least in part, on a set of initial contextual characteristic(s), where each initial contextual characteristic of the set of initial contextual characteristic(s) relates to the contextual use of at least one category related term of a set of category related term(s) in the corpus; (ii) adding the first term to the set of category related term(s), thereby creating a revised set of category related term(s) and a set of first term contextual characteristic(s), where each first term contextual characteristic of the set of first term contextual characteristic(s) relates to the contextual use of the first term in the corpus; and (iii) identifying a second term from the corpus, based, at least in part, on the set of first term contextual characteristic(s).

Description

BACKGROUND OF THE INVENTION[0001]The present invention relates generally to the field of natural language processing, and more particularly to “term extraction.”[0002]Natural language processing (NLP) is a field of computer science, artificial intelligence, and linguistics concerned with the interactions between computers and human (natural) languages. As such, NLP is related to the area of human-computer interaction. Many challenges in NLP involve natural language understanding (that is, enabling computers to derive meaning from human or natural language input).[0003]Information Extraction (IE) is a known element of NLP. IE is the task of automatically extracting structured information from unstructured (and / or semi-structured) machine-readable documents. Term Extraction is a sub-task of IE. The goal of Term Extraction is to automatically extract relevant terms from a given text (or “corpus”). Term Extraction is used in many NLP tasks and applications, such as question answering, i...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F17/30

CPCG06F17/30684G06F17/30719G06F17/30705G06F40/30G06F16/3344G06F16/345G06F16/35

InventorAJMERA, JITENDRAPARIKH, ANKUR

OwnerIBM CORP

Discovering terms using statistical corpus analysis

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Benefits of technology

Problems solved by technology

Method used

Image

Examples

example embodiment

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology