Acronym Extraction System and Method of Identifying Acronyms and Extracting Corresponding Expansions from Text

a technology of acronym extraction and extraction method, applied in the field of text analysis and processing systems, can solve the problems of infrequent use or subject specific acronyms, readers may be difficult to understand acronyms, and individuals unfamiliar with certain acronyms tend to have difficulty in understanding acronyms and using acronyms in vocabulary, so as to improve accuracy.

Inactive Publication Date: 2008-02-07
ITT MFG ENTERPRISES LLC
View PDF1 Cites 46 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Benefits of technology

[0014] It is another object of the present invention to employ a bi-directional candidate search technique to select terms near an acronym in order to enhance identification of acronym expansions.
[0016] Still another object of the present invention is to apply rules to identified acronym expansions to verify the validity of those expansions and enhance accuracy.

Problems solved by technology

Acronyms may present challenges to readers in several manners.
In particular, individuals unfamiliar with a certain acronym tend to have difficulty understanding the acronym and using the acronym in vocabulary.
For example, commonly known acronyms, such as “LASER” and “CDROM”, are widely understood, while infrequently used or subject specific acronyms may be difficult for readers to understand (e.g., “AABFS” for Amphibious Assault Bulk Fuel System).
This process may be performed manually; however, preparing acronym lists with corresponding expansions in this fashion becomes prohibitive due to the effort required and is prone to errors.
In particular, the AcronymFinder system is highly inefficient due to the list being generated by manual submissions.
Further, the list is typically generic and static and may not suit or be tailored to various needs of particular organizations.
Although the above-described systems extract acronyms and corresponding expansions, the results produced by these systems have limited accuracy.
This tends to frustrate readers since the systems may omit acronyms within text or provide incorrect expansions for the acronyms, thereby requiring the reader to perform an additional task of ascertaining the correct expansion in another manner (e.g., manually).

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Acronym Extraction System and Method of Identifying Acronyms and Extracting Corresponding Expansions from Text
  • Acronym Extraction System and Method of Identifying Acronyms and Extracting Corresponding Expansions from Text
  • Acronym Extraction System and Method of Identifying Acronyms and Extracting Corresponding Expansions from Text

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0033] The acronym expansion system or tool of the present invention basically receives ASCII or plain text documents and extracts acronyms (e.g., including phrasal abbreviations, such as “SYSAD” for System Administration) and corresponding expansions. The acronym expansion tool may process text contained in documents of other formats (e.g., .pdf, HTML, etc.); however, these documents are typically converted to plain text format for processing. The acronym expansion tool is preferably implemented by a computer system as illustrated, by way of example only, in FIG. 1. Specifically, the computer system is typically implemented by a conventional personal or other suitable computer system or workstation preferably equipped with a display or monitor 2, a base 4 (e.g., including the processor, memories and / or internal or external communications devices (e.g., modem, network cards, etc.)), a keyboard 6 and optional mouse 8 or other input device. The computer system includes software (e.g.,...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

An acronym expansion system of the present invention receives electronic documents and extracts acronyms and their corresponding expansions. A part-of-speech tagger decomposes text into string tokens or words and tags them with their part-of-speech, while an acronym identifier determines whether a word is a potential acronym based on various conditions. An expansion identifier retrieves lists of words preceding and following a potential acronym to search for the expansion. The resulting word lists are examined sequentially to identify and retrieve an expansion for the potential acronym. An expansion extractor receives the potential acronym and a processed word list to retrieve the expansion of the potential acronym from that list. The extractor may utilize information from prior search iterations, and verifies an extracted expansion against a set of rules to remove spurious expansions.

Description

BACKGROUND OF THE INVENTION [0001] 1. Technical Field [0002] The present invention pertains to text analysis and processing systems. In particular, the present invention pertains to a system that identifies acronyms and extracts the appropriate acronym expansion from text. [0003] 2. Discussion of Related Art [0004] An acronym is a word that is formed from the initial letter or letters of each component of a compound term (e.g., NATO, RADAR, SNAFU, etc.), while an abbreviation is a shortened form of a written word or phrase that is used or substituted for the whole word (e.g., “amt” is an abbreviation for amount). Acronyms and abbreviations tend to overlap and are frequently used in daily verbal discourse, in written documents and in electronic documents and web pages on the Internet. In certain communities (e.g., military, engineering, medicine, etc.), numerous acronyms are employed constantly. For example, a page of a military document commonly includes in excess of ten acronyms. [...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(United States)
IPC IPC(8): G06F17/27
CPCG06F17/277G06F40/284
Inventor GUPTA, KALYAN M.
Owner ITT MFG ENTERPRISES LLC
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products