Unlock instant, AI-driven research and patent intelligence for your innovation.

Apparatus and Method of Implementing Batch-Mode Active Learning for Technology-Assisted Review of Documents

a technology of active learning and document review, applied in the field of electronic document review, can solve the problems of difficult task of designing such a criterion, unreasonable retraining of classifiers at every iteration, and general cost of methods

Pending Publication Date: 2022-05-26
LEGILITY DATA SOLUTIONS LLC
View PDF0 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Due to the positive feedback in favor of responsiveness, such methods yield high recall, but their learning models more often suffer from a self-fulfilling prophecy and the quality of such methods depends strongly on the initial batch of instances.
But, due to practical consideration, it is unreasonable to retrain the classifier at every iteration with only one additional training sample.
But, these methods are generally costly, and their applicability in the legal domain where the dimension of the feature space is substantially large (several million) have yet to be explored.
Designing such a criterion is a difficult task.
However, these are offline metrics that are not integrated within the learning framework.
Such methods fail to shift the initial hyperplane towards the ideal hyperplane because every iteration selects instances that are closest to the prevailing hyperplane without any exploration.
Thus, they perform poorly if the initial hyperplane is far-off from the ideal hyperplane.
So, such methods perform poorly on such datasets.
Such a parameter is hard to choose.
The primary motivation for having a stopping condition is to stop training as early as possible (training is costly).
However, recall does not provide any indication of the attorneys' effort for labeling the train dataset.
In most real-life legal datasets, the document collection contains several issues that impact the performance of any classification technique.
. . . . Without any preprocessing, these files cause computation times and storage volumes that exceed acceptable levels.
Another significant challenge with real-life legal datasets that is unaddressed in the existing literature is that document collections are rarely fixed.
Specifically, any term weighting scheme with a global weighting component (e.g., Term Frequency-Inverse Document Frequency (TF-IDF)) could result in feature vectors changing over the course of the learning task with unstudied effects on the active learning process.
In addition, iControlESI® has performed experiments with TF-IDF, Latent Semantic Analysis (LSA), Latent Dirichlet Allocation (LDA), and Log-Entropy feature weighting schemes and found them to perform no better than the standard bag-of-words model.
One other potential feature selection scheme is the hashing trick, which would be suitable since it has no global component, but it was also found to perform worse than the bag-of-words model.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Apparatus and Method of Implementing Batch-Mode Active Learning for Technology-Assisted Review of Documents
  • Apparatus and Method of Implementing Batch-Mode Active Learning for Technology-Assisted Review of Documents
  • Apparatus and Method of Implementing Batch-Mode Active Learning for Technology-Assisted Review of Documents

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030]To describe the technical features of the present disclosure, a discussion is provided first to describe details about an apparatus and method for implementing a batch-mode active learning for technology-assisted review (TAR) of documents in accordance with an embodiment of the present disclosure (see FIGS. 1-5). Thereafter, a discussion is provided to explain in detail the various operations-steps implemented by the apparatus and method in accordance with embodiments of the present disclosure (see FIGS. 6-11) (note: these particular discussions are based on the teachings of the first U.S. Provisional Application No. 62 / 288,660). Then, a discussion is provided to describe details about an apparatus and method for implementing a more generalized version of batch-mode active learning for TAR of documents in accordance with an embodiment of the present disclosure. Thereafter, a discussion is provided to explain in detail the various operations-steps implemented by the more genera...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present disclosure relates to the electronic document review field and, more particularly, to various apparatuses and methods of implementing batch-mode active learning for technology-assisted review (TAR) of documents (e.g., legal documents).

Description

CLAIM OF PRIORITY[0001]This application claims the benefit of priority to U.S. Provisional Application No. 62 / 288,660, filed on Jan. 29, 2016, and to U.S. Provisional Application No. 62 / 246,719, filed on Oct. 27, 2015, the entire contents of each of these applications are hereby incorporated by reference for all purposes.RELATED PATENT APPLICATION[0002]This application is related to the co-filed U.S. application Ser. No. ______, entitled “Apparatus and Method of Implementing Enhanced Batch-Mode Active Learning for Technology-Assisted Review of Documents” (Docket No. WJT018-0002). The entire contents of this document are hereby incorporated herein by reference for all purposes.TECHNICAL FIELD[0003]The present disclosure relates to the electronic document review field and, more particularly, to various apparatuses and methods of implementing batch-mode active learning for technology-assisted review (TAR) of documents (e.g., legal documents).BACKGROUND[0004]The following terms are here...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(United States)
IPC IPC(8): G06N20/00G06N7/00G06F16/35G06N20/10G06N3/08
CPCG06N20/00G06N7/005G06N3/08G06N20/10G06F16/35G06N7/01
Inventor JOHNSON, JEFFREY A.HABIB, MD AHSANBURGESS, CHANDLER L.SAHA, TANAY KUMARHASAN, MOHAMMAD AL
Owner LEGILITY DATA SOLUTIONS LLC