A query method for indefinite words and sentences of evaluation documents based on inverted index

A technology of inverted index and query method, applied in the field of data science

Active Publication Date: 2019-01-29
HARBIN INST OF TECH
View PDF5 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The purpose of the present invention is to overcome the deficiencies of the existing manual retrieval of words and sentences in evaluation documents, and provide a method for querying words and sentences of evaluation documents based on inverted indexes, so that information can be quickly and accurately retrieved from text data and data mining value

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A query method for indefinite words and sentences of evaluation documents based on inverted index
  • A query method for indefinite words and sentences of evaluation documents based on inverted index
  • A query method for indefinite words and sentences of evaluation documents based on inverted index

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment approach

[0075] The review and evaluation data of colleges and universities are mainly text reports in Word and PDF formats, which include quantitative digital evaluations and qualitative text evaluations of the teaching quality of colleges and universities. Among them, the text-based evaluation is the main part of the evaluation report. When looking for common problems and individual problems among universities, it is necessary to search the evaluation data for key words, especially indefinite length words and sentences.

[0076] Execute Step 1: Perform data preprocessing on the evaluation reports to be processed, convert them into plain text format and store them in the same directory, as shown in Table 1.

[0077] Table 1 Sample table of data to be processed

[0078] serial number

file name

file type

file format

File size

1

Jilin Police Academy

Evaluation Documentation

Word file (.doc)

64KB

2

zhejiang foreign language universi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a query method for indefinite words and phrases of evaluation documents based on inverted index, which relates to an index method in the field of data science and a word segmentation method in the field of NLP, and solves the query problem of indefinite words and phrases of evaluation documents. The invention comprises the following steps: 1, data preprocessing is carriedout on the document to be queried, word segmentation is carried out by using jieba word segmentation method, and word dictionary and word frequency information are obtained; 2, based on the inverted index principle of complete reconstruction strategy, an adaptive inverted table is established. 3, combine that information of the indefinite words and sentence to be searched, identifying the indefinite words and phrases position information in each word and phrases in the adaptive inverted table index, and indexing the paragraphs where the indefinite words and phrases are located, so as to complete the query function of the indefinite words and phrases in the evaluation documents. The basic idea of the invention is to divide the text data into words and establish an inverted index so as to realize fast searching for indefinite words and sentences, thereby realizing the inquiry function of evaluation documents. The application scenario is wide, so it has high socio-economic value.

Description

technical field [0001] The invention relates to a data indexing method in the field of data science and a word segmentation method in the field of natural language processing, in particular to a method for querying words and sentences of variable length in evaluation documents based on an inverted index. Background technique [0002] With the explosive growth of the amount of data in the information age, people find that there is a huge value of data hidden behind the massive data, which attracts more and more researchers to study the data. For the data value of structured data, good results can be obtained by applying traditional or modern data mining methods, but for unstructured data, such as the data value of massive evaluation text reports, modern data mining methods and Methods in fields such as natural language processing to extract value from information. Evaluation documents are characterized by the coexistence of digital evaluation and text evaluation, and there a...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/31G06F17/27
CPCG06F40/242G06F40/289
Inventor 沈毅赵虹博杨朔王宏志张淼
Owner HARBIN INST OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products