Method for rapidly establishing full-text retrieval tool for common files

A full-text indexing and full-text technology, applied in semi-structured data retrieval, special data processing applications, semi-structured data mapping/conversion, etc., can solve problems such as difficulties in the completion process, database performance limitations, and database incompleteness, and achieve Easily manageable effects

Inactive Publication Date: 2015-03-04
LANGCHAO ELECTRONIC INFORMATION IND CO LTD
View PDF3 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] Now many users' search needs are still in the database stage, but in the case of heavy search task load, the performance of the database is also limited
Moreover, for the search of the content of a large number of files, the database is almost impossible to complete, or the completion process is quite difficult. It is a good choice to choose a mature open source search engine as the core to build a search tool that can be used by users. , but the construction of a practical full-text search tool is very complicated, and there is basically no unified and simple construction method. The present invention provides a method for quickly building a full-text search tool for commonly used documents. Based on the common file search tool of the open source search engine solr, the Files are stored in the search engine, and a full-text index is constructed for it, and all relevant content can be quickly retrieved according to the search keywords, and finally presented to the user

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for rapidly establishing full-text retrieval tool for common files

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0027] The document parsing module is responsible for parsing files;

[0028] The Chinese word segmentation module is responsible for using the Chinese word segmentation algorithm to perform full-text word segmentation of the file content in order to establish a full-text index;

[0029] The full-text index building module is responsible for full-text indexing of the words after the word segmentation of the Chinese word segmentation module;

[0030] The full-text index library is responsible for data storage;

[0031] The retrieval module is responsible for various retrievals of users.

[0032] A method for quickly building a full-text search tool for commonly used documents, the specific steps are as follows

[0033] ①The document parsing module reads the word file and converts it into XML format after parsing, and parses each file into two attributes, which are the file name of the file and the full-text content of the file, where the file name includes the absolute path o...

Embodiment 2

[0040] The document parsing module is responsible for parsing files;

[0041] The Chinese word segmentation module is responsible for using the Chinese word segmentation algorithm to perform full-text word segmentation of the file content in order to establish a full-text index;

[0042] The full-text index building module is responsible for full-text indexing of the words after the word segmentation of the Chinese word segmentation module;

[0043] The full-text index library is responsible for data storage;

[0044] The retrieval module is responsible for various retrievals of users.

[0045] A method for quickly building a full-text search tool for commonly used documents, the specific steps are as follows

[0046] ①The document parsing module reads the PDF file and converts it into XML format after parsing, and parses each file into two attributes, which are the file name of the file and the full text content of the file, where the file name includes the absolute path of...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for rapidly establishing a full-text retrieval tool for common files, belonging to the field of retrieval tools. The method specifically comprises the steps that: (1) a document analysis module reads all the file analysis HTTP requests and sends the requests to a Chinese words segmentation module; (2) the Chinese words segmentation module segments attribute content in the received HTTP requests; (3) a full-text retrieval establishment module customizes a retrieval service type; (4) a retrieval module after analyzing a retrieval command performs corresponding operation and finishes establishment of the retrieval tool; (5) after a user submits search terms, the retrieval module performs word segmentation treatment on the search terms, generates an inquiry request and inquires in an index library and takes on the inquired result to the user. The method for rapidly establishing the full-text retrieval tool for common files realizes establishment of search engine dedicated to personnel and enterprises, personal retrieval requirements can be realized only by taking relatively little time and effort, and a plenty of internal files can be managed easily.

Description

technical field [0001] The invention discloses a method for rapidly constructing a retrieval tool, which belongs to the field of retrieval tools, in particular to a method for rapidly constructing a full-text retrieval tool for commonly used documents. Background technique [0002] Full-text search is a search to find any content information in the stored entire book or article. It can obtain relevant chapters, sections, paragraphs, sentences, words and other information in the full text as needed, which means that it is similar to adding a label to each word in the entire book, and can also perform various statistics and analysis. Solr is an independent enterprise-level search application server that provides an API interface similar to Web-service. Users can submit XML files in a certain format to the search engine server through http requests to generate indexes; they can also submit search requests through Http Get operations and get returned results in XML format. [...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/27
CPCG06F16/8358G06F16/838G06F16/84
Inventor 刘粉粉
Owner LANGCHAO ELECTRONIC INFORMATION IND CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products