Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method for criminating electronci file and relative degree with certain field and application thereof

An electronic document and related degree technology, which is applied in the identification of the related degree between electronic documents and a certain field, and in the application field of industry search engines, can solve the problems of no file name, cumbersome analysis, unavailable data flow, etc., and achieve optimal results , the effect of improving execution efficiency

Inactive Publication Date: 2008-11-12
白云
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] At present, there are technologies to search and classify documents through information such as file names, directory names, and file attributes. However, this file-name-oriented query method cannot provide search for specific topics, and cannot mine files whose file names cannot represent the content of the file. There are too many requirements for understanding the query target. At the same time, data streams without file names and file attributes such as web pages captured by search engine crawlers cannot be classified by this technology.
[0004] There are also technologies that can parse the content of the document and extract the summary of the document, but the cost of this method is extremely high, and its cumbersome analysis makes the entire data processing process very long, which seriously affects the efficiency of automatic program execution
At the same time, this technique cannot quantify the professional relevance of documents or data streams
[0005] In addition, for information search engines such as Google, when building an index, the page rank (PR) is mainly evaluated based on the number of backlinks on the page, which is one of the main basis for the ranking algorithm. However, most professional information pages are often located in the deeper layers of the website. , and there are fewer backlinks. Therefore, professional content is often not high in web page level, and may even not be included because of the low level. This kind of algorithm will inevitably reduce the quality of search results
[0006] At present, there is no prior art that discloses the quantitative calculation method of the professional degree, and does not apply the professional degree to the field of document classification and / or document category identification and / or blocking engines of industry search engines. Content allowed by category information and / or blocking policies is provided to end users as a result of a content search, which has low accuracy and does not implement optimal blocking policies

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for criminating electronci file and relative degree with certain field and application thereof
  • Method for criminating electronci file and relative degree with certain field and application thereof
  • Method for criminating electronci file and relative degree with certain field and application thereof

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0073] Calculation method of professional degree

[0074] For example, the industry signature library has the following signatures:

[0075] The industry characteristic code library has an industry characteristic code "XYZ", and the industry characteristic degree corresponding to this characteristic code is "A(0.09), B(0.12), C(0.18), D(0.59), E(0.88), F(0.07 ),..."

[0076] The industry characteristic code library has an industry characteristic code "ACD", and the industry characteristic degree corresponding to this characteristic code is "A(0.08), B(0.22), C(0.38), D(0.77), E(0.28), F(0.09 ),..."

[0077] The industry characteristic code library has an industry characteristic code "ECA", and the industry characteristic degree corresponding to this characteristic code is "A(0.09), B(0.16), C(0.31), D(0.27), E(0.16), F(0.03 ),..."

[0078] The industry characteristic code library has an industry characteristic code "GIHF", and the industry characteristic degree correspondi...

Embodiment 2

[0093] Application of professional degree in document content analysis

[0094] For example, when a computer industry search engine crawls three webpages A, B, and C, it extracts keywords and calculates the industry relevance as follows:

[0095] Page A: computer industry (82.5%), pharmaceutical industry (2.1%), chemical industry (3.2%), agriculture (1.5%)...

[0096] Page B: computer industry (1.2%), pharmaceutical industry (5.5%), chemical industry (22.1%), agriculture (53.9%)...

[0097] Page C: computer industry (3.7%), pharmaceutical industry (77.3%), chemical industry (13.2%), agriculture (11.6%)...

[0098] It can be seen from the results that page A is most related to the computer industry, page B is most related to agriculture, and page C is most related to the pharmaceutical industry. Therefore, page A can be processed as a computer industry web page and stored in a computer database. C is not very relevant to the computer industry. For computer industry search eng...

Embodiment 3

[0101] Application of professional degree in document content indexing

[0102] For example, when a search engine in the pharmaceutical industry indexes the crawled web page caches A, B, and C, it extracts keywords and calculates the industry relevance as follows:

[0103] Page A: computer industry (82.5%), pharmaceutical industry (2.1%), chemical industry (3.2%), agriculture (1.5%)...

[0104] Page B: computer industry (1.2%), pharmaceutical industry (5.5%), chemical industry (22.1%), agriculture (53.9%)...

[0105] Page C: computer industry (3.7%), pharmaceutical industry (77.3%), chemical industry (13.2%), agriculture (11.6%)...

[0106] The level values ​​of A, B, and C pages calculated by conventional methods are 1.232, 0.573, and 1.107 respectively. If professionalism is not introduced, the sorting method is A→C→B. If specialization is introduced and the adjustment factor ε is 0.2, then the page level can be adjusted as follows:

[0107] PR(A)=1.232×0.8+(0.021×0.2×10)...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

This invention discloses a judgment method for electronic document and related degree of a field and uses the technical degree in a search tool and / or engine to get the effect of sorting file information and / or filtration and / or blockage including: carrying out searching for the key phrases and key words pick up by a search tool or engine system in a speciality character code library / blockage character code library to get a corresponding preset trade character degree for weighted average then to multiply a trade character ratio to get a speciality degree to be used in the search field to increase the executing efficiency and quality of search result or provide the search result meeting the blockage strategy or specific kind.

Description

Technical field: [0001] The invention relates to the fields of indexing, searching and classification of computer electronic documents, especially a method for judging the degree of correlation between electronic documents and a certain field and its application in the field of industry search engines. Background technique: [0002] With the development of computer and Internet information technology, electronic documents such as text and multimedia content used in the Internet and other data networks and systems have increased rapidly. Currently, the management of electronic documents such as text and multimedia content used in the Internet and other data networks and systems Search and search mainly rely on text and keyword-based search tools or engines to search for the required information. Generally, existing search tools or engines do not classify and store such data, which greatly reduces the execution efficiency and the quality of execution results of search tools or...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
Inventor 白云刘圣何顺超
Owner 白云
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products