Method for criminating electronci file and relative degree with certain field and application thereof
An electronic document and related degree technology, which is applied in the identification of the related degree between electronic documents and a certain field, and in the application field of industry search engines, can solve the problems of no file name, cumbersome analysis, unavailable data flow, etc., and achieve optimal results , the effect of improving execution efficiency
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0073] Calculation method of professional degree
[0074] For example, the industry signature library has the following signatures:
[0075] The industry characteristic code library has an industry characteristic code "XYZ", and the industry characteristic degree corresponding to this characteristic code is "A(0.09), B(0.12), C(0.18), D(0.59), E(0.88), F(0.07 ),..."
[0076] The industry characteristic code library has an industry characteristic code "ACD", and the industry characteristic degree corresponding to this characteristic code is "A(0.08), B(0.22), C(0.38), D(0.77), E(0.28), F(0.09 ),..."
[0077] The industry characteristic code library has an industry characteristic code "ECA", and the industry characteristic degree corresponding to this characteristic code is "A(0.09), B(0.16), C(0.31), D(0.27), E(0.16), F(0.03 ),..."
[0078] The industry characteristic code library has an industry characteristic code "GIHF", and the industry characteristic degree correspondi...
Embodiment 2
[0093] Application of professional degree in document content analysis
[0094] For example, when a computer industry search engine crawls three webpages A, B, and C, it extracts keywords and calculates the industry relevance as follows:
[0095] Page A: computer industry (82.5%), pharmaceutical industry (2.1%), chemical industry (3.2%), agriculture (1.5%)...
[0096] Page B: computer industry (1.2%), pharmaceutical industry (5.5%), chemical industry (22.1%), agriculture (53.9%)...
[0097] Page C: computer industry (3.7%), pharmaceutical industry (77.3%), chemical industry (13.2%), agriculture (11.6%)...
[0098] It can be seen from the results that page A is most related to the computer industry, page B is most related to agriculture, and page C is most related to the pharmaceutical industry. Therefore, page A can be processed as a computer industry web page and stored in a computer database. C is not very relevant to the computer industry. For computer industry search eng...
Embodiment 3
[0101] Application of professional degree in document content indexing
[0102] For example, when a search engine in the pharmaceutical industry indexes the crawled web page caches A, B, and C, it extracts keywords and calculates the industry relevance as follows:
[0103] Page A: computer industry (82.5%), pharmaceutical industry (2.1%), chemical industry (3.2%), agriculture (1.5%)...
[0104] Page B: computer industry (1.2%), pharmaceutical industry (5.5%), chemical industry (22.1%), agriculture (53.9%)...
[0105] Page C: computer industry (3.7%), pharmaceutical industry (77.3%), chemical industry (13.2%), agriculture (11.6%)...
[0106] The level values of A, B, and C pages calculated by conventional methods are 1.232, 0.573, and 1.107 respectively. If professionalism is not introduced, the sorting method is A→C→B. If specialization is introduced and the adjustment factor ε is 0.2, then the page level can be adjusted as follows:
[0107] PR(A)=1.232×0.8+(0.021×0.2×10)...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com