Crowd portrait storage and orientation system and method under big data high concurrency

A directional system and big data technology, applied in database indexing, database updating, structured data retrieval, etc., can solve problems such as inability to make full use of multi-core servers, inability to retain historical data, main server service interruption, etc., and achieve flexible loading and release. memory space, achieve fast and accurate, application reduction effect

Pending Publication Date: 2021-08-24
苏州合数科技有限公司
0 Cites 0 Cited by

AI-Extracted Technical Summary

Problems solved by technology

What's even more frightening is that the full synchronization of a large amount of data will cause the main server to occupy a large amount of CPU and memory resources, causing the service to be interrupted and the server unable to respond to requests
Again, redis is single-threaded, and a single server cannot make full use of the CPU of a multi-core s...
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Method used

This system carries out bit mapping based on the self-defining cache system based on file-based shared memory, the message processing management system, and the user identification ID is converted into 64 hash values ​​for storage, multi-hash functions, and circulation The combination of comparison, compression writing to avoid collisions, and the reading process of orientation function modules based on bit operations and check values ​​form a complete set of storage and orientation systems for crowd portraits. This system has strong robustness, advanced nature, versatility, good maintainability and ease of use. Using this system, not only can meet the bidding system orientation function and crowd portrait cache storage ...
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Abstract

The invention discloses a crowd portrait storage and orientation system and method under big data high concurrency. The crowd portrait storage and orientation system is based on a user-defined cache system of a shared memory of a file and a message processing management system, performs bit mapping based on crowd portrait data, converts a user identifier (ID) into a 64-bit hash value for storage, has multiple hash functions, performs cyclic comparison, and performs compression writing for avoiding collision. Based on the combination of the bit operation and the read process of an orientation function module of the check value, a set of complete crowd portrait storage and orientation system is formed. The system has the advantages of strong robustness, advancement, versatility, good maintainability and usability.

Application Domain

Technology Topic

Image

  • Crowd portrait storage and orientation system and method under big data high concurrency
  • Crowd portrait storage and orientation system and method under big data high concurrency
  • Crowd portrait storage and orientation system and method under big data high concurrency

Examples

  • Experimental program(1)

Example Embodiment

[0024] Example:
[0025] like figure 1 As shown, the crowd portrait storage and orientation system under high concurrency of big data, the output end of the crowd portrait storage and orientation system is connected to the DSP bidding server, and the input end is respectively connected to the DSP bidding server, the distributed message system server and the DMP server. The storage and orientation system includes a crowd portrait cache system, a reading system, a compression writing system, and a message processing management system.
[0026] The method of crowd portrait storage and orientation under the high concurrency of big data, through the segmentation of the crowd portrait data, block identification formed by block calculation, load or create shared memory files, create file-based shared memory, and generate custom cache , to map the shared memory into the cache. The self-defined cache starts timer automatic management, checks the update status by blocks at regular intervals, if the update status is full update, loads the shared memory file corresponding to the block identifier, opens up a shared memory space, loads the data into the shared memory, and then The shared memory map of the new block is updated to the custom cache, and the shared memory corresponding to the old block is released; if not, check whether the shared memory map of the block exists, and if not, load the shared memory corresponding to the block ID File, open up shared memory space, load data into shared memory, map shared memory to custom cache; if it exists, proceed to the next one; loop check until the last block is checked; incremental addition and The update is completed through the message processing system. The message processing system manages multiple compression writing processes. When an exception occurs in the writing process, the machine corresponding to the process is set as unreadable. When data is added and updated, compress and write process data for fragmentation processing, and then use the hash function to generate a 32-bit hash value for the user ID, and obtain the linked list data in the cache address through the hash value. First use the first hash function to generate a 64-bit hash value for the user ID, and then compare it with each 64-bit hash value of the crowd portrait data in the linked list. If they are equal, use the next hash function to generate a hash value. Compare again. If there is no equal, then compare the serial number of the hash function used, whether it is less than the largest serial number of the hash function used in the linked list, if it is smaller than the largest serial number, then use the next hash function to generate a hash value, and then proceed Compare until there is no equality at all (ensure the uniqueness of each user ID hash value). When all are not equal, take the crowd portrait data, map the data and bits one by one, and then generate the data according to the rules by using the function value of the 64-bit user ID, the mapping value of the crowd portrait and the serial number of the hash function used , stored at the end of the linked list, and written into the cache. And refresh the updated data in the shared memory to the file corresponding to the shared memory on the hard disk. When the bidding system performs user orientation, it first calls the orientation function module through the user ID. The orientation function module queries the crowd portrait Bloom filter and asks whether the user ID uses group portraits. If so, the orientation function module calculates the fragmentation Group information, to determine that the server that accesses that group of cache servers is readable, and a reading process connected to the cache server takes out the corresponding shared memory map from the custom cache according to the group identifier. Then use the hash function to generate a 32-bit hash value for the user ID, and obtain the linked list data in the cache address through the hash value. Traverse the linked list data, take out the hash function value and its serial number of each data, use the hash function of the serial number to generate a 64-bit hash function value for the user ID, and compare it with the hash function value. If they are equal, judge the collision flag at the head of the linked list, and if there is no collision, directly return the mapping value of the crowd portrait. If there is a collision, save the crowd portrait data in the result array, and then compare the next data until all the data in the linked list is traversed. Then check the result array, if there are more than one, take the crowd portrait mapping value of the data with the largest hash function serial number, and then return the result to the orientation function module, the orientation function module temporarily saves the mapping value of the crowd portrait and matches the order requirements The mapping value of the crowd portrait is performed with bit AND operation, and the result value of the bit AND operation is calculated as a verification value, and then the verification value is compared with the directional value required by the order, and if it matches, it is returned to the bidding system. And based on the above method, a system for storing and orienting crowd portraits under high concurrency of big data is invented. Through the custom cache based on shared memory files, the segmentation of crowd portrait data is realized, and by loading and unloading of the chunks, the It not only greatly reduces the occupation of machine resources, but also completely avoids the impact of full update on the system. Incremental addition and update management through the message processing module realizes the timeliness and accuracy of online addition and update. Through compression and writing, the crowd portrait data is bit-mapped, and the user ID is stored as a hash value. The memory usage is greatly reduced, which not only saves a large number of servers and costs, but also reduces the complexity of operation and maintenance, and improves the efficiency and speed of queries. Through the orientation function module, the orientation requirements of the order and the bit operation of the mapping value of the crowd portrait data are realized. The introduction of the verification value can not only achieve accurate judgment but also greatly speed up the orientation speed, making user orientation faster and more accurate. The whole system solves the shortcomings and deficiencies of existing distributed caches and the difficulties of crowd portrait storage and orientation, and is very suitable for big data and high concurrency environments. Fully meet the needs of crowd portrait storage and orientation.
[0027] like figure 2 As shown, the system is divided into DSP bidding server cluster, log processing server cluster, crowd portrait DMP (data management platform) server cluster, distributed message system cluster, crowd portrait storage system cluster and other parts. The DSP bidding server calls the orientation function module to send information to the crowd portrait storage and orientation system server, inquires about the user’s crowd portrait data information, and the crowd portrait storage and orientation system server returns the user’s crowd portrait data information through the corresponding algorithm, and the orientation After a series of judgments, the functional module returns orders that meet the targeting requirements to the bidding server, and the bidding server goes through a series of judgments and returns the creative materials and bidding prices of the orders that meet the requirements to the media server. The dsp bidding server sends the received bidding request information to the log processing server. The log processing server formats the information and sends it to the DMP (data management platform) server. The DMP server performs a series of processing in combination with other data sources, and produces new Send the crowd portrait data or updated data to the distributed message system server, and add or update its own crowd portrait database. The message processing system of the crowd portrait storage system server processes the crowd portrait data, and writes the processed data into the custom cache. The DMP server regularly generates memory shared files of the full amount of crowd portrait blocks, and sends them to the crowd portrait storage and orientation system server. The storage of crowd portraits and orientation system server start loading or regular update.
[0028] like image 3 As shown, the storage of crowd portraits and the realization principle of the directional custom cache system
[0029] 1. When the system starts, the custom cache system is initialized, the socketserver is started, and the listening port is started.
[0030] 2. The startup information initializes the timer thread, and the timer executes every 10 minutes.
[0031] 3. Determine whether the initialization information is updated. If not, do nothing and wait for the next task to be executed.
[0032] 4. If there is an update, start initialization and obtain initialization information.
[0033] 5. If the acquisition fails, judge the number of times to re-acquire the information. If it is less than 3 times, wait for 10 seconds. If it is less than 10 or about 3 times, wait for 5 minutes before re-acquiring the information. If it is more than 10 times, exit the information acquisition, send a message to the monitoring system, exit the information initialization, and wait for the next task execution.
[0034] 6. If the acquisition is successful, it is judged whether the custom cache exists, and if it does not exist, the cache is created.
[0035] 7. If the cache exists, obtain machine group information and shared memory file information.
[0036] 8. Take a block information identifier and initialize information for the block information identifier.
[0037] 9. Determine whether the shared cache mapping already exists in the custom cache, and if so, go to the next one.
[0038] 10. If it does not exist, initialize the shared memory information.
[0039] 11. Determine whether there is a shared memory file, if not, generate a shared memory file, save it on the disk, and open up a shared memory space.
[0040] 12. If it exists, judge whether the shared memory space has been opened up. If so, map the shared memory to the custom cache, and cycle through steps 8 to 11.
[0041] 13. If not opened, then open up a shared memory space. Load the contents of the memory file into shared memory.
[0042] 14. Check the update mark of the custom block to see if it is a full update.
[0043] 15. If it is a full update, update the shared memory mapping of the new block to the custom cache, and release the shared memory corresponding to the old block.
[0044] 16. Otherwise, update the shared memory mapping of the block into the custom cache.
[0045] 17. The steps from 8 to 16 are repeated until all orders are processed.
[0046] 18. The information update initialization is completed, waiting for the next task execution.
[0047] 19. The next task executes the steps from cycle 3 to cycle 18.
[0048] like Figure 4 As shown, the storage of crowd portraits and the implementation principle of directional bit mapping compression
[0049] 1. When the DMP has new or updated crowd portrait data information, it sends the formatted crowd portrait data information to the distributed message system server.
[0050] 2. Crowd portrait storage and directional system compression process reads a piece of information from the distributed message system and parses the data.
[0051] 3. Calculate the grouped machine information and block information by analyzing the user ID in the data.
[0052] 4. Obtain the shared memory mapping of the identifier in the custom cache through the block identifier.
[0053] 5. Determine whether there is a shared memory mapping in the custom cache, if not, send the information to the monitoring system, end the processing, and take out the next message from the distributed message system for processing.
[0054] 6. If it exists, use the user ID to generate a 32-bit hash value through the hash function.
[0055]7. Take the remainder of the hash value and the maximum storage number of the block, calculate the position of the hash value in the cache, and take out the position of the crowd portrait data (20 bytes) in the cache from the cache. That is, the head of the linked list (the 64-bit hash value of the user ID#The mapping value of the crowd portrait#The hash function is used#The storage address of the next data in the linked list#The block where the next data in the linked list is located, the number of bytes is 8# 6#1#4#1 total 20 pieces).
[0056] 8. Take out the 15th byte from the 20 bytes, which is the mark of "which hash function is used", and the serial number of the hash function. (In the head data of the linked list, this mark indicates three situations. When it is 0, it means that the head of the linked list does not store crowd portrait data. When it is 1, it means that the head of the linked list has stored data and the 64-bit hash value in the linked list has no collision. When it is When 2, it means that the head of the linked list has stored data and the 64-bit hash value in the linked list has a collision.)
[0057] 9. Calculate the decimal array value of the 15th byte.
[0058] 10. Determine whether the value of the 15th byte is greater than 0. If it is equal to 0, use the first 64-bit hash function to generate the hash of the ID and store it in 1-8 bytes of the 20 bytes.
[0059] 11. Take out 6 categories and 45 states of the crowd portrait, corresponding to the last 45 bits of 9-14 bytes, 1 means yes, 0 means no, and store them in 9-14 bytes of the 20 bytes.
[0060] 12. Set the 15th byte to decimal 1, and set the 16-20 byte to 0.
[0061] 13. Update the spliced ​​20-bit byte to the address and the linked list header.
[0062] 14. If the value of the 15th byte is greater than 0, take the 1-8th byte and the 15th byte and calculate their respective decimal values ​​and put them into the array (that is, the 64-bit function value of the user ID and the hash used sequence number of the function).
[0063] 15. Take the 16th-19th byte and calculate the decimal value (the pointer of the linked list/the position of the next value), and judge whether it is greater than 0.
[0064] 16. If it is greater than 0, loop steps 14-15 until the linked list ends.
[0065] 17. If it is less than 0, it means that the end of the linked list is reached, and the end data of the linked list, the address position of the end data and the serial number of the largest hash function are saved.
[0066] 18. Take the first 64-bit hash function to generate a 64-bit long hash value for the ID, and compare it with all 64-bit hash function values ​​stored in the array.
[0067] 19. Determine whether there are equal values. If so, set the hash collision flag to true, and the next 64-bit hash function of the hash function will identify the ID to generate a 64-bit long hash value.
[0068] 20. Repeat steps 18 to 19 until there are no equal values.
[0069] 21. If not, compare the serial number of the hash function with the serial number of the largest hash function in the linked list, and judge whether the serial number of the hash function is greater than or equal to the serial number of the largest hash function in the linked list.
[0070] 22. If yes, go to step 27.
[0071] 23. If not, take all hash function values ​​and serial numbers in the array whose hash function serial number is greater than the hash function serial number, and store them in a temporary array.
[0072] 24. Traverse the temporary array, and compare each hash function value in the array with the hash function value generated by using the hash function corresponding to the value to generate the user ID. Checks for equality.
[0073] 25. If there are equal values, then loop through steps 18 to 24 until there is no equal value (to ensure the uniqueness of the stored user ID hash value).
[0074] 26. If not equal, proceed to the next step.
[0075] 27. Determine whether the collision flag is true, if so, set the 15th byte of the first data in the linked list to decimal 2, and continue to step 29.
[0076] 28. If no, continue to the next step.
[0077] 29. Use the 64-bit hash function to store the hash value generated by the ID into 1-8 bytes of the new 20 bytes.
[0078] 30. Take out 6 categories and 45 states of the crowd portrait, corresponding to the last 45 bits of 9-14 bytes, 1 means yes, 0 means no, and store them in 9-14 bytes of the 20 bytes.
[0079] 31. Set the 15th byte to decimal 1, and set the 16-20 byte to 0.
[0080] 32. Store the spliced ​​20-byte crowd portrait data in a new address.
[0081] 33. Update the tail data of the linked list, put the new address data into the 16-19 byte, and put the block data into the 20th byte.
[0082] 34. Determine whether the writing is successful, and if so, return the addition success.
[0083] 35. If not, return adding failure.
[0084] 36. End this processing, take out the next piece of information from the distributed message system for processing, and then loop through steps 2 to 28.
[0085] like Figure 5 As shown, the realization principle of crowd portrait storage and directional reading
[0086] 1. When the bidding system of the DSP bidding server performs user orientation.
[0087] 2. The bidding system retrieves the orientation information of an order.
[0088] 3. The DSP bidding server invokes the directional function module through the directional information of the order.
[0089] 4. The orientation function module judges whether there is a group portrait mapping value corresponding to the temporarily stored user ID.
[0090] 5. If there is, the orientation function module analyzes the crowd portrait mapping value.
[0091] 6. The orientation function module obtains the group portrait value and verification value required by the order orientation.
[0092] 7. The crowd portrait mapping value corresponding to the user ID is bit-ANDed with the crowd portrait value required by the order orientation.
[0093] 8. Calculate the check value from the result after bit AND operation.
[0094] 9. Compare with the verification value of the group portrait required by the order orientation.
[0095] 10. Determine whether they are equal, and return the result to the bidding system if they are equal.
[0096] 11. If not equal, the bidding system judges whether there is a next order.
[0097] 12. If there is, the steps from 2 to 11 are repeated until all order checks are completed.
[0098] 13. If not, the order check ends, and the directional function module deletes the group portrait mapping value corresponding to the temporarily stored user ID, and returns no compliance to the bidding system.
[0099] 14. If there is no crowd portrait mapping value corresponding to the temporarily stored user ID, the orientation function module asks the crowd portrait filter whether there is crowd portrait for the user.
[0100] 15. If it does not exist, inform the targeting function module and return the portrait of no crowd to the bidding system.
[0101] 16. If it exists, the orientation function module sends information such as the user identification ID to the crowd portrait cache system.
[0102] 17. The crowd portrait cache system calculates the grouped machine information and block information through the user ID.
[0103] 18. Obtain the shared memory mapping of the identifier in the custom cache through the block identifier.
[0104] 19. Determine whether there is a shared memory mapping in the custom cache, if not, send information to the monitoring system, and return that the user's crowd portrait does not exist to the targeting function module, and the targeting function module returns no crowd portrait to the bidding system.
[0105] 20. If it exists, use the user ID to generate a 32-bit hash value through a hash function.
[0106] 21. Take the remainder of the hash value and the maximum storage number of the block, calculate the position of the hash value in the cache, and take out the position of the crowd portrait data (20 bytes) in the cache from the cache. That is, the head of the chain list.
[0107] 22. Take out the 15th byte from the 20 bytes, which is the serial number of the hash function. (In the head data of the linked list, this mark indicates three situations. When it is 0, it means that the head of the linked list does not store crowd portrait data. When it is 1, it means that the head of the linked list has stored data and the 64-bit hash value in the linked list has no collision. When it is When 2, it means that the head of the linked list has stored data and the 64-bit hash value in the linked list has a collision.)
[0108] 23. Calculate the decimal value of the 15th byte.
[0109] 24. Determine whether the value is greater than 1. If it is not greater than 1 or equal to 0, then the user’s crowd portrait does not exist to the targeting function module, and the targeting function module returns no crowd portrait to the bidding system.
[0110] 25. If it is equal to 1, then judge whether the 16th-19th byte value is greater than 0,
[0111] 26. If it is not greater than 0, use the first 64-bit hash function to generate a hash for the identification ID, and compare it with the value of the 1-8th byte extracted.
[0112] 26. If they are not equal, then return the user's group portrait does not exist to the targeting function module, and the targeting function module returns no crowd portrait to the bidding system.
[0113] 27. If they are equal, take out the data corresponding to 9-14 bytes and calculate the decimal value, return the user group portrait value to the orientation function module, and the orientation function module temporarily saves the user ID corresponding to the crowd portrait mapping value, continue to 5th step.
[0114] 28. If the 16th-19th byte values ​​are greater than 0, continue to step 30.
[0115] 29. If the decimal value of the 15th byte is greater than 1, set whether there is a hash collision flag to true.
[0116]30. Take the 16th-19th byte and calculate the decimal value (the pointer of the linked list/the address of the next value in the linked list).
[0117] 31. Take the 1-8th byte and calculate the decimal value (64-bit long hash value).
[0118] 32. Take out the 15th byte of the data and calculate the decimal array (the hash function used).
[0119] 33. Use the 64-bit hash function to generate a hash value for the identification ID, and compare whether the two hash values ​​are equal.
[0120] 34. If the two hash values ​​are not equal, determine whether the address of the next value in the linked list is greater than 0.
[0121] 35. If it is greater than 0, loop through steps 30 to 34 until the last item in the linked list.
[0122] 36. If it is equal to 0, it means that the end of the linked list is reached, then judge whether the length of the result array is greater than 0
[0123] 37. If it is not greater than 0, then return that the user's crowd portrait does not exist to the directional function module, and the directional function module returns no crowd portrait to the bidding system.
[0124] 38. If it is greater than 0, then judge whether the length of the array is greater than 1
[0125] 39. If it is greater than 1, then take out the group portrait data with the largest hash function sequence used in the array, and judge whether the length of the array is greater than 2, if so, send information to the monitoring system for memory adjustment. Go to step 41
[0126] 40. If it is equal to 1, the crowd portrait data in the array is taken out.
[0127] 41. Take out the data corresponding to 9-14 bytes and calculate the decimal value, return the user group portrait value to the orientation function module, the orientation function module temporarily saves the user identification ID corresponding to the crowd portrait mapping value, and continue to step 5.
[0128] 42. If the two hash values ​​are equal, determine whether the Hash collision flag is true
[0129] 43. If it is true, store the crowd portrait data into the result array and repeat steps 34 to 41.
[0130] 44. If it is not true, take out the data corresponding to 9-14 bytes and calculate the decimal value, return the value of the user group portrait to the orientation function module, and the mapping value of the crowd portrait corresponding to the user ID temporarily saved by the orientation function module, continue Step 5.
[0131] This system uses a custom cache system based on file-based shared memory, a message processing management system, bit mapping based on crowd portrait data, user ID is converted into a 64-bit hash value for storage, multiple hash functions, and cyclic comparison. The combination of compression writing to avoid collisions and the reading process of orientation function modules based on bit operations and check values ​​forms a complete set of storage and orientation systems for crowd portraits. This system has strong robustness, advanced nature, versatility, good maintainability and ease of use. Using this system, not only can meet the bidding system orientation function and crowd portrait cache storage requirements, but also can independently apply the distributed cache storage of crowd portraits, especially for the key/value sustainable cache storage of the resource library under the high concurrency of big data . By converting the key of the string into a 64-bit hash value, the memory space is greatly saved, and some values ​​can also use the bit mapping method of the present invention, which can not only solve many difficult problems under high concurrent big data, but also Greatly improve the utilization rate of the CPU and memory of a single machine, realize the dynamic memory loading and release of full update, increase the flexibility of system applications, and compress and utilize memory space more effectively.
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

no PUM

Description & Claims & Application Information

We can also present the details of the Description, Claims and Application information to help users get a comprehensive understanding of the technical details of the patent, such as background art, summary of invention, brief description of drawings, description of embodiments, and other original content. On the other hand, users can also determine the specific scope of protection of the technology through the list of claims; as well as understand the changes in the life cycle of the technology with the presentation of the patent timeline. Login to view more.
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Similar technology patents

Laying method for lining of reduction furnace

Owner:岷山环能高科股份公司

Photoelectric hybrid image identification system

InactiveCN106295633ASolve poor recognition efficiency and accuracyAchieve speed and accuracyCharacter and pattern recognitionDistortionFourier transform
Owner:CHENGDU LIANZHONGZHI TECH CO LTD

Intelligent antenna network calibration method

InactiveCN112019280AReduce applicationsReduce calibration costsTransmitters monitoringReceivers monitoringRadio frequencyTransceiver
Owner:安徽蓝煜电子科技有限公司

Rapid vehicle detection method and device under low resolution

InactiveCN103793722AGood calculationAchieve speed and accuracyCharacter and pattern recognitionImage resolutionReal-time computing
Owner:UNIV OF ELECTRONIC SCI & TECH OF CHINA

Classification and recommendation of technical efficacy words

  • Reduce applications
  • Achieve speed and accuracy

Rapid vehicle detection method and device under low resolution

InactiveCN103793722AGood calculationAchieve speed and accuracyCharacter and pattern recognitionImage resolutionReal-time computing
Owner:UNIV OF ELECTRONIC SCI & TECH OF CHINA

Photoelectric hybrid image identification system

InactiveCN106295633ASolve poor recognition efficiency and accuracyAchieve speed and accuracyCharacter and pattern recognitionDistortionFourier transform
Owner:CHENGDU LIANZHONGZHI TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products