当前位置:网站首页>Redis data type (VII) -- hyperloglog
Redis data type (VII) -- hyperloglog
2022-06-12 06:18:00 【leo_ messi94】
Preface
At work , The functional requirements related to statistics are often met , For example, Statistics website PV(pageview Page visits ), have access to reids Of incr、incrby Make it easy .
But like UV(uniquevistor, Independent visitor )、 Independent IP Count 、 How to solve the problems that need to be de duplicated and counted, such as the number of search records ? This problem of finding the number of non repeating elements in a set becomes a technical problem
There are many solutions to technical problems :
- The data is stored in mysql In the table , Use distinct count Calculate the number of non duplicates
- Use Redis Provided hash、set、bitmaps And other data structures
The results of the above scheme are accurate , But as the data increases , Resulting in more and more occupied space , It is impractical for very large data sets .
Can you reduce the accuracy to balance the storage space ?Redis Push it out HyperLogLog
brief introduction
Redis HyperLogLog Is used to do technical statistics algorithm , Advantage is , When the number or volume of input elements is very, very large , The space needed to calculate the cardinality is always fixed , And it's very small .
stay redis in , Every hyperLogLog Keys only cost 12KB Memory , Note that proximity can be calculated 2^64 Cardinality of different elements . This is the same as calculating the cardinality , The more elements, the more memory consuming collections form a sharp contrast .
however , because HyperLogLog Only the input elements will be used to calculate the cardinality , Instead of storing the input elements of this book , therefore HyperLogLog It can't be like a collection , Return the various elements of the input .
What is the cardinality
Like data sets {1,3,5,7,5,7,8} The cardinality set of is {1,3,5,7,8}, base ( Number of non repeating elements ) by 5. The cardinality is within the acceptable range of error , Fast base calculation .
command :
1.pfadd: Store parameters other than the first parameter in a variable named with the first parameter HyperLogLog In structure
- Format :PFADD key element [element …]
- One side effect of this command is that it may change this HyperLogLog To reflect the cardinality estimated when each unique object is added ( The cardinality of a set ).
- If one HyperLogLog The approximate cardinality of the estimate changes during the execution of the command , PFADD return 1, Otherwise return to 0, If specified key non-existent , This command will automatically create an empty HyperLogLog structure ( Specify the length and encoding of the string ).
- It is also possible to provide only the variable name without specifying the element when calling the command , If this variable name exists , There will be no operation , If it doesn't exist , Then a data structure will be created ( return 1)
- Return value : If HyperLogLog The interior of the has been modified , Then the return 1, Otherwise return to 0
- Complexity :O(1)
redis> PFADD hll a b c d e f g
(integer) 1
redis> PFCOUNT hll
(integer) 7
2.pfcount: Return stored in HyperLogLog The approximate cardinality of the variable of the structure
- Format :PFCOUNT key [key …]
- When the parameter is a key when , Return stored in HyperLogLog The approximate cardinality of the variable of the structure , If the variable does not exist , Then return to 0.
- When Parameters are multiple key when , Back to these HyperLogLog Approximate cardinality of Union , This value will be given to all key Of HyperLoglog The structure is merged into a temporary HyperLogLog Calculated in the structure .
- HyperLogLog Can use fixed and very little memory ( Every HyperLogLog The structure requires 12K Bytes plus key A few bytes of itself ) To store the only elements of the collection .
- The returned visible set cardinality is not an exact value , But one with 0.81% The standard error (standard error) Approximate value .
- For example, to record how many different search queries are executed in a day , A program can call once each time a search query is executed PFADD, And by calling PFCOUNT Command to get the approximate result of this record .
- Be careful : One side effect of this command is that it may cause HyperLogLog Internal changed , For caching purposes , It will use 8 Bytes to record the cardinality of the latest calculation , therefore PFCOUNT The command is technically a write command
- Return value : Approximate number of unique elements added
- Complexity :O(1) with every small average constant times when called with a single key. O(N) with N being the number of keys, and much bigger constant times, when called with multiple keys
redis> PFADD hll foo bar zap
(integer) 1
redis> PFADD hll zap zap zap
(integer) 0
redis> PFADD hll foo bar
(integer) 0
redis> PFCOUNT hll
(integer) 3
redis> PFADD some-other-hll 1 2 3
(integer) 1
redis> PFCOUNT hll some-other-hll
(integer) 6
performance
When calling PFCOUNT Command to specify a key Is the parameter , Good performance , Even with dealing with one HyperLogLog It takes as little time . This may be related to PFCOUNT The command can directly use the estimated cardinality of the cache , Most of the PFADD Nor will any registers be updated , So this value is rarely changed . Theoretically, it can achieve hundreds of operations per second .
When calling PFCOUNT Command to specify multiple key, Because there are many HperLogLog Structure to perform a slower merge operation , Moreover, the cardinality obtained by union calculation cannot be cached , PFCOUNT The command also takes milliseconds to execute multiple key Union operation of , It will take a long time , So don't abuse this multiple key The way .
The user needs to understand this command to handle 1 individual key And multiple key The semantics of execution are different , And the performance of the execution is also different .
For more information, please refer to This article . Source code hyperloglog.c The document is also easy to understand , Including sparse and dense implementation of the code .
3.pfmerge: Will be multiple HyperLogLog Merge (merge) For one HyperLogLog
- Format :PFMERGE destkey sourcekey [sourcekey …]
- After the merger HyperLogLog The cardinality of is close to all inputs HyperLogLog The visible set of (observed set) Union .
- It's a combination of HyperLogLog Will be stored in the target variable ( The first parameter ) Inside , If the key doesn't exist , So before the command is executed , An empty... Will be created for the key first
- Return value :OK
- Complexity :O(N) to merge N HyperLogLogs, but with high constant times
redis> PFADD hll1 foo bar zap a
(integer) 1
redis> PFADD hll2 a b c foo
(integer) 1
redis> PFMERGE hll3 hll1 hll2
OK
redis> PFCOUNT hll3
(integer) 6
边栏推荐
- LeetCode个人题解(剑指offer3-5)3.数组中重复的数字,4.二维数组中的查找,5.替换空格
- Three years of sharpening a sword: insight into the R & D efficiency of ant financial services
- Jackson - how to convert the array string with only one map object to list < map >
- dlib 人脸检测
- Front desk display LED number (number type on calculator)
- Poisson disk sampling for procedural placement
- Directx11 advanced tutorial cluster based deffered shading
- Understand Houdini's (heightfield) remap operation
- SQLite cross compile dynamic library
- First note
猜你喜欢

EBook upload

Book classification based on Naive Bayes

BRDF of directx11 advanced tutorial PBR (2)

Houdini terrain creation

Explanation of sensor flicker/banding phenomenon

Why don't databases use hash tables?

Houdini & UE4 programmed generation of mountains and multi vegetation scattering points

Using hidden Markov model to mark part of speech

Guns framework multi data source configuration without modifying the configuration file

First note
随机推荐
. Net core - pass Net core will Net to cross platform
Cross compile libev
Unity C script implements AES encryption and decryption
Unity3d display FPS script
Project and build Publishing
English grammar_ Adverb_ With or without ly, the meaning is different
User login (medium)
Cause analysis of motion blur / smear caused by camera shooting moving objects
Leetcode sword finger offer II 119 Longest continuous sequence
English语法_副词_有无ly,意义不同
Getting started with houdininengine HDA and UE4
Why do I object so [1.01 to the power of 365 and 0.99 to the power of 365]
RNN model
为什么联合索引是最左匹配原则?
Es6-es11 learning
MLP sensor
LeetCode个人题解(剑指offer3-5)3.数组中重复的数字,4.二维数组中的查找,5.替换空格
BRDF of directx11 advanced tutorial PBR (2)
LeetCode-剑指Offer(第二版)个人题解完整版
Idea common configuration