当前位置:网站首页>A complete tutorial for getting started with redis: hyperloglog
A complete tutorial for getting started with redis: hyperloglog
2022-07-04 22:53:00 【Gu Ge academic】
HyperLogLog It's not a new data structure ( The actual type is string type ), and
Is a cardinality algorithm , adopt HyperLogLog You can use very little memory space to complete the independent total number
The statistics of , Data sets can be IP、Email、ID etc. .HyperLogLog Provides 3 An order :
pfadd、pfcount、pfmerge. for example 2016-03-06 The access user is uuid-1、uuid-2、
uuid-3、uuid-4,2016-03-05 The access user is uuid-4、uuid-5、uuid-6、uuid-7, Such as
chart 3-15 Shown .
Be careful
HyperLogLog The algorithm is composed of Philippe
Flajolet(https://en.wikipedia.org/wiki/Philippe_Flajolet) stay The analysis of a
near-optimal cardinality estimation algorithm In this paper , If the reader is interested
You can read it yourself .
1. add to
pfadd key element [element … ]
pfadd Used to direct to HyperLogLog Additive elements , If the addition is successful, return 1:
127.0.0.1:6379> pfadd 2016_03_06:unique:ids "uuid-1" "uuid-2" "uuid-3" "uuid-4"
(integer) 1
2. Count the number of independent users
pfcount key [key … ]
pfcount Used to calculate one or more HyperLogLog Independent total of , for example
2016_03_06:unique:ids The total number of independents is 4:
127.0.0.1:6379> pfcount 2016_03_06:unique:ids
(integer) 4
If at this time to 2016_03_06:unique:ids Insert uuid-1、uuid-2、uuid-3、uuid-
90, The result is 5( newly added uuid-90):
127.0.0.1:6379> pfadd 2016_03_06:unique:ids "uuid-1" "uuid-2" "uuid-3" "uuid-90"
(integer) 1
127.0.0.1:6379> pfcount 2016_03_06:unique:ids
(integer) 5
At present, the effect of memory saving in this example is not very obvious , Now use the script to
HyperLogLog Insert 100 m id, Record before inserting info memory:
127.0.0.1:6379> info memory
# Memory
used_memory:835144
used_memory_human:815.57K
... towards 2016_05_01:unique:ids Insert 100 Million users , Insert... Every time 1000 strip :
elements=""
key="2016_05_01:unique:ids"
for i in `seq 1 1000000`
do
elements="${elements} uuid-"${i}
if [[ $((i%1000)) == 0 ]];
then
redis-cli pfadd ${key} ${elements}
elements=""
fi
done
When the above code is executed , You can see that the memory is only increased 15K about :
127.0.0.1:6379> info memory
# Memory
used_memory:850616
used_memory_human:830.68K
however , And you can see that pfcount The result is not 100 ten thousand :
127.0.0.1:6379> pfcount 2016_05_01:unique:ids
(integer) 1009838
It can be done to 100 m uuid Use the collection type to test , The code is as follows :
elements=""
key="2016_05_01:unique:ids:set"
for i in `seq 1 1000000`
do
elements="${elements} "${i}
if [[ $((i%1000)) == 0 ]];
then
redis-cli sadd ${key} ${elements}
elements=""
fi
done
You can see the memory usage 84MB:
127.0.0.1:6379> info memory
# Memory
used_memory:88702680
used_memory_human:84.59M
But the number of independent users is 100 ten thousand :
127.0.0.1:6379> scard 2016_05_01:unique:ids:set
(integer) 1000000
surface 3-6 Lists the use of set types and HperLogLog Count the occupied space of millions of users
Than .
You can see ,HyperLogLog Memory footprint is surprisingly small , But with such a small space
Such a huge amount of data , It must not be 100% Right , There must be an error rate .Redis Officer,
The number given by Fang is 0.81% The error rate of .
3. Merge
pfmerge destkey sourcekey [sourcekey ...]
pfmerge You can find multiple HyperLogLog And assign to destkey, For example, to calculate
2016 year 3 month 5 Day and 3 month 6 Number of independent users per day , This can be done as follows , Sure
See that the number of end independent users is 7:
127.0.0.1:6379> pfadd 2016_03_06:unique:ids "uuid-1" "uuid-2" "uuid-3" "uuid-4"
(integer) 1
127.0.0.1:6379> pfadd 2016_03_05:unique:ids "uuid-4" "uuid-5" "uuid-6" "uuid-7"
(integer) 1
127.0.0.1:6379> pfmerge 2016_03_05_06:unique:ids 2016_03_05:unique:ids
2016_03_06:unique:ids
OK
127.0.0.1:6379> pfcount 2016_03_05_06:unique:ids
(integer) 7
HyperLogLog Memory footprint is very small , But there is an error rate , Developers are working on data
During structure selection, only the following two items need to be confirmed :
· Just to calculate the independent total , No need to get a single piece of data .
· Can tolerate a certain error rate , After all HyperLogLog It has a great advantage in the occupation of memory
potential .
边栏推荐
- MySQL Architecture - logical architecture
- Common methods in string class
- Locust performance test - environment construction and use
- leetcode 72. Edit Distance 编辑距离(中等)
- 页面关闭前,如何发送一个可靠请求
- [cooking record] - stir fried 1000 pieces of green pepper
- Attack and defense world misc advanced area ditf
- LOGO特訓營 第三節 首字母創意手法
- 剑指Offer 68 - II. 二叉树的最近公共祖先
- Unity修仙手游 | lua动态滑动功能(3种源码具体实现)
猜你喜欢
Redis入门完整教程:GEO
Redis: redis configuration file related configuration and redis persistence
集群的概述与定义,一看就会
Redis入门完整教程:集合详解
The Sandbox 和数字好莱坞达成合作,通过人力资源开发加速创作者经济的发展
Locust performance test - environment construction and use
2022-07-04: what is the output of the following go language code? A:true; B:false; C: Compilation error. package main import “fmt“ func main() { fmt.Pri
[OpenGL] note 29 anti aliasing (MSAA)
新版判断PC和手机端代码,手机端跳转手机端,PC跳转PC端最新有效代码
Redis入门完整教程:Redis Shell
随机推荐
[Lua] Int64 support
High school physics: linear motion
The overview and definition of clusters can be seen at a glance
Is Huatai Securities a nationally recognized securities firm? Is it safe to open an account?
LOGO特训营 第二节 文字与图形的搭配关系
集群的概述与定义,一看就会
新版判断PC和手机端代码,手机端跳转手机端,PC跳转PC端最新有效代码
Serial port data frame
Redis入门完整教程:Pipeline
How to manage 15million employees easily?
Why is Dameng data called the "first share" of domestic databases?
记录:关于Win10系统中Microsoft Edge上的网页如何滚动截屏?
MySQL Architecture - user rights and management
攻防世界 MISC 进阶 glance-50
Deployment of JVM sandbox repeater
Redis入门完整教程:GEO
Insert sort, select sort, bubble sort
Logo special training camp section II collocation relationship between words and graphics
Analysis of the self increasing and self decreasing of C language function parameters
Solana chain application crema was shut down due to hacker attacks