当前位置:网站首页>A complete tutorial for getting started with redis: hyperloglog
A complete tutorial for getting started with redis: hyperloglog
2022-07-04 22:53:00 【Gu Ge academic】
HyperLogLog It's not a new data structure ( The actual type is string type ), and
Is a cardinality algorithm , adopt HyperLogLog You can use very little memory space to complete the independent total number
The statistics of , Data sets can be IP、Email、ID etc. .HyperLogLog Provides 3 An order :
pfadd、pfcount、pfmerge. for example 2016-03-06 The access user is uuid-1、uuid-2、
uuid-3、uuid-4,2016-03-05 The access user is uuid-4、uuid-5、uuid-6、uuid-7, Such as
chart 3-15 Shown .
Be careful
HyperLogLog The algorithm is composed of Philippe
Flajolet(https://en.wikipedia.org/wiki/Philippe_Flajolet) stay The analysis of a
near-optimal cardinality estimation algorithm In this paper , If the reader is interested
You can read it yourself .
1. add to
pfadd key element [element … ]
pfadd Used to direct to HyperLogLog Additive elements , If the addition is successful, return 1:
127.0.0.1:6379> pfadd 2016_03_06:unique:ids "uuid-1" "uuid-2" "uuid-3" "uuid-4"
(integer) 1
2. Count the number of independent users
pfcount key [key … ]
pfcount Used to calculate one or more HyperLogLog Independent total of , for example
2016_03_06:unique:ids The total number of independents is 4:
127.0.0.1:6379> pfcount 2016_03_06:unique:ids
(integer) 4
If at this time to 2016_03_06:unique:ids Insert uuid-1、uuid-2、uuid-3、uuid-
90, The result is 5( newly added uuid-90):
127.0.0.1:6379> pfadd 2016_03_06:unique:ids "uuid-1" "uuid-2" "uuid-3" "uuid-90"
(integer) 1
127.0.0.1:6379> pfcount 2016_03_06:unique:ids
(integer) 5
At present, the effect of memory saving in this example is not very obvious , Now use the script to
HyperLogLog Insert 100 m id, Record before inserting info memory:
127.0.0.1:6379> info memory
# Memory
used_memory:835144
used_memory_human:815.57K
... towards 2016_05_01:unique:ids Insert 100 Million users , Insert... Every time 1000 strip :
elements=""
key="2016_05_01:unique:ids"
for i in `seq 1 1000000`
do
elements="${elements} uuid-"${i}
if [[ $((i%1000)) == 0 ]];
then
redis-cli pfadd ${key} ${elements}
elements=""
fi
done
When the above code is executed , You can see that the memory is only increased 15K about :
127.0.0.1:6379> info memory
# Memory
used_memory:850616
used_memory_human:830.68K
however , And you can see that pfcount The result is not 100 ten thousand :
127.0.0.1:6379> pfcount 2016_05_01:unique:ids
(integer) 1009838
It can be done to 100 m uuid Use the collection type to test , The code is as follows :
elements=""
key="2016_05_01:unique:ids:set"
for i in `seq 1 1000000`
do
elements="${elements} "${i}
if [[ $((i%1000)) == 0 ]];
then
redis-cli sadd ${key} ${elements}
elements=""
fi
done
You can see the memory usage 84MB:
127.0.0.1:6379> info memory
# Memory
used_memory:88702680
used_memory_human:84.59M
But the number of independent users is 100 ten thousand :
127.0.0.1:6379> scard 2016_05_01:unique:ids:set
(integer) 1000000
surface 3-6 Lists the use of set types and HperLogLog Count the occupied space of millions of users
Than .
You can see ,HyperLogLog Memory footprint is surprisingly small , But with such a small space
Such a huge amount of data , It must not be 100% Right , There must be an error rate .Redis Officer,
The number given by Fang is 0.81% The error rate of .
3. Merge
pfmerge destkey sourcekey [sourcekey ...]
pfmerge You can find multiple HyperLogLog And assign to destkey, For example, to calculate
2016 year 3 month 5 Day and 3 month 6 Number of independent users per day , This can be done as follows , Sure
See that the number of end independent users is 7:
127.0.0.1:6379> pfadd 2016_03_06:unique:ids "uuid-1" "uuid-2" "uuid-3" "uuid-4"
(integer) 1
127.0.0.1:6379> pfadd 2016_03_05:unique:ids "uuid-4" "uuid-5" "uuid-6" "uuid-7"
(integer) 1
127.0.0.1:6379> pfmerge 2016_03_05_06:unique:ids 2016_03_05:unique:ids
2016_03_06:unique:ids
OK
127.0.0.1:6379> pfcount 2016_03_05_06:unique:ids
(integer) 7
HyperLogLog Memory footprint is very small , But there is an error rate , Developers are working on data
During structure selection, only the following two items need to be confirmed :
· Just to calculate the independent total , No need to get a single piece of data .
· Can tolerate a certain error rate , After all HyperLogLog It has a great advantage in the occupation of memory
potential .
边栏推荐
- Redis入门完整教程:哈希说明
- Advanced area a of attack and defense world misc Masters_ good_ idea
- 繁华落尽、物是人非:个人站长该何去何从
- 【lua】int64的支持
- Sword finger offer 68 - I. nearest common ancestor of binary search tree
- Google Earth Engine(GEE)——Tasks升级,实现RUN ALL可以一键下载任务类型中的所有影像
- Redis入门完整教程:Pipeline
- Redis入门完整教程:客户端通信协议
- SQL中MAX与GREATEST的区别
- Redis入门完整教程:GEO
猜你喜欢
Google Earth engine (GEE) - globfire daily fire data set based on mcd64a1
攻防世界 MISC 進階區 Erik-Baleog-and-Olaf
Lost in the lock world of MySQL
The sandbox has reached a cooperation with digital Hollywood to accelerate the economic development of creators through human resource development
Locust performance test - environment construction and use
Redis入门完整教程:客户端通信协议
Logo special training camp Section IV importance of font design
Three stage operations in the attack and defense drill of the blue team
NFT insider 64: e-commerce giant eBay submitted an NFT related trademark application, and KPMG will invest $30million in Web3 and metauniverse
Hit the core in the advanced area of misc in the attack and defense world
随机推荐
Redis入门完整教程:发布订阅
Redis introduction complete tutorial: slow query analysis
Advanced area of attack and defense world misc 3-11
Analog rocker controlled steering gear
Feature scaling normalization
Serial port data frame
Redis入门完整教程:Redis使用场景
Locust performance test - environment construction and use
Install the gold warehouse database of NPC
Wake up day, how do I step by step towards the road of software testing
Unity修仙手游 | lua动态滑动功能(3种源码具体实现)
Redis入門完整教程:Pipeline
关于栈区、堆区、全局区、文字常量区、程序代码区
Microservices -- Opening
Common methods in string class
Sobel filter
记录:关于Win10系统中Microsoft Edge上的网页如何滚动截屏?
[Lua] Int64 support
Why is Dameng data called the "first share" of domestic databases?
Logo special training camp section II collocation relationship between words and graphics