当前位置:网站首页>Redis' hyperloglog as a powerful tool for active user statistics

Redis' hyperloglog as a powerful tool for active user statistics

2022-07-05 01:32:00 Fulongyuan resident

Preface

Websites often have such needs : Count the number of daily active users , What are the ways to do it ?


Use

1、 use redis Of set aggregate

After the user logs in , Put the user id Add to redis Of set in ,set It will automatically remove the weight , Something like this :

127.0.0.1:6379> sadd users_2019_06_17 user1
(integer) 1
127.0.0.1:6379> sadd users_2019_06_17 user2
(integer) 1
127.0.0.1:6379> sadd users_2019_06_17 user3
(integer) 1

Obviously , Just one scard command :

127.0.0.1:6379> scard users_2019_06_17
(integer) 3

You can see it ,2019 year 6 month 17 The number of users of number is 3 individual .

It's simple , But the set is only suitable for occasions with a small number of users , If the user has 100 ten thousand ,set Storage 100 m id Number , If one id Account number 32 Bytes , The total is about the same 32M, One month is 960M Almost one G 了 !


2、 use Bitmap

We store 100 m id Number one needs 100 m bit position , That is to say 100 ten thousand /8 = 125K byte , Directly used for id Number and 100 Ten thousand withdrawal , The remainder is taken as bit The index of :

127.0.0.1:6379> setbit login_2019_06_17  10000 1
(integer) 0
127.0.0.1:6379> setbit login_2019_06_17  1024 1
(integer) 0
127.0.0.1:6379> setbit login_2019_06_17  238 1
(integer) 0
127.0.0.1:6379> setbit login_2019_06_17  3434 1
(integer) 0

At this time, the same , Just one bitcount You can find out the number of users :

127.0.0.1:6379> bitcount login_2019_06_17
(integer) 4

Store at this time 100 Million users , It only needs 125K Bytes , It's only a month 4M.

Is there any way to occupy less storage space ?


3、 use redis Of HyperLogLog

Redis HyperLogLog It's an algorithm for cardinality statistics ,HyperLogLog The advantages of , When the number or volume of input elements is very, very large , The space needed to calculate the cardinality is always fixed 、 And it's very small .

stay Redis Inside , Every HyperLogLog Keys only cost 12 KB Memory , So we can calculate the proximity 2^64 A base of different elements Count . This is the same as calculating the cardinality , The more elements consume memory, the more collections there are .

however , because HyperLogLog Only the input elements will be used to calculate the cardinality , Instead of storing the input elements themselves , therefore HyperLogLog It can't be like a collection , Return the various elements of the input .

The principle is very complicated, so I won't say , Just say the usage :

127.0.0.1:6379> pfadd login.2019_06_17 user1
(integer) 1
127.0.0.1:6379>  pfadd login.2019_06_17 user2
(integer) 1
127.0.0.1:6379>  pfadd login.2019_06_17 user3
(integer) 1
127.0.0.1:6379>  pfadd login.2019_06_17 user4
(integer) 1
127.0.0.1:6379> pfcount login.2019_06_17
(integer) 4

Store at this time 100 Ten thousand independent users only need 15K about , It's only a month 480K about !

It should be noted that HyperLogLog The statistical result of is not an accurate value , Error in 0.81% about , But it is enough for the scenario of counting the number of users .


summary

Three ways :

1)、 Use Redis Of set aggregate

2)、 Use Bitmap

3)、 Use HyperLogLog( recommend )



original text : https://mp.weixin.qq.com/s/t0g54IqFBx3Zoxq2z37s8A
author : If fish 1919


If you think this sharing will help you bit by bit , Please extend your hand and point recommend Well ~_


原网站

版权声明
本文为[Fulongyuan resident]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/02/202202141028297739.html