当前位置：网站首页>Redis' hyperloglog as a powerful tool for active user statistics

Redis' hyperloglog as a powerful tool for active user statistics

2022-07-05 01:32:00 【Fulongyuan resident】

Preface

Websites often have such needs ： Count the number of daily active users , What are the ways to do it ？

Use

1、 use redis Of set aggregate

After the user logs in , Put the user id Add to redis Of set in ,set It will automatically remove the weight , Something like this ：

127.0.0.1:6379> sadd users_2019_06_17 user1
(integer) 1
127.0.0.1:6379> sadd users_2019_06_17 user2
(integer) 1
127.0.0.1:6379> sadd users_2019_06_17 user3
(integer) 1

Obviously , Just one scard command ：

127.0.0.1:6379> scard users_2019_06_17
(integer) 3

You can see it ,2019 year 6 month 17 The number of users of number is 3 individual .

It's simple , But the set is only suitable for occasions with a small number of users , If the user has 100 ten thousand ,set Storage 100 m id Number , If one id Account number 32 Bytes , The total is about the same 32M, One month is 960M Almost one G 了！

2、 use Bitmap

We store 100 m id Number one needs 100 m bit position , That is to say 100 ten thousand /8 = 125K byte , Directly used for id Number and 100 Ten thousand withdrawal , The remainder is taken as bit The index of ：

127.0.0.1:6379> setbit login_2019_06_17  10000 1
(integer) 0
127.0.0.1:6379> setbit login_2019_06_17  1024 1
(integer) 0
127.0.0.1:6379> setbit login_2019_06_17  238 1
(integer) 0
127.0.0.1:6379> setbit login_2019_06_17  3434 1
(integer) 0

At this time, the same , Just one bitcount You can find out the number of users ：

127.0.0.1:6379> bitcount login_2019_06_17
(integer) 4

Store at this time 100 Million users , It only needs 125K Bytes , It's only a month 4M.

Is there any way to occupy less storage space ？

3、 use redis Of HyperLogLog

Redis HyperLogLog It's an algorithm for cardinality statistics ,HyperLogLog The advantages of , When the number or volume of input elements is very, very large , The space needed to calculate the cardinality is always fixed 、 And it's very small .
stay Redis Inside , Every HyperLogLog Keys only cost 12 KB Memory , So we can calculate the proximity 2^64 A base of different elements Count . This is the same as calculating the cardinality , The more elements consume memory, the more collections there are .
however , because HyperLogLog Only the input elements will be used to calculate the cardinality , Instead of storing the input elements themselves , therefore HyperLogLog It can't be like a collection , Return the various elements of the input .

The principle is very complicated, so I won't say , Just say the usage ：

127.0.0.1:6379> pfadd login.2019_06_17 user1
(integer) 1
127.0.0.1:6379>  pfadd login.2019_06_17 user2
(integer) 1
127.0.0.1:6379>  pfadd login.2019_06_17 user3
(integer) 1
127.0.0.1:6379>  pfadd login.2019_06_17 user4
(integer) 1
127.0.0.1:6379> pfcount login.2019_06_17
(integer) 4

Store at this time 100 Ten thousand independent users only need 15K about , It's only a month 480K about ！

It should be noted that HyperLogLog The statistical result of is not an accurate value , Error in 0.81% about , But it is enough for the scenario of counting the number of users .