Preface
Websites often have such needs : Count the number of daily active users , What are the ways to do it ?
Use
1、 use redis Of set aggregate
After the user logs in , Put the user id Add to redis Of set in ,set It will automatically remove the weight , Something like this :
127.0.0.1:6379> sadd users_2019_06_17 user1
(integer) 1
127.0.0.1:6379> sadd users_2019_06_17 user2
(integer) 1
127.0.0.1:6379> sadd users_2019_06_17 user3
(integer) 1
Obviously , Just one scard command :
127.0.0.1:6379> scard users_2019_06_17
(integer) 3
You can see it ,2019 year 6 month 17 The number of users of number is 3 individual .
It's simple , But the set is only suitable for occasions with a small number of users , If the user has 100 ten thousand ,set Storage 100 m id Number , If one id Account number 32 Bytes , The total is about the same 32M, One month is 960M Almost one G 了 !
2、 use Bitmap
We store 100 m id Number one needs 100 m bit position , That is to say 100 ten thousand /8 = 125K byte , Directly used for id Number and 100 Ten thousand withdrawal , The remainder is taken as bit The index of :
127.0.0.1:6379> setbit login_2019_06_17 10000 1
(integer) 0
127.0.0.1:6379> setbit login_2019_06_17 1024 1
(integer) 0
127.0.0.1:6379> setbit login_2019_06_17 238 1
(integer) 0
127.0.0.1:6379> setbit login_2019_06_17 3434 1
(integer) 0
At this time, the same , Just one bitcount You can find out the number of users :
127.0.0.1:6379> bitcount login_2019_06_17
(integer) 4
Store at this time 100 Million users , It only needs 125K Bytes , It's only a month 4M.
Is there any way to occupy less storage space ?
3、 use redis Of HyperLogLog
Redis HyperLogLog It's an algorithm for cardinality statistics ,HyperLogLog The advantages of , When the number or volume of input elements is very, very large , The space needed to calculate the cardinality is always fixed 、 And it's very small .
stay Redis Inside , Every HyperLogLog Keys only cost 12 KB Memory , So we can calculate the proximity 2^64 A base of different elements Count . This is the same as calculating the cardinality , The more elements consume memory, the more collections there are .
however , because HyperLogLog Only the input elements will be used to calculate the cardinality , Instead of storing the input elements themselves , therefore HyperLogLog It can't be like a collection , Return the various elements of the input .
The principle is very complicated, so I won't say , Just say the usage :
127.0.0.1:6379> pfadd login.2019_06_17 user1
(integer) 1
127.0.0.1:6379> pfadd login.2019_06_17 user2
(integer) 1
127.0.0.1:6379> pfadd login.2019_06_17 user3
(integer) 1
127.0.0.1:6379> pfadd login.2019_06_17 user4
(integer) 1
127.0.0.1:6379> pfcount login.2019_06_17
(integer) 4
Store at this time 100 Ten thousand independent users only need 15K about , It's only a month 480K about !
It should be noted that HyperLogLog The statistical result of is not an accurate value , Error in 0.81% about , But it is enough for the scenario of counting the number of users .
summary
Three ways :
1)、 Use Redis Of set aggregate
2)、 Use Bitmap
3)、 Use HyperLogLog( recommend )
original text : https://mp.weixin.qq.com/s/t0g54IqFBx3Zoxq2z37s8A
author : If fish 1919
If you think this sharing will help you bit by bit , Please extend your hand and point recommend Well ~_