当前位置:网站首页>Tencent three sides: how to duplicate 4billion QQ numbers?
Tencent three sides: how to duplicate 4billion QQ numbers?
2022-06-23 17:41:00 【Programmer waiter】
today , Let's talk about a common test question , It also appears in the three aspects of Tencent interview , Very interesting . The specific topics are as follows :
In file 40 One hundred million QQ number , Please design an algorithm for QQ Number de duplication , same QQ Keep only one number , Memory limit 1G.
The meaning of this topic should be very clear , More straightforward . For your understanding , Let me draw a moving picture for fun , Hope you enjoy it .
Can you do it right , It largely determines whether Tencent can win offer, Have a certain skill , Let's see .
In the original question , There is actually 40 One hundred million QQ number , For convenience , In illustration and narration , Only with 4 individual QQ For example .
Method 1 : Sort
naturally , The simplest way is for all QQ Number to sort , Repetitive QQ The number must be adjacent , Keep the first , Just get rid of the repetition .
The original QQ Number is :
After sorting QQ Number is :
It's easy to remove the weight :
But , The interviewer wants to ask you , Do you have to sort to redo ? obviously , The time complexity of sorting is too high , Unable to pass Tencent interview .
Method 2 :hashmap
Since the time complexity of direct sorting is too high , Then use hashmap Well , The specific idea is to QQ The number records hashmap in :
mapFlag[123] = truemapFlag[567] = truemapFlag[123] = truemapFlag[890] = true
because hashmap The de duplication property of , You can see that it automatically becomes :
mapFlag[123] = truemapFlag[567] = truemapFlag[890] = true
Obviously , Only 123,567,890 There is , So this is the result of de duplication .
But , The interviewer will ask you again : Actually save 40 Billion QQ number ,1G Is there enough memory to allocate so much space ? Obviously not. , Unable to pass Tencent interview .
Method 3 : Document cutting
obviously , This is a massive data problem . Job seekers who have seen a lot , Naturally think of the way to cut documents , Avoid excessive memory .
But , Rack your brains , Or use merge sort between files , Or use bucket sorting , Anyway, it can be sorted in the end .
Now that it's sorted , Then you can achieve weight removal , It seems that everything is all right . I can only say frankly , Happy a little early .
next , The interviewer will ask you again : So many file operations , Naturally, the efficiency is not high . obviously , Unable to pass Tencent interview .
Method four :bitmap
Let's see the trick ! We can hashmap To optimize , use bitmap This data structure , It can successfully solve the problems of time and space at the same time .
In many practical projects ,bitmap Frequently used . I've seen the source code of many components , Found in many places bitmap Realization ,bitmap The diagram is as follows :
This is a unsigned char type , You can see , share 8 position , The value range is [0, 255], Like this unsigned char The value of is 255, It can identify 0~7 These numbers exist .
Empathy , Here is unsigned char The value of the type is 254, Its corresponding meaning is :1~7 These numbers exist , And the number 0 non-existent :
thus it can be seen , One unsigned char Data of type , It can mark 0~7 this 8 The existence of integers . And so on :
- One unsigned int Type data can identify 0~31 this 32 The existence of integers .
- Two unsigned int Type data can identify 0~63 this 64 The existence of integers .
obviously , It can be deduced that :512MB Large enough to identify all QQ Whether the number exists or not , Please note that :QQ The theoretical maximum number is 2^32 - 1, Probably 43 Million or so .
The next question is simple : use 512MB Of unsigned int Array to record QQ Whether the number exists or not , To form a bitmap, such as :
bitmapFlag[123] = 1bitmapFlag[567] = 1bitmapFlag[123] = 1bitmapFlag[890] = 1
It's actually :
bitmapFlag[123] = 1bitmapFlag[567] = 1bitmapFlag[890] = 1
Then traverse all positive integers from small to large (4 byte ), When bitmapFlag The value is 1 when , It shows that the number exists . and , From the above process, we can see , Automatic weight removal . obviously , This way can be through Tencent's interview .
边栏推荐
- 如何通过线上股票开户?在线开户安全么?
- 创新技术领航者!华为云GaussDB获颁2022年云原生数据库领域权威奖项
- Elk log collection system deployment
- C. Product 1 Modulo N-Codeforces Round #716 (Div. 2)
- 微信小程序:酒店预计到店日期的时间选择器
- How long does it take to open a stock account by mobile phone? Is online account opening safe?
- Easyplayer mobile terminal plays webrtc protocol for a long time. Pressing the play page cannot close the "about us" page
- 《AN4190应用笔记 天线选择指南》——天线理论2
- hands-on-data-analysis 第二单元 第四节数据可视化
- Analytic analog-to-digital (a/d) converter
猜你喜欢

I successfully joined the company with 27K ByteDance. This interview notes on software testing has benefited me for life

What can the accelerated implementation of digital economy bring to SMEs?

12 initialization of beautifulsoup class

【网络通信 -- WebRTC】WebRTC 源码分析 -- 接收端带宽估计

Redis cluster operation method

Tupu software builds smart city with lightweight modeling

Performance test bottleneck tuning in 10 minutes! If you want to enter a large factory, you must know

Huawei mobile phones install APK through ADB and prompt "the signature is inconsistent. The application may have been modified."

Query the size of each table in the database

Rongyun: let the bank go to the "cloud" easily
随机推荐
MySQL的 安裝、配置、卸載
Three minutes to learn how to retrieve the MySQL password
How do you choose to buy stocks? Good security?
QT layout manager [qvboxlayout, qhboxlayout, qgridlayout]
I successfully joined the company with 27K ByteDance. This interview notes on software testing has benefited me for life
C. Add One--Divide by Zero 2021 and Codeforces Round #714 (Div. 2)
Redis cluster operation method
bypassuac提权
Add new members to the connector family! Scenario connector helps enterprises comprehensively improve the operational efficiency of business systems
How to select an oscilloscope? These 10 points must be considered!
单火线设计系列文章10:拓展应用-单火开关实现双控
ERP管理系统的重要性
Date to localdatetime
华为手机通过adb安装APK提示“签名不一致,该应用可能已被修改”
Tupu software builds smart city with lightweight modeling
B. Integers Shop-Hello 2022
根据年份获取第一天和最后一天
Comparison of asemi Schottky diode and ultrafast recovery diode in switching power supply
图扑软件以轻量化建模构建智慧城市
What can the accelerated implementation of digital economy bring to SMEs?