当前位置:网站首页>Top k questions of interview
Top k questions of interview
2022-07-03 16:23:00 【Zuo Haifeng blog】
Look up the first large or small data from a large amount of data .
First example : Yes 4g Memory , Yes 1 Ten million ipv4 The address of . Find the top one that appears most often 100 individual . set up ipv4 Address 192.168.111.222. In string 15 individual char, Every char Two bytes . Every address 30 Byte storage .1 It must be 3 Gigabytes . 3 Gigabytes = 0.3g. So there is enough memory .ipv4 If int Storage . Smaller memory .
1. On the premise of sufficient memory :
First, make statistics . Reordering .
Make a k,v Structure . every last ip How many times . Can pass hashmap Or array mapping . Sort to find the maximum or minimum after statistics .( The sort here is different from the heap sort ) For example, before looking for 100 Big , Let's assume that hashmap Or the first in the array 100 As the biggest , Just record the smallest one here . Then compare with the new number . I'll leave the record and throw it away , It's too small . Reduce complexity with the help of top heap . The top 100 use small top piles , The first 100 hours use the big top pile . For example, before looking for 100 Big , You can create a 100 A small top pile of nodes , When maintaining the small top heap, it is different from the heap sorting . Heap sorting is to construct all nodes into a heap , After the maintenance, it exchanges the position of the top of the heap and the last node every time . Remove the last node after exchange . The sorting here is to maintain a 100 Heap of nodes . The size of this heap will not change . Then it's the one we count hashmap Or arrays . Put the front first 100 One approximation is regarded as making a small top pile with the most occurrences . Then start traversing the 101 individual , Compare with the top of the pile every time . If it's bigger than the top of the pile . Exchange with the top of the stack . Maintain the small top pile after the exchange . Maintain it well, then compare it 102 Number . In this way, we can get the former 100 The most frequent heap . Complexity nlogm,n On behalf of this 1 Ten million ,m On behalf of 100.
2. Out of memory :
Give it to 100g The text of . The text appears in the following two columns , One is ip Address , One is url. find url By different ip The number of visits before 100 Much of the . Give it to 4g Machine , Unlimited hard disk .
1.2.3.4 baidu.com
1.2.3.4 taobao.com
1.2.3.3 taobao.com
1.2.2.2 baidu.com
1.2.2.2 baidu.com
Here is the baidu yes 2 individual ,taobao It's also 2 individual .
1. Are there any more machines . It can be processed by multiple machines , Each machine handles a portion .
2. Borrow existing software, such as mysql. adopt sql Statement to find out .
3. Sort with hard disk . Name the file according to the domain name . Visit this domain name ip Put in , Or use hash, Check the domain name n Seeking remainder , Put the same domain name data into the same text for statistics . Read files during statistics , There is no need to put all the files in memory , Read a file and make the domain name independent ip Count the number of . After the statistics are completed, the top pile calculation can be used .
边栏推荐
- 架构实战营 - 第 6 期 毕业总结
- Salary 3000, monthly income 40000 by "video editing": people who can make money never rely on hard work!
- TCP congestion control details | 3 design space
- ASEMI整流桥UMB10F参数,UMB10F规格,UMB10F封装
- 切入点表达式
- 如何在本机搭建SVN服务器
- PHP中register_globals参数设置
- App mobile terminal test [3] ADB command
- 【声明】关于检索SogK1997而找到诸多网页爬虫结果这件事
- 【Proteus仿真】8×8LED点阵屏仿电梯数字滚动显示
猜你喜欢

Détails du contrôle de la congestion TCP | 3. Espace de conception

Mixlab编辑团队招募队友啦~~

Colab works with Google cloud disk

Cocos Creator 2.x 自动打包(构建 + 编译)

Threejs Part 2: vertex concept, geometry structure

Low level version of drawing interface (explain each step in detail)

Deep understanding of grouping sets statements in SQL

【声明】关于检索SogK1997而找到诸多网页爬虫结果这件事

Explore Netease's large-scale automated testing solutions see here see here

Redis installation under windows and Linux systems
随机推荐
远程文件包含实操
How to use AAB to APK and APK to AAB of Google play apps on the shelves
Myopia: take off or match glasses? These problems must be understood clearly first
Explore Cassandra's decentralized distributed architecture
MongoDB 的安装和基本操作
Stm32f103c8t6 firmware library lighting
Slam learning notes - build a complete gazebo multi machine simulation slam from scratch (III)
Chinese translation of Tagore's floating birds (1~10)
用通达信炒股开户安全吗?
相同切入点的抽取
利用MySQL中的乐观锁和悲观锁实现分布式锁
SDNU_ ACM_ ICPC_ 2022_ Winter_ Practice_ 4th [individual]
Page dynamics [2]keyframes
Construction practice camp - graduation summary of phase 6
Mysql 将逗号隔开的属性字段数据由列转行
Leetcode binary search tree
Mb10m-asemi rectifier bridge mb10m
Getting started with Message Oriented Middleware
首发!!lancet饿了么官方文档
Using optimistic lock and pessimistic lock in MySQL to realize distributed lock