当前位置:网站首页>请问海量数据如何去取最大的K个
请问海量数据如何去取最大的K个
2022-06-30 20:00:00 【兔云程序】
这可能是形而上学很有深度一个算法题目,因为这个会的人能天马行空设计出绝妙的算法,不会的人可能连题目都无处下手。海量数据top K问题,在互联网大厂的产品中到处体现出来,比如微信的计步软件,统计出K名,然后进行排序。
当然类似的题目还有有一亿个浮点数,如何找出其中最大的10000个。这里面其实涉及代码技能就是内存的处理以及数据的去重优化,把本来需要占大量内存空间的海量数据通过各种方法处理出来。以下有几种方法,包括最蠢和最明智的方案,在面试中可以供你吹水。
内存允许的情况下,直接全部排序
这可能是最直接简单粗暴的方法,但是你懂的,我们的前提是海量数据,这个方法是一种方案,但是肯定不靠谱的。全部排序从大到小排序之后,我们直接取头部的K个,方法需要消耗大量内存并且不高效,做了很多无用功,不建议使用。当然如果在口谈面试中,你大脑短路的时候可以先提出这个。
内存允许的情况下,分治法
其实分治的思想里就包括快排和归并排序。思想先分都是将数据不断的分成N份,治是找到每份数据中最大的K个数。
最小堆法(也叫局部淘汰法)
一种局部淘汰法。先读取前K个数,建立一个最小堆。然后将剩余的所有数字依次与最小堆的堆顶进行比较,如果小于或等于堆顶数据,则继续比较下一个;否则,删除堆顶元素,并将新数据插入堆中,重新调整最小堆。当遍历完全部数据后,最小堆中的数据即为最大的K个数。
时间复杂度为O(n+m^2)(其中m为K,比如10000)
边栏推荐
- 静态类使用@Resource注解注入
- Encoding type of Perl conversion file
- DEX file parsing - Method_ IDS resolution
- [ICLR 2021] semi supervised object detection: unbiased teacher for semi supervised object detection
- Jerry's touch key recognition process [chapter]
- 杰理之触摸按键识别流程【篇】
- Primary school, session 3 - afternoon: Web_ sessionlfi
- Solution to rollback of MySQL database by mistake deletion
- 杰理之触摸按键识别流程【篇】
- Maya House Modeling
猜你喜欢

mysql主从同步

MySQL master-slave synchronization

Maya house modeling

maya房子建模

Why should offline stores do new retail?

Cv+deep learning network architecture pytoch recurrence series basenets (backbones) (I)

Tensorflow2.4实现RepVGG

Document contains & conditional competition

SQL优化

Primary school, session 3 - afternoon: Web_ xxe
随机推荐
Black apple server system installation tutorial, black apple installation tutorial, teach you how to install black apple in detail [easy to understand]
The Commission is so high that everyone can participate in the new programmer's partner plan
网上炒股开户安全嘛!?
QT :QAxObject操作Excel
By analyzing more than 7million R & D needs, it is found that these eight programming languages are the most needed by the industry
Solve the problems of Devops landing in complex environment with various tools with full stack and full function solutions
杰理之触摸按键识别流程【篇】
静态类使用@Resource注解注入
大神详解开源 BUFF 增益攻略丨直播
项目经理不应该犯的错误
originpro 2021 附安装教程
Lambda expression principle analysis and learning (June 23, 2022)
Primary school, session 3 - afternoon: Web_ xxe
All the important spark summit features were released here last night (with ultra clear video attached)
Description of the latest RTSP address rules for Hikvision camera, NVR, streaming media server, playback and streaming [easy to understand]
Jerry's question about long press boot detection [chapter]
How unity pulls one of multiple components
Why must we move from Devops to bizdevops?
Jerry's touch key recognition process [chapter]
北京大学ACM Problems 1006:Biorhythms