当前位置:网站首页>[recommended algorithm] C interview question of a small factory
[recommended algorithm] C interview question of a small factory
2022-07-01 04:06:00 【Mountain peak evening view】
List of articles
zero 、 Project questions
0.1 User portrait
- The first is the offline processing part : get data : Image processing after crawling data .
- MongoDB User portrait in , come from mysql User registry and user log data in ( Such as the amount of reading 、 Number of likes 、 Collection number, etc ).
- User portraits and object portraits , Material storage MongoDB Medium SinaNews In the database ; Here we use MongoDB Because its documentation is similar to JSON object , Adding and deleting fields is very convenient .
- The processed materials will be stored in redis in ( Directly from MongoDB It will be more difficult to pull ). Save the recommended list and popular list offline redis. Front end display .
- PS: This is for the convenience of construction , Data is not available online in real time , But I crawl data at a fixed time every night .
0.2 Responsible module
Data cleaning , Algorithm model, etc .
0.3 Cold start problem
Analyze the problem according to the following mind map :
One 、 Machine learning algorithm
1.0 Where is the randomness of random forest
The randomness in random forest mainly comes from three aspects :
One is bootstrap The randomness of training set caused by sampling ,
Secondly, the randomness of randomly selecting feature subsets for each node for impure calculation ,
The third is the randomness when using random segmentation point selection ( At this time, the random forest is also called Extremely Randomized Trees).
1.1 GBDT It's different from random forests
( One )GBDT= Decision tree +AdaBoost Integrated learning .GBDT It's the use of residual training ( Use the negative gradient to fit the residual ), In the process of forecasting , We also need to add up the predictions of all the trees , Get the final prediction result .
( Two ) Random forest is based on decision tree ( Commonly used CART Trees ) Based on the learner's bagging Algorithm .
(1) Random forest when dealing with regression problems , The output value is the average value of each learner ;
(2) Random forest has two strategies when dealing with classification problems :
The first is the voting strategy used in the original paper , That is, each learner outputs a category , Returns the category with the highest predicted frequency ;
The second is sklearn The probabilistic aggregation strategy used in , That is, the average probability that the sample belongs to a certain category is calculated first through the probability distribution output by each learner , After taking the average probability distribution arg max \arg\maxargmax To output the most likely category .
1.2 bagging and boosting difference
Base classifier error = deviation + variance
- Boosting Through the step-by-step aggregation of the wrong samples by the base classifier , Reduce the deviation of integrated classifier ; After training a weak classifier , Calculate its error or residual , As input to the next classifier —— This process is reducing the loss function , Keep the model approaching “ Bull's eye ”.
- Bagging Through a divide and rule strategy , Through the use of training samples for many times , Multiple trained models for comprehensive decision-making , To reduce the variance of the ensemble classifier . A little loosely , Yes n The prediction results of independent and uncorrelated models are averaged , Variance is the variance of the original single model 1/n.
Two 、 Recommendation algorithm
(1)NeuralCF Training process , Sampling process .
(2) Why? DIN To introduce attention mechanisms .
| The model name | The basic principle | characteristic | limitations |
|---|---|---|---|
| NeuralCF | The dot product operation of user vector and item vector in traditional matrix decomposition , Replaced by neural networks for interoperability | The matrix decomposition model with enhanced expression ability | Only users and items are used id features , No more features added |
| Wide&deep | utilize wide Partially strengthen the memory ability of the model , utilize deep Partially strengthen the generalization ability of the model | Create the construction method of composite model | wide The features that need to be combined manually |
| Deep&Cross | use Cross Network substitution Wide&Deep In the model Wide part | It's solved Wide&deep The problem of artificial combination feature of model | Cross The complexity of the network is high |
| DeepFM | use FM replace Wide&deep Of wide part | Strengthened wide Part of the feature cross ability | And classic wide&deep The structural difference is not obvious |
| DIN | Introduce attention mechanism , And use the correlation between user behavior items and target advertising items to calculate the attention score | According to the different target advertising items , Make more targeted recommendations | Not fully utilized except “ Beyond historical behavior ” Other characteristics of |
| DIEN | Use the sequence model to simulate the evolution process of users' interests | The sequence model enhances the system's ability to express the changes of users' interests , Make the system start to consider the valuable information contained in the time-dependent behavior sequence | The training of sequence model is complex , The delay of online service is long , Engineering optimization is required |
among DeepFM Model :
3、 ... and 、Python Basics
3.1 Python Memory management mechanism
- Immutable object : Numbers character string Tuples ; The variable object : Dictionaries list Byte array .
- Immutable objects include int,float,long,str,tuple etc.
For variables of immutable type , If you want to change variables , A new value will be created , Bind the variable to the new value , If the old value is not referenced, it will wait for garbage collection .
Python Garbage collection is mainly based on reference counting , shortcoming : Can't solve the problem of the object “ Circular reference ”、 Need extra space to maintain reference count
The following four situations , Reference count for object +1:
Object created (a=11)、 Object is quoted (b=a)、 Object is passed to the function as an argument func(a)、 Object is stored as an element in a container ( Such as lst1=[a,a])The following four situations , Reference count for object -1:
The alias of the object is explicitly destroyed del a、 The alias of the object is given a new object a=66、 An object leaves its scope ( Such as fun Function execution finished ,fun Local variables in , Note that global variables do not ), The container in which the object is located is destroyed or the object is removed from the container
#!/usr/bin/python
## -*- coding: utf-8 -*-
import sys
def func(c):
print ('in func function',sys.getrefcount(c)-1)
print ('init',sys.getrefcount(11)-1)
a=11
print ('after a=11----',sys.getrefcount(11)-1)
b=a
print ('after b=a----',sys.getrefcount(11)-1)
func(11) # In the calling function is +2: Another reference is that the function stack holds the reference of the input parameter to the formal parameter
print ('after func(11)----',sys.getrefcount(11)-1)
lst1=[a,12,14]
print ('after lst1=[a,12,14]----',sys.getrefcount(11)-1)
a=666
print ('after a=666----',sys.getrefcount(11)-1)
del a
print ('after del a----',sys.getrefcount(11)-1)
del b
print ('after del b----',sys.getrefcount(11)-1)
del lst1
print ('after del lst1----',sys.getrefcount(11)-1)
The result is
init 50
after a=11---- 51
after b=a---- 52
in func function 54
after func(11)---- 52
after lst1=[a,12,14]---- 53
after a=666---- 52
after del a---- 52
after del b---- 51
after del lst1---- 50
Four 、Redis relevant
4.1 redis in ×× Source code implementation
The first stage : read Redis Data structure part of
- Basically located in the following files : Memory allocation zmalloc.c and zmalloc.h
- Dynamic string sds.h and sds.c
- Double ended linked list adlist.c and adlist.h
- Dictionaries dict.h and dict.c
- Skip list server.h It's about zskiplist The structure and zskiplistNode structure , as well as t_zset.c All in zsl Initial function , such as zslCreate、zslInsert、zslDeleteNode wait .
- Base Statistics hyperloglog.c Medium hllhdr structure , And all with hll Initial function
The second stage : be familiar with Redis Memory coding structure
- Integer set data structure intset.h and intset.c
- Compressed list data structure ziplist.h and ziplist.c
The third stage : be familiar with Redis Implementation of data type
- Object system object.c
- String key t_string.c
- List building t_list.c
- The hash key t_hash.c
- Set key t_set.c
- Ordered set key t_zset.c Middle Division zsl All functions except the function at the beginning
- HyperLogLog key hyperloglog.c In the pf Initial function
The fourth stage be familiar with Redis The realization of database
- Database implementation redis.h In the document redisDb structure , as well as db.c file
- notifications notify.c
- RDB Persistence rdb.c
- AOF Persistence aof.c
And the implementation of some independent functional modules
- Publish and subscribe redis.h Of documents pubsubPattern structure , as well as pubsub.c file
- Business redis.h Of documents multiState Structure and multiCmd structure ,multi.c file
The fifth stage Familiar with client and server code implementation
- Event processing module ae.c/ae_epoll.c/ae_evport.c/ae_kqueue.c/ae_select.c
- Network link library anet.c and networking.c
- Server side redis.c
- client redis-cli.c
- At this time, you can read the code implementation of the following independent function modules
- lua Script scripting.c
- The slow query slowlog.c
- monitor monitor.c
Phase 6 This stage is mainly about getting familiar with Redis Multi machine part of the code implementation
- Copy function replication.c
- Redis Sentinel sentinel.c
- colony cluster.c
Reference
[1] How to solve the cold start problem in the recommendation system
边栏推荐
- Valid @suppresswarnings warning name
- How keil displays Chinese annotations (simple with pictures)
- [EI search] important information conference of the 6th International Conference on materials engineering and advanced manufacturing technology (meamt 2022) in 2022 website: www.meamt Org meeting time
- Usage of AfxMessageBox and MessageBox
- 【TA-霜狼_may-《百人计划》】1.3纹理的秘密
- How to ensure the idempotency of the high concurrency interface?
- [today in history] June 30: von Neumann published the first draft; The semiconductor war in the late 1990s; CBS acquires CNET
- Unity之三维空间多点箭头导航
- 241. Design priorities for operational expressions
- 程序员女友给我做了一个疲劳驾驶检测
猜你喜欢

小程序中自定义组件
![[TA frost wolf \u may- hundred talents plan] 1.2.2 matrix calculation](/img/49/173b1f1f379faa28c503165a300ce0.png)
[TA frost wolf \u may- hundred talents plan] 1.2.2 matrix calculation

JMeter学习笔记2-图形界面简单介绍

多次跳槽后,月薪等于老同事的年薪

Jenkins automatically cleans up construction history

The programmer's girlfriend gave me a fatigue driving test

【TA-霜狼_may-《百人計劃》】1.2.1 向量基礎

431. encode n-ary tree as binary tree DFS

25.K个一组翻转链表

使用WinMTR软件简单分析跟踪检测网络路由情况
随机推荐
Visit the image URL stored by Alibaba cloud to preview the thumbnail directly on the web page instead of downloading it directly
NFT:使用 EIP-2981 開啟 NFT 版稅之旅
Usage of AfxMessageBox and MessageBox
MallBook:后疫情时代下,酒店企业如何破局?
214. 最短回文串
程序员女友给我做了一个疲劳驾驶检测
Access denied for user ‘ODBC‘@‘localhost‘ (using password: NO)
JMeter login failure, extracting login token, and obtaining token problem solving
171. excel table column No
高并发下接口幂等性如何保证?
定了!2022京东全球科技探索者大会之京东云峰会7月13日北京见
Millet College wechat scanning code login process record and bug resolution
242. valid Letter heteronyms
Qt开发经验小技巧226-230
What does ft mean in the data book table
[TA frost wolf \u may- hundred people plan] 2.3 introduction to common functions
嵌入式系统开发笔记80:应用Qt Designer进行主界面设计
Custom components in applets
Introduction of Spock unit test framework and its practice in meituan optimization___ Chapter I
Unexpected token o in JSON at position 1, JSON parsing problem