当前位置:网站首页>[recommended algorithm] C interview question of a small factory
[recommended algorithm] C interview question of a small factory
2022-07-01 04:06:00 【Mountain peak evening view】
List of articles
zero 、 Project questions
0.1 User portrait
- The first is the offline processing part : get data : Image processing after crawling data .
- MongoDB User portrait in , come from mysql User registry and user log data in ( Such as the amount of reading 、 Number of likes 、 Collection number, etc ).
- User portraits and object portraits , Material storage MongoDB Medium SinaNews In the database ; Here we use MongoDB Because its documentation is similar to JSON object , Adding and deleting fields is very convenient .
- The processed materials will be stored in redis in ( Directly from MongoDB It will be more difficult to pull ). Save the recommended list and popular list offline redis. Front end display .
- PS: This is for the convenience of construction , Data is not available online in real time , But I crawl data at a fixed time every night .
0.2 Responsible module
Data cleaning , Algorithm model, etc .
0.3 Cold start problem
Analyze the problem according to the following mind map :
One 、 Machine learning algorithm
1.0 Where is the randomness of random forest
The randomness in random forest mainly comes from three aspects :
One is bootstrap The randomness of training set caused by sampling ,
Secondly, the randomness of randomly selecting feature subsets for each node for impure calculation ,
The third is the randomness when using random segmentation point selection ( At this time, the random forest is also called Extremely Randomized Trees).
1.1 GBDT It's different from random forests
( One )GBDT= Decision tree +AdaBoost Integrated learning .GBDT It's the use of residual training ( Use the negative gradient to fit the residual ), In the process of forecasting , We also need to add up the predictions of all the trees , Get the final prediction result .
( Two ) Random forest is based on decision tree ( Commonly used CART Trees ) Based on the learner's bagging Algorithm .
(1) Random forest when dealing with regression problems , The output value is the average value of each learner ;
(2) Random forest has two strategies when dealing with classification problems :
The first is the voting strategy used in the original paper , That is, each learner outputs a category , Returns the category with the highest predicted frequency ;
The second is sklearn The probabilistic aggregation strategy used in , That is, the average probability that the sample belongs to a certain category is calculated first through the probability distribution output by each learner , After taking the average probability distribution arg max \arg\maxargmax To output the most likely category .
1.2 bagging and boosting difference
Base classifier error = deviation + variance
- Boosting Through the step-by-step aggregation of the wrong samples by the base classifier , Reduce the deviation of integrated classifier ; After training a weak classifier , Calculate its error or residual , As input to the next classifier —— This process is reducing the loss function , Keep the model approaching “ Bull's eye ”.
- Bagging Through a divide and rule strategy , Through the use of training samples for many times , Multiple trained models for comprehensive decision-making , To reduce the variance of the ensemble classifier . A little loosely , Yes n The prediction results of independent and uncorrelated models are averaged , Variance is the variance of the original single model 1/n.
Two 、 Recommendation algorithm
(1)NeuralCF Training process , Sampling process .
(2) Why? DIN To introduce attention mechanisms .
| The model name | The basic principle | characteristic | limitations |
|---|---|---|---|
| NeuralCF | The dot product operation of user vector and item vector in traditional matrix decomposition , Replaced by neural networks for interoperability | The matrix decomposition model with enhanced expression ability | Only users and items are used id features , No more features added |
| Wide&deep | utilize wide Partially strengthen the memory ability of the model , utilize deep Partially strengthen the generalization ability of the model | Create the construction method of composite model | wide The features that need to be combined manually |
| Deep&Cross | use Cross Network substitution Wide&Deep In the model Wide part | It's solved Wide&deep The problem of artificial combination feature of model | Cross The complexity of the network is high |
| DeepFM | use FM replace Wide&deep Of wide part | Strengthened wide Part of the feature cross ability | And classic wide&deep The structural difference is not obvious |
| DIN | Introduce attention mechanism , And use the correlation between user behavior items and target advertising items to calculate the attention score | According to the different target advertising items , Make more targeted recommendations | Not fully utilized except “ Beyond historical behavior ” Other characteristics of |
| DIEN | Use the sequence model to simulate the evolution process of users' interests | The sequence model enhances the system's ability to express the changes of users' interests , Make the system start to consider the valuable information contained in the time-dependent behavior sequence | The training of sequence model is complex , The delay of online service is long , Engineering optimization is required |
among DeepFM Model :
3、 ... and 、Python Basics
3.1 Python Memory management mechanism
- Immutable object : Numbers character string Tuples ; The variable object : Dictionaries list Byte array .
- Immutable objects include int,float,long,str,tuple etc.
For variables of immutable type , If you want to change variables , A new value will be created , Bind the variable to the new value , If the old value is not referenced, it will wait for garbage collection .
Python Garbage collection is mainly based on reference counting , shortcoming : Can't solve the problem of the object “ Circular reference ”、 Need extra space to maintain reference count
The following four situations , Reference count for object +1:
Object created (a=11)、 Object is quoted (b=a)、 Object is passed to the function as an argument func(a)、 Object is stored as an element in a container ( Such as lst1=[a,a])The following four situations , Reference count for object -1:
The alias of the object is explicitly destroyed del a、 The alias of the object is given a new object a=66、 An object leaves its scope ( Such as fun Function execution finished ,fun Local variables in , Note that global variables do not ), The container in which the object is located is destroyed or the object is removed from the container
#!/usr/bin/python
## -*- coding: utf-8 -*-
import sys
def func(c):
print ('in func function',sys.getrefcount(c)-1)
print ('init',sys.getrefcount(11)-1)
a=11
print ('after a=11----',sys.getrefcount(11)-1)
b=a
print ('after b=a----',sys.getrefcount(11)-1)
func(11) # In the calling function is +2: Another reference is that the function stack holds the reference of the input parameter to the formal parameter
print ('after func(11)----',sys.getrefcount(11)-1)
lst1=[a,12,14]
print ('after lst1=[a,12,14]----',sys.getrefcount(11)-1)
a=666
print ('after a=666----',sys.getrefcount(11)-1)
del a
print ('after del a----',sys.getrefcount(11)-1)
del b
print ('after del b----',sys.getrefcount(11)-1)
del lst1
print ('after del lst1----',sys.getrefcount(11)-1)
The result is
init 50
after a=11---- 51
after b=a---- 52
in func function 54
after func(11)---- 52
after lst1=[a,12,14]---- 53
after a=666---- 52
after del a---- 52
after del b---- 51
after del lst1---- 50
Four 、Redis relevant
4.1 redis in ×× Source code implementation
The first stage : read Redis Data structure part of
- Basically located in the following files : Memory allocation zmalloc.c and zmalloc.h
- Dynamic string sds.h and sds.c
- Double ended linked list adlist.c and adlist.h
- Dictionaries dict.h and dict.c
- Skip list server.h It's about zskiplist The structure and zskiplistNode structure , as well as t_zset.c All in zsl Initial function , such as zslCreate、zslInsert、zslDeleteNode wait .
- Base Statistics hyperloglog.c Medium hllhdr structure , And all with hll Initial function
The second stage : be familiar with Redis Memory coding structure
- Integer set data structure intset.h and intset.c
- Compressed list data structure ziplist.h and ziplist.c
The third stage : be familiar with Redis Implementation of data type
- Object system object.c
- String key t_string.c
- List building t_list.c
- The hash key t_hash.c
- Set key t_set.c
- Ordered set key t_zset.c Middle Division zsl All functions except the function at the beginning
- HyperLogLog key hyperloglog.c In the pf Initial function
The fourth stage be familiar with Redis The realization of database
- Database implementation redis.h In the document redisDb structure , as well as db.c file
- notifications notify.c
- RDB Persistence rdb.c
- AOF Persistence aof.c
And the implementation of some independent functional modules
- Publish and subscribe redis.h Of documents pubsubPattern structure , as well as pubsub.c file
- Business redis.h Of documents multiState Structure and multiCmd structure ,multi.c file
The fifth stage Familiar with client and server code implementation
- Event processing module ae.c/ae_epoll.c/ae_evport.c/ae_kqueue.c/ae_select.c
- Network link library anet.c and networking.c
- Server side redis.c
- client redis-cli.c
- At this time, you can read the code implementation of the following independent function modules
- lua Script scripting.c
- The slow query slowlog.c
- monitor monitor.c
Phase 6 This stage is mainly about getting familiar with Redis Multi machine part of the code implementation
- Copy function replication.c
- Redis Sentinel sentinel.c
- colony cluster.c
Reference
[1] How to solve the cold start problem in the recommendation system
边栏推荐
- 有效的 @SuppressWarnings 警告名称
- [JPCs publication] the Third International Conference on control theory and application in 2022 (icocta 2022)
- Use of JMeter counters
- 431. 将 N 叉树编码为二叉树 DFS
- 跳槽一次涨8k,5年跳了3次...
- 283. move zero
- How keil displays Chinese annotations (simple with pictures)
- Deep learning | rnn/lstm of naturallanguageprocessing
- Future of NTF and trends in 2022
- Procurement intelligence is about to break out, and Alipay'3+2'system helps enterprises build core competitive advantages
猜你喜欢

不同性能测试工具的并发模式

Use of JMeter counters

使用WinMTR软件简单分析跟踪检测网络路由情况
![[TA frost wolf \u may- hundred talents plan] 1.2.3 MVP matrix operation](/img/4e/8cf60bc816441967c04f97c64685a1.png)
[TA frost wolf \u may- hundred talents plan] 1.2.3 MVP matrix operation

嵌入式系统开发笔记79:为什么要获取本机网卡IP地址

【发送邮件报错】535 Error:authentication failed
![[ta - Frost Wolf May - 100 people plan] 1.2.1 base vectorielle](/img/94/99090ea91082a385968e071ef3766c.png)
[ta - Frost Wolf May - 100 people plan] 1.2.1 base vectorielle

TS type gymnastics: illustrating a complex advanced type

ThreeJS开篇

431. encode n-ary tree as binary tree DFS
随机推荐
程序员女友给我做了一个疲劳驾驶检测
Browser top loading (from Zhihu)
【TA-霜狼_may-《百人计划》】1.4 PC手机图形API介绍
389. find a difference
[TA frost wolf \u may- hundred people plan] 1.3 secret of texture
熊市下的Coinbase:亏损、裁员、股价暴跌
嵌入式系統開發筆記80:應用Qt Designer進行主界面設計
Visit the image URL stored by Alibaba cloud to preview the thumbnail directly on the web page instead of downloading it directly
Tip of edge browser: enter+ctrl can automatically convert the address bar into a web address
Qt development experience tips 226-230
多次跳槽后,月薪等于老同事的年薪
定了!2022京东全球科技探索者大会之京东云峰会7月13日北京见
【TA-霜狼_may-《百人計劃》】2.3 常用函數介紹
241. Design priorities for operational expressions
[human version] Web3 privacy game in the dark forest
PageObject模式解析及案例
DO280管理应用部署--RC
TS type gymnastics: illustrating a complex advanced type
Quickly filter data such as clock in time and date: Excel filter to find whether a certain time point is within a certain time period
Unexpected token o in JSON at position 1, JSON parsing problem