当前位置:网站首页>[recommended algorithm] C interview question of a small factory
[recommended algorithm] C interview question of a small factory
2022-07-01 04:06:00 【Mountain peak evening view】
List of articles
zero 、 Project questions
0.1 User portrait
- The first is the offline processing part : get data : Image processing after crawling data .
- MongoDB User portrait in , come from mysql User registry and user log data in ( Such as the amount of reading 、 Number of likes 、 Collection number, etc ).
- User portraits and object portraits , Material storage MongoDB Medium SinaNews In the database ; Here we use MongoDB Because its documentation is similar to JSON object , Adding and deleting fields is very convenient .
- The processed materials will be stored in redis in ( Directly from MongoDB It will be more difficult to pull ). Save the recommended list and popular list offline redis. Front end display .
- PS: This is for the convenience of construction , Data is not available online in real time , But I crawl data at a fixed time every night .
0.2 Responsible module
Data cleaning , Algorithm model, etc .
0.3 Cold start problem
Analyze the problem according to the following mind map :
One 、 Machine learning algorithm
1.0 Where is the randomness of random forest
The randomness in random forest mainly comes from three aspects :
One is bootstrap The randomness of training set caused by sampling ,
Secondly, the randomness of randomly selecting feature subsets for each node for impure calculation ,
The third is the randomness when using random segmentation point selection ( At this time, the random forest is also called Extremely Randomized Trees).
1.1 GBDT It's different from random forests
( One )GBDT= Decision tree +AdaBoost Integrated learning .GBDT It's the use of residual training ( Use the negative gradient to fit the residual ), In the process of forecasting , We also need to add up the predictions of all the trees , Get the final prediction result .
( Two ) Random forest is based on decision tree ( Commonly used CART Trees ) Based on the learner's bagging Algorithm .
(1) Random forest when dealing with regression problems , The output value is the average value of each learner ;
(2) Random forest has two strategies when dealing with classification problems :
The first is the voting strategy used in the original paper , That is, each learner outputs a category , Returns the category with the highest predicted frequency ;
The second is sklearn The probabilistic aggregation strategy used in , That is, the average probability that the sample belongs to a certain category is calculated first through the probability distribution output by each learner , After taking the average probability distribution arg max \arg\maxargmax To output the most likely category .
1.2 bagging and boosting difference
Base classifier error = deviation + variance
- Boosting Through the step-by-step aggregation of the wrong samples by the base classifier , Reduce the deviation of integrated classifier ; After training a weak classifier , Calculate its error or residual , As input to the next classifier —— This process is reducing the loss function , Keep the model approaching “ Bull's eye ”.
- Bagging Through a divide and rule strategy , Through the use of training samples for many times , Multiple trained models for comprehensive decision-making , To reduce the variance of the ensemble classifier . A little loosely , Yes n The prediction results of independent and uncorrelated models are averaged , Variance is the variance of the original single model 1/n.
Two 、 Recommendation algorithm
(1)NeuralCF Training process , Sampling process .
(2) Why? DIN To introduce attention mechanisms .
| The model name | The basic principle | characteristic | limitations |
|---|---|---|---|
| NeuralCF | The dot product operation of user vector and item vector in traditional matrix decomposition , Replaced by neural networks for interoperability | The matrix decomposition model with enhanced expression ability | Only users and items are used id features , No more features added |
| Wide&deep | utilize wide Partially strengthen the memory ability of the model , utilize deep Partially strengthen the generalization ability of the model | Create the construction method of composite model | wide The features that need to be combined manually |
| Deep&Cross | use Cross Network substitution Wide&Deep In the model Wide part | It's solved Wide&deep The problem of artificial combination feature of model | Cross The complexity of the network is high |
| DeepFM | use FM replace Wide&deep Of wide part | Strengthened wide Part of the feature cross ability | And classic wide&deep The structural difference is not obvious |
| DIN | Introduce attention mechanism , And use the correlation between user behavior items and target advertising items to calculate the attention score | According to the different target advertising items , Make more targeted recommendations | Not fully utilized except “ Beyond historical behavior ” Other characteristics of |
| DIEN | Use the sequence model to simulate the evolution process of users' interests | The sequence model enhances the system's ability to express the changes of users' interests , Make the system start to consider the valuable information contained in the time-dependent behavior sequence | The training of sequence model is complex , The delay of online service is long , Engineering optimization is required |
among DeepFM Model :
3、 ... and 、Python Basics
3.1 Python Memory management mechanism
- Immutable object : Numbers character string Tuples ; The variable object : Dictionaries list Byte array .
- Immutable objects include int,float,long,str,tuple etc.
For variables of immutable type , If you want to change variables , A new value will be created , Bind the variable to the new value , If the old value is not referenced, it will wait for garbage collection .
Python Garbage collection is mainly based on reference counting , shortcoming : Can't solve the problem of the object “ Circular reference ”、 Need extra space to maintain reference count
The following four situations , Reference count for object +1:
Object created (a=11)、 Object is quoted (b=a)、 Object is passed to the function as an argument func(a)、 Object is stored as an element in a container ( Such as lst1=[a,a])The following four situations , Reference count for object -1:
The alias of the object is explicitly destroyed del a、 The alias of the object is given a new object a=66、 An object leaves its scope ( Such as fun Function execution finished ,fun Local variables in , Note that global variables do not ), The container in which the object is located is destroyed or the object is removed from the container
#!/usr/bin/python
## -*- coding: utf-8 -*-
import sys
def func(c):
print ('in func function',sys.getrefcount(c)-1)
print ('init',sys.getrefcount(11)-1)
a=11
print ('after a=11----',sys.getrefcount(11)-1)
b=a
print ('after b=a----',sys.getrefcount(11)-1)
func(11) # In the calling function is +2: Another reference is that the function stack holds the reference of the input parameter to the formal parameter
print ('after func(11)----',sys.getrefcount(11)-1)
lst1=[a,12,14]
print ('after lst1=[a,12,14]----',sys.getrefcount(11)-1)
a=666
print ('after a=666----',sys.getrefcount(11)-1)
del a
print ('after del a----',sys.getrefcount(11)-1)
del b
print ('after del b----',sys.getrefcount(11)-1)
del lst1
print ('after del lst1----',sys.getrefcount(11)-1)
The result is
init 50
after a=11---- 51
after b=a---- 52
in func function 54
after func(11)---- 52
after lst1=[a,12,14]---- 53
after a=666---- 52
after del a---- 52
after del b---- 51
after del lst1---- 50
Four 、Redis relevant
4.1 redis in ×× Source code implementation
The first stage : read Redis Data structure part of
- Basically located in the following files : Memory allocation zmalloc.c and zmalloc.h
- Dynamic string sds.h and sds.c
- Double ended linked list adlist.c and adlist.h
- Dictionaries dict.h and dict.c
- Skip list server.h It's about zskiplist The structure and zskiplistNode structure , as well as t_zset.c All in zsl Initial function , such as zslCreate、zslInsert、zslDeleteNode wait .
- Base Statistics hyperloglog.c Medium hllhdr structure , And all with hll Initial function
The second stage : be familiar with Redis Memory coding structure
- Integer set data structure intset.h and intset.c
- Compressed list data structure ziplist.h and ziplist.c
The third stage : be familiar with Redis Implementation of data type
- Object system object.c
- String key t_string.c
- List building t_list.c
- The hash key t_hash.c
- Set key t_set.c
- Ordered set key t_zset.c Middle Division zsl All functions except the function at the beginning
- HyperLogLog key hyperloglog.c In the pf Initial function
The fourth stage be familiar with Redis The realization of database
- Database implementation redis.h In the document redisDb structure , as well as db.c file
- notifications notify.c
- RDB Persistence rdb.c
- AOF Persistence aof.c
And the implementation of some independent functional modules
- Publish and subscribe redis.h Of documents pubsubPattern structure , as well as pubsub.c file
- Business redis.h Of documents multiState Structure and multiCmd structure ,multi.c file
The fifth stage Familiar with client and server code implementation
- Event processing module ae.c/ae_epoll.c/ae_evport.c/ae_kqueue.c/ae_select.c
- Network link library anet.c and networking.c
- Server side redis.c
- client redis-cli.c
- At this time, you can read the code implementation of the following independent function modules
- lua Script scripting.c
- The slow query slowlog.c
- monitor monitor.c
Phase 6 This stage is mainly about getting familiar with Redis Multi machine part of the code implementation
- Copy function replication.c
- Redis Sentinel sentinel.c
- colony cluster.c
Reference
[1] How to solve the cold start problem in the recommendation system
边栏推荐
- NFT:使用 EIP-2981 开启 NFT 版税之旅
- [untitled] Li Kou 496 Next larger element I
- [send email with error] 535 error:authentication failed
- 205. isomorphic string
- 【TA-霜狼_may-《百人计划》】2.1 色彩空间
- 不同性能测试工具的并发模式
- PageObject模式解析及案例
- 674. longest continuous increasing sequence force buckle JS
- 盘点华为云GaussDB(for Redis)六大秒级能力
- Deep learning | rnn/lstm of naturallanguageprocessing
猜你喜欢

【TA-霜狼_may-《百人计划》】2.2 模型与材质空间

互联网行业最佳产品开发流程 推荐!

Grid system in bootstrap

【TA-霜狼_may-《百人计划》】1.2.1 向量基础

Custom components in applets

Unity's 3D multi-point arrow navigation

Mallbook: how can hotel enterprises break the situation in the post epidemic era?

Future of NTF and trends in 2022

Coinbase in a bear market: losses, layoffs, stock price plunges

Deep learning | rnn/lstm of naturallanguageprocessing
随机推荐
Account sharing technology enables the farmers' market and reshapes the efficiency of transaction management services
嵌入式系統開發筆記80:應用Qt Designer進行主界面設計
创新界,聚势行 | 2022人大金仓“百城巡展”火热开启
Embedded System Development Notes 81: Using Dialog component to design prompt dialog box
【TA-霜狼_may-《百人计划》】1.4 PC手机图形API介绍
205. isomorphic string
Loop filtering based on Unet
206.反转链表
mysql 函数 变量 存储过程
409. longest palindrome
242. valid Letter heteronyms
MallBook:后疫情时代下,酒店企业如何破局?
Its appearance makes competitors tremble. Interpretation of Sony vision-s 02 products
LeetCode 1400. Construct K palindrome strings
[untitled]
171. excel table column No
[TA frost wolf \u may- hundred talents plan] 1.2.3 MVP matrix operation
Web components series (VIII) -- custom component style settings
LeetCode 1828. Count the number of points in a circle
389. find a difference