当前位置:网站首页>Recommendation systems: feature engineering, common features
Recommendation systems: feature engineering, common features
2022-07-30 00:55:00 【u013250861】
一、特征工程的重要性


- Feature engineering can make machine learning models achieve better results
Features commonly used in recommender systems









- 用户行为信息
- 属性、标签信息(不容易获取)
- 用户关系信息
- 内容信息
- 上下文信息
二、Inadequate original features




- does not belong to the unified dimension
- 信息冗余
- There are non-quantitative qualitative features
- 存在缺失值
三、A common approach to feature engineering









- 标准化
- It is more suitable for data that is normally distributed(如价格)
- 对异常值不敏感
- 归一化
- Suitable for data whose distribution is uncertain(Such as dummy encoding backend classification data)
- 对异常值较为敏感
- 二值化
- Transform qualitative features into quantitative features
- 哑编码
- Convert discrete attribute classification features to 0、1向量
- 缺失值补全
- Commonly used supplement0、平均值、median and other methods
四、特征选择



Apache Spark
- An open source distributed computing framework
- 计算速度快:相对于Hadoop有最多100倍的提升 - Powerful cache design:Provides memory through a simple interface+硬盘缓存
- 部署灵活:支持YARN,k8s等集群管理工具 - 实时性高:Provides tools specific to stream computing
- 通用性高:提供多种语言APIand various business abstractions
- RDD
- Resilient Distributed Dataset
- Resilient: Good fault tolerance and automatic error recovery
- Distributed:天生的分布式
- Dataset:Provide uniformity to users、Distributed transparent programming interface
行为数据采集
- Data generated when a user interacts with a product,如点赞、收藏、浏览
- Usually uploaded by the client
- 为何使用KafkaProcess behavioral data? - 解耦:Message producers and consumers can work independently of each other
- 拓展性:To cope with the rapid expansion of the number of users, it can efficiently expand the capacity
- 削峰填谷:Effectively ensure the smooth distribution of traffic during the event
- 异步通信:Suitable for processing behavioral data
- Kafka核心概念
- Broker:集群中的服务器
- Topic:The logical category of the message
- Partition:topicphysical storage unit - Producer\Consumer:消息生产、消费者 - Consumer Group:消费者群组
边栏推荐
- 自学HarmonyOS应用开发(47)- 自定义switch组件
- Docker installs redis cluster (including deployment script)
- Internship in a group
- 9 common mistakes testers fall into
- Navicat for mysql crack version installation
- My first understanding of MySql, and the basic syntax of DDL and DML and DQL in sql statements
- 专心致志做事情
- 中文语义匹配
- 1592. 重新排列单词间的空格
- Navicat报错:1045-Access denied for user [email protected](using passwordYES)
猜你喜欢

@RequestParam注解的详细介绍

基于TNEWS‘ 今日头条中文新闻(短文本)分类

这是一道非常有争议的题,我的分析如下: TCP/IP在多个层引入了安全机制,其中TLS协议位于______。 A.数据链路层 B.网络层 C.传输层 D.应用层

【Incubator DAY18】Interesting exchange【Simulation】【Math】

7.27

How to realize the frame selection of objects in canvas (6)

STM32——OLED显示实验

Finding a 2D Array

The range of motion of the robot

Worthington Optimized Technology: Cell Quantification
随机推荐
How to increase account weight?3 ways to operate your own media to help you get more revenue
Replace the executable file glibc version of the one
what is a .pro file in qt
WeChat developer tools set the tab size to 2
Worthington解离酶:胰蛋白酶及常见问题
He used to cells harvested trypsin & release procedure
【mysql】Mysql公用表表达式with as
Chinese semantic matching
1592. 重新排列单词间的空格
Recurrent Neural Network (RNN)
Reconstruction of binary tree
QTableWidget usage example
百度智能云章淼:详解企业级七层负载均衡开源软件BFE
Graph Theory: Bipartite Graphs
canvas 中如何实现物体的框选(六)
Worthington Enzymatic Cell Harvest & Cell Adhesion and Harvest
Worthington解离酶:中性蛋白酶(分散酶)详情解析
Music theory & guitar skills
基于SSM实现个性化健康饮食推荐系统
旋转数组的最小数字