当前位置:网站首页>Recommendation systems: feature engineering, common features
Recommendation systems: feature engineering, common features
2022-07-30 00:55:00 【u013250861】
一、特征工程的重要性


- Feature engineering can make machine learning models achieve better results
Features commonly used in recommender systems









- 用户行为信息
- 属性、标签信息(不容易获取)
- 用户关系信息
- 内容信息
- 上下文信息
二、Inadequate original features




- does not belong to the unified dimension
- 信息冗余
- There are non-quantitative qualitative features
- 存在缺失值
三、A common approach to feature engineering









- 标准化
- It is more suitable for data that is normally distributed(如价格)
- 对异常值不敏感
- 归一化
- Suitable for data whose distribution is uncertain(Such as dummy encoding backend classification data)
- 对异常值较为敏感
- 二值化
- Transform qualitative features into quantitative features
- 哑编码
- Convert discrete attribute classification features to 0、1向量
- 缺失值补全
- Commonly used supplement0、平均值、median and other methods
四、特征选择



Apache Spark
- An open source distributed computing framework
- 计算速度快:相对于Hadoop有最多100倍的提升 - Powerful cache design:Provides memory through a simple interface+硬盘缓存
- 部署灵活:支持YARN,k8s等集群管理工具 - 实时性高:Provides tools specific to stream computing
- 通用性高:提供多种语言APIand various business abstractions
- RDD
- Resilient Distributed Dataset
- Resilient: Good fault tolerance and automatic error recovery
- Distributed:天生的分布式
- Dataset:Provide uniformity to users、Distributed transparent programming interface
行为数据采集
- Data generated when a user interacts with a product,如点赞、收藏、浏览
- Usually uploaded by the client
- 为何使用KafkaProcess behavioral data? - 解耦:Message producers and consumers can work independently of each other
- 拓展性:To cope with the rapid expansion of the number of users, it can efficiently expand the capacity
- 削峰填谷:Effectively ensure the smooth distribution of traffic during the event
- 异步通信:Suitable for processing behavioral data
- Kafka核心概念
- Broker:集群中的服务器
- Topic:The logical category of the message
- Partition:topicphysical storage unit - Producer\Consumer:消息生产、消费者 - Consumer Group:消费者群组
边栏推荐
- News text classification
- 每周推荐短视频:研发效能是什么?它可以实现反“内卷”?
- STM32 - OLED display experiment
- WeChat developer tools set the tab size to 2
- KDE Frameworks 5.20.0: Plasma welcomes many improvements
- 更换可执行文件glibc版本的某一次挣扎
- “灯塔工厂”的中国路径:智造从点到面铺开
- @RequestParam注解的详细介绍
- Baidu Intelligent Cloud Zhangmiao: Detailed explanation of enterprise-level seven-layer load balancing open source software BFE
- 7.28
猜你喜欢
随机推荐
转发和重定向的区别及使用场景
go语言解决自定义header的跨域问题
这是一道非常有争议的题,我的分析如下: TCP/IP在多个层引入了安全机制,其中TLS协议位于______。 A.数据链路层 B.网络层 C.传输层 D.应用层
CMake Tutorial 巡礼(0)_总述
【Flutter】Flutter inspector 工具使用详解,查看Flutter布局,widget树,调试界面等
重新定义分析 - EventBridge 实时事件分析平台发布
自学HarmonyOS应用开发(56)- 用Service保证应用在后台持续运行
How many ways does Selenium upload files?I don't believe you have me
自学HarmonyOS应用开发(49)- 引入地图功能
新闻文本分类
机器人的运动范围
Navicat如何连接MySQL
Selenium上传文件有多少种方式?不信你有我全
3 tips for using hot events to create press releases?A must-see for self-media people
My first understanding of MySql, and the basic syntax of DDL and DML and DQL in sql statements
Running a Fabric Application
Linux - install MySQL (detailed tutorial)
1592. 重新排列单词间的空格
自媒体人如何打造出爆文?这3种类型的文章最容易爆
工厂模式









