当前位置:网站首页>Recommendation systems: feature engineering, common features
Recommendation systems: feature engineering, common features
2022-07-30 00:55:00 【u013250861】
一、特征工程的重要性


- Feature engineering can make machine learning models achieve better results
Features commonly used in recommender systems









- 用户行为信息
- 属性、标签信息(不容易获取)
- 用户关系信息
- 内容信息
- 上下文信息
二、Inadequate original features




- does not belong to the unified dimension
- 信息冗余
- There are non-quantitative qualitative features
- 存在缺失值
三、A common approach to feature engineering









- 标准化
- It is more suitable for data that is normally distributed(如价格)
- 对异常值不敏感
- 归一化
- Suitable for data whose distribution is uncertain(Such as dummy encoding backend classification data)
- 对异常值较为敏感
- 二值化
- Transform qualitative features into quantitative features
- 哑编码
- Convert discrete attribute classification features to 0、1向量
- 缺失值补全
- Commonly used supplement0、平均值、median and other methods
四、特征选择



Apache Spark
- An open source distributed computing framework
- 计算速度快:相对于Hadoop有最多100倍的提升 - Powerful cache design:Provides memory through a simple interface+硬盘缓存
- 部署灵活:支持YARN,k8s等集群管理工具 - 实时性高:Provides tools specific to stream computing
- 通用性高:提供多种语言APIand various business abstractions
- RDD
- Resilient Distributed Dataset
- Resilient: Good fault tolerance and automatic error recovery
- Distributed:天生的分布式
- Dataset:Provide uniformity to users、Distributed transparent programming interface
行为数据采集
- Data generated when a user interacts with a product,如点赞、收藏、浏览
- Usually uploaded by the client
- 为何使用KafkaProcess behavioral data? - 解耦:Message producers and consumers can work independently of each other
- 拓展性:To cope with the rapid expansion of the number of users, it can efficiently expand the capacity
- 削峰填谷:Effectively ensure the smooth distribution of traffic during the event
- 异步通信:Suitable for processing behavioral data
- Kafka核心概念
- Broker:集群中的服务器
- Topic:The logical category of the message
- Partition:topicphysical storage unit - Producer\Consumer:消息生产、消费者 - Consumer Group:消费者群组
边栏推荐
- 利用热点事件来创作软文的3大技巧?自媒体人必看
- 每周推荐短视频:研发效能是什么?它可以实现反“内卷”?
- nacos集群配置详解
- He cell separation technology 丨 basic primary cell separation methods and materials
- Linux - install MySQL (detailed tutorial)
- 【微服务~Nacos】Nacos之配置中心
- Worthington细胞分离技术丨基本原代细胞分离方法和材料
- 【MySQL系列】MySQL数据库基础
- MySql的初识感悟,以及sql语句中的DDL和DML和DQL的基本语法
- [Flutter] Flutter preloading of mixed development solves the problem of slow page loading for the first time
猜你喜欢
![[MySQL series] MySQL database foundation](/img/50/cc75b2cdf6e52714c1d492fa1ae2c4.png)
[MySQL series] MySQL database foundation

@RequestParam注解的详细介绍

3 tips for using hot events to create press releases?A must-see for self-media people

经典毕业设计:基于SSM实现高校后勤报修系统

The range of motion of the robot

I.MX6U-驱动开发-3-新字符驱动

自学HarmonyOS应用开发(56)- 用Service保证应用在后台持续运行

会议OA之待开会议&&所有会议

推荐系统:特征工程、常用特征

自媒体人如何打造出爆文?这3种类型的文章最容易爆
随机推荐
Music theory & guitar skills
专心致志做事情
Detailed introduction of @RequestParam annotation
头条号自媒体运营:如何在今日头条涨500+粉丝?
Worthington解离酶:胰蛋白酶及常见问题
Fabric 私有数据案例
中文语义匹配
docker安装redis集群(含部署脚本)
CMake Tutorial 巡礼(0)_总述
Worthington优化技术:细胞定量
Reconstruction of binary tree
@RequestParam注解的详细介绍
[Training DAY16] ALFA [convex hull] [computational geometry]
He used to cells harvested trypsin & release procedure
自媒体人如何打造出爆文?这3种类型的文章最容易爆
新媒体运营必备的4个热点查询网
[Best training DAY16] KC's Can [Dynamic programming]
Replace the executable file glibc version of the one
自学HarmonyOS应用开发(49)- 引入地图功能
The strongest JVM in the whole network is coming!(Extreme Collector's Edition)