当前位置:网站首页>Using tensorflow to forecast the rental price of airbnb in New York City
Using tensorflow to forecast the rental price of airbnb in New York City
2020-11-06 01:14:00 【Artificial intelligence meets pioneer】
author |TIMOTHY102 compile |VK source |Analytics Vidhya
Introduce
Airbnb It's an online market , Allow people to rent their property or spare rooms to guests . Per booking 3 Guests , collect 12% and 6% Commission .
The company started from 2009 Since its establishment in , Has been helped from every year 2.1 Ten thousand guests found accommodation , To help every year 600 Ten thousand people on holiday , Currently in 90 From different countries 34000 Cities list amazing 80 Ten thousand houses .
In this paper , I will use Kaggle-newyorkcityairbnb Open data set , Try to use TensorFlow Build a neural network model to predict .
The goal is to build a suitable machine learning model , Be able to predict the price of future accommodation data .
In this paper , I'm going to show you what I've created Jupyter Notebook. You can GitHub Find it on the :https://github.com/Timothy102/Tensorflow-for-Airbnb-Prices
Load data
First , Let's see how to load data . We use it wget Directly from Kaggle Get data on the website . Be careful -o The flag indicates the file name .
The dataset should look like this . share 48895 That's ok 16 Column .
Data analysis and preprocessing
Seaborn There's a very simple API, You can draw all kinds of graphs for all kinds of data . If you are not familiar with grammar , Check out this article :https://www.analyticsvidhya.com/blog/2019/09/comprehensive-data-visualization-guide-seaborn-python/
stay pandas Use on data frame corr after , We pass it on to a heatmap function . give the result as follows :
Since we have longitude and longitude and neighborhood data , Let's create a scatter plot :
Besides , I've deleted duplicate items and some unnecessary Columns , And filled in “reviews_per_month”, Because it has too many missing values . The data looks like this . It has 10 Column , There is no zero value :
very good , Right ?
First , Computers do numbers . That's why we need to convert a sort column into a one-hot Encoded vector . This is the use of pandas Of factorize Method . You can use a lot of other tools :
In order to keep the loss function in a stable range , Let's normalize some data , Let the average be 0, The standard deviation is 1.
Feature crossover
We have to make a change , This is an essential change . To correlate longitude and latitude with model output , We have to create a feature crossover . The following links should provide you with sufficient background knowledge , So that you can feel the cross of features correctly :
- https://developers.google.com/machine-learning/crash-course/feature-crosses/video-lecture
- https://www.kaggle.com/vikramtiwari/feature-crosses-tensorflow-mlcc
Our goal is to introduce latitude longitude crossing , This is one of the oldest techniques in the book . If we just put these two columns in the model as values , It will assume that these values are gradually related to the output .
contrary , We're going to use feature crossover , That means we're going to put longitude * The longitude map is divided into a grid . Fortunately, ,TensorFlow Make it easy .
I go through iteration (max-min)/100, So as to generate a frame grid with uniform distribution .
I use it 100×100 grid :
Essentially , What we're doing here , Is to define a bucked Columns and the boundaries defined earlier , And create a DenseFeatures layer , Then pass it to Sequential API.
If you're not familiar with it Tensorflow grammar , Please check the documentation :https://www.tensorflow.org/api_docs/python/tf/feature_column/
Now? , finally , We are ready for model training . Apart from splitting the data part , in other words .
obviously , We have to create two datasets , One contains all the data , The other contains the predicted score . Due to data size mismatch , This may cause problems for our model , So I decided to truncate data that was too long .
Creating models
Last , Established Keras Sequence model .
We use Adam Optimizer 、 Mean square error loss and two metrics to compile the model .
Besides , We use two callbacks :
-
Stop early , This is self-evident
-
Reduce the learning rate at high altitude .
after 50 individual epoch Training for ,batch The size is 64, Our model is quite successful .
ending
We use New York City AirBnB The data builds a fully connected neural network to predict future prices .Pandas and seaborn It makes it very easy to visualize and examine data . We introduce the idea of latitude longitude crossing as a feature in the model . And thanks to that Kaggle Open data set of , We have a fully operational machine learning model .
Link to the original text :https://www.analyticsvidhya.com/blog/2020/10/predicting-nyc-airbnb-rental-prices-tensorflow/
Welcome to join us AI Blog station : http://panchuang.net/
sklearn Machine learning Chinese official documents : http://sklearn123.com/
Welcome to pay attention to pan Chuang blog resource summary station : http://docs.panchuang.net/
版权声明
本文为[Artificial intelligence meets pioneer]所创,转载请带上原文链接,感谢
边栏推荐
- 谁说Cat不能做链路跟踪的,给我站出来
- How to demote a domain controller in Windows Server 2012 and later
- 被老程式設計師壓榨怎麼辦?我不想辭職
- Listening to silent words: hand in hand teaching you sign language recognition with modelarts
- 使用NLP和ML来提取和构造Web数据
- Kitty中的动态线程池支持Nacos,Apollo多配置中心了
- DTU连接经常遇到的问题有哪些
- 【新閣教育】窮學上位機系列——搭建STEP7模擬環境
- 词嵌入教程
- Anomaly detection method based on SVM
猜你喜欢
Technical director, to just graduated programmers a word - do a good job in small things, can achieve great things
安装Anaconda3 后,怎样使用 Python 2.7?
文本去重的技术方案讨论(一)
3分钟读懂Wi-Fi 6于Wi-Fi 5的优势
Want to do read-write separation, give you some small experience
词嵌入教程
GUI 引擎评价指标
(1)ASP.NET Core3.1 Ocelot介紹
2018个人年度工作总结与2019工作计划(互联网)
简直骚操作,ThreadLocal还能当缓存用
随机推荐
如何在Windows Server 2012及更高版本中將域控制器降級
6.9.2 session flashmapmanager redirection management
How to get started with new HTML5 (2)
谁说Cat不能做链路跟踪的,给我站出来
Ubuntu18.04上安裝NS-3
如何将数据变成资产?吸引数据科学家
Top 10 best big data analysis tools in 2020
使用Asponse.Words處理Word模板
Listening to silent words: hand in hand teaching you sign language recognition with modelarts
有关PDF417条码码制的结构介绍
如果前端不使用SPA又能怎样?- Hacker News
[译] 5个Vuex插件,给你的下个VueJS项目
Kitty中的动态线程池支持Nacos,Apollo多配置中心了
用Keras LSTM构建编码器-解码器模型
Computer TCP / IP interview 10 even asked, how many can you withstand?
drf JWT認證模組與自定製
After brushing leetcode's linked list topic, I found a secret!
【效能優化】納尼?記憶體又溢位了?!是時候總結一波了!!
Python自动化测试学习哪些知识?
How to demote a domain controller in Windows Server 2012 and later