当前位置:网站首页>How to use databricks for data analysis on tidb cloud | tidb cloud User Guide
How to use databricks for data analysis on tidb cloud | tidb cloud User Guide
2022-07-28 13:00:00 【InfoQ】

Set up TiDB Cloud Dev Tier colony
- register TiDB Cloud Account and log in .
- stayCreate Cluster > Developer TierUnder menu , choice1 year Free Trial.
- Set cluster name , And select a region for the cluster .
- single clickCreate. about 1~3 Minutes later ,TiDB Cloud Cluster created successfully .
- stayOverviewpanel , single clickConnectAnd create a flow filter . for example , add to IP Address 0.0.0.0/0, Allow all IP visit .
Import the sample data TiDB Cloud
- In the cluster information pane , single click Import. And then , Will appear Data Import Task page .
- Configure the import task as follows :
- Data Source Type :
Amazon S3
- Bucket URL :
s3://tidbcloud-samples/data-ingestion/
- Data Format :
TiDB Dumpling
- Role-ARN :
arn:aws:iam::385595570414:role/import-sample-access
- To configureTarget Databasewhen , type TiDB ClusteredUsernameandPassword.
- single clickImport, Start importing sample data . The whole process will last about 3 minute .
- Return to the overview panel , single clickConnect to Get the MyCLI URL.
- Use MyCLI The client checks whether the sample data is imported successfully :
$ mycli -u root -h tidb.xxxxxx.aws.tidbcloud.com -P 4000
(none)> SELECT COUNT(*) FROM bikeshare.trips;
+----------+
| COUNT(*) |
+----------+
| 816090 |
+----------+
1 row in set
Time: 0.786s
Use Databricks Connect TiDB Cloud
- stay Databricks work area , Create and associate as follows Spark colony :

- stay Databricks Configure in notebook JDBC.TiDB have access to Databricks default JDBC The driver , Therefore, there is no need to configure driver parameters :
%scala
val url = "jdbc:mysql://tidb.xxxx.prod.aws.tidbcloud.com:4000"
val table = "bikeshare.trips"
val user = "root"
val password = "xxxxxxxxxx"
- url: Used to connect to TiDB Cloud Of JDBC URL
- table: Specify data sheets , for example :{table}
- user: Used to connect to TiDB Cloud Of user name
- password: User's password
- Check TiDB Cloud The connectivity of :
%scala
import java.sql.DriverManager
val connection = DriverManager.getConnection(url, user, password)
connection.isClosed()
res2: Boolean = false
stay Databricks Analyze the data
- Create a Spark DataFrame Used for loading TiDB data . here , We will refer to the variables defined in the previous steps :
%scala
val remote_table = spark.read.format("jdbc")
.option("url", url)
.option("dbtable", table)
.option("user", user)
.option("password", password)
.load()
- Query data .Databricks Provide powerful chart display function , You can customize the chart type :
%scala
display(remote_table.select("*"))

- Create a DataFrame View or one DataFrame surface . Let's create a name “trips” As an example :
%scala
remote_table.createOrReplaceTempView("trips")
- Use SQL Statement query data . The following statement will query the number of each type of single vehicle :
%sql
SELECT rideable_type, COUNT(*) count FROM trips GROUP BY rideable_type ORDER BY count DESC
- Write the analysis results to TiDB Cloud:
%scala
spark.table("type_count")
.withColumnRenamed("type", "count")
.write
.format("jdbc")
.option("url", url)
.option("dbtable", "bikeshare.type_count")
.option("user", user)
.option("password", password)
.option("isolationLevel", "NONE")
.mode(SaveMode.Append)
.save()
take TiDB Cloud Sample notebook import Databricks
- stay Databricks work area , single clickCreate > Import, And paste TiDB Cloud Examples URL, Download the notebook to your Databricks work area .
- Associate this notebook with your Spark colony .
- Use your own TiDB Cloud Cluster information replaces JDBC To configure .
- Follow the steps in the notebook , adopt Databricks Use TiDB Cloud.
summary
边栏推荐
- Siemens docking Leuze BPS_ 304i notes
- 快速读入
- Solution to using json.tojsonstring to display question marks in Chinese in Servlet
- 05 pyechars basic chart (example code + effect diagram)
- Brother bird talks about cloud native security best practices
- Change the document type in endnode and import it in word
- LeetCode206 反转链表
- Understanding of vite2
- How can non-standard automation equipment enterprises do well in product quality management with the help of ERP system?
- Uncover why devaxpress WinForms, an interface control, discards the popular maskbox property
猜你喜欢

Which big model is better? Openbmb releases bmlist to give you the answer!

Interface control telerik UI for WPF - how to use radspreadsheet to record or comment
![[graduation design] heart rate detection system based on single chip microcomputer - STM32 embedded Internet of things](/img/b4/06c822c52f5bb0045698b7107efb26.png)
[graduation design] heart rate detection system based on single chip microcomputer - STM32 embedded Internet of things

Fundamentals of machine learning Bayesian analysis-14

【嵌入式C基础】第6篇:超详细的常用的输入输出函数讲解

Science heavyweight: AI design protein has made another breakthrough, and it can design specific functional proteins

机器学习实战-集成学习-23

机器学习基础-贝叶斯分析-14

How to open the power saving mode of win11 system computer

2020-12-27
随机推荐
机器学习基础-支持向量机 SVM-17
力扣315计算右侧小于当前元素的个数
[graduation design] smart home system based on ZigBee - single chip microcomputer Internet of things stm32
Science heavyweight: AI design protein has made another breakthrough, and it can design specific functional proteins
【嵌入式C基础】第1篇:基本数据类型
Machine learning practice - integrated learning-23
Vs code is not in its original position after being updated
MySQL limit paging optimization
Ccf201912-2 recycling station site selection
LeetCode394 字符串解码
C structure use
大模型哪家强?OpenBMB发布BMList给你答案!
Uncover why devaxpress WinForms, an interface control, discards the popular maskbox property
机器学习基础-集成学习-13
【嵌入式C基础】第3篇:常量和变量
Rolling update strategy of deployment.
[pictures and texts] detailed tutorial of one click reinstallation of win11 system
Installation and reinstallation of win11 system graphic version tutorial
机器学习实战-逻辑回归-19
Understanding of vite2