当前位置:网站首页>How to use databricks for data analysis on tidb cloud | tidb cloud User Guide
How to use databricks for data analysis on tidb cloud | tidb cloud User Guide
2022-07-28 13:00:00 【InfoQ】

Set up TiDB Cloud Dev Tier colony
- register TiDB Cloud Account and log in .
- stayCreate Cluster > Developer TierUnder menu , choice1 year Free Trial.
- Set cluster name , And select a region for the cluster .
- single clickCreate. about 1~3 Minutes later ,TiDB Cloud Cluster created successfully .
- stayOverviewpanel , single clickConnectAnd create a flow filter . for example , add to IP Address 0.0.0.0/0, Allow all IP visit .
Import the sample data TiDB Cloud
- In the cluster information pane , single click Import. And then , Will appear Data Import Task page .
- Configure the import task as follows :
- Data Source Type :
Amazon S3
- Bucket URL :
s3://tidbcloud-samples/data-ingestion/
- Data Format :
TiDB Dumpling
- Role-ARN :
arn:aws:iam::385595570414:role/import-sample-access
- To configureTarget Databasewhen , type TiDB ClusteredUsernameandPassword.
- single clickImport, Start importing sample data . The whole process will last about 3 minute .
- Return to the overview panel , single clickConnect to Get the MyCLI URL.
- Use MyCLI The client checks whether the sample data is imported successfully :
$ mycli -u root -h tidb.xxxxxx.aws.tidbcloud.com -P 4000
(none)> SELECT COUNT(*) FROM bikeshare.trips;
+----------+
| COUNT(*) |
+----------+
| 816090 |
+----------+
1 row in set
Time: 0.786s
Use Databricks Connect TiDB Cloud
- stay Databricks work area , Create and associate as follows Spark colony :

- stay Databricks Configure in notebook JDBC.TiDB have access to Databricks default JDBC The driver , Therefore, there is no need to configure driver parameters :
%scala
val url = "jdbc:mysql://tidb.xxxx.prod.aws.tidbcloud.com:4000"
val table = "bikeshare.trips"
val user = "root"
val password = "xxxxxxxxxx"
- url: Used to connect to TiDB Cloud Of JDBC URL
- table: Specify data sheets , for example :{table}
- user: Used to connect to TiDB Cloud Of user name
- password: User's password
- Check TiDB Cloud The connectivity of :
%scala
import java.sql.DriverManager
val connection = DriverManager.getConnection(url, user, password)
connection.isClosed()
res2: Boolean = false
stay Databricks Analyze the data
- Create a Spark DataFrame Used for loading TiDB data . here , We will refer to the variables defined in the previous steps :
%scala
val remote_table = spark.read.format("jdbc")
.option("url", url)
.option("dbtable", table)
.option("user", user)
.option("password", password)
.load()
- Query data .Databricks Provide powerful chart display function , You can customize the chart type :
%scala
display(remote_table.select("*"))

- Create a DataFrame View or one DataFrame surface . Let's create a name “trips” As an example :
%scala
remote_table.createOrReplaceTempView("trips")
- Use SQL Statement query data . The following statement will query the number of each type of single vehicle :
%sql
SELECT rideable_type, COUNT(*) count FROM trips GROUP BY rideable_type ORDER BY count DESC
- Write the analysis results to TiDB Cloud:
%scala
spark.table("type_count")
.withColumnRenamed("type", "count")
.write
.format("jdbc")
.option("url", url)
.option("dbtable", "bikeshare.type_count")
.option("user", user)
.option("password", password)
.option("isolationLevel", "NONE")
.mode(SaveMode.Append)
.save()
take TiDB Cloud Sample notebook import Databricks
- stay Databricks work area , single clickCreate > Import, And paste TiDB Cloud Examples URL, Download the notebook to your Databricks work area .
- Associate this notebook with your Spark colony .
- Use your own TiDB Cloud Cluster information replaces JDBC To configure .
- Follow the steps in the notebook , adopt Databricks Use TiDB Cloud.
summary
边栏推荐
- [graduation design] smart home system based on ZigBee - single chip microcomputer Internet of things stm32
- 区块反转(暑假每日一题 7)
- 连通块&&食物链——(并查集小结)
- What if the win11 folder cannot be opened
- Low code: reduce technical capability requirements and improve software development efficiency
- Science 重磅:AI设计蛋白质再获突破,可设计特定功能性蛋白质
- How to add PDF virtual printer in win11
- 合并表格行---三层for循环遍历数据
- MySQL limit paging optimization
- [graduation design] oscilloscope design and Implementation Based on STM32 - single chip microcomputer Internet of things
猜你喜欢

大模型哪家强?OpenBMB发布BMList给你答案!

Cloud native - runtime environment

【嵌入式C基础】第2篇:进制转换与BCD编码

How to open the power saving mode of win11 system computer
![[June 28 event preview] low code Summit](/img/ba/2f35306ed2fb0ece14d704256d8e60.png)
[June 28 event preview] low code Summit

2020-12-27

Ruan Bonan of Green Alliance Technology: cloud native security from the open source shooting range

试用copilot过程中问题解决

DART 三维辐射传输模型申请及下载

Machine learning practice - neural network-21
随机推荐
机器学习基础-集成学习-13
2020-12-27
Chapter IX rest Service Security
第九章 REST 服务安全
Zurich Federal Institute of technology | reference based image super resolution with deformable attention transformer (eccv2022))
What if the win11 folder cannot be opened
Unity installs the device simulator
Science heavyweight: AI design protein has made another breakthrough, and it can design specific functional proteins
Flexpro software: measurement data analysis in production, research and development
How to open the power saving mode of win11 system computer
01 introduction to pyechars features, version and installation
区块反转(暑假每日一题 7)
CTO of youhaoda, MVP of Huawei cloud, and Zhang Shanyou: build cloud native applications based on kubernetes and dapr
Communication example between upper computer and Mitsubishi fn2x
Xampp Chinese tutorial guide
机器学习实战-决策树-22
【嵌入式C基础】第8篇:C语言数组讲解
Block reversal (summer vacation daily question 7)
【嵌入式C基础】第6篇:超详细的常用的输入输出函数讲解
Li FuPan: application practice of kata safety container in ant group