当前位置:网站首页>How to use databricks for data analysis on tidb cloud | tidb cloud User Guide
How to use databricks for data analysis on tidb cloud | tidb cloud User Guide
2022-07-28 13:00:00 【InfoQ】

Set up TiDB Cloud Dev Tier colony
- register TiDB Cloud Account and log in .
- stayCreate Cluster > Developer TierUnder menu , choice1 year Free Trial.
- Set cluster name , And select a region for the cluster .
- single clickCreate. about 1~3 Minutes later ,TiDB Cloud Cluster created successfully .
- stayOverviewpanel , single clickConnectAnd create a flow filter . for example , add to IP Address 0.0.0.0/0, Allow all IP visit .
Import the sample data TiDB Cloud
- In the cluster information pane , single click Import. And then , Will appear Data Import Task page .
- Configure the import task as follows :
- Data Source Type :
Amazon S3
- Bucket URL :
s3://tidbcloud-samples/data-ingestion/
- Data Format :
TiDB Dumpling
- Role-ARN :
arn:aws:iam::385595570414:role/import-sample-access
- To configureTarget Databasewhen , type TiDB ClusteredUsernameandPassword.
- single clickImport, Start importing sample data . The whole process will last about 3 minute .
- Return to the overview panel , single clickConnect to Get the MyCLI URL.
- Use MyCLI The client checks whether the sample data is imported successfully :
$ mycli -u root -h tidb.xxxxxx.aws.tidbcloud.com -P 4000
(none)> SELECT COUNT(*) FROM bikeshare.trips;
+----------+
| COUNT(*) |
+----------+
| 816090 |
+----------+
1 row in set
Time: 0.786s
Use Databricks Connect TiDB Cloud
- stay Databricks work area , Create and associate as follows Spark colony :

- stay Databricks Configure in notebook JDBC.TiDB have access to Databricks default JDBC The driver , Therefore, there is no need to configure driver parameters :
%scala
val url = "jdbc:mysql://tidb.xxxx.prod.aws.tidbcloud.com:4000"
val table = "bikeshare.trips"
val user = "root"
val password = "xxxxxxxxxx"
- url: Used to connect to TiDB Cloud Of JDBC URL
- table: Specify data sheets , for example :{table}
- user: Used to connect to TiDB Cloud Of user name
- password: User's password
- Check TiDB Cloud The connectivity of :
%scala
import java.sql.DriverManager
val connection = DriverManager.getConnection(url, user, password)
connection.isClosed()
res2: Boolean = false
stay Databricks Analyze the data
- Create a Spark DataFrame Used for loading TiDB data . here , We will refer to the variables defined in the previous steps :
%scala
val remote_table = spark.read.format("jdbc")
.option("url", url)
.option("dbtable", table)
.option("user", user)
.option("password", password)
.load()
- Query data .Databricks Provide powerful chart display function , You can customize the chart type :
%scala
display(remote_table.select("*"))

- Create a DataFrame View or one DataFrame surface . Let's create a name “trips” As an example :
%scala
remote_table.createOrReplaceTempView("trips")
- Use SQL Statement query data . The following statement will query the number of each type of single vehicle :
%sql
SELECT rideable_type, COUNT(*) count FROM trips GROUP BY rideable_type ORDER BY count DESC
- Write the analysis results to TiDB Cloud:
%scala
spark.table("type_count")
.withColumnRenamed("type", "count")
.write
.format("jdbc")
.option("url", url)
.option("dbtable", "bikeshare.type_count")
.option("user", user)
.option("password", password)
.option("isolationLevel", "NONE")
.mode(SaveMode.Append)
.save()
take TiDB Cloud Sample notebook import Databricks
- stay Databricks work area , single clickCreate > Import, And paste TiDB Cloud Examples URL, Download the notebook to your Databricks work area .
- Associate this notebook with your Spark colony .
- Use your own TiDB Cloud Cluster information replaces JDBC To configure .
- Follow the steps in the notebook , adopt Databricks Use TiDB Cloud.
summary
边栏推荐
- Jetpack Compose 完全脱离 View 系统了吗?
- 2020-12-13
- Problem solving during copilot trial
- STM32 Development Notes - experience sharing
- LeetCode394 字符串解码
- Summary: idea problem record
- Brother bird talks about cloud native security best practices
- Transaction of MySQL underlying principle (2)
- Comments are not allowed in JSON
- Zurich Federal Institute of technology | reference based image super resolution with deformable attention transformer (eccv2022))
猜你喜欢

01 introduction to pyechars features, version and installation

Linear classifier (ccf20200901)

Science heavyweight: AI design protein has made another breakthrough, and it can design specific functional proteins

Kotlin是如何帮助你避免内存泄漏的?

BiliBili Yang Zhou: above efficiency, efficient delivery

机器学习实战-神经网络-21

Transaction of MySQL underlying principle (2)

Fundamentals of machine learning - principal component analysis pca-16

LeetCode94. 二叉树的中序遍历

DART 三维辐射传输模型申请及下载
随机推荐
Sliding Window
Introduction to border border attribute
Uniapp 应用开机自启插件 Ba-Autoboot
leetcode 1518. 换酒问题
The input string contains an array of numbers and non characters, such as a123x456. Take the consecutive numbers as an integer, store them in an array in turn, such as 123 in a[0], 456 in a[1], and ou
Multiple items on a computer share a public-private key pair to pull the Gerrit server code
What if the right button of win11 start menu doesn't respond
机器学习基础-集成学习-13
Leetcode remove element & move zero
[graduation design] smart home system based on ZigBee - single chip microcomputer Internet of things stm32
【嵌入式C基础】第5篇:原码/反码/补码
Scala transformation, filtering, grouping, sorting
Ccf201912-2 recycling station site selection
区块反转(暑假每日一题 7)
05 pyechars 基本图表(示例代码+效果图)
IO流再回顾,深入理解序列化和反序列化
Pits encountered in MSP430 development (to be continued)
Leetcode394 string decoding
Zurich Federal Institute of technology | reference based image super resolution with deformable attention transformer (eccv2022))
Change the document type in endnode and import it in word