当前位置:网站首页>How to use databricks for data analysis on tidb cloud | tidb cloud User Guide
How to use databricks for data analysis on tidb cloud | tidb cloud User Guide
2022-07-28 13:00:00 【InfoQ】

Set up TiDB Cloud Dev Tier colony
- register TiDB Cloud Account and log in .
- stayCreate Cluster > Developer TierUnder menu , choice1 year Free Trial.
- Set cluster name , And select a region for the cluster .
- single clickCreate. about 1~3 Minutes later ,TiDB Cloud Cluster created successfully .
- stayOverviewpanel , single clickConnectAnd create a flow filter . for example , add to IP Address 0.0.0.0/0, Allow all IP visit .
Import the sample data TiDB Cloud
- In the cluster information pane , single click Import. And then , Will appear Data Import Task page .
- Configure the import task as follows :
- Data Source Type :
Amazon S3
- Bucket URL :
s3://tidbcloud-samples/data-ingestion/
- Data Format :
TiDB Dumpling
- Role-ARN :
arn:aws:iam::385595570414:role/import-sample-access
- To configureTarget Databasewhen , type TiDB ClusteredUsernameandPassword.
- single clickImport, Start importing sample data . The whole process will last about 3 minute .
- Return to the overview panel , single clickConnect to Get the MyCLI URL.
- Use MyCLI The client checks whether the sample data is imported successfully :
$ mycli -u root -h tidb.xxxxxx.aws.tidbcloud.com -P 4000
(none)> SELECT COUNT(*) FROM bikeshare.trips;
+----------+
| COUNT(*) |
+----------+
| 816090 |
+----------+
1 row in set
Time: 0.786s
Use Databricks Connect TiDB Cloud
- stay Databricks work area , Create and associate as follows Spark colony :

- stay Databricks Configure in notebook JDBC.TiDB have access to Databricks default JDBC The driver , Therefore, there is no need to configure driver parameters :
%scala
val url = "jdbc:mysql://tidb.xxxx.prod.aws.tidbcloud.com:4000"
val table = "bikeshare.trips"
val user = "root"
val password = "xxxxxxxxxx"
- url: Used to connect to TiDB Cloud Of JDBC URL
- table: Specify data sheets , for example :{table}
- user: Used to connect to TiDB Cloud Of user name
- password: User's password
- Check TiDB Cloud The connectivity of :
%scala
import java.sql.DriverManager
val connection = DriverManager.getConnection(url, user, password)
connection.isClosed()
res2: Boolean = false
stay Databricks Analyze the data
- Create a Spark DataFrame Used for loading TiDB data . here , We will refer to the variables defined in the previous steps :
%scala
val remote_table = spark.read.format("jdbc")
.option("url", url)
.option("dbtable", table)
.option("user", user)
.option("password", password)
.load()
- Query data .Databricks Provide powerful chart display function , You can customize the chart type :
%scala
display(remote_table.select("*"))

- Create a DataFrame View or one DataFrame surface . Let's create a name “trips” As an example :
%scala
remote_table.createOrReplaceTempView("trips")
- Use SQL Statement query data . The following statement will query the number of each type of single vehicle :
%sql
SELECT rideable_type, COUNT(*) count FROM trips GROUP BY rideable_type ORDER BY count DESC
- Write the analysis results to TiDB Cloud:
%scala
spark.table("type_count")
.withColumnRenamed("type", "count")
.write
.format("jdbc")
.option("url", url)
.option("dbtable", "bikeshare.type_count")
.option("user", user)
.option("password", password)
.option("isolationLevel", "NONE")
.mode(SaveMode.Append)
.save()
take TiDB Cloud Sample notebook import Databricks
- stay Databricks work area , single clickCreate > Import, And paste TiDB Cloud Examples URL, Download the notebook to your Databricks work area .
- Associate this notebook with your Spark colony .
- Use your own TiDB Cloud Cluster information replaces JDBC To configure .
- Follow the steps in the notebook , adopt Databricks Use TiDB Cloud.
summary
边栏推荐
- How many times can the WordPress user name be changed? Attach the method of changing user name
- Insufficient permission to pull server code through Jenkins and other precautions
- Using JSON in C language projects
- Initialization examples of several modes of mma8452q
- The input string contains an array of numbers and non characters, such as a123x456. Take the consecutive numbers as an integer, store them in an array in turn, such as 123 in a[0], 456 in a[1], and ou
- DART 三维辐射传输模型申请及下载
- 2020-12-07
- The largest rectangle in leetcode84 histogram
- Databinding+LiveData轻松实现无重启换肤
- 【嵌入式C基础】第1篇:基本数据类型
猜你喜欢

Ruan Bonan of Green Alliance Technology: cloud native security from the open source shooting range

Interface control telerik UI for WPF - how to use radspreadsheet to record or comment

Sliding Window
![[embedded explanation] key scanning based on finite state machine and stm32](/img/ce/cc3f959a4e4f5b22e2c711ea887ad7.png)
[embedded explanation] key scanning based on finite state machine and stm32

Understanding of vite2

SQL most commonly used basic operation syntax

Machine learning practice - logistic regression-19

Fundamentals of machine learning Bayesian analysis-14

Shenwenbo, researcher of the Hundred Talents Program of Zhejiang University: kernel security in the container scenario

Leetcode94. Middle order traversal of binary trees
随机推荐
Leetcode: array
【嵌入式C基础】第2篇:进制转换与BCD编码
Quick read in
Shenwenbo, researcher of the Hundred Talents Program of Zhejiang University: kernel security in the container scenario
Comments are not allowed in JSON
C# static的用法详解
[graduation design] heart rate detection system based on single chip microcomputer - STM32 embedded Internet of things
04 pyechars geographic chart (example code + effect diagram)
Merge table rows - three levels of for loop traversal data
Vs code is not in its original position after being updated
非标自动化设备企业如何借助ERP系统,做好产品质量管理?
Hc-05 Bluetooth module debugging slave mode and master mode experience
Leetcode206 reverse linked list
Machine learning practice - decision tree-22
Siemens docking Leuze BPS_ 304i notes
Connected Block & food chain - (summary of parallel search set)
Monotonic stack
BA autoboot plug-in of uniapp application boot
Ccf201912-2 recycling station site selection
机器学习实战-逻辑回归-19