当前位置:网站首页>Pyspark Machine Learning: Vectors and Common Operations
Pyspark Machine Learning: Vectors and Common Operations
2022-08-01 04:28:00 【Sun_Sherry】
Spark版本:V3.2.1
本篇主要介绍pyspark.ml.linalgvector operations in.
1. DenseVector(稠密向量)
1.1 创建
Dense vectors are similar to normal arrays,其创建方法如下:
from pyspark.ml import linalg
import numpy as np
dvect1=linalg.Vectors.dense([1,2,3,4,5])
dvect2=linalg.Vectors.dense(1.2,3,3,4,5)
print(dvect1)
print(dvect2)
其结果如下(To pay attention to its data typefloat型):
1.2 常用操作
- Add, subtract, multiply and divide operations on two vectors of the same length.具体如下:
res1=dvect1+dvect2
res2=dvect1-dvect2
res3=dvect1*dvect2
res4=dvect1/dvect2
print(res1)
print(res2)
print(res3)
print(res4)
其结果如下:
- 可以使用numpy.darray中的一些属性
dvec1_shape=dvect1.array.shape
dvec1_size=dvect1.array.size
print(dvec1_shape)# 其结果为:(5,)
print(dvec1_size)# 其结果为:5
- dot点乘操作
res_1=dvect1.dot([1,2,3,4,5])
res_2=dvect1.dot([0,1,0,0,0])
res_3=dvect1.dot(dvect2)
print(res_1) #结果为55
print(res_2) #结果为2
print(res_3) #结果为57.2
- 求向量的范式
dvect1=linalg.Vectors.dense([1,2,3,4,5])
norm_0=dvect1.norm(0)
norm_1=dvect1.norm(1)
norm_2=dvect2.norm(2)
print('dvect1的L0范式为:{}'.format(norm_0))
print('dvect1的L1范式为:{}'.format(norm_1))
print('dvect1的L2范式为:{:.3f}'.format(norm_2))
其结果如下:
- numNonZeros()统计非0元素的个数
dvect1=linalg.Vectors.dense([1,0,3,0,5])
num_nonzero=dvect1.numNonzeros()
print(num_nonzero)#其结果为3
- squared_distance()Find the squared distance of two vectors with the same dimension
dvect1=linalg.Vectors.dense([1,0,3])
dvect2=linalg.Vectors.dense([1,1,1])
dist=dvect1.squared_distance(dvect2) #其值为5
- get the value of the vector
dvect1=linalg.Vectors.dense([1,0,3])
print(dvect1.toArray())
print(dvect1.values)
2. SparseVector(稀疏向量)
2.1 创建
There are several ways to create sparse vectors::
- Vectors.sparse(向量长度, 索引数组,With the index array corresponding numerical arrays),其中索引从0开始编号,下同;
- Vectors.sparse(向量长度, {索引:数值,索引:数值, … \dots …})
- Vectors.sparse(向量长度,[(索引,数值),(索引,数值), … \dots …])
举例如下:
svect1=linalg.Vectors.sparse(3,[0,1],[3.4,4.5])
svect2=linalg.Vectors.sparse(3,{
0:3.4,2:4.5})
svect3=linalg.Vectors.sparse(4,[(2,3),(3,2.3)])
2.2 常用操作
Some operations on sparse variables are the same as those on dense vectors,不再赘述.Only the following two operations are introduced here:
- toArrayDisplay all values of a sparse variable
svect1=linalg.Vectors.sparse(3,[0,1],[3.4,4.5])
svect2=linalg.Vectors.sparse(3,{
0:3.4,2:4.5})
svect3=linalg.Vectors.sparse(4,[(2,3),(3,2.3)])
print(svect1.toArray())
print(svect2.toArray())
print(svect3.toArray())
其结果如下:
- indices()Returns a sparse vector in0元素的索引值
svect1=linalg.Vectors.sparse(3,[0,1],[3.4,4.5])
svect2=linalg.Vectors.sparse(3,{
0:3.4,2:4.5})
svect3=linalg.Vectors.sparse(4,[(2,3),(3,2.3)])
print(svect1.indices) #返回[0 1](array类型,下同)
print(svect2.indices) #返回[0 2]
print(svect3.indices) #返回[2 3]
边栏推荐
猜你喜欢
随机推荐
【堆】小红的数组
Simple and easy to use task queue - beanstalkd
简单易用的任务队列-beanstalkd
智芯传感输液泵压力传感器 为精准智能控制注入科技“强心剂”
win10 fixed local IP
Software Testing Interview (3)
「以云为核,无感极速」顶象第五代验证码
Pyspark机器学习:向量及其常用操作
High Numbers | 【Re-integration】Line Area Score 880 Examples
C# | 使用Json序列化对象时忽略只读的属性
【愚公系列】2022年07月 Go教学课程 023-Go容器之列表
时时刻刻保持敬畏之心
button remove black frame
Dynamic Programming 01 Backpack
The Principle Of Percona Toolkit Nibble Algorithm
This article takes you to understand the past and present of Mimir, Grafana's latest open source project
In the shake database, I want to synchronize the data of the source db0 to the destination db5, how to set the parameters?
软件测试基础理论知识—用例篇
Typescript22 - interface inheritance
Flink 1.13 (8) CDC