当前位置:网站首页>Pyspark Machine Learning: Vectors and Common Operations
Pyspark Machine Learning: Vectors and Common Operations
2022-08-01 04:28:00 【Sun_Sherry】
Spark版本:V3.2.1
本篇主要介绍pyspark.ml.linalgvector operations in.
1. DenseVector(稠密向量)
1.1 创建
Dense vectors are similar to normal arrays,其创建方法如下:
from pyspark.ml import linalg
import numpy as np
dvect1=linalg.Vectors.dense([1,2,3,4,5])
dvect2=linalg.Vectors.dense(1.2,3,3,4,5)
print(dvect1)
print(dvect2)
其结果如下(To pay attention to its data typefloat型):
1.2 常用操作
- Add, subtract, multiply and divide operations on two vectors of the same length.具体如下:
res1=dvect1+dvect2
res2=dvect1-dvect2
res3=dvect1*dvect2
res4=dvect1/dvect2
print(res1)
print(res2)
print(res3)
print(res4)
其结果如下:
- 可以使用numpy.darray中的一些属性
dvec1_shape=dvect1.array.shape
dvec1_size=dvect1.array.size
print(dvec1_shape)# 其结果为:(5,)
print(dvec1_size)# 其结果为:5
- dot点乘操作
res_1=dvect1.dot([1,2,3,4,5])
res_2=dvect1.dot([0,1,0,0,0])
res_3=dvect1.dot(dvect2)
print(res_1) #结果为55
print(res_2) #结果为2
print(res_3) #结果为57.2
- 求向量的范式
dvect1=linalg.Vectors.dense([1,2,3,4,5])
norm_0=dvect1.norm(0)
norm_1=dvect1.norm(1)
norm_2=dvect2.norm(2)
print('dvect1的L0范式为:{}'.format(norm_0))
print('dvect1的L1范式为:{}'.format(norm_1))
print('dvect1的L2范式为:{:.3f}'.format(norm_2))
其结果如下:
- numNonZeros()统计非0元素的个数
dvect1=linalg.Vectors.dense([1,0,3,0,5])
num_nonzero=dvect1.numNonzeros()
print(num_nonzero)#其结果为3
- squared_distance()Find the squared distance of two vectors with the same dimension
dvect1=linalg.Vectors.dense([1,0,3])
dvect2=linalg.Vectors.dense([1,1,1])
dist=dvect1.squared_distance(dvect2) #其值为5
- get the value of the vector
dvect1=linalg.Vectors.dense([1,0,3])
print(dvect1.toArray())
print(dvect1.values)
2. SparseVector(稀疏向量)
2.1 创建
There are several ways to create sparse vectors::
- Vectors.sparse(向量长度, 索引数组,With the index array corresponding numerical arrays),其中索引从0开始编号,下同;
- Vectors.sparse(向量长度, {索引:数值,索引:数值, … \dots …})
- Vectors.sparse(向量长度,[(索引,数值),(索引,数值), … \dots …])
举例如下:
svect1=linalg.Vectors.sparse(3,[0,1],[3.4,4.5])
svect2=linalg.Vectors.sparse(3,{
0:3.4,2:4.5})
svect3=linalg.Vectors.sparse(4,[(2,3),(3,2.3)])
2.2 常用操作
Some operations on sparse variables are the same as those on dense vectors,不再赘述.Only the following two operations are introduced here:
- toArrayDisplay all values of a sparse variable
svect1=linalg.Vectors.sparse(3,[0,1],[3.4,4.5])
svect2=linalg.Vectors.sparse(3,{
0:3.4,2:4.5})
svect3=linalg.Vectors.sparse(4,[(2,3),(3,2.3)])
print(svect1.toArray())
print(svect2.toArray())
print(svect3.toArray())
其结果如下:
- indices()Returns a sparse vector in0元素的索引值
svect1=linalg.Vectors.sparse(3,[0,1],[3.4,4.5])
svect2=linalg.Vectors.sparse(3,{
0:3.4,2:4.5})
svect3=linalg.Vectors.sparse(4,[(2,3),(3,2.3)])
print(svect1.indices) #返回[0 1](array类型,下同)
print(svect2.indices) #返回[0 2]
print(svect3.indices) #返回[2 3]
边栏推荐
- 「以云为核,无感极速」顶象第五代验证码
- Invalid classes inferred from unique values of `y`. Expected: [0 1 2], got [1 2 3]
- Step by step hand tearing carousel Figure 3 (nanny level tutorial)
- IJCAI2022 | Hybrid Probabilistic Reasoning with Algebraic and Logical Constraints
- 【目标检测】YOLOv7理论简介+实践测试
- 阿叶的目标
- 基于ProXmoX VE的虚拟化家庭服务器(篇一)—ProXmoX VE 安装及基础配置
- 时时刻刻保持敬畏之心
- Weekly Summary (*67): Why not dare to express an opinion
- leetcode6132. Make all elements in an array equal to zero (simple, weekly)
猜你喜欢

Unknown Bounded Array

开源许可证 GPL、BSD、MIT、Mozilla、Apache和LGPL的区别

typescript25-类型断言

简单易用的任务队列-beanstalkd

Passive anti-islanding-UVP/OVP and UFP/OFP passive anti-islanding model simulation based on simulink

typescript26-字面量类型

今日睡眠质量记录68分

typescript20-接口

Simulation of Active anti-islanding-AFD Active Anti-islanding Model Based on Simulink

微软 Win10 照片磁贴后的又一“刺客”,谷歌 Chrome 浏览器将在新标签页展示用户照片
随机推荐
Weekly Summary (*67): Why not dare to express an opinion
故乡的素描画
罗技鼠标体验记录
"ArchSummit: The cry of the times, technical people can hear"
7 行代码搞崩溃 B 站,原因令人唏嘘!
win10 fixed local IP
Error using ts-node
律师解读 | 枪炮还是玫瑰?从大厂之争谈元宇宙互操作性
2022-07-31: Given a graph with n points and m directed edges, you can use magic to turn directed edges into undirected edges, such as directed edges from A to B, with a weight of 7.After casting the m
【愚公系列】2022年07月 Go教学课程 023-Go容器之列表
【kali-信息收集】枚举——DNS枚举:DNSenum、fierce
25. 这三道常见的面试题,你有被问过吗?
项目风险管理必备内容总结
MySQL4
李迟2022年7月工作生活总结
Game Theory (Depu) and Sun Tzu's Art of War (42/100)
scheduleWithFixedDelay和scheduleAtFixedRate的区别
Unity在BuildIn渲染管线下实现PlanarReflection的初级方法
RSA主要攻击方法
MySQL3