当前位置:网站首页>Pyspark Machine Learning: Vectors and Common Operations
Pyspark Machine Learning: Vectors and Common Operations
2022-08-01 04:28:00 【Sun_Sherry】
Spark版本:V3.2.1
本篇主要介绍pyspark.ml.linalgvector operations in.
1. DenseVector(稠密向量)
1.1 创建
Dense vectors are similar to normal arrays,其创建方法如下:
from pyspark.ml import linalg
import numpy as np
dvect1=linalg.Vectors.dense([1,2,3,4,5])
dvect2=linalg.Vectors.dense(1.2,3,3,4,5)
print(dvect1)
print(dvect2)
其结果如下(To pay attention to its data typefloat型):
1.2 常用操作
- Add, subtract, multiply and divide operations on two vectors of the same length.具体如下:
res1=dvect1+dvect2
res2=dvect1-dvect2
res3=dvect1*dvect2
res4=dvect1/dvect2
print(res1)
print(res2)
print(res3)
print(res4)
其结果如下:
- 可以使用numpy.darray中的一些属性
dvec1_shape=dvect1.array.shape
dvec1_size=dvect1.array.size
print(dvec1_shape)# 其结果为:(5,)
print(dvec1_size)# 其结果为:5
- dot点乘操作
res_1=dvect1.dot([1,2,3,4,5])
res_2=dvect1.dot([0,1,0,0,0])
res_3=dvect1.dot(dvect2)
print(res_1) #结果为55
print(res_2) #结果为2
print(res_3) #结果为57.2
- 求向量的范式
dvect1=linalg.Vectors.dense([1,2,3,4,5])
norm_0=dvect1.norm(0)
norm_1=dvect1.norm(1)
norm_2=dvect2.norm(2)
print('dvect1的L0范式为:{}'.format(norm_0))
print('dvect1的L1范式为:{}'.format(norm_1))
print('dvect1的L2范式为:{:.3f}'.format(norm_2))
其结果如下:
- numNonZeros()统计非0元素的个数
dvect1=linalg.Vectors.dense([1,0,3,0,5])
num_nonzero=dvect1.numNonzeros()
print(num_nonzero)#其结果为3
- squared_distance()Find the squared distance of two vectors with the same dimension
dvect1=linalg.Vectors.dense([1,0,3])
dvect2=linalg.Vectors.dense([1,1,1])
dist=dvect1.squared_distance(dvect2) #其值为5
- get the value of the vector
dvect1=linalg.Vectors.dense([1,0,3])
print(dvect1.toArray())
print(dvect1.values)
2. SparseVector(稀疏向量)
2.1 创建
There are several ways to create sparse vectors::
- Vectors.sparse(向量长度, 索引数组,With the index array corresponding numerical arrays),其中索引从0开始编号,下同;
- Vectors.sparse(向量长度, {索引:数值,索引:数值, … \dots …})
- Vectors.sparse(向量长度,[(索引,数值),(索引,数值), … \dots …])
举例如下:
svect1=linalg.Vectors.sparse(3,[0,1],[3.4,4.5])
svect2=linalg.Vectors.sparse(3,{
0:3.4,2:4.5})
svect3=linalg.Vectors.sparse(4,[(2,3),(3,2.3)])
2.2 常用操作
Some operations on sparse variables are the same as those on dense vectors,不再赘述.Only the following two operations are introduced here:
- toArrayDisplay all values of a sparse variable
svect1=linalg.Vectors.sparse(3,[0,1],[3.4,4.5])
svect2=linalg.Vectors.sparse(3,{
0:3.4,2:4.5})
svect3=linalg.Vectors.sparse(4,[(2,3),(3,2.3)])
print(svect1.toArray())
print(svect2.toArray())
print(svect3.toArray())
其结果如下:
- indices()Returns a sparse vector in0元素的索引值
svect1=linalg.Vectors.sparse(3,[0,1],[3.4,4.5])
svect2=linalg.Vectors.sparse(3,{
0:3.4,2:4.5})
svect3=linalg.Vectors.sparse(4,[(2,3),(3,2.3)])
print(svect1.indices) #返回[0 1](array类型,下同)
print(svect2.indices) #返回[0 2]
print(svect3.indices) #返回[2 3]
边栏推荐
- Step by step hand tearing carousel Figure 3 (nanny level tutorial)
- PMP 项目质量管理
- Input输入框光标在前输入后自动跳到最后面的bug
- 干货!如何使用仪表构造SRv6-TE性能测试环境
- Visual Studio提供的 Command Prompt 到底有啥用
- Four implementations of
batch insert: have you really got it? - Typescript22 - interface inheritance
- MySQL3
- The kernel's handling of the device tree
- lambda
猜你喜欢

mysql中解决存储过程表名通过变量传递的方法

【愚公系列】2022年07月 Go教学课程 023-Go容器之列表

The maximum quantity leetcode6133. Grouping (medium)

A way to deal with infinite debugger

win10 fixed local IP

MLP neural network, GRNN neural network, SVM neural network and deep learning neural network compare and identify human health and non-health data

Flink 1.13 (8) CDC

怀念故乡的月亮

7月编程排行榜来啦!这次有何新变化?

IJCAI2022 | Hybrid Probabilistic Reasoning with Algebraic and Logical Constraints
随机推荐
Mysql基础篇(约束)
Step by step hand tearing carousel Figure 3 (nanny level tutorial)
罗技鼠标体验记录
Input输入框光标在前输入后自动跳到最后面的bug
Li Chi's work and life summary in July 2022
MySQL3
Risk strategy important steps of tuning method
typescript27-枚举类型呢
【云原生之kubernetes实战】kubernetes集群的检测工具——popeye
Message queue design based on mysql
Dynamic Programming 01 Backpack
UE4 模型OnClick事件不生效的两种原因
智芯传感输液泵压力传感器 为精准智能控制注入科技“强心剂”
typescript19-对象可选参数
TypeScript simplifies running ts-node
TIM登陆时提示00001(TIM00001)
【目标检测】YOLOv7理论简介+实践测试
UE4 rays flashed from mouse position detection
EntityFramework saves to SQLServer decimal precision is lost
The Principle Of Percona Toolkit Nibble Algorithm