当前位置:网站首页>Transfer Learning - Joint Geometrical and Statistical Alignment for Visual Domain Adaptation
Transfer Learning - Joint Geometrical and Statistical Alignment for Visual Domain Adaptation
2022-08-05 01:38:00 【orokok】
《Joint Geometrical and Statistical Alignment for Visual Domain Adaptation》学习
2017 CVPR
This paper puts forward a kind of unsupervised for cross-domain visual recognition domain adaptive algorithm.我们提出了一个统一的框架,Can reduce the deviation between the domain in statistics and geometry,称为Joint geometry and statistical alignment(JGSA).
具体而言,We studied the two coupling projection,They will source domain and target domain projection data into the low-dimensional subspace,In the low dimensional plane son,Geometric displacement and displacement distribution as well as reduce.The objective function can be effectively solve in closed form.
大量实验证明,In the synthetic data set and three different kinds of real world cross-domain visual identification task,This method is superior to several most advanced domain adaptive methods
Can use a realistic strategy,The domain adaptive,To take advantage of the previous mark of the source domain data to improve the task of the new target domain.
According to the target tag data availability,Domain adaptive generally can be divided into a semi-supervised domain adaptive and unsupervised domain adaptive.
然而,In the half and the supervision domain adaptive,All need enough did not mark the target domain data.在本文中,We focus on the areas of no supervision adaptive,This is considered more practical and more challenging.
The most common domain adaptive methods includeBased on the example of adaptive、Characteristics of adaptive said和Based on the adaptive classifier
在无监督域自适应中,Due to the target domain without tag data,Based on the adaptive classifier is not feasible.
或者,我们可以通过Minimize the domain distribution differences between以及Experience the source error来处理这个问题
Usually assume that can compensate by adaptive method based on the instance spread distribution,Or by the method based on feature transformation
Based on the instance method requires strict hypothesis,即
- The source domain and target domain the conditional distribution of the same,
- The source domain of certain data can be used by weighted again to the target domain of study.
And the method based on feature transform relaxed these assumptions,Just suppose there are two similar public space domain distribution.
In this paper, based on the characteristics of the transform method.
In the literature to determine the characteristics of two kinds of main transform method,The data center method and subspace method.
Data center:To project the data of the two fields to the unity of the domain the same space transform,In order to reduce the distribution differences between domains,At the same time in the original space pretreatment data attributes
Center of subspace method:By manipulating two domain subspace to reduce sphere,Makes each individual domain subspace help eventually map
Use of domain-specific features.
例如,Gong等人[10]Two subspace is regarded as two points of glassman manifold,And in the geodesic path between them to find some,As the source subspace and a bridge between target subspace.
Fernando等人[11]Using the linear transformation matrix directly aligned source subspace and target subspace.然而,Centering on the subspace method only operated for two domain subspace,Without accurate considering distribution between two domain projection data migration
We study the two coupling projection,Map the source and target data to each subspace
- To maximize the change of the target domain data to retain the target domain data attributes,
- Keep the source data to identify information in an effective category message,
- Minimize the edge between source domain and target domain and conditions in the statistical distribution differences reduce sphere,
- The two projection emittance restricted to small,To reduce the domain on the geometry offset.
因此,Unlike a data-centric approach,We don't need a strong assumption,The unified transform can reduce the distribution offset,At the same time retain data attributes.
Unlike centered on subspace method,We not only reduce the subspace geometry offset,And reduced the distribution of the two domain migration.
此外,Our method can be easily extended to the nucleation version,To deal with the situation of the field is the shift between nonlinear.Can effectively solve the objective function in closed form.
PanPut forward the transfer component analysis(TCA),To use the maximum average difference(MMD)学习RKHIn some of cross-domain transfer components.
Joint distribution analysis(JDA)Through the use of the target domain of the pseudo tag,Consider not only the marginal distribution offset,And consider the conditional distribution offset,改进了TCA.
Transmission joint matching(TJM)Through joint weighted instance and find public subspace to improveTCA.
Dispersive component analysis(SCA)Consider the source domain between class and class dispersed.
然而,These methods need a strong assumption,Is a unified transformation,To map the source domain and target domain to a Shared subspace distribution with small deviation in the.
2.2Center of subspace method
如前所述,Centering on the subspace method can be solved using only two common characteristics in the field of data centric approach the problem of.
Fernando等人[11]This paper proposes a centering on the subspace method,即子空间对齐(SA).
SAThe key idea is to use a transformation matrix ofMThe source base vector(A)With the target base vector(B)对齐.
然而,In the use of linear mapping mapping source subspace after,由于域偏移,Projection of source domain data will be different from the target domain of variance data.
在这种情况下,SAAfter the alignment subspace can't minimize distribution between domains.
此外,SACan't handle the shift between two subspaces is nonlinear situation.
Subspace distribution alignment(SDA)[14]通过考虑Orthogonal principal components variance来改进SA.然而,Variance is based on the alignment of the subspace to consider.因此,Just change the size of each eigen direction,When the sphere larger,The amplitude is still likely to fail.
图2The synthetic data description and the experimental results of real data set confirms this
三、Joint geometry and statistical alignment
我们从术语的定义开始.The source domain data is expressed as X s ∈ R D × n s X_s\in\mathbb{R}^{D\times n_s} Xs∈RD×ns是从分布 P s ( X s ) P_s(X_s) Ps(Xs)中得出的;The target domain data is expressed as X t ∈ R D × n t X_t\in\mathbb{R}^{D\times n_t} Xt∈RD×nt是从分布 P t ( X t ) P_t(X_t) Pt(Xt)中得出的,其中 D D DIs the dimensions of the data instance, n s n_s ns和 n t n_t ntIs both the source domain and target domain sample.
In the training stage of unsupervised domain adaptive,Have enough markup source domain data, D s = { ( x i , y i ) } i = 1 n s , x i ∈ R D D_s=\{(x_i,y_i)\}^{n_s}_{i=1},x_i\in\mathbb{R}^D Ds={(xi,yi)}i=1ns,xi∈RDAnd there is plenty of unlabelled target domain data, D t = { ( x j ) } j = 1 n t , x j ∈ R D D_t=\{(x_j)\}^{n_t}_{j=1},x_j\in\mathbb{R}^D Dt={(xj)}j=1nt,xj∈RD.
We assume that the domain between the characteristics of the space and the label space is the same: X s = X t X_s=X_t Xs=Xt和 Y s = Y t Y_s=Y_t Ys=Yt.Because the data set shift, P s ( X s ) ≠ P t ( X t ) P_s(X_s)\ne P_t(X_t) Ps(Xs)=Pt(Xt)
Unlike previous domain adaptation method,We don't assume that there is a unified transformation ϕ ( ⋅ ) \phi(\cdot) ϕ(⋅),使得 P s ( ϕ ( X s ) ) = P t ( ϕ ( X t ) ) 和 P s ( Y s ∣ ϕ ( X s ) ) ) = P t ( Y t ∣ ϕ ( X s ) ) P_s(\phi(X_s))=P_t(\phi(X_t))和P_s(Y_s|\phi(X_s)))=P_t(Y_t|\phi(X_s)) Ps(ϕ(Xs))=Pt(ϕ(Xt))和Ps(Ys∣ϕ(Xs)))=Pt(Yt∣ϕ(Xs)),Because when large data set offset,The assumption is invalid.
In order to solve the data-centric and the limitations of centering on the subspace method,所提出的框架(JGSA)By using two domain Shared characteristics and domain-specific features,Reduce the domain differences in statistics and geometry.
JGSABy finding two double projection(AUsed in the source domain,B用于目标域)来制定,For each domain new said,使得
- The target domain of the maximum variance,
- The identification of the source domain information service in advance,
- The distribution of the source and target divergence is small,
- Source subspace and the divergence between the target subspace is.
3.2.1Goal of maximum variance
In order to avoid will feature projection to irrelevant dimensions,We encourage in the corresponding subspace to maximize the target domain of the variance in the.因此,Maximum variance can be implemented as follows:
max B T r ( B T S t B ) (1) \max_BTr(B^TS_tB)\tag{1} BmaxTr(BTStB)(1)
S t = X t H t X t T (2) S_t=X_tH_tX_t^T\tag{2} St=XtHtXtT(2)
Is the target domain divergence function, H t = I t − 1 n t 1 t 1 t T H_t=I_t-\frac{1}{n_t}\mathbb{1}_t\mathbb{1}_t^T Ht=It−nt11t1tTIs a central function. 1 t ∈ R n t 1_t\in\mathbb{R}^{n_t} 1t∈RntIs a column for all1的矩阵.
3.2.2Source identification information preservation
Because of the source domain label is available,We can use the label information to constrain the source domain data of new said,Said is difference.
max A t r ( A T S b A ) (3) \max_Atr(A^TS_bA)\tag{3} Amaxtr(ATSbA)(3)
max A t r ( A T S w A ) (4) \max_Atr(A^TS_wA)\tag{4} Amaxtr(ATSwA)(4)
其中 S w S_w SwAs the source domain data within class scatter matrix, S b S_b SbAs the source domain data between class scatter matrix between,定义如下:
S w = ∑ c = 1 C X s ( c ) H s ( c ) ( X s ( c ) ) T (5) S_w=\sum^C_{c=1}X_s^{(c)}H_s^{(c)}(X_s^{(c)})^T\tag{5} Sw=c=1∑CXs(c)Hs(c)(Xs(c))T(5)
S b = ∑ c = 1 C n s ( c ) ( m s ( c ) − m ˉ s ) ( m s ( c ) − m ˉ s ) T (6) S_b=\sum^C_{c=1}n_s^{(c)}(m^{(c)}_s-\bar{m}_s)(m^{(c)}_s-\bar{m}_s)^T\tag{6} Sb=c=1∑Cns(c)(ms(c)−mˉs)(ms(c)−mˉs)T(6)
X s ( c ) ∈ R D × n s ( c ) X_s^{(c)}\in\mathbb{R}^{D\times n_s^ {(c)}} Xs(c)∈RD×ns(c)Is a set of source samples belonging to the classc, m s ( c ) = 1 n s ( c ) ∑ i = 1 n s ( c ) x i ( c ) , m ˉ s = 1 n s ∑ i = 1 n s x i , H s ( c ) = I s ( c ) − 1 n s ( c ) 1 s ( c ) ( 1 s ( c ) ) T m_s^{(c)}=\frac{1}{n_s^{(c)}}\sum^{n_s^{(c)}}_{i=1}x_i^{(c)},\bar{m}_s=\frac{1}{n_s}\sum^{n_s}_{i=1}x_i,H_s^{(c)}=I^{(c)}_s-\frac{1}{n_s^{(c)}}1^{(c)}_s(1^{(c)}_s)^T ms(c)=ns(c)1∑i=1ns(c)xi(c),mˉs=ns1∑i=1nsxi,Hs(c)=Is(c)−ns(c)11s(c)(1s(c))T是类cThe center of the matrix, I s ( c ) ∈ R n s ( c ) × n s ( c ) I^{(c)}_s\in\mathbb{R}^{ n_s^ {(c)}\times n_s^ {(c)}} Is(c)∈Rns(c)×ns(c)是单位矩阵, 1 s ( c ) ∈ R n s ( c ) 1^{(c)}_s\in\mathbb{R}^{ n_s^ {(c)}} 1s(c)∈Rns(c)是全为1的列向量, n s ( c ) n_s^ {(c)} ns(c)为cThe number of source samples.
3.2.3Distribution of divergence minimize
我们使用MMDThe distance between the standard to compare the domain distribution,它计算了k维嵌入中源数据和目标数据的样本均值之间的距离,
min A , B ∥ 1 n s ∑ x i ∈ X s A T x i − 1 n t ∑ x j ∈ X t B T x j ∥ F 2 (7) \min_{A,B}\left\|\frac{1}{n_s}\sum_{x_i\in X_s}A^Tx_i-\frac{1}{n_t}\sum_{x_j\in X_t}B^Tx_j\right\|^2_F\tag{7} A,Bmin∥∥ns1xi∈Xs∑ATxi−nt1xj∈Xt∑BTxj∥∥F2(7)
Long等人[7]Put forward using the source domain classifier to predict target dummy label to indicate the target domain of class data distribution.
And iterative refinement target domain pseudo tag,To further reduce the two domain conditional distribution difference.
We follow the idea,Minimize the domain between the conditional distribution of transfer,
min A , B ∑ c = 1 C ∥ 1 n s ( c ) ∑ x i ∈ X s ( c ) A T x i − 1 n t ( c ) ∑ x j ∈ X t ( c ) B T x j ∥ F 2 (8) \min_{A,B}\sum^C_{c=1}\left\|\frac{1}{n_s^{(c)}}\sum_{x_i\in X_s^{(c)}}A^Tx_i-\frac{1}{n_t^{(c)}}\sum_{x_j\in X_t^{(c)}}B^Tx_j\right\|^2_F\tag{8} A,Bminc=1∑C∥∥ns(c)1xi∈Xs(c)∑ATxi−nt(c)1xj∈Xt(c)∑BTxj∥∥F2(8)
因此,Combining the marginal distribution excursion of the smallest item and conditional distribution,The final distribution of divergence minimum item can be rewritten as
min A , B T r ( [ A T B T ] [ M s M s t M t s M t ] [ A B ] ) (9) \min_{A,B}Tr(\begin{bmatrix}A^T&B^T\end{bmatrix}\begin{bmatrix}M_s&M_{st}\\M_{ts}&M_t\end{bmatrix}\begin{bmatrix}A\\B\end{bmatrix})\tag{9} A,BminTr([ATBT][MsMtsMstMt][AB])(9)
M s = X s ( L s + ∑ c = 1 C L s ( c ) ) X s T , L s = 1 n s 2 1 s 1 s T ( L s ( c ) ) i j = { 1 ( n s ( c ) ) 2 x i , x j ∈ X s ( c ) 0 otherwise (10) \begin{array}{cl} M_{s}=X_{s}\left(L_{s}+\sum_{c=1}^{C} L_{s}^{(c)}\right) X_{s}^{T}, & L_{s}=\frac{1}{n_{s}^{2}} 1_{s} 1_{s}^{T} \\ \left(L_{s}^{(c)}\right)_{i j}= \begin{cases}\frac{1}{\left(n_{s}^{(c)}\right)^{2}} & \mathbf{x}_{i}, \mathbf{x}_{j} \in X_{s}^{(c)} \\ 0 & \text { otherwise }\end{cases} \end{array}\tag{10} Ms=Xs(Ls+∑c=1CLs(c))XsT,(Ls(c))ij=⎩⎨⎧(ns(c))210xi,xj∈Xs(c) otherwise Ls=ns211s1sT(10)
M t = X t ( L t + ∑ c = 1 C L t ( c ) ) X t T , L t = 1 n t 2 1 t 1 t T ( L t ( c ) ) i j = { 1 ( n t ( c ) ) 2 x i , x j ∈ X t ( c ) 0 otherwise (11) \begin{array}{cl} M_{t}=X_{t}\left(L_{t}+\sum_{c=1}^{C} L_{t}^{(c)}\right) X_{t}^{T}, & L_{t}=\frac{1}{n_{t}^{2}} 1_{t} 1_{t}^{T} \\ \left(L_{t}^{(c)}\right)_{i j}= \begin{cases}\frac{1}{\left(n_{t}^{(c)}\right)^{2}} & \mathbf{x}_{i}, \mathbf{x}_{j} \in X_{t}^{(c)} \\ 0 & \text { otherwise }\end{cases} \end{array}\tag{11} Mt=Xt(Lt+∑c=1CLt(c))XtT,(Lt(c))ij=⎩⎨⎧(nt(c))210xi,xj∈Xt(c) otherwise Lt=nt211t1tT(11)
M s t = X s ( L s t + ∑ c = 1 C L s t ( c ) ) X t T , L s t = − 1 n s n t 1 s 1 t T ( L s t ( c ) ) i j = { − 1 n s ( c ) n t ( c ) x i ∈ X s ( c ) , x j ∈ X t ( c ) 0 otherwise (12) \begin{array}{cl} M_{st}=X_{s}\left(L_{st}+\sum_{c=1}^{C} L_{st}^{(c)}\right) X_{t}^{T}, & L_{st}=-\frac{1}{n_{s}n_{t}} 1_{s} 1_{t}^{T} \\ \left(L_{st}^{(c)}\right)_{i j}= \begin{cases}-\frac{1}{n_{s}^{(c)}n_{t}^{(c)}} & \mathbf{x}_{i} \in X_{s}^{(c)},\mathbf{x}_{j} \in X_{t}^{(c)} \\ 0 & \text { otherwise }\end{cases} \end{array}\tag{12} Mst=Xs(Lst+∑c=1CLst(c))XtT,(Lst(c))ij={ −ns(c)nt(c)10xi∈Xs(c),xj∈Xt(c) otherwise Lst=−nsnt11s1tT(12)
M t s = X t ( L t s + ∑ c = 1 C L t s ( c ) ) X s T , L t s = − 1 n s n t 1 t 1 s T ( L t s ( c ) ) i j = { − 1 n s ( c ) n t ( c ) x j ∈ X s ( c ) , x i ∈ X t ( c ) 0 otherwise (13) \begin{array}{cl} M_{ts}=X_{t}\left(L_{ts}+\sum_{c=1}^{C} L_{ts}^{(c)}\right) X_{s}^{T}, & L_{ts}=-\frac{1}{n_{s}n_{t}} 1_{t} 1_{s}^{T} \\ \left(L_{ts}^{(c)}\right)_{i j}= \begin{cases}-\frac{1}{n_{s}^{(c)}n_{t}^{(c)}} & \mathbf{x}_{j} \in X_{s}^{(c)},\mathbf{x}_{i} \in X_{t}^{(c)} \\ 0 & \text { otherwise }\end{cases} \end{array}\tag{13} Mts=Xt(Lts+∑c=1CLts(c))XsT,(Lts(c))ij={ −ns(c)nt(c)10xj∈Xs(c),xi∈Xt(c) otherwise Lts=−nsnt11t1sT(13)
注意,这与TCA和JDA不同,Because we don't use unified subspace,Because there may not be such a subspace of public,The distribution of the two domains is similar.
3.2.4Subspace divergence minimization
与SA[11]相似,We are also close to the source and target subspace to help reduce domain of the differences between.
如前所述,Map the source subspace toSAIn target subspace requires an additional transformation matrixM.
然而,We don't work an extra matrix to map the two subspace.
But at the same timeA和B进行优化,Both preserves the source information and the target of variance,And make two subspace approach at the same time.
We use the following terms will be two subspace mobile together:
min A , B ∥ A − B ∥ F 2 (14) \min_{A,B}\|A-B\|^2_F\tag{14} A,Bmin∥A−B∥F2(14)
By putting a term(14)和术语(9)结合使用,Both use the Shared characteristics,Again using domain specific features,To make the two areas on the geometric and statistical good alignment
max μ { Target Var. } + β { Between Class Var. } { Distribution shift } + λ { Subspace shift } + β { Within Class Var. } \max\frac{\mu\{\text{Target Var.}\}+\beta\{\text{Between Class Var.}\}}{\{\text{Distribution shift}\}+\lambda\{\text{Subspace shift}\}+\beta\{\text{Within Class Var.}\}} max{ Distribution shift}+λ{ Subspace shift}+β{ Within Class Var.}μ{ Target Var.}+β{ Between Class Var.}
其中λ、μ、β为权衡参数,To balance the importance of each quantity,Var.表示方差.
我们遵循[9]Further exert T r ( B T B ) Tr(B^TB) Tr(BTB)Small constraints to controlB的规模,具体来说,Our goal is by solving the following optimization function,Find the coupling of the two projectionA和B,
max A , B Tr ( [ A T B T ] [ β S b 0 0 μ S t ] [ A B ] ) Tr ( [ A T B T ] [ M s + λ I + β S w M s t − λ I M t s − λ I M t + ( λ + μ ) I ] [ A B ] ) (15) \max _{A, B} \frac{\operatorname{Tr}\left(\left[\begin{array}{ll} A^{T} & B^{T} \end{array}\right]\left[\begin{array}{cc} \beta S_{b} & \mathbf{0} \\ \mathbf{0} & \mu S_{t} \end{array}\right]\left[\begin{array}{l} A \\ B \end{array}\right]\right)}{\operatorname{Tr}\left(\left[\begin{array}{ll} A^{T} & B^{T} \end{array}\right]\left[\begin{array}{cc} M_{s}+\lambda I+\beta S_{w} & M_{s t}-\lambda I \\ M_{t s}-\lambda I & M_{t}+(\lambda+\mu) I \end{array}\right]\left[\begin{array}{l} A \\ B \end{array}\right]\right)}\tag{15} A,BmaxTr([ATBT][Ms+λI+βSwMts−λIMst−λIMt+(λ+μ)I][AB])Tr([ATBT][βSb00μSt][AB])(15)
其中 I ∈ R d × d I\in\mathbb{R}^{d\times d} I∈Rd×d是单位矩阵
最小化(15)The denominator can make marginal distribution and the conditional distribution deviation smaller,And the source domain class variance is small.
使(15)Maximize molecules leads to the target domain variance larger,The source domain became big variance between.
与JDA相似,We also use to update the data in the target domain transform iteration pseudo tag,To improve the quality of label,直到收敛.
为了优化(15),我们将 [ A T B T ] \begin{bmatrix}A^T&B^T\end{bmatrix} [ATBT]改写为 W T W^T WT.
The objective function and the corresponding constraints can be rewritten as:
max W Tr ( W T [ β S b 0 0 μ S t ] W ) Tr ( W T [ M s + λ I + β S w M s t − λ I M t s − λ I M t + ( λ + μ ) I ] W ) (16) \max _{W} \frac{\operatorname{Tr}\left(W^T\left[\begin{array}{cc} \beta S_{b} & \mathbf{0} \\ \mathbf{0} & \mu S_{t} \end{array}\right]W\right)}{\operatorname{Tr}\left(W^T\left[\begin{array}{cc} M_{s}+\lambda I+\beta S_{w} & M_{s t}-\lambda I \\ M_{t s}-\lambda I & M_{t}+(\lambda+\mu) I \end{array}\right]W\right)}\tag{16} WmaxTr(WT[Ms+λI+βSwMts−λIMst−λIMt+(λ+μ)I]W)Tr(WT[βSb00μSt]W)(16)
注意,目标函数对于wScale is the same,因此,We will target function(16)重写为
max W Tr ( W T [ β S b 0 0 μ S t ] W ) s . t . Tr ( W T [ M s + λ I + β S w M s t − λ I M t s − λ I M t + ( λ + μ ) I ] W ) = 1 (17) \max _{W} \operatorname{Tr}\left(W^T\left[\begin{array}{cc} \beta S_{b} & \mathbf{0} \\ \mathbf{0} & \mu S_{t} \end{array}\right]W\right)\tag{17}\\ s.t.\operatorname{Tr}\left(W^T\left[\begin{array}{cc} M_{s}+\lambda I+\beta S_{w} & M_{s t}-\lambda I \\ M_{t s}-\lambda I & M_{t}+(\lambda+\mu) I \end{array}\right]W\right)=1 WmaxTr(WT[βSb00μSt]W)s.t.Tr(WT[Ms+λI+βSwMts−λIMst−λIMt+(λ+μ)I]W)=1(17)
L = Tr ( W T [ β S b 0 0 μ S t ] W ) + Tr ( ( W T [ M s + λ I + β S w M s t − λ I M t s − λ I M t + ( λ + μ ) I ] W − I ) Φ ) (18) L=\operatorname{Tr}\left(W^T\left[\begin{array}{cc} \beta S_{b} & \mathbf{0} \\ \mathbf{0} & \mu S_{t} \end{array}\right]W\right)\tag{18}\\ +\operatorname{Tr}\left(\left(W^T\left[\begin{array}{cc} M_{s}+\lambda I+\beta S_{w} & M_{s t}-\lambda I \\ M_{t s}-\lambda I & M_{t}+(\lambda+\mu) I \end{array}\right]W-I\right)\Phi\right) L=Tr(WT[βSb00μSt]W)+Tr((WT[Ms+λI+βSwMts−λIMst−λIMt+(λ+μ)I]W−I)Φ)(18)
通过 ∂ L ∂ W = 0 \frac{\partial L}{\partial W} = 0 ∂W∂L=0,我们得到:
[ β S b 0 0 μ S t ] W = [ M s + λ I + β S w M s t − λ I M t s − λ I M t + ( λ + μ ) I ] W Φ (19) \left[\begin{array}{cc} \beta S_{b} & \mathbf{0} \\ \mathbf{0} & \mu S_{t} \end{array}\right]W=\left[\begin{array}{cc} M_{s}+\lambda I+\beta S_{w} & M_{s t}-\lambda I \\ M_{t s}-\lambda I & M_{t}+(\lambda+\mu) I \end{array}\right]W\Phi\tag{19} [βSb00μSt]W=[Ms+λI+βSwMts−λIMst−λIMt+(λ+μ)I]WΦ(19)
其中 Φ = d i a g ( λ 1 , … , λ k ) \Phi= diag(\lambda_1,…, \lambda_k) Φ=diag(λ1,…,λk)为kA leading characteristic value, W = [ W 1 , … , W k ] W = [W_1,…, W_k] W=[W1,…,Wk]Contain the corresponding eigenvectors,By generalized eigenvalue decomposition is solved.
Once you get transformation matrixW,You can easily get subspaceA和B.
3.4Nuclear analysis
Using some kernel function ϕ \phi ϕ,将JGSAMethods are generalized to renewable kernel Hilbert space(RKHS)中的非线性问题.
We use representative theorem P = ϕ ( X ) A 和 Q = ϕ ( X ) B P = \phi(X)A和Q = \phi(X)B P=ϕ(X)A和Q=ϕ(X)BThe nucleation our approach to,其中 X = [ X s , X t ] X = [X_s, X_t] X=[Xs,Xt]Said all of the source and target training sample, Φ ( X ) = [ ϕ ( x 1 ) , … , ϕ ( x n ) ] \Phi(X) = [\phi(x_1),…, \phi(x_n)] Φ(X)=[ϕ(x1),…,ϕ(xn)], nFor all the sample number.
max P , Q Tr ( [ P T Q T ] [ β S b 0 0 μ S t ] [ P Q ] ) Tr ( [ P T Q T ] [ M s + λ I + β S w M s t − λ I M t s − λ I M t + ( λ + μ ) I ] [ P Q ] ) (20) \max _{P, Q} \frac{\operatorname{Tr}\left(\left[\begin{array}{ll} P^{T} & Q^{T} \end{array}\right]\left[\begin{array}{cc} \beta S_{b} & \mathbf{0} \\ \mathbf{0} & \mu S_{t} \end{array}\right]\left[\begin{array}{l} P \\ Q \end{array}\right]\right)}{\operatorname{Tr}\left(\left[\begin{array}{ll} P^{T} & Q^{T} \end{array}\right]\left[\begin{array}{cc} M_{s}+\lambda I+\beta S_{w} & M_{s t}-\lambda I \\ M_{t s}-\lambda I & M_{t}+(\lambda+\mu) I \end{array}\right]\left[\begin{array}{l} P \\ Q \end{array}\right]\right)}\tag{20} P,QmaxTr([PTQT][Ms+λI+βSwMts−λIMst−λIMt+(λ+μ)I][PQ])Tr([PTQT][βSb00μSt][PQ])(20)
In the kernel version,所有的 X t X_t Xt都被 Φ ( X t ) \Phi(X_t) Φ(Xt)取代,所有的 X s X_s Xs都被 Φ ( X s ) \Phi(X_s) Φ(Xs)取代,包括 S t 、 S w 、 S b 、 M s 、 M t 、 M s t 和 M t s S_t、S_w、S_b、M_s、M_t、M_{st}和M_{ts} St、Sw、Sb、Ms、Mt、Mst和Mts.
我们用 Φ ( X ) A \Phi(X)A Φ(X)A和 Φ ( X ) B \Phi(X)B Φ(X)B替换P和Q,Get the objective function is as follows:
max A , B Tr ( [ A T B T ] [ β S b 0 0 μ S t ] [ A B ] ) Tr ( [ A T B T ] [ M s + λ K + β S w M s t − λ K M t s − λ K M t + ( λ + μ ) K ] [ A B ] ) (21) \max _{A, B} \frac{\operatorname{Tr}\left(\left[\begin{array}{ll} A^{T} & B^{T} \end{array}\right]\left[\begin{array}{cc} \beta S_{b} & \mathbf{0} \\ \mathbf{0} & \mu S_{t} \end{array}\right]\left[\begin{array}{l} A \\ B \end{array}\right]\right)}{\operatorname{Tr}\left(\left[\begin{array}{ll} A^{T} & B^{T} \end{array}\right]\left[\begin{array}{cc} M_{s}+\lambda K+\beta S_{w} & M_{s t}-\lambda K \\ M_{t s}-\lambda K & M_{t}+(\lambda+\mu) K \end{array}\right]\left[\begin{array}{l} A \\ B \end{array}\right]\right)}\tag{21} A,BmaxTr([ATBT][Ms+λK+βSwMts−λKMst−λKMt+(λ+μ)K][AB])Tr([ATBT][βSb00μSt][AB])(21)
其中 S t = K ~ t K ~ t T , S w = K s H s ( c ) K s T , K = Φ ( X ) T Φ ( X ) , K s = Φ ( X ) T Φ ( X s ) , K t = Φ ( X ) T Φ ( X t ) , K ~ t = K t − 1 t K − K t 1 n + 1 t K 1 n , 1 t ∈ R n t × n S_t=\tilde K_t\tilde K_t^T,S_w=K_sH_s^{(c)}K_s^T,K=\Phi(X)^T\Phi(X),K_s=\Phi(X)^T\Phi(X_s),K_t=\Phi(X)^T\Phi(X_t),\tilde K_t=K_t-1_tK-K_t1_n+1_tK1_n,1_t\in\mathbb{R}^{n_t\times n} St=K~tK~tT,Sw=KsHs(c)KsT,K=Φ(X)TΦ(X),Ks=Φ(X)TΦ(Xs),Kt=Φ(X)TΦ(Xt),K~t=Kt−1tK−Kt1n+1tK1n,1t∈Rnt×n和 1 n ∈ R n × n 1_n\in\mathbb{R}^{n\times n} 1n∈Rn×n是全为 1 n \frac{1}{n} n1的矩阵.
S b , m s ( c ) = 1 n s ( c ) ∑ i = 1 n s ( c ) k i ( c ) , m ˉ s = 1 n s ∑ i = 1 n s k i , k i = Φ ( X ) T ϕ ( x i ) S_b,m_s^{(c)}=\frac{1}{n_s^{(c)}}\sum^{n_s^{(c)}}_{i=1}k_i^{(c)},\bar m_s=\frac{1}{n_s}\sum^{n_s}_{i=1}k_i,k_i=\Phi(X)^T\phi(x_i) Sb,ms(c)=ns(c)1∑i=1ns(c)ki(c),mˉs=ns1∑i=1nski,ki=Φ(X)Tϕ(xi)
在MMD术语中 M s = K s ( L s + ∑ c = 1 C L s ( c ) ) K s T , M t = K t ( L t + ∑ c = 1 C L t ( c ) ) K t T , M s t = K s ( L s t + ∑ c = 1 C L s t ( c ) ) K t T , M t s = K t ( L t s + ∑ c = 1 C L t s ( c ) ) K s T M_{s}=K_{s}\left(L_{s}+\sum_{c=1}^{C} L_{s}^{(c)}\right) K_{s}^{T},M_{t}=K_{t}\left(L_{t}+\sum_{c=1}^{C} L_{t}^{(c)}\right) K_{t}^{T},M_{st}=K_{s}\left(L_{st}+\sum_{c=1}^{C} L_{st}^{(c)}\right) K_{t}^{T},M_{ts}=K_{t}\left(L_{ts}+\sum_{c=1}^{C} L_{ts}^{(c)}\right) K_{s}^{T} Ms=Ks(Ls+∑c=1CLs(c))KsT,Mt=Kt(Lt+∑c=1CLt(c))KtT,Mst=Ks(Lst+∑c=1CLst(c))KtT,Mts=Kt(Lts+∑c=1CLts(c))KsT
Once the nucleation objective function(21),We can use the same as the original objective function of the method is simple to solve it,以计算A和B.
We will our method comparing with the most advanced methods:子空间对齐(SA)[11]、Subspace distribution alignment(SDA)[14]、测地线流核(GFK)[10]、Transmission component analysis(TCA)[6]、Joint distribution analysis(JDA)[7]、Transmission joint matching(TJM)[8]、Scattering component analysis(SCA)[9]、最优传输(OTGL)[15]And nuclear manifold alignment(KEMA)[16].对于所有基线方法,We use the parameters of the original paper recommends.
对于JGSA,We will be in all experiments λ = 1 , μ = 1 \lambda=1,\mu=1 λ=1,μ=1固定,Therefore distribution deviation、Variance subspace migration and the target are regarded as equally important.We fixed is verified by experiment parameters can be obtained in different types of tasks promising results.因此,Subspace dimension k k k、迭代次数 T T T和正则化参数 β \beta β是自由参数.
Synthesis of the source domain and target domain samples are from threeRBF分布的混合.
The global average, and the third class average between domain mobile.Raw data is 3 d.对于所有方法,We set the dimensions of the subspace to2.
We in the three cross-domain virtual recognition task to evaluate our method:对象识别(Office,Caltech256)、手写数字识别(USPS,MNIST)和基于RGB-DThe gesture recognition(MSRAction3DExt、G3D、UTD-MHAD和MAD).Sample images or video frames as shown in figure1所示
We consider the characteristics of two types of:SURF描述符(使用800A binary histogram coding,This code fromAmazonImage subset training)和Decaf6特征(在imageNetThe convolution of network training on sixth completely connection layer activation).
如[10]所述,选择1-最近邻分类器(NN)作为基础分类器.For free parameters,我们设置 k = 30 , T = 10 , β = 0.1 k=30,T=10,\beta=0.1 k=30,T=10,β=0.1.
All images are evenly to zoom to 16 × 16 16\times 16 16×16的大小,By coding gray level and each image pixel values of the eigenvector said.For free parameters,我们设置 k = 100 , T = 10 , β = 0.01 k=100,T=10,\beta=0.01 k=100,T=10,β=0.01.
基于RGB-DThe gesture recognition:
For free parameters,我们设置 k = 100 和 β = 0.01 k=100和\beta=0.01 k=100和β=0.01.
In order to avoid excessive fitting of the target training set,We are in action recognition task Settings T = 1 T=1 T=1
表1、表2和表3Shows the three real world cross-domain(对象、Digital and action)数据集的结果.
JGSASaid the original raw data on the spaceJGSA方法的结果,而JGSA线性和JGSA RBFRespectively linear nucleus andRBFThe result of nuclear.
We also assessed the cross domain object data set(With linear kernelSURF)The runtime complexity of.平均运行时间为28.97秒,About baseline method is the best(JDA)的三倍.这是因为JGSAAt the same time learning two maps,与JDA相比,Characteristics of decomposing matrix has doubled the size of the.
The results of different types of data sets show that,固定 λ = 1 和µ = 1 λ=1和µ=1 λ=1和µ=1Enough to complete all three tasks.因此,We only evaluate the other three parameters ( k 、 β 和 T ) (k、β和T) (k、β和T).
我们在USPS上进行实验→MNIST,W→A(With linear nuclearSURF descriptor)和MSR→MADData set is used to illustrate,如图3所示.
Solid line is the use of different parameters ofJGSA的精度,The best method of baseline dashed lines through each data set of the results obtained.On the other data set also observed a similar trend.
在本文中,We propose a new unsupervised domain adaptive framework,Known as the joint geometry and statistical alignment(JGSA).JGSABy considering the source domain and target domain geometry and statistical characteristics of data,And use the Shared, the characteristics of the specific domain and,Reduce the sphere.
On synthetic data and three different types of visual recognition task of comprehensive experiment to verify the real worldJGSACompared with several kinds of the most advanced domain adaptive methods of the effectiveness of the.
Joint Geometrical and Statistical Alignment for Visual Domain Adaptation
- <开发>实用工具
- DHCP的工作过程
- 如何发现一个有价值的 GameFi?
- JVM类加载简介
- Gartner Hype Cycle:超融合技术将在2年内到达“生产力成熟期”
- Getting Started with Kubernetes Networking
- The difference between a process in user mode and kernel mode [exclusive analysis]
- 新来个技术总监,把DDD落地的那叫一个高级,服气
- Methods commonly used interface automation test framework postman tests
- 手把手基于YOLOv5定制实现FacePose之《YOLO结构解读、YOLO数据格式转换、YOLO过程修改》
sqlite--nested exception is org.apache.ibatis.exceptions.PersistenceException:
[How to smash wool according to the music the couple listens to during the Qixi Festival] Does the background music affect the couple's choice of wine?
ORA-01105 ORA-03175
Introduction to JVM class loading
How DHCP works
oracle create tablespace
day14--postman interface test
Why is this problem reported when installing oracle11
PCIe 核配置
Introduction to JVM class loading
LiveVideoStackCon 2022 上海站明日开幕!
Opencv - video frame skipping processing
GCC: paths to header and library files
ora-00604 ora-02429
ExcelPatternTool: Excel table-database mutual import tool
Lattice PCIe 学习 1
C# const readonly static 关键字区别