Introduction

Definition

$c$, $\lambda$, $\delta$ - a single number or a scaler of constant value, weights on loss function, and a small difference

$x$ - a tensor of n dimension array. Generally, we use $x$ as the instance, or the input of forward model (e.g. MRI images in a segmentation model) and $y$ as the labels (e.g. manual contour in a segmentation task).

$x_i$ - the ith element of the tensor $x$

$x^T$ - is the transpose of a tensor $x$

$X$ - the collection or set of $x$.

$\mathcal{D}$ - the domain, which consists of a feature space $X$ and a maginal distribution p(X). $\mathcal{D}={X,p(x)}$ and an instance in the domain is $X={x|x_i \in X}$

$\mathcal{T}$ - a task,

$s$ - source

$t$ - target

$\theta$ - parameters in the model

$p$ - the distribution

$f$ - the decision function which can be implicit (e.g. the operation of segmentation). Simply, we use $f$ standing for the true, $\hat{f}$ for an estimation or prediction of $f$, and $f^*$ for the optimal function.

$|*|$ - the norm

Concepts

There are many concepts and method names which can be similar but slightly different. Here we try to distinguish them.

Domain adaption vs transfer learning. Domain adaption is to reduce the differences between domains, while transfer learning is to improve the performance of the target domain. Therefore, the goal of the above two are the same, but domain adaption can be the preprocessing step of transfer learning. For example, when the transfer learning tries to do it in the features space, and tries to find a common latent space of the source and target domains (symmetrical transfer), it is close to the domain adaption.