ConceptEmbedding
- class hybrid_learning.concepts.models.embeddings.ConceptEmbedding(state_dict, kernel_size, normal_vec_name=None, bias_name=None, scaling_factor=1.0, **meta_info)[source]
Bases:
object
Representation of an embedding of a concept within a DNN. The representation aims to be technology independent.
Main aspects:
the parameters: \(\text{concept vector} = \text{weight}\), \(\text{bias} = -\text{threshold}\)
the layer it is attached to given by the model up to that layer.
Public Data Attributes:
Dictionary to reproduce the instance.
A normal vector to the represented hyperplane.
The bias \(B\) of the represented hyperplane.
A factor \(b\) to obtain the orthogonal support vector \(b\cdot n\) from the normal vector \(n\).
Public Methods:
scale
()Return a new equivalent embedding with
scaling_factor == 1
.save
(filepath[, overwrite])Save the embedding parameters and some description as torch pt file.
distance
(point)Calc the scaled distance of point from the embedding hyperplane.
Yield a new, equivalent embedding with normalized normal vec.
unique
()Yield new, equivalent, unique embedding with normalized normal vec and pos scaling.
Yield new equivalent, unique embedding with normal vec normalized in upper hemisphere.
Return the representation of this embedding with positive scaling.
Return the embedding with the same normal vec and support but scaling factor 1.
Special Methods:
__init__
(state_dict, kernel_size[, ...])Init.
__getattr__
(key)Hand over attribute access to meta_infos if necessary.
__eq__
(other)Convert both embeddings to unique representation and compare values.
__repr__
()Information about concept, model, layer, concept vector and thresh.
__copy__
()Return a copy of this embedding.
__deepcopy__
([memo])Return a deep copy of this embedding.
- Parameters
- __eq__(other)[source]
Convert both embeddings to unique representation and compare values.
- Parameters
other (ConceptEmbedding) –
- __init__(state_dict, kernel_size, normal_vec_name=None, bias_name=None, scaling_factor=1.0, **meta_info)[source]
Init.
- Parameters
state_dict (Dict[str, ndarray]) – numpy representations of a
torch.nn.Module
state dict describing the concept model of this embeddingkernel_size (Tuple[int, ...]) – see
kernel_size
normal_vec_name (Optional[str]) – the key of the concept vector within
state_dict
bias_name (Optional[str]) – the key of the bias within
state_dict
support_factor – the negative concept threshold; calculates as
-bias
over squared normal vec lengthscaling_factor (Union[ndarray, float]) – see
scaling_factor
concept – the concept that is embedded;
model_stump – the model up to the layer of the embedding; part of
meta_info
layer_id – if
model_stump
is not given, optional specification of thelayer_id
; part ofmeta_info
concept_name – if
concept
is not given, optional specification of the name; part ofmeta_info
meta_info – any other meta information
- distance(point)[source]
Calc the scaled distance of point from the embedding hyperplane. The distance from a point pt is given by
\[d(pt) = \text{scaling_factor} \cdot \left((n \circ pt) - b (n \circ n)\right)\]- Parameters
point (ndarray) –
- Return type
- classmethod first(embeddings)[source]
Select a copy of the first element of the list.
- Parameters
embeddings (Union[Sequence[ConceptEmbedding], Sequence[Sequence[ConceptEmbedding]]]) –
- Return type
- forget_scaling()[source]
Return the embedding with the same normal vec and support but scaling factor 1.
- Return type
- classmethod load(filepath)[source]
Load an embedding using
torch.load()
. The format should be as used bysave()
. For .npz files, a legacy loading mechanism based onnumpy.load()
is used.Warning
Be aware that unpickling is used for loading.
- Parameters
filepath (str) –
- classmethod mean(embeddings)[source]
Get the normalized embedding with distance fctn mean of the normalized distance fctns. Consider the non-scaled distance functions of the normalized versions of the given embeddings. Then the condition for the normalized mean embedding is that at any point the distance from the embedding hyperplane to the point is the mean distance of these normalized distances:
\[d_{\frac{n}{|n|}, b\cdot |n|} = mean\left( d_{\frac{n_j}{|n_j|}, |n_j|\cdot b_j} \right)\]The scaling factor in the end is the mean of the scaling factors of the normalized representations of the given embeddings.
- Returns
normalized
- Raises
ValueError
if \(mean\left(\frac{n_j}{|n_j|}\right)\) of the scaled normal vectors \(n_j\) is 0- Parameters
embeddings (Union[Sequence[ConceptEmbedding], Sequence[Sequence[ConceptEmbedding]]]) –
- classmethod mean_by_angle(embeddings)[source]
Get embedding where distance to the given hyperplanes at each point sums up to 0.
The Math Behind
This routine approximates an “average” hyperplane from the given embeddings where here average hyperplane means the one for which the following holds: Given a point \(x\) on the average hyperplane, the signed distances to all hyperplanes along the average hyperplane’s normal vector sum up to zero. The signed distance from \(x\) to a hyperplane H non-orthogonal to the average hyperplane is
\[\left(\left( (R\cdot n + x) \cap H \right) - x \right) \circ n,\]where
\(n\) is the normalized normal vector of the average hyperplane,
\((R \cdot n + x)\) is the 1-dim affine sub-space through \(x\) in the direction of \(n\), and
\(((R \cdot n + x) \cap H)\) is the unique intersection of above line with \(H\).
The average hyperplane has the following properties:
The average hyperplane is unique.
The average normal vector only depends on the normal vectors of the hyperplanes, not their supports/biases.
Given the normalized normal vector n of the average hyperplane, a support vector is given by:
\[\frac{1}{N} \sum_{j=1}^{N} \frac{|b_j|^2}{n \circ b_j} \cdot n\]where the sum goes over the N hyperplanes, \(n\) is a normalized normal vector of the average hyperplane and \(b_j\) is the orthogonal support vector of the jth hyperplane (i.e. a support vector which is a multiple of the normal vector).
Assume normalized normal vectors of the hyperplanes which all lie in the same hypersphere and are given in angle coordinates of the 1-hypersphere. An entry in the average normal vector in angle coordinates is the mean of the entries in the other hyperplane’s normal vectors.
Implementation Notes
- Normal vector:
The normal vector is computationally expensive to calculate (should be the spherical barycenter of the normed normal vectors in one hemisphere) and can be approximated by the normalized barycenter of the normalized normal vectors which lie in the same hemisphere.
- Support:
If the normal vectors do not differ too much, the support can also be approximated by the mean of the orthogonal support vectors (or be considered as an optimisation problem and be learned from the concept data).
- Parameters
embeddings (Union[Sequence[ConceptEmbedding], Sequence[Sequence[ConceptEmbedding]]]) – list of embeddings or list of list of embeddings
- Returns
The embedding representing the average hyperplane of the hyperplanes represented by the given embeddings
- Raises
ValueError
if the mean of the normalized normal vectors \(\frac{n_j}{|n_j|}\) of the given embeddings is 0
- classmethod mean_by_distance(embeddings)[source]
Get embedding with distance measure being the mean of given embs. This routine only works if the mean of the scaled embeddings normal vectors is non-zero.
The distance of a point \(x\) from a hyperplane \((n, b)\) with normal vector \(n\) and support vector \(b\cdot n\) is defined as
\[d_{n,b}(x) = \left((x - b\cdot n) \circ n\right) = x \circ n - b \cdot |n|^2\]For an embedding \((n, b, s)\) with scaling factor s the distance measure is the one of its scaled version \((s n, \frac{b}{s}, 1)\), which turns out to be
\[d_{s n, \frac{b}{s}} = s \cdot d_{n,b}\]This routine determines the “average” hyperplane for the given embeddings, where here average hyperplane \((n, b)\) means the one with the following property:
\[d_{n,b} = mean(d_{n_j,b_j}) = \frac 1 N \sum_{j=1}^{N} d_{n_j,b_j}\]i.e. at any point \(x\) in space the distance of the average hyperplane to \(x\) is the mean of the distances of all N given hyperplanes \((n_j,b_j)\) to \(x\). It is unique (the points on the plane are those with distance 0 and thus all the same), and given by the following combination (with scaling factor 1):
\[\begin{split}n &= mean(n_j) \\ b &= \frac{1}{|n|^2} mean(b_j \cdot |n_j|^2)\end{split}\]Possible problems: This will weight the contribution of the given embeddings by their confidence, i.e. their scaling factor. To avoid this, the mean can be taken over the normalized versions with scaling factor set to one and the scaling factor of the mean can be determined by confidence calibration.
- Returns
embedding describing the hyperplane with above properties
- Raises
ValueError if the mean of the scaled normal vectors of the given embeddings is 0
- Parameters
embeddings (Union[Sequence[ConceptEmbedding], Sequence[Sequence[ConceptEmbedding]]]) –
- Return type
- normalize()[source]
Yield a new, equivalent embedding with normalized normal vec. The sign of the scaling factor is not changed.
- Return type
- save(filepath, overwrite=True)[source]
Save the embedding parameters and some description as torch pt file. Load the embedding using
load()
.
- classmethod std_deviation(embeddings, ddof=1)[source]
Get the (by default unbiased) standard deviation of a list of embs. The standard deviations are calculated on the unique normalized representations of the embeddings, and encompass standard deviation of:
the normal vector
the support vector factor (= distance to 0)
the scaling factor (= length of the normal vector).
The deviations are calculated as the square root of the variances (see
variance()
).- Parameters
embeddings (Union[Sequence[ConceptEmbedding], Sequence[Sequence[ConceptEmbedding]]]) – sequence of embeddings
ddof (int) – delta degrees of freedom: the divisor used in calculations is \(\text{num_embeddings} - \text{ddof}\); if
ddof=1
(default), the unbiased standard deviation is obtained
- Returns
Tuple of standard deviation of
(normal vecs, support factors, scaling factors)
for normalized representations of given embeddings- Return type
- to_pos_scaling()[source]
Return the representation of this embedding with positive scaling.
- Return type
- unique()[source]
Yield new, equivalent, unique embedding with normalized normal vec and pos scaling.
- Return type
- unique_upper_sphere()[source]
Yield new equivalent, unique embedding with normal vec normalized in upper hemisphere.
An embedding defines a hyperplane as follows:
the \(weight\) is a (not necessarily normalized) normal vector of the hyperplane
\(bias \cdot weight\) is a support vector orthogonal to the plane
This representation is not unique. In many cases it is desirable to consider the representation where the normal vector is normalized, and lies on the upper half of a given sphere (including the equator). To also obtain unique results for the equator cases, the rule is that, when flattened, the first non-zero entry is positive. The representation obtained then as follows is unique (sign(weight) is the sign of the first non-zero entry when flattened):
\[\begin{split}weight_{new} &= sign(weight) \cdot \frac{weight} {|weight|} \\ bias_{new} &= sign(weight) \cdot (bias \cdot |weight|)\end{split}\]Then the weight is normalized and
\[weight_{new} \cdot bias_{new} = weight \cdot bias\]is still an orthogonal support vector. Two equivalent representations will yield the same such normalized embedding.
- Returns
Equivalent embedding where the weight of the output embedding is normalized and, when flattened, the weight’s first non-zero entry is positive
- Raises
ValueError
, if the weight of the embedding is zero- Return type
- classmethod variance(embeddings, ddof=1)[source]
Get the variance of a list of embeddings (by default unbiased). The variances are calculated on the unique normalized representations of the embeddings, and encompass variance of:
the normal vector
the support vector factor (= distance to 0)
the scaling factor (= length of the normal vector).
- Parameters
embeddings (Union[Sequence[ConceptEmbedding], Sequence[Sequence[ConceptEmbedding]]]) – sequence of embeddings to take variance of
ddof (int) – delta degrees of freedom: the divisor used in calculations is \(\text{num_embeddings} - \text{ddof}\); if
ddof=1
(default), the unbiased variance is obtained
- Returns
Tuple of variance of
(normal vecs, support factors, scaling factors)
for normalized representations of given embeddings- Return type
- __hash__ = None
- property bias: Optional[numpy.ndarray]
The bias \(B\) of the represented hyperplane. A vector \(v\) is on the hyperplane defined by the normal vector \(n\) and the bias \(B\) iff
\[0 = d(v) = v \circ n + B\]
- property normal_vec: Optional[numpy.ndarray]
A normal vector to the represented hyperplane.
- scaling_factor: numpy.ndarray
The factor to obtain the original normal vector. Only applies if a normal vector is given (see
normal_vec_name
). Any two embeddings with normal vectors \(n_1, n_2\) and support factors \(b_1, b_2\) fulfilling the following represent the same hyperplane:\begin{align*} \frac{|n_1 \circ n_2|} {(|n_1| \cdot |n_2|)} &= 1 &\text{and} && \frac{|n_1|} {|n_2|} &= \frac{b_2} {b_1} \end{align*}However, the signed orthogonal distance measure of an embedding \((n, b)\) for a vector \(v\)
\[d(v) = (v - b \cdot n) \circ n = |n| \cdot \left(v \circ \frac{n}{|n|}\right) - b\cdot|n|^2\]which is used e.g. in concept layers, depends quadratic on the normal vector length. If the hyperplane representation is changed, the original normal vector and support factor providing the original distance measure can be obtained via
\[\left(n \cdot \text{scaling_factor} , \frac{b}{\text{scaling_factor}})\right.\]Examples: The scaling_factor is 1 if the original weight was not changed, and \(|weight|\) if it was normalized.
- state_dict: Dict[str, numpy.ndarray]
The concept model’s state dict. Assumed to be the result of a call to :py:meth`torch.nn.Module.state_dict`.
- property support_factor: Optional[float]
A factor \(b\) to obtain the orthogonal support vector \(b\cdot n\) from the normal vector \(n\). A vector \(v\) is on the hyperplane iff
\[0 = d(v) = (v - b\cdot n) \circ n = v \circ n - b\cdot |n|^2\]Here, \(d(v)\) denotes the signed orthogonal distance of \(v\) from the hyperplane (cf.
bias
). If given, it is calculated from the bias \(B\) and the normal vector \(n\) as \(-\frac{B}{\|n\|^2}\).