ConceptEmbedding

class hybrid_learning.concepts.models.embeddings.ConceptEmbedding(state_dict, kernel_size, normal_vec_name=None, bias_name=None, scaling_factor=1.0, **meta_info)[source]

Bases: object

Representation of an embedding of a concept within a DNN. The representation aims to be technology independent.

Main aspects:

the parameters: \(\text{concept vector} = \text{weight}\), \(\text{bias} = -\text{threshold}\)
the layer it is attached to given by the model up to that layer.

Public Data Attributes:

`settings`	Dictionary to reproduce the instance.
`normal_vec`	A normal vector to the represented hyperplane.
`bias`	The bias \(B\) of the represented hyperplane.
`support_factor`	A factor \(b\) to obtain the orthogonal support vector \(b\cdot n\) from the normal vector \(n\).

Public Methods:

`scale`()	Return a new equivalent embedding with `scaling_factor == 1`.
`save`(filepath[, overwrite])	Save the embedding parameters and some description as torch pt file.
`distance`(point)	Calc the scaled distance of point from the embedding hyperplane.
`normalize`()	Yield a new, equivalent embedding with normalized normal vec.
`unique`()	Yield new, equivalent, unique embedding with normalized normal vec and pos scaling.
`unique_upper_sphere`()	Yield new equivalent, unique embedding with normal vec normalized in upper hemisphere.
`to_pos_scaling`()	Return the representation of this embedding with positive scaling.
`forget_scaling`()	Return the embedding with the same normal vec and support but scaling factor 1.

Special Methods:

`__init__`(state_dict, kernel_size[, ...])	Init.
`__getattr__`(key)	Hand over attribute access to meta_infos if necessary.
`__eq__`(other)	Convert both embeddings to unique representation and compare values.
`__repr__`()	Information about concept, model, layer, concept vector and thresh.
`__copy__`()	Return a copy of this embedding.
`__deepcopy__`([memo])	Return a deep copy of this embedding.

Parameters

state_dict (Dict[str, ndarray]) –
kernel_size (Tuple[int, ...]) –
normal_vec_name (str) –
bias_name (str) –
scaling_factor (Union[ndarray, float]) –

__copy__()[source]

Return a copy of this embedding.

Return type: ConceptEmbedding

__deepcopy__(memo=None)[source]: Return a deep copy of this embedding.

__eq__(other)[source]

Convert both embeddings to unique representation and compare values.

Parameters: other (ConceptEmbedding) –

__getattr__(key)[source]: Hand over attribute access to meta_infos if necessary.

__init__(state_dict, kernel_size, normal_vec_name=None, bias_name=None, scaling_factor=1.0, **meta_info)[source]

Init.

Parameters

state_dict (Dict[str, ndarray]) – numpy representations of a torch.nn.Module state dict describing the concept model of this embedding
kernel_size (Tuple[int, ...]) – see kernel_size
normal_vec_name (Optional[str]) – the key of the concept vector within state_dict
bias_name (Optional[str]) – the key of the bias within state_dict
support_factor – the negative concept threshold; calculates as -bias over squared normal vec length
scaling_factor (Union[ndarray, float]) – see scaling_factor
concept – the concept that is embedded;
model_stump – the model up to the layer of the embedding; part of meta_info
layer_id – if model_stump is not given, optional specification of the layer_id; part of meta_info
concept_name – if concept is not given, optional specification of the name; part of meta_info
meta_info – any other meta information

__repr__()[source]

Information about concept, model, layer, concept vector and thresh.

Return type: str

distance(point)[source]

Calc the scaled distance of point from the embedding hyperplane. The distance from a point pt is given by

\[d(pt) = \text{scaling_factor} \cdot \left((n \circ pt) - b (n \circ n)\right)\]

Parameters: point (ndarray) –
Return type: float

classmethod first(embeddings)[source]

Select a copy of the first element of the list.

Parameters: embeddings (Union[Sequence[ConceptEmbedding], Sequence[Sequence[ConceptEmbedding]]]) –
Return type: ConceptEmbedding

forget_scaling()[source]

Return the embedding with the same normal vec and support but scaling factor 1.

Return type: ConceptEmbedding

classmethod load(filepath)[source]

Load an embedding using torch.load(). The format should be as used by save(). For .npz files, a legacy loading mechanism based on numpy.load() is used.

Warning

Be aware that unpickling is used for loading.

Parameters: filepath (str) –

classmethod mean(embeddings)[source]

Get the normalized embedding with distance fctn mean of the normalized distance fctns. Consider the non-scaled distance functions of the normalized versions of the given embeddings. Then the condition for the normalized mean embedding is that at any point the distance from the embedding hyperplane to the point is the mean distance of these normalized distances:

\[d_{\frac{n}{|n|}, b\cdot |n|} = mean\left( d_{\frac{n_j}{|n_j|}, |n_j|\cdot b_j} \right)\]

The scaling factor in the end is the mean of the scaling factors of the normalized representations of the given embeddings.

Returns: normalized
Raises: ValueError if \(mean\left(\frac{n_j}{|n_j|}\right)\) of the scaled normal vectors \(n_j\) is 0
Parameters: embeddings (Union[Sequence[ConceptEmbedding], Sequence[Sequence[ConceptEmbedding]]]) –

classmethod mean_by_angle(embeddings)[source]

Get embedding where distance to the given hyperplanes at each point sums up to 0.

The Math Behind

This routine approximates an “average” hyperplane from the given embeddings where here average hyperplane means the one for which the following holds: Given a point \(x\) on the average hyperplane, the signed distances to all hyperplanes along the average hyperplane’s normal vector sum up to zero. The signed distance from \(x\) to a hyperplane H non-orthogonal to the average hyperplane is

\[\left(\left( (R\cdot n + x) \cap H \right) - x \right) \circ n,\]

where

\(n\) is the normalized normal vector of the average hyperplane,

\((R \cdot n + x)\) is the 1-dim affine sub-space through \(x\) in the direction of \(n\), and

\(((R \cdot n + x) \cap H)\) is the unique intersection of above line with \(H\).

The average hyperplane has the following properties:

The average hyperplane is unique.
The average normal vector only depends on the normal vectors of the hyperplanes, not their supports/biases.
Given the normalized normal vector n of the average hyperplane, a support vector is given by:

\[\frac{1}{N} \sum_{j=1}^{N} \frac{|b_j|^2}{n \circ b_j} \cdot n\]

where the sum goes over the N hyperplanes, \(n\) is a normalized normal vector of the average hyperplane and \(b_j\) is the orthogonal support vector of the jth hyperplane (i.e. a support vector which is a multiple of the normal vector).
Assume normalized normal vectors of the hyperplanes which all lie in the same hypersphere and are given in angle coordinates of the 1-hypersphere. An entry in the average normal vector in angle coordinates is the mean of the entries in the other hyperplane’s normal vectors.

Implementation Notes

Normal vector:: The normal vector is computationally expensive to calculate (should be the spherical barycenter of the normed normal vectors in one hemisphere) and can be approximated by the normalized barycenter of the normalized normal vectors which lie in the same hemisphere.
Support:: If the normal vectors do not differ too much, the support can also be approximated by the mean of the orthogonal support vectors (or be considered as an optimisation problem and be learned from the concept data).

Parameters: embeddings (Union[Sequence[ConceptEmbedding], Sequence[Sequence[ConceptEmbedding]]]) – list of embeddings or list of list of embeddings
Returns: The embedding representing the average hyperplane of the hyperplanes represented by the given embeddings
Raises: ValueError if the mean of the normalized normal vectors \(\frac{n_j}{|n_j|}\) of the given embeddings is 0

classmethod mean_by_distance(embeddings)[source]

Get embedding with distance measure being the mean of given embs. This routine only works if the mean of the scaled embeddings normal vectors is non-zero.

The distance of a point \(x\) from a hyperplane \((n, b)\) with normal vector \(n\) and support vector \(b\cdot n\) is defined as

\[d_{n,b}(x) = \left((x - b\cdot n) \circ n\right) = x \circ n - b \cdot |n|^2\]

For an embedding \((n, b, s)\) with scaling factor s the distance measure is the one of its scaled version \((s n, \frac{b}{s}, 1)\), which turns out to be

\[d_{s n, \frac{b}{s}} = s \cdot d_{n,b}\]

This routine determines the “average” hyperplane for the given embeddings, where here average hyperplane \((n, b)\) means the one with the following property:

\[d_{n,b} = mean(d_{n_j,b_j}) = \frac 1 N \sum_{j=1}^{N} d_{n_j,b_j}\]

i.e. at any point \(x\) in space the distance of the average hyperplane to \(x\) is the mean of the distances of all N given hyperplanes \((n_j,b_j)\) to \(x\). It is unique (the points on the plane are those with distance 0 and thus all the same), and given by the following combination (with scaling factor 1):

\[\begin{split}n &= mean(n_j) \\ b &= \frac{1}{|n|^2} mean(b_j \cdot |n_j|^2)\end{split}\]

Possible problems: This will weight the contribution of the given embeddings by their confidence, i.e. their scaling factor. To avoid this, the mean can be taken over the normalized versions with scaling factor set to one and the scaling factor of the mean can be determined by confidence calibration.

Returns: embedding describing the hyperplane with above properties
Raises: ValueError if the mean of the scaled normal vectors of the given embeddings is 0
Parameters: embeddings (Union[Sequence[ConceptEmbedding], Sequence[Sequence[ConceptEmbedding]]]) –
Return type: ConceptEmbedding

normalize()[source]

Yield a new, equivalent embedding with normalized normal vec. The sign of the scaling factor is not changed.

Return type: ConceptEmbedding

save(filepath, overwrite=True)[source]

Save the embedding parameters and some description as torch pt file. Load the embedding using load().

Parameters

filepath (str) –
overwrite (bool) –

scale()[source]

Return a new equivalent embedding with scaling_factor == 1.

Return type: ConceptEmbedding

classmethod std_deviation(embeddings, ddof=1)[source]

Get the (by default unbiased) standard deviation of a list of embs. The standard deviations are calculated on the unique normalized representations of the embeddings, and encompass standard deviation of:

the normal vector
the support vector factor (= distance to 0)
the scaling factor (= length of the normal vector).

The deviations are calculated as the square root of the variances (see variance()).

Parameters

embeddings (Union[Sequence[ConceptEmbedding], Sequence[Sequence[ConceptEmbedding]]]) – sequence of embeddings
ddof (int) – delta degrees of freedom: the divisor used in calculations is \(\text{num_embeddings} - \text{ddof}\); if ddof=1 (default), the unbiased standard deviation is obtained

Returns

Tuple of standard deviation of (normal vecs, support factors, scaling factors) for normalized representations of given embeddings

Return type

Tuple[ndarray, float, float]

to_pos_scaling()[source]

Return the representation of this embedding with positive scaling.

Return type: ConceptEmbedding

unique()[source]

Yield new, equivalent, unique embedding with normalized normal vec and pos scaling.

Return type: ConceptEmbedding

unique_upper_sphere()[source]

Yield new equivalent, unique embedding with normal vec normalized in upper hemisphere.

An embedding defines a hyperplane as follows:

the \(weight\) is a (not necessarily normalized) normal vector of the hyperplane
\(bias \cdot weight\) is a support vector orthogonal to the plane

This representation is not unique. In many cases it is desirable to consider the representation where the normal vector is normalized, and lies on the upper half of a given sphere (including the equator). To also obtain unique results for the equator cases, the rule is that, when flattened, the first non-zero entry is positive. The representation obtained then as follows is unique (sign(weight) is the sign of the first non-zero entry when flattened):

\[\begin{split}weight_{new} &= sign(weight) \cdot \frac{weight} {|weight|} \\ bias_{new} &= sign(weight) \cdot (bias \cdot |weight|)\end{split}\]

Then the weight is normalized and

\[weight_{new} \cdot bias_{new} = weight \cdot bias\]

is still an orthogonal support vector. Two equivalent representations will yield the same such normalized embedding.

Returns: Equivalent embedding where the weight of the output embedding is normalized and, when flattened, the weight’s first non-zero entry is positive
Raises: ValueError, if the weight of the embedding is zero
Return type: ConceptEmbedding

classmethod variance(embeddings, ddof=1)[source]

Get the variance of a list of embeddings (by default unbiased). The variances are calculated on the unique normalized representations of the embeddings, and encompass variance of:

the normal vector
the support vector factor (= distance to 0)
the scaling factor (= length of the normal vector).

Parameters

embeddings (Union[Sequence[ConceptEmbedding], Sequence[Sequence[ConceptEmbedding]]]) – sequence of embeddings to take variance of
ddof (int) – delta degrees of freedom: the divisor used in calculations is \(\text{num_embeddings} - \text{ddof}\); if ddof=1 (default), the unbiased variance is obtained

Returns

Tuple of variance of (normal vecs, support factors, scaling factors) for normalized representations of given embeddings

Return type

Tuple[ndarray, float, float]

__hash__ = None

property bias: Optional[numpy.ndarray]: The bias \(B\) of the represented hyperplane. A vector \(v\) is on the hyperplane defined by the normal vector \(n\) and the bias \(B\) iff

\[0 = d(v) = v \circ n + B\]

bias_name: Optional[str]: The key of the concept model bias within the state_dict.

kernel_size: Tuple[int, ...]: The kernel size used by the concept model convolution.

meta_info: Dict[str, Any]: Any further meta information.

property normal_vec: Optional[numpy.ndarray]: A normal vector to the represented hyperplane.

normal_vec_name: Optional[str]: The key of the concept vector parameter within the state_dict.

scaling_factor: numpy.ndarray

The factor to obtain the original normal vector. Only applies if a normal vector is given (see normal_vec_name). Any two embeddings with normal vectors \(n_1, n_2\) and support factors \(b_1, b_2\) fulfilling the following represent the same hyperplane:

\begin{align*} \frac{|n_1 \circ n_2|} {(|n_1| \cdot |n_2|)} &= 1 &\text{and} && \frac{|n_1|} {|n_2|} &= \frac{b_2} {b_1} \end{align*}

However, the signed orthogonal distance measure of an embedding \((n, b)\) for a vector \(v\)

\[d(v) = (v - b \cdot n) \circ n = |n| \cdot \left(v \circ \frac{n}{|n|}\right) - b\cdot|n|^2\]

which is used e.g. in concept layers, depends quadratic on the normal vector length. If the hyperplane representation is changed, the original normal vector and support factor providing the original distance measure can be obtained via

\[\left(n \cdot \text{scaling_factor} , \frac{b}{\text{scaling_factor}})\right.\]

Examples: The scaling_factor is 1 if the original weight was not changed, and \(|weight|\) if it was normalized.

property settings: Dict[str, Any]: Dictionary to reproduce the instance.

state_dict: Dict[str, numpy.ndarray]: The concept model’s state dict. Assumed to be the result of a call to :py:meth`torch.nn.Module.state_dict`.

property support_factor: Optional[float]

A factor \(b\) to obtain the orthogonal support vector \(b\cdot n\) from the normal vector \(n\). A vector \(v\) is on the hyperplane iff

\[0 = d(v) = (v - b\cdot n) \circ n = v \circ n - b\cdot |n|^2\]

Here, \(d(v)\) denotes the signed orthogonal distance of \(v\) from the hyperplane (cf. bias). If given, it is calculated from the bias \(B\) and the normal vector \(n\) as \(-\frac{B}{\|n\|^2}\).