Conducting a Concept Analysis

Concept analysis refers to

finding concept embeddings within the latent spaces of a DNN,
assessing their strength (i.e. prediction performance as concept models),
possibly *aggregating* several candidates to a single one for stabilization, and
selecting the final embedding for a concept.

The base class holding functionality for concept analysis and logging is ConceptAnalysis:

>>> from hybrid_learning.concepts.analysis import ConceptAnalysis

In the following, an exemplary concept analysis is conducted

of the concept “face”
on a AlexNet model
in the backbone convolutional layer features.5
with 3 cross-validation splits

Preparation: Getting the concept
Collecting embeddings and performances
Aggregation and selection
Performance assessment
Using an embedding

Preparation: Getting the concept 

First initialize the concept and the main model to analyse:

>>> # Main model & layers to analyse
>>> from torchvision.models import alexnet
>>> main_model = alexnet(pretrained=True)
>>> layers = ['features.5']  # more layers can be added
>>>
>>> # Concept data and concept
>>> import os
>>> from hybrid_learning.datasets.custom import coco
>>> from hybrid_learning.datasets import DataTriple
>>> root = os.path.join("dataset", "coco_test", "images", "{}2017")
>>> data = DataTriple(
...   train_val=coco.ConceptDataset([coco.BodyParts.FACE],
...       split="train", dataset_root=root.format("train"), img_size=(224, 224)).subset(num=10),
...   test=coco.ConceptDataset([coco.BodyParts.FACE],
...       split="test", dataset_root=root.format("val"), img_size=(224, 224)).subset(num=3)
... )
>>> from hybrid_learning.concepts.concepts import SegmentationConcept2D
>>> concept=SegmentationConcept2D("face", data, rel_size=(0.05, 0.05))

Collecting embeddings and performances 

Now one can start an analysis with the desired amounts of runs on the layers of interest. If the logging level is set to INFO, intermediate results are logged.

>>> analyser = ConceptAnalysis(
...     concept=concept, model=main_model,
...     # layer info (Iterable with IDs of layers to analyse)
...     layer_infos=layers,
...     # the number of independent cross-validation runs per layer
...     cross_val_runs=1,
...     # the number of runs/splits per cross-validation run
...     num_val_splits=3,
... )
>>> analysis_results = analyser.analysis()

The analysis results format can be turned into a pandas.DataFrame:

>>> run_info = analysis_results.to_pandas()

Aggregation and selection 

The layer-wise best embedding is collected by aggregating the results for each layer (see embedding_reduction()). To obtain the final best embedding over all layers, their evaluation results are compared and the best one is selected (see best_layer_from_stats()). Both steps are united in best_embedding(), which directly returns the best embedding. To automate experiment saving use best_embedding_with_logging() instead.

Note

If no analysis results are given, a complete new analysis is conducted.

>>> best_emb = analyser.best_embedding(analysis_results)
>>> type(best_emb)
<class 'hybrid_learning.concepts.models.embeddings.ConceptEmbedding'>

Performance assessment 

Prediction performance 

The performance of an embedding can be (re-)evaluated with respect to the analysis settings by using evaluate_embedding().

>>> from hybrid_learning.concepts.analysis.results import ResultsHandle
>>> best_emb_stats = analyser.evaluate_embedding(best_emb)
>>> print(ResultsHandle.emb_info_to_string(best_emb, best_emb_stats))
test_loss         ...
test_set_iou      ...
test_mean_iou     ...
normal vec len    ...
support factor    ...
scaling factor    ...

Variance and standard deviation 

The variance and standard deviation of different runs within one layer can be obtained via the embedding functionalities:

>>> # Embeddings for this layer from the analysis_results:
>>> embeddings = [e for embs, stats in
...               analysis_results.results[best_emb.layer_id].values()
...               for e in embs]
>>> # Variances of the different embedding aspects:
>>> from hybrid_learning.concepts.models import ConceptEmbedding
>>> std_normal_vec, std_support_factor, std_scaling_factor = \
...     ConceptEmbedding.std_deviation(embeddings)

Cosine distance 

Cosine distance between concept embeddings from one layer can be calculated as follows (including the layer’s best_emb in this case):

>>> import numpy as np
>>> import pandas as pd
>>> # pair-wise cosines with last row and column the best_embedding:
>>> all_vecs = ([e.normal_vec for e in embeddings] + [best_emb.normal_vec])
>>> keys = list(range(len(embeddings))) + ['best_emb']
>>> pairwise_cosines = pd.DataFrame([
...     # cosine dist between two vectors:
...     [np.sum(n1 * n2)/(np.linalg.norm(n1)*np.linalg.norm(n2))
...      for n1 in all_vecs]
...     for n2 in all_vecs], index = keys, columns = keys)
>>> # Mean cosine distance of best to other embeddings in the layer:
>>> mean_pairwise_cosine = pairwise_cosines.iloc[:-1, -1].mean()

Using an embedding 

Finally, the embedding can be used to initialize a concept model (here fed to visualization):

>>> from hybrid_learning.concepts.models import ConceptDetectionModel2D
>>> from hybrid_learning.concepts.models import ConceptDetection2DTrainTestHandle
>>> best_concept_model = ConceptDetectionModel2D.from_embedding(best_emb)
>>> from hybrid_learning.concepts.analysis import data_for_concept_model, visualization as vis
>>> vis.visualize_concept_model(
...     ConceptDetection2DTrainTestHandle(
...         best_concept_model,
...         data_for_concept_model(best_concept_model),
...         **analyser.train_val_args),
...     save_as=None
... )