ConceptAnalysis

class hybrid_learning.concepts.analysis.analysis_handle.ConceptAnalysis(concept, model, layer_infos=None, cross_val_runs=1, num_val_splits=5, emb_reduction=EmbeddingReduction.MEAN_NORMALIZED_DIST, concept_model_args=None, train_val_args=None, data_args=None, after_layer_hook=None)[source]

Bases: object

Handle for conducting a concept embedding analysis. Saves the analysis settings and can run a complete analysis.

The core methods are:

  • analysis(): plain analysis (collect \(\text{cross_val_runs}\cdot\text{num_val_splits}\) embeddings for each layer in :py:attr`layer_infos`)

  • best_embedding(): aggregate embeddings of an analysis per layer, then choose best one

  • best_embedding_with_logging(): combination of the latter two with automatic logging and result saving

Public Data Attributes:

num_runs

The total number of runs that are conducted per layer.

settings

Settings dict to reproduce instance.

Public Methods:

best_embedding([analysis_results])

Conduct an analysis and from results derive the best embedding.

analysis()

Conduct a concept embedding analysis.

analysis_for_layer(layer_id)

Get a concept embedding of the given concept in the given layer.

concept_model_handle([c_model, emb, layer_id])

Train and eval handle for the given concept model.

concept_model_for_layer(layer_id)

Return a concept model for the given layer ID.

concept_model_from_embedding(emb)

Get concept model from embedding for training and eval.

data_for_concept_model([c_model, layer_id])

Get the concept model data for this instance.

evaluate_embedding(embedding[, log_prefix])

Evaluate the embedding on its concept test data.

embedding_reduction(results_per_run)

Aggregate the embeddings collected in results_per_run to a best one.

fill_train_data_infos()

Collect layer-wise information about the corresponding concept model.

best_embedding_with_logging(concept_exp_root)

Conduct an analysis, collect mean and best embeddings, and save and log all results.

Special Methods:

__init__(concept, model[, layer_infos, ...])

Init.

__repr__()

Return repr(self).


Parameters
__init__(concept, model, layer_infos=None, cross_val_runs=1, num_val_splits=5, emb_reduction=EmbeddingReduction.MEAN_NORMALIZED_DIST, concept_model_args=None, train_val_args=None, data_args=None, after_layer_hook=None)[source]

Init.

Parameters
  • concept (Concept) – concept to find the embedding of

  • model (Module) – the DNN

  • layer_infos (Optional[Union[Dict[str, Dict[str, Any]], Sequence[str]]]) –

    information about the layers in which to look for the best concept embedding; it may be given either as sequence of layer IDs or as dict where the indices are the layer keys in the model’s torch.nn.Module.named_modules() dict; used keys:

    • kernel_size: fixed kernel size to use for this layer (overrides value from concept_model_args)

    • lr: learning rate to use

  • num_val_splits (int) – the number of validation splits to use for each cross-validation run

  • cross_val_runs (int) – for a layer, several concept models are trained in different runs; the runs differ by model initialization, and the validation data split; cross_val_runs is the number of cross-validation runs, i.e. collections of runs with num_val_splits distinct validation sets each

  • emb_reduction (EmbeddingReduction) – aggregation function to reduce list of embeddings to one

  • concept_model_args (Optional[Dict[str, Any]]) – dict with arguments for the concept model initialization

  • train_val_args (Optional[Dict[str, Any]]) – any further arguments to initialize the concept model handle; a loss and a metric are added by default

  • data_args (Optional[Dict[str, Any]]) – any further arguments to initialize the training and eval data tuple using data_for_concept_model()

  • after_layer_hook (Optional[Callable[[AnalysisResult, ConceptDetection2DTrainTestHandle], Any]]) – see after_layer_hook

__repr__()[source]

Return repr(self).

analysis()[source]

Conduct a concept embedding analysis.

For each layer in layer_infos:

Returns

a analysis result object holding a dictionary of {layer_id: {run: (embedding, pandas.Series with {pre_: metric_val}}}

Return type

AnalysisResult

analysis_for_layer(layer_id)[source]

Get a concept embedding of the given concept in the given layer. A number of cross_val_runs cross validation runs is conducted with each num_val_splits non-intersecting splits for the validation data. In case num_val_splits is 1, just cross_val_runs a normal training runs are conducted.

After the analysis is completed, after_layer_hook is called.

Parameters

layer_id (str) – ID of the layer to find embedding in; key in layer_infos

Returns

an analysis result object holding only information on this layer

Return type

AnalysisResult

best_embedding(analysis_results=None)[source]

Conduct an analysis and from results derive the best embedding.

Parameters

analysis_results (Optional[Dict[str, Dict[int, Tuple[ConceptEmbedding, Series]]]]) – optionally the results of a previously run analysis; defaults to running a new analysis via analysis()

Returns

the determined best embedding of all layers analysed

Return type

ConceptEmbedding

best_embedding_with_logging(concept_exp_root, logger=None, file_logging_formatter=None, log_file='log.txt', img_fp_templ='{}.png', visualization_transform=None)[source]

Conduct an analysis, collect mean and best embeddings, and save and log all results.

Saved results

Saved visualizations

  • visualization of the training data

  • visualization of the final best embedding on some test data samples

  • visualization of the best embedding and each embedding in its layer for comparison (the best embedding is a kind of mean of the embeddings from its layer)

  • visualization of the aggregated embeddings of each layer for comparison

Parameters
  • concept_exp_root (str) – the root directory in which to save results for this part

  • logger (Optional[Logger]) – the logger to use for file logging; defaults to the module level logger; for the analysis, the logging level is set to logging.INFO

  • file_logging_formatter (Optional[Formatter]) – if given, the formatter for the file logging

  • log_file (str) – the path to the logfile to use relative to concept_exp_root

  • img_fp_templ (Optional[str]) – template for the path of image files relative to concept_exp_root; must include one '{}' formatting variable

  • visualization_transform (Optional[Callable[[Any, Any], Tuple[Tensor, Tensor]]]) – a transformation applied to the tuple of concept model output and ground truth mask before visualization as mask

Returns

the found best embedding for that part

Return type

ConceptEmbedding

static best_layer_from_stats(results_per_layer)[source]

From the embedding quality results per layer, select the best layer. For segmentation concepts, select by set IoU.

Parameters

results_per_layer (BestEmbeddingResult) – a best embedding result object

Returns

layer ID with best stats

Return type

str

concept_model_for_layer(layer_id)[source]

Return a concept model for the given layer ID.

Parameters

layer_id – ID of the layer the concept model should be attached to; key in layer_infos

Returns

concept model for concept attached to given layer in model

concept_model_from_embedding(emb)[source]

Get concept model from embedding for training and eval.

Parameters

emb (Union[ConceptEmbedding, Sequence[ConceptEmbedding]]) –

Return type

ConceptDetectionModel2D

concept_model_handle(c_model=None, emb=None, layer_id=None)[source]

Train and eval handle for the given concept model. The concept model to handle can either be specified directly or is created from an embedding or from a given layer_id.

Parameters
Returns

a handle for the specified or created concept model

Return type

ConceptDetection2DTrainTestHandle

data_for_concept_model(c_model=None, layer_id=None)[source]

Get the concept model data for this instance.

Parameters
Return type

DataTriple

embedding_reduction(results_per_run)[source]

Aggregate the embeddings collected in results_per_run to a best one. This is a wrapper with standard deviation and stats collection and logging around a call to emb_reduction().

Parameters

results_per_run (AnalysisResult) – analysis result object as returned by analysis()

Returns

a best embedding result object holding one tuple entry of

  • an aggregated (“mean”) embedding for the concept and the layer,

  • the standard deviation values of the normal vectors,

  • the stats for the chosen “mean” embedding

Return type

BestEmbeddingResult

evaluate_embedding(embedding, log_prefix=None)[source]

Evaluate the embedding on its concept test data.

Parameters
fill_train_data_infos()[source]

Collect layer-wise information about the corresponding concept model. The results are stored into layer_infos and returned as pandas.DataFrame.

Return type

DataFrame

after_layer_hook: Callable[[AnalysisResult, ConceptDetection2DTrainTestHandle], Any]

Callable that is called after each layer analysis with the analysis result and the used training handle as arguments. Can e.g. be used to clear dataset caches or store the result.

concept: Concept

The concept to find the embedding for.

concept_model_args: Dict[str, Any]

Any arguments for initializing a new concept model.

cross_val_runs: int

The number of cross-validation runs to conduct for each layer. A cross-validation run consists of num_val_splits training runs with distinct validation sets. The resulting embeddings of all runs of all cross-validation runs are then used to obtain the layer’s best concept embedding.

data_args: Dict[str, Any]

Any arguments except for the concept model specifiers to the concept model data initializer. See data_for_concept_model() for details.

emb_reduction: EmbeddingReduction

Aggregation function to reduce a list of embeddings from several runs to one.

layer_infos: Dict[str, Dict[str, Any]]

Information about the layers in which to look for the best concept embedding; the indices are the layer keys in the model’s torch.nn.Module.named_modules() dict

model: torch.nn.modules.module.Module

The model in which to find the embedding.

property num_runs: int

The total number of runs that are conducted per layer.

num_val_splits: int

The number of validation splits per cross-validation run. If set to 1, no cross-validation is conducted but simply a number of cross_val_runs training runs. See analysis_for_layer().

property settings: Dict[str, Any]

Settings dict to reproduce instance.

train_val_args: Dict[str, Any]

Any training and evaluation arguments for the concept model initialization.