ConceptAnalysis

class hybrid_learning.concepts.analysis.analysis_handle.ConceptAnalysis(concept, model, layer_infos=None, cross_val_runs=1, num_val_splits=5, emb_reduction=EmbeddingReduction.MEAN_NORMALIZED_DIST, concept_model_args=None, train_val_args=None, data_args=None, after_layer_hook=None)[source]

Bases: object

Handle for conducting a concept embedding analysis. Saves the analysis settings and can run a complete analysis.

The core methods are:

analysis(): plain analysis (collect \(\text{cross_val_runs}\cdot\text{num_val_splits}\) embeddings for each layer in :py:attr`layer_infos`)
best_embedding(): aggregate embeddings of an analysis per layer, then choose best one
best_embedding_with_logging(): combination of the latter two with automatic logging and result saving

Public Data Attributes:

`num_runs`	The total number of runs that are conducted per layer.
`settings`	Settings dict to reproduce instance.

Public Methods:

`best_embedding`([analysis_results])	Conduct an analysis and from results derive the best embedding.
`analysis`()	Conduct a concept embedding analysis.
`analysis_for_layer`(layer_id)	Get a concept embedding of the given concept in the given layer.
`concept_model_handle`([c_model, emb, layer_id])	Train and eval handle for the given concept model.
`concept_model_for_layer`(layer_id)	Return a concept model for the given layer ID.
`concept_model_from_embedding`(emb)	Get concept model from embedding for training and eval.
`data_for_concept_model`([c_model, layer_id])	Get the concept model data for this instance.
`evaluate_embedding`(embedding[, log_prefix])	Evaluate the embedding on its concept test data.
`embedding_reduction`(results_per_run)	Aggregate the embeddings collected in `results_per_run` to a best one.
`fill_train_data_infos`()	Collect layer-wise information about the corresponding concept model.
`best_embedding_with_logging`(concept_exp_root)	Conduct an analysis, collect mean and best embeddings, and save and log all results.

Special Methods:

`__init__`(concept, model[, layer_infos, ...])	Init.
`__repr__`()	Return repr(self).

Parameters

concept (Concept) –
model (Module) –
layer_infos (Union[Dict[str, Dict[str, Any]], Sequence[str]]) –
cross_val_runs (int) –
num_val_splits (int) –
emb_reduction (EmbeddingReduction) –
concept_model_args (Dict[str, Any]) –
train_val_args (Dict[str, Any]) –
data_args (Dict[str, Any]) –
after_layer_hook (Callable[[AnalysisResult, ConceptDetection2DTrainTestHandle], Any]) –

__init__(concept, model, layer_infos=None, cross_val_runs=1, num_val_splits=5, emb_reduction=EmbeddingReduction.MEAN_NORMALIZED_DIST, concept_model_args=None, train_val_args=None, data_args=None, after_layer_hook=None)[source]

Init.

Parameters

concept (Concept) – concept to find the embedding of
model (Module) – the DNN
layer_infos (Optional[Union[Dict[str, Dict[str, Any]], Sequence[str]]]) –
information about the layers in which to look for the best concept embedding; it may be given either as sequence of layer IDs or as dict where the indices are the layer keys in the model’s torch.nn.Module.named_modules() dict; used keys:
- kernel_size: fixed kernel size to use for this layer (overrides value from concept_model_args)
- lr: learning rate to use
num_val_splits (int) – the number of validation splits to use for each cross-validation run
cross_val_runs (int) – for a layer, several concept models are trained in different runs; the runs differ by model initialization, and the validation data split; cross_val_runs is the number of cross-validation runs, i.e. collections of runs with num_val_splits distinct validation sets each
emb_reduction (EmbeddingReduction) – aggregation function to reduce list of embeddings to one
concept_model_args (Optional[Dict[str, Any]]) – dict with arguments for the concept model initialization
train_val_args (Optional[Dict[str, Any]]) – any further arguments to initialize the concept model handle; a loss and a metric are added by default
data_args (Optional[Dict[str, Any]]) – any further arguments to initialize the training and eval data tuple using data_for_concept_model()
after_layer_hook (Optional[Callable[[AnalysisResult, ConceptDetection2DTrainTestHandle], Any]]) – see after_layer_hook

__repr__()[source]: Return repr(self).

analysis()[source]

Conduct a concept embedding analysis.

For each layer in layer_infos:

train cross_val_runs x num_val_splits concept models,
collect their evaluation results,
convert them to embeddings.

Returns: a analysis result object holding a dictionary of {layer_id: {run: (embedding, pandas.Series with {pre_: metric_val}}}
Return type: AnalysisResult

analysis_for_layer(layer_id)[source]

Get a concept embedding of the given concept in the given layer. A number of cross_val_runs cross validation runs is conducted with each num_val_splits non-intersecting splits for the validation data. In case num_val_splits is 1, just cross_val_runs a normal training runs are conducted.

After the analysis is completed, after_layer_hook is called.

Parameters: layer_id (str) – ID of the layer to find embedding in; key in layer_infos
Returns: an analysis result object holding only information on this layer
Return type: AnalysisResult

best_embedding(analysis_results=None)[source]

Conduct an analysis and from results derive the best embedding.

Parameters: analysis_results (Optional[Dict[str, Dict[int, Tuple[ConceptEmbedding, Series]]]]) – optionally the results of a previously run analysis; defaults to running a new analysis via analysis()
Returns: the determined best embedding of all layers analysed
Return type: ConceptEmbedding

best_embedding_with_logging(concept_exp_root, logger=None, file_logging_formatter=None, log_file='log.txt', img_fp_templ='{}.png', visualization_transform=None)[source]

Conduct an analysis, collect mean and best embeddings, and save and log all results.

Saved results

the embedding of each layer and run as .pt file; for format see hybrid_learning.concepts.models.embeddings.ConceptEmbedding.save(); load with hybrid_learning.concepts.models.embeddings.ConceptEmbedding.load()
the aggregated (best) embedding for each layer (see above)
the final best embedding amongst all layers (chosen from above best embeddings; see above)
statistics of the runs for each layer incl. evaluation results and infos on final embedding obtained by each run; for format see save(); load with load();
statistics for the aggregated (best) embeddings; for format see save();

Saved visualizations

visualization of the training data
visualization of the final best embedding on some test data samples
visualization of the best embedding and each embedding in its layer for comparison (the best embedding is a kind of mean of the embeddings from its layer)
visualization of the aggregated embeddings of each layer for comparison

Parameters

concept_exp_root (str) – the root directory in which to save results for this part
logger (Optional[Logger]) – the logger to use for file logging; defaults to the module level logger; for the analysis, the logging level is set to logging.INFO
file_logging_formatter (Optional[Formatter]) – if given, the formatter for the file logging
log_file (str) – the path to the logfile to use relative to concept_exp_root
img_fp_templ (Optional[str]) – template for the path of image files relative to concept_exp_root; must include one '{}' formatting variable
visualization_transform (Optional[Callable[[Any, Any], Tuple[Tensor, Tensor]]]) – a transformation applied to the tuple of concept model output and ground truth mask before visualization as mask

Returns

the found best embedding for that part

Return type

ConceptEmbedding

static best_layer_from_stats(results_per_layer)[source]

From the embedding quality results per layer, select the best layer. For segmentation concepts, select by set IoU.

Parameters: results_per_layer (BestEmbeddingResult) – a best embedding result object
Returns: layer ID with best stats
Return type: str

concept_model_for_layer(layer_id)[source]

Return a concept model for the given layer ID.

Parameters: layer_id – ID of the layer the concept model should be attached to; key in layer_infos
Returns: concept model for concept attached to given layer in model

concept_model_from_embedding(emb)[source]

Get concept model from embedding for training and eval.

Parameters: emb (Union[ConceptEmbedding, Sequence[ConceptEmbedding]]) –
Return type: ConceptDetectionModel2D

concept_model_handle(c_model=None, emb=None, layer_id=None)[source]

Train and eval handle for the given concept model. The concept model to handle can either be specified directly or is created from an embedding or from a given layer_id.

Parameters

c_model (Optional[ConceptDetectionModel2D]) – the concept model to provide a handle for
emb (Optional[Union[ConceptEmbedding, Sequence[ConceptEmbedding]]]) – if c_model is not given, it is initialized using concept_model_from_embedding() on emb
layer_id (Optional[str]) – if c_model and emb is not given, it is initialized using concept_model_for_layer() on layer_id

Returns

a handle for the specified or created concept model

Return type

ConceptDetection2DTrainTestHandle

data_for_concept_model(c_model=None, layer_id=None)[source]

Get the concept model data for this instance.

Parameters

c_model (Optional[ConceptDetectionModel2D]) –
layer_id (Optional[str]) –

Return type

DataTriple

embedding_reduction(results_per_run)[source]

Aggregate the embeddings collected in results_per_run to a best one. This is a wrapper with standard deviation and stats collection and logging around a call to emb_reduction().

Parameters

results_per_run (AnalysisResult) – analysis result object as returned by analysis()

Returns

a best embedding result object holding one tuple entry of

an aggregated (“mean”) embedding for the concept and the layer,
the standard deviation values of the normal vectors,
the stats for the chosen “mean” embedding

Return type

BestEmbeddingResult

evaluate_embedding(embedding, log_prefix=None)[source]

Evaluate the embedding on its concept test data.

Parameters

embedding (Union[ConceptEmbedding, Sequence[ConceptEmbedding]]) –
log_prefix (Optional[str]) –

fill_train_data_infos()[source]

Collect layer-wise information about the corresponding concept model. The results are stored into layer_infos and returned as pandas.DataFrame.

Return type: DataFrame

after_layer_hook: Callable[[AnalysisResult, ConceptDetection2DTrainTestHandle], Any]: Callable that is called after each layer analysis with the analysis result and the used training handle as arguments. Can e.g. be used to clear dataset caches or store the result.

concept: Concept: The concept to find the embedding for.

concept_model_args: Dict[str, Any]: Any arguments for initializing a new concept model.

cross_val_runs: int: The number of cross-validation runs to conduct for each layer. A cross-validation run consists of num_val_splits training runs with distinct validation sets. The resulting embeddings of all runs of all cross-validation runs are then used to obtain the layer’s best concept embedding.

data_args: Dict[str, Any]: Any arguments except for the concept model specifiers to the concept model data initializer. See data_for_concept_model() for details.

emb_reduction: EmbeddingReduction: Aggregation function to reduce a list of embeddings from several runs to one.

layer_infos: Dict[str, Dict[str, Any]]: Information about the layers in which to look for the best concept embedding; the indices are the layer keys in the model’s torch.nn.Module.named_modules() dict

model: torch.nn.modules.module.Module: The model in which to find the embedding.

property num_runs: int: The total number of runs that are conducted per layer.

num_val_splits: int: The number of validation splits per cross-validation run. If set to 1, no cross-validation is conducted but simply a number of cross_val_runs training runs. See analysis_for_layer().

property settings: Dict[str, Any]: Settings dict to reproduce instance.

train_val_args: Dict[str, Any]: Any training and evaluation arguments for the concept model initialization.