ConceptAnalysis
- class hybrid_learning.concepts.analysis.analysis_handle.ConceptAnalysis(concept, model, layer_infos=None, cross_val_runs=1, num_val_splits=5, emb_reduction=EmbeddingReduction.MEAN_NORMALIZED_DIST, concept_model_args=None, train_val_args=None, data_args=None, after_layer_hook=None)[source]
Bases:
object
Handle for conducting a concept embedding analysis. Saves the analysis settings and can run a complete analysis.
The core methods are:
analysis()
: plain analysis (collect \(\text{cross_val_runs}\cdot\text{num_val_splits}\) embeddings for each layer in :py:attr`layer_infos`)best_embedding()
: aggregate embeddings of an analysis per layer, then choose best onebest_embedding_with_logging()
: combination of the latter two with automatic logging and result saving
Public Data Attributes:
The total number of runs that are conducted per layer.
Settings dict to reproduce instance.
Public Methods:
best_embedding
([analysis_results])Conduct an analysis and from results derive the best embedding.
analysis
()Conduct a concept embedding analysis.
analysis_for_layer
(layer_id)Get a concept embedding of the given concept in the given layer.
concept_model_handle
([c_model, emb, layer_id])Train and eval handle for the given concept model.
concept_model_for_layer
(layer_id)Return a concept model for the given layer ID.
Get concept model from embedding for training and eval.
data_for_concept_model
([c_model, layer_id])Get the concept model data for this instance.
evaluate_embedding
(embedding[, log_prefix])Evaluate the embedding on its concept test data.
embedding_reduction
(results_per_run)Aggregate the embeddings collected in
results_per_run
to a best one.Collect layer-wise information about the corresponding concept model.
best_embedding_with_logging
(concept_exp_root)Conduct an analysis, collect mean and best embeddings, and save and log all results.
- Parameters
- __init__(concept, model, layer_infos=None, cross_val_runs=1, num_val_splits=5, emb_reduction=EmbeddingReduction.MEAN_NORMALIZED_DIST, concept_model_args=None, train_val_args=None, data_args=None, after_layer_hook=None)[source]
Init.
- Parameters
concept (Concept) – concept to find the embedding of
model (Module) – the DNN
layer_infos (Optional[Union[Dict[str, Dict[str, Any]], Sequence[str]]]) –
information about the layers in which to look for the best concept embedding; it may be given either as sequence of layer IDs or as dict where the indices are the layer keys in the model’s
torch.nn.Module.named_modules()
dict; used keys:kernel_size: fixed kernel size to use for this layer (overrides value from
concept_model_args
)lr: learning rate to use
num_val_splits (int) – the number of validation splits to use for each cross-validation run
cross_val_runs (int) – for a layer, several concept models are trained in different runs; the runs differ by model initialization, and the validation data split;
cross_val_runs
is the number of cross-validation runs, i.e. collections of runs with num_val_splits distinct validation sets eachemb_reduction (EmbeddingReduction) – aggregation function to reduce list of embeddings to one
concept_model_args (Optional[Dict[str, Any]]) – dict with arguments for the concept model initialization
train_val_args (Optional[Dict[str, Any]]) – any further arguments to initialize the concept model handle; a loss and a metric are added by default
data_args (Optional[Dict[str, Any]]) – any further arguments to initialize the training and eval data tuple using
data_for_concept_model()
after_layer_hook (Optional[Callable[[AnalysisResult, ConceptDetection2DTrainTestHandle], Any]]) – see
after_layer_hook
- analysis()[source]
Conduct a concept embedding analysis.
For each layer in
layer_infos
:train
cross_val_runs
xnum_val_splits
concept models,collect their evaluation results,
convert them to embeddings.
- Returns
a analysis result object holding a dictionary of
{layer_id: {run: (embedding, pandas.Series with {pre_: metric_val}}}
- Return type
- analysis_for_layer(layer_id)[source]
Get a concept embedding of the given concept in the given layer. A number of
cross_val_runs
cross validation runs is conducted with eachnum_val_splits
non-intersecting splits for the validation data. In casenum_val_splits
is 1, justcross_val_runs
a normal training runs are conducted.After the analysis is completed,
after_layer_hook
is called.- Parameters
layer_id (str) – ID of the layer to find embedding in; key in
layer_infos
- Returns
an analysis result object holding only information on this layer
- Return type
- best_embedding(analysis_results=None)[source]
Conduct an analysis and from results derive the best embedding.
- Parameters
analysis_results (Optional[Dict[str, Dict[int, Tuple[ConceptEmbedding, Series]]]]) – optionally the results of a previously run analysis; defaults to running a new analysis via
analysis()
- Returns
the determined best embedding of all layers analysed
- Return type
- best_embedding_with_logging(concept_exp_root, logger=None, file_logging_formatter=None, log_file='log.txt', img_fp_templ='{}.png', visualization_transform=None)[source]
Conduct an analysis, collect mean and best embeddings, and save and log all results.
Saved results
the embedding of each layer and run as .pt file; for format see
hybrid_learning.concepts.models.embeddings.ConceptEmbedding.save()
; load withhybrid_learning.concepts.models.embeddings.ConceptEmbedding.load()
the aggregated (best) embedding for each layer (see above)
the final best embedding amongst all layers (chosen from above best embeddings; see above)
statistics of the runs for each layer incl. evaluation results and infos on final embedding obtained by each run; for format see
save()
; load withload()
;statistics for the aggregated (best) embeddings; for format see
save()
;
Saved visualizations
visualization of the training data
visualization of the final best embedding on some test data samples
visualization of the best embedding and each embedding in its layer for comparison (the best embedding is a kind of mean of the embeddings from its layer)
visualization of the aggregated embeddings of each layer for comparison
- Parameters
concept_exp_root (str) – the root directory in which to save results for this part
logger (Optional[Logger]) – the logger to use for file logging; defaults to the module level logger; for the analysis, the logging level is set to
logging.INFO
file_logging_formatter (Optional[Formatter]) – if given, the formatter for the file logging
log_file (str) – the path to the logfile to use relative to
concept_exp_root
img_fp_templ (Optional[str]) – template for the path of image files relative to
concept_exp_root
; must include one'{}'
formatting variablevisualization_transform (Optional[Callable[[Any, Any], Tuple[Tensor, Tensor]]]) – a transformation applied to the tuple of concept model output and ground truth mask before visualization as mask
- Returns
the found best embedding for that part
- Return type
- static best_layer_from_stats(results_per_layer)[source]
From the embedding quality results per layer, select the best layer. For segmentation concepts, select by set IoU.
- Parameters
results_per_layer (BestEmbeddingResult) – a best embedding result object
- Returns
layer ID with best stats
- Return type
- concept_model_for_layer(layer_id)[source]
Return a concept model for the given layer ID.
- Parameters
layer_id – ID of the layer the concept model should be attached to; key in
layer_infos
- Returns
- concept_model_from_embedding(emb)[source]
Get concept model from embedding for training and eval.
- Parameters
emb (Union[ConceptEmbedding, Sequence[ConceptEmbedding]]) –
- Return type
- concept_model_handle(c_model=None, emb=None, layer_id=None)[source]
Train and eval handle for the given concept model. The concept model to handle can either be specified directly or is created from an embedding or from a given
layer_id
.- Parameters
c_model (Optional[ConceptDetectionModel2D]) – the concept model to provide a handle for
emb (Optional[Union[ConceptEmbedding, Sequence[ConceptEmbedding]]]) – if
c_model
is not given, it is initialized usingconcept_model_from_embedding()
onemb
layer_id (Optional[str]) – if c_model and emb is not given, it is initialized using
concept_model_for_layer()
onlayer_id
- Returns
a handle for the specified or created concept model
- Return type
- data_for_concept_model(c_model=None, layer_id=None)[source]
Get the concept model data for this instance.
- Parameters
c_model (Optional[ConceptDetectionModel2D]) –
- Return type
- embedding_reduction(results_per_run)[source]
Aggregate the embeddings collected in
results_per_run
to a best one. This is a wrapper with standard deviation and stats collection and logging around a call toemb_reduction()
.- Parameters
results_per_run (AnalysisResult) – analysis result object as returned by
analysis()
- Returns
a best embedding result object holding one tuple entry of
an aggregated (“mean”) embedding for the concept and the layer,
the standard deviation values of the normal vectors,
the stats for the chosen “mean” embedding
- Return type
- evaluate_embedding(embedding, log_prefix=None)[source]
Evaluate the embedding on its concept test data.
- Parameters
embedding (Union[ConceptEmbedding, Sequence[ConceptEmbedding]]) –
- fill_train_data_infos()[source]
Collect layer-wise information about the corresponding concept model. The results are stored into
layer_infos
and returned aspandas.DataFrame
.- Return type
DataFrame
- after_layer_hook: Callable[[AnalysisResult, ConceptDetection2DTrainTestHandle], Any]
Callable that is called after each layer analysis with the analysis result and the used training handle as arguments. Can e.g. be used to clear dataset caches or store the result.
- cross_val_runs: int
The number of cross-validation runs to conduct for each layer. A cross-validation run consists of
num_val_splits
training runs with distinct validation sets. The resulting embeddings of all runs of all cross-validation runs are then used to obtain the layer’s best concept embedding.
- data_args: Dict[str, Any]
Any arguments except for the concept model specifiers to the concept model data initializer. See
data_for_concept_model()
for details.
- emb_reduction: EmbeddingReduction
Aggregation function to reduce a list of embeddings from several runs to one.
- layer_infos: Dict[str, Dict[str, Any]]
Information about the layers in which to look for the best concept embedding; the indices are the layer keys in the model’s
torch.nn.Module.named_modules()
dict
- model: torch.nn.modules.module.Module
The model in which to find the embedding.
- num_val_splits: int
The number of validation splits per cross-validation run. If set to 1, no cross-validation is conducted but simply a number of
cross_val_runs
training runs. Seeanalysis_for_layer()
.