flatten_cols

hybrid_learning.experimentation.exp_eval_common.flatten_cols(df, columns, keep_columns=())[source]

If items in an dataframe are iterables, flatten these. Flattening here means:

>>> df = pd.DataFrame({'a': {'idx1': (1,2), 'idx2': (3,)},
...                    'b': {'idx1': ('x', 'y'), 'idx2': ('z',)},
...                    'keep': {'idx1': 'k1', 'idx2': 'k2'}})
>>> flatten_cols(df, columns=['a', 'b'], keep_columns=['keep'])
   orig_idx  a  b  keep
0         0  1  x    k1
1         0  2  y    k1
2         1  3  z    k2

Parameters

df (DataFrame) – the dataframe to create a modified version of
columns (Sequence[str]) – the columns to include into the new frame with values flattened
keep_columns (Sequence[str]) – spread entries from these columns from a row to the rows newly created from that row

Returns

a new dataframe new_df with columns ['orig_idx', *columns, *keep_columns] where for each index idx, col in columns, kcol in keep_columns: df.loc[idx, col][i] == new_df[new_df['orig_idx']==idx][col].iloc[i] and new_df.loc[idx, kcol] == df.loc[new_df.loc[idx, 'orig_idx'], kcol]

Return type

DataFrame