fusilli.fusionmodels.tabularfusion.mcvae_model

This module implements the MCVAE (multi-channel variational autoencoder) model for fusing two types of tabular data.

Functions

mcvae_early_stopping_tol(patience,Β ...[,Β ...])

Simple early stopping function for the MCVAE model's training.

Classes

MCVAESubspaceMethod(datamodule[,Β k,Β ...])

Class for creating the MCVAE (multi-channel variational autoencoder) joint latent space.

MCVAE_tab(prediction_task,Β data_dims,Β ...)

This class implements a model that fuses the two types of tabular data using the MCVAE approach.

class MCVAESubspaceMethod(datamodule, k=None, max_epochs=5000, train_subspace=True)[source]

Bases: object

Class for creating the MCVAE (multi-channel variational autoencoder) joint latent space.

If you want to change the tolerance or patience for early stopping, you can do so by adding extra keyword arguments to the function prepare_fusion_data. For example:

β€œmcvae_patience”: value, β€œmcvae_tolerance”: value,

where value is the number of epochs for patience and the tolerance for tolerance.

datamodule

Datamodule object containing the data.

Type:

datamodule object

num_latent_dims

Number of latent dimensions.

Type:

int

fit_model

Mcvae object containing the fitted model.

Type:

Mcvae object

__init__(datamodule, k=None, max_epochs=5000, train_subspace=True)[source]
Parameters:
  • datamodule (datamodule object) – Datamodule object containing the data.

  • k (int, optional) – Number of latent dimensions, by default None

  • max_epochs (int, optional) – Maximum number of epochs, by default 5000

  • train_subspace (bool, optional) – Whether to train the subspace model, by default True.

check_params()[source]

Checks the parameters of the model.

Return type:

None

convert_to_latent(test_dataset)[source]

Converts the test dataset to the latent space.

Parameters:

test_dataset (list) – List containing the two types of tabular data.

Returns:

  • test_mean_latents (torch.Tensor) – Tensor containing the mean latents of the dataset.

  • labels (pd.DataFrame) – Dataframe containing the labels of the dataset.

  • [self.num_latent_dims, None, None] (list) – List containing the dimensions of the data.

get_latents(dataset)[source]

Gets the latent representations of the multimodal dataset. The two latent spaces are averaged to form the joint latent space.

Parameters:

dataset (list) – List containing the two types of tabular data.

Returns:

mean_latents – Array containing the mean latents of the dataset.

Return type:

np.array

load_ckpt(checkpoint_path)[source]

Loads the checkpoint.

Parameters:

checkpoint_path (list) – List containing the path to the checkpoint.

Return type:

None

train(train_dataset, val_dataset=None)[source]

Trains the model.

Parameters:
  • train_dataset (list) – List containing the two types of tabular data.

  • val_dataset (list, optional) – List containing the two types of tabular data, by default None

Returns:

  • mean_latents (torch.Tensor) – Tensor containing the mean latents of the dataset.

  • labels (pd.DataFrame) – Dataframe containing the labels of the dataset.

class MCVAE_tab(prediction_task, data_dims, multiclass_dimensions)[source]

Bases: ParentFusionModel, Module

This class implements a model that fuses the two types of tabular data using the MCVAE approach. MCVAE: multi-channel variational autoencoder.

The MCVAE creates a joint latent space of the two types of tabular data based off a joint latent prior and joint decoding.

References

Antelmi, L., Ayache, N., Robert, P., & Lorenzi, M. (2019). Sparse Multi-Channel Variational

Autoencoder for the Joint Analysis of Heterogeneous Data. Proceedings of the 36th International Conference on Machine Learning, 302–311. https://proceedings.mlr.press/v97/antelmi19a.html

subspace_method

Class of the subspace method: MCVAESubspaceMethod

Type:

class

latent_space_layers

Dictionary containing the layers of the 1st type of tabular data. Here the first type of tabular data is the joint latent space created in the mcvae_subspace_method class.

Type:

dict

fused_dim

Number of features of the fused layers. This is the flattened output size of the latent space layers.

Type:

int

fused_layers

Sequential layer containing the fused layers.

Type:

nn.Sequential

final_prediction

Sequential layer containing the final prediction layers.

Type:

nn.Sequential

__init__(prediction_task, data_dims, multiclass_dimensions)[source]
Parameters:
  • prediction_task (str) – Type of prediction to be performed.

  • data_dims (list) – List containing the dimensions of the data.

  • multiclass_dimensions (int) – Number of classes in the multiclass classification task.

calc_fused_layers()[source]

Calculates the fused layers of the model.

Return type:

None

forward(x)[source]

Forward pass of the model.

Parameters:

x (torch.Tensor) – torch.Tensor containing the input data: joint latent space of the two types of tabular data.

Returns:

out_pred – List containing the predictions.

Return type:

list

fusion_type = 'subspace'

Type of fusion.

Type:

str

method_name = 'MCVAE Tabular'

Name of the method.

Type:

str

modality_type = 'tabular_tabular'

Type of modality.

Type:

str

subspace_method

alias of MCVAESubspaceMethod

mcvae_early_stopping_tol(patience, tolerance, loss_logs, verbose=False)[source]

Simple early stopping function for the MCVAE model’s training.

Parameters:
  • patience (int) – Number of epochs to wait before stopping

  • tolerance (int) – Tolerance for loss

  • loss_logs (list) – List of loss logs

  • verbose (bool) – Whether to print out information

Returns:

i – Epoch to stop at

Return type:

int