Loading your Data

fusilli facilitates fusion of tabular data with tabular data or tabular data with images.

Data Format Requirements

Your data must adhere to specific formats for fusilli to read it correctly with the fusilli.data.prepare_fusion_data() function.

The paths to the data source files must be in a dictionary before being passed to the fusilli.data.prepare_fusion_data() function.

data_paths = {
    "tabular1": "path/to/tabular1_data.csv",
    "tabular2": "path/to/tabular2_data.csv",
    "image": "path/to/image_data.pt",
}

Warning

If you are not using a particular data source, set the value to "".

For example, if you are not using tabular2, set tabular2 in the dictionary to "".

Tabular and Tabular Data

All tabular data must be in CSV format.

Columns named ID and prediction_label are required:

ID: Unique identifiers for each row.
prediction_label: Labels (integers for classification or floats for regression).

Example of loading two tabular modalities:

from fusilli.data import prepare_fusion_data

data_paths = {
    "tabular1": "path/to/tabular1_data.csv",
    "tabular2": "path/to/tabular2_data.csv",
    "image": "",
}

data_module = prepare_fusion_data(prediction_task=...,
                                  fusion_model=some_example_model,
                                  data_paths=data_paths,
                                  output_paths=...)

Tabular and Image Data

Tabular data should follow the format specified above. Image data should be in a .pt file format with dimensions (num_samples, num_channels, height, width).

For example, for 100 2D 28x28 grey-scale images, my images.pt file would have the dimensions (100, 1, 28, 28) when I use torch.load().

For 100 3D 32x32x32 RGB images, my images.pt file would have the dimensions (100, 3, 32, 32, 32) when I use torch.load().

Example of loading tabular and image data:

from fusilli.data import prepare_fusion_data

data_paths = {
    "tabular1": "path/to/tabular1_data.csv",
    "tabular2": "",
    "image": "path/to/image_data.pt",
}

data_module = prepare_fusion_data(prediction_task=...,
                                  fusion_model=some_example_model,
                                  data_paths=data_paths,
                                  output_paths=...)

Downsampling Images

To downsample images before model input, use the image_downsample_size parameter in the fusilli.data.prepare_fusion_data() function.

Example of downsampling 2D images to 16x16:

data_module = prepare_fusion_data(prediction_task=...,
                                  fusion_model=some_example_model,
                                  data_paths=data_paths,
                                  output_paths=...,
                                  image_downsample_size=(16, 16))

Incorporating External Test Data

For evaluating models with external test data:

Provide paths to test data sources in another dictionary like data_paths with the same keys tabular1, tabular2, and image.
Use the same data format as the training data.

Calling the evaluation figures functions with the method from_new_data will evaluate the model on the external test data and plot the results.

Example of training and evaluating a model with external test data:

from fusilli.data import prepare_fusion_data
from fusilli.train import train_and_save_models
from fusilli.eval import RealsVsPreds

data_paths = {
    "tabular1": "path/to/tabular1_data.csv",
    "tabular2": "path/to/tabular2_data.csv",
    "image": "path/to/image_data.pt",
}

external_test_data_paths = {
    "tabular1": "path/to/tabular1_test_data.csv",
    "tabular2": "path/to/tabular2_test_data.csv",
    "image": "path/to/image_test_data.pt",
}

# Using the training data
data_module = prepare_fusion_data(prediction_task=...,
                                  fusion_model=some_example_model,
                                  data_paths=data_paths,
                                  output_paths=...)

# Train the model on the training data
trained_model= train_and_save_models(data_module=data_module,
                                    fusion_model=some_example_model)

# Evaluate the model on the external test data
RealsVsPreds.from_new_data(trained_model, output_paths=..., test_data_paths=external_test_data_paths)