Loading your Data๏
fusilli
facilitates fusion of tabular data with tabular data or tabular data with images.
Data Format Requirements๏
Your data must adhere to specific formats for fusilli
to read it correctly with the fusilli.data.prepare_fusion_data()
function.
The paths to the data source files must be in a dictionary before being passed to the fusilli.data.prepare_fusion_data()
function.
data_paths = {
"tabular1": "path/to/tabular1_data.csv",
"tabular2": "path/to/tabular2_data.csv",
"image": "path/to/image_data.pt",
}
Warning
If you are not using a particular data source, set the value to ""
.
For example, if you are not using tabular2
, set tabular2
in the dictionary to ""
.
Tabular and Tabular Data๏
All tabular data must be in CSV format.
Columns named ID
and prediction_label
are required:
ID
: Unique identifiers for each row.prediction_label
: Labels (integers for classification or floats for regression).
Example of loading two tabular modalities:
from fusilli.data import prepare_fusion_data
data_paths = {
"tabular1": "path/to/tabular1_data.csv",
"tabular2": "path/to/tabular2_data.csv",
"image": "",
}
data_module = prepare_fusion_data(prediction_task=...,
fusion_model=some_example_model,
data_paths=data_paths,
output_paths=...)
Tabular and Image Data๏
Tabular data should follow the format specified above. Image data should be in a .pt
file format with dimensions
(num_samples, num_channels, height, width)
.
For example, for 100 2D 28x28 grey-scale images, my images.pt file would have the dimensions (100, 1, 28, 28)
when I use torch.load()
.
For 100 3D 32x32x32 RGB images, my images.pt file would have the dimensions (100, 3, 32, 32, 32)
when I use torch.load()
.
Example of loading tabular and image data:
from fusilli.data import prepare_fusion_data
data_paths = {
"tabular1": "path/to/tabular1_data.csv",
"tabular2": "",
"image": "path/to/image_data.pt",
}
data_module = prepare_fusion_data(prediction_task=...,
fusion_model=some_example_model,
data_paths=data_paths,
output_paths=...)
Downsampling Images๏
To downsample images before model input, use the image_downsample_size
parameter in the fusilli.data.prepare_fusion_data()
function.
Example of downsampling 2D images to 16x16:
data_module = prepare_fusion_data(prediction_task=...,
fusion_model=some_example_model,
data_paths=data_paths,
output_paths=...,
image_downsample_size=(16, 16))
Incorporating External Test Data๏
For evaluating models with external test data:
Provide paths to test data sources in another dictionary like
data_paths
with the same keystabular1
,tabular2
, andimage
.Use the same data format as the training data.
Calling the evaluation figures functions with the method from_new_data
will evaluate the model on the external test data and plot the results.
Example of training and evaluating a model with external test data:
from fusilli.data import prepare_fusion_data
from fusilli.train import train_and_save_models
from fusilli.eval import RealsVsPreds
data_paths = {
"tabular1": "path/to/tabular1_data.csv",
"tabular2": "path/to/tabular2_data.csv",
"image": "path/to/image_data.pt",
}
external_test_data_paths = {
"tabular1": "path/to/tabular1_test_data.csv",
"tabular2": "path/to/tabular2_test_data.csv",
"image": "path/to/image_test_data.pt",
}
# Using the training data
data_module = prepare_fusion_data(prediction_task=...,
fusion_model=some_example_model,
data_paths=data_paths,
output_paths=...)
# Train the model on the training data
trained_model= train_and_save_models(data_module=data_module,
fusion_model=some_example_model)
# Evaluate the model on the external test data
RealsVsPreds.from_new_data(trained_model, output_paths=..., test_data_paths=external_test_data_paths)