MedFuncta: Modality-Agnostic Representations Based on Efficient Neural Fields

arXiv preprint

Paul Friedrich, Florentin Bieder, Philippe C. Cattin

University of Basel, Department of Biomedical Engineering

(Left) The proposed network architecture with shared weights \(\theta\) and signal specific parameters \(\phi^{(i)}\) that modulate the base network. (Right) The proposed meta-learning approach to obtain shared weights \(\theta^*\) that allow to fit signal specific parameters \(\phi^{(1, ..., N)}\) in few update steps.

Abstract

Recent research in medical image analysis with deep learning almost exclusively focuses on grid- or voxel-based data representations. We challenge this common choice by introducing MedFuncta, a modality-agnostic continuous data representation based on neural fields. We demonstrate how to scale neural fields from single instances to large datasets by exploiting redundancy in medical signals and by applying an efficient meta-learning approach with a context reduction scheme. We further address the spectral bias in commonly used SIREN activations, by introducing an \(\omega_0\)-schedule, improving reconstruction quality and convergence speed. We validate our proposed approach on a large variety of medical signals of different dimensions and modalities (1D: ECG; 2D: Chest X-ray, Retinal OCT, Fundus Camera, Dermatoscope, Colon Histopathology, Cell Microscopy; 3D: Brain MRI, Lung CT) and successfully demonstrate that we can solve relevant downstream tasks on these representations. We additionally release a large-scale dataset of > 550 k annotated neural fields to promote research in this direction.

General Idea

It is a common choice to represent data on discretized grids, e.g., to represent an image as a grid of pixels. While this data representation is widely explored, it poorly scales with grid resolution and ignores the often continuous nature of the underlying signal. Recent research has shown that neural fields (NFs) provide an interesting, continuous alternative to represent different kinds of data modalities like sound, images, shapes, or 3D scenes, by treating data as a neural function that takes a spatial position (e.g., a pixel coordinate) as input and outputs the appropriate measurement (e.g., an image intensity value). This work investigates how to find meaningful functional representations of medical data, allowing relevant downstream tasks to be solved on this compressed, modality-agnostic representation rather than the original signal, mitigating the need to design modality-specific networks.

Difference between explicitly defined discrete representations on the left and continuous Neural Fields that define a signal (e.g., an image) as a neural function \(f_\theta\) on the right.

Data as Functions

We argue that most signals, especially in medicine, contain large amounts of redundant information or structure that we can learn over an entire set of signals. We therefore define a neural network \(f_{\theta,\phi^{(i)}}:\mathbb{R}^C\rightarrow\mathbb{R}^D\) with shared network parameters \(\theta\) that represents this redundant information and additional signal specific parameters \(\phi^{(i)}\in\mathbb{R}^{P}\) that condition the base network to represent a specific signal \(s_i\). The proposed network can handle data of different dimensionalities (from 1D ECG data to 3D MRI), by simply changing input and/or output dimensions for the specific data type. The representation of a single datapoint, namely the signal-specific parameters \(\phi^{(i)}\), always remains a single 1D vector.

Proposed framework with shared model parameters \(\theta\) that parameterize a shared network, and signal-specific parameter \(\phi^{(i)}\) that condition this network to represent a single signal.

Meta-Learning Shared Model Parameters (with Context Reduction)

To efficiently create a set of NFs, we aim to meta-learn the shared parameters \(\theta\) so that we can fit a signal \(s_i\) by only optimizing \(\phi^{(i)}\) for very few update steps. We follow a CAVIA meta-learning approach. Before we take a single meta-update step with respect to the shared parameters \(\theta\), we update \(\phi\) for \(G\) inner-loop update steps. Since performing a single meta-update step requires backpropagating through the entire inner-loop optimization, the computational graph must be retained in GPU memory to compute the required second-order gradients. This is a resource-intensive task that does not scale well to high-dimensional signals. While first-order approximations or auto-decoder training approaches that do not rely on second-order optimization exist, recent research has shown that this results in severe performance drops or unstable training.

(Left) The proposed meta-learning framework with context reduction. (Right) The test-time optimization process after meta-learning.

To circumvent this issue, we present a context-reduction approach that reduces the amount of pixel-value pairs used to optimize \(\phi\) in the inner-loop. We randomly sample \(\gamma|\mathcal{C}|\) coordinate-value pairs from the full context set \(\mathcal{C}\). This significantly reduces the computational burden, while resulting in marginal perfromance drops.

Test-Time Adaptation

Given the meta-learned model parameters \(\theta^{*}\), we fit a Neural Field to each signal \(s_1, ..., s_N\). We start with initializing the signal-specific parameters \(\phi^{(i)}:=\mathbf{0}\) and optimize \(\phi^{(i)}\) for \(H\) steps. As no second-order optimization is required at test-time, we can make use of the full context set. A set of NFs representing the signals \(s_1, ..., s_N\) is therefore defined by the network architecture, the shared model parameters \(\theta^{*}\), and the signal-specific parameters \(\phi^{(1)}, ..., \phi^{(N)}\).

Learning on this Representation

To demonstarte that our obtained representation is meaningful, we perform classifcation expereimnts on our MedFuncta representation and compare it to commonly used classifiers on the original data. We apply a very simple MLP classifier with ReLU and dropout to our representation and find that this simple classifier generally perfroms well, indicating that the representation contains meaningful information about the underlying data.

Reconstruction Results

BibTeX

      
@article{friedrich2025medfuncta,
         title={MedFuncta: Modality-Agnostic Representations Based on Efficient Neural Fields},
         author={Friedrich, Paul and Bieder, Florentin and Cattin, Philippe C},
         journal={arXiv preprint arXiv:2502.14401},
         year={2025}
        }

Acknowledgements

This work was financially supported by the Werner Siemens Foundation through the MIRACLE II project.