Quick-start tutorial

In this section, we present a practical quick-start tutorial for using CellMincer using a synthetic Optosynth dataset as an example. This dataset (2.4 GB) can be downloaded here.

Declaring a dataset manifest

CellMincer accepts .tif, .bin, .npy, and .npz formats as input, so for standardization, CellMincer expects a YAML file containing the basic specifications of the data. For our example, the manifest would look like this:

n_frames_per_segment: 1000
n_segments: 7
order: tyx
sampling_rate: 500
infer_active_t_range: true

The dataset in question contains 7000 frames split into 7 equal segments, each of which undergoes stimulation at a different intensity. The order parameter describes the dataset’s order of dimensions; in this case, the data when read as a tensor is structured as time x height x width, abbreviated as tyx. The sampling rate is in Hertz. If the data is structured into periods of activity and inactivity, infer_active_t_range should be set to True to limit the selection of frames over which features are computed. Furthermore, if the duration of stimulation over each segment is known, the parameters of such stimulation can be given under the stim field. A full description of these options can be found under Reference: Data manifest.

For .bin files, which do not directly encode shape, we would specify additional arguments for width and height.

Preprocessing

Preprocessing is performed using the command cellmincer preprocess. It takes as input a path to the dataset, its manifest, and a configuration YAML file. It outputs a directory containing the detrended data, precomputed global features, and other components used for training and inference.

Unlike the manifest, the preprocessing configuration does not generally need to be tailored to the data, barring certain imaging artifacts that need correction. A suitable configuration for our Optosynth data can be found here, and more information about preprocessing options can be found under Reference: Preprocessing configuration.

Note that this configuration assumes the use of GPU support; if unavailable, the device configuration option should be changed to cpu.

With our three files in our current working directory (we recommend using a fresh directory), we can generate our preprocessing output in the same location with this command:

(cellmincer) $ cellmincer preprocess \
               -i optosynth__1__20__50.tif \
               -o ./optosynth__1__20__50 \
               --manifest manifest.yaml \
               --config preprocess.yaml

Training

Note

This step will take a much longer time than the others. Consider skipping this step and downloading this pretrained model.

Training is conducted using cellmincer train. It takes as input one or more dataset directories (generated by preprocessing), and a configuration YAML file. It outputs a PyTorch Lightning checkpoint containing the final model state. Optionally, it can take arguments for choosing an existing model state for initialization, for resuming training from an intermediate checkpoint, and for increasing the number of GPUs. We recommend using this configuration for training a first model. More information about training options can be found under Reference: Training configuration.

We can use the following command to train on our single Optosynth dataset:

(cellmincer) $ cellmincer train \
               -i ./optosynth__1__20__50 \
               -o . \
               --config train.yaml

After each iteration, the current model state will be logged to the current working directory as last.ckpt. When our training is complete, this will be our trained model.

For consistency, let’s rename our trained model to optosynth.ckpt.

(cellmincer) $ mv last.ckpt optosynth.ckpt

Denoising

With a trained model, denoising is a relatively quick operation. Using the same processed dataset directory, we can use cellmincer denoise with a model checkpoint to denoise the data. Optionally, the .avi visualization can be disabled with the --no-avi flag (necessary if your system lacks a FFmpeg installation). If you are using the CellMincer docker image, FFmpeg is available. Additionally, you can customize the visualization by choosing a frame range to render over or by adjusting the scaling. See the Reference for more details.

(cellmincer) $ cellmincer denoise \
               -i ./optosynth__1__20__50 \
               -o . \
               --model optosynth.ckpt

This outputs two versions of the denoised data. The first is in the original scale, while the second is “detrended” and can more easily be visualized.