Data formats and data loading

Accessing the displayed data

The data currently represented in PIT can always be accessed and changed from the ipython console through pit.get_data() and pit.set_data().

We could, for example, replace the first slice of our data by random numbers:

# Get hold of the data
[1] data = pit.get_data()
# Get the shape
[2] x, y, z = data.shape
# Randomize the first slice
[3] data[:,:,0] = np.random.rand(x, y)
# Set the data again
[4] pit.set_data(data)

The set_data() function takes an additional argument axes, which should be a list/tuple/array of 3 arrays representing the x, y and z axis coordinates. The following example would rescale the x-axis by an order of magnitude and shift the y-axis by 50 units:

# Get the current axes
[1] axes = pit.axes
# Set modified axes coordinates with the set_data method
[2] pit.set_data(axes=[axes[0]*10, axes[1]+50, axes[2]])

As another example, this is how we could apply some Gaussian blurring to the data (requires scipy to be installed):

# Import the Gaussian filter from scipy
[1] from scipy.ndimage import gaussian_filter
# Get the current data
[2] data = pit.get_data()
# Set blurred data
[3] pit.set_data(gaussian_filter(data, 1))
# Play with different levels of blurring
[4] pit.set_data(gaussian_filter(data, 10))

If you have certain operations that you routinely carry out on your data, it is recommended to automate this process by writing a plugin.

Loading data from a file

The existing dataloaders found in the dataloading module can handle two types of input data formats:

  1. A binary file that has been created using python’s pickle module. The pickled object can be either a numpy array, or a dictionary or argparse.Namespace containing the keys data and axes, as described in Dataloader_Pickle.
  2. A plain text file containing values for x, y, z and the actual data in four columns as described in the documentation of Dataloader_3dtxt. Notice that there is an utility function (three_d_to_txt()) that can help you create such a text file in the correct format from existing data.

In the following we give an example tutorial for both cases.

Binary pickle files

Step 1: Load the data into python

We will assume our starting point to be a 3D data file of any format. It could be an Igor file, an HDF5 file, a matlab file, a plain text file but in a different format than what is required by Dataloader_3dtxt or anything else. In any case, you will have to find a way of loading that dataset into python first (For the examples listed above, the packages igor, h5py, scipy or numpy could be used, respectively).

In order to make the example more concrete and to allow for a step-by-step code-along experience, we will explain here how we prepared the MRI brain scan data that can be loaded by issuing mw.brain() on PIT’s console. But again, how exactly this step is handled depends on your starting point. The result of Step 1 should always be the same, though: Your data should be accessible as a numpy array in python.

OK, now let’s get started with the concrete example. We find and download our brain scan data of a meditating person here [1].

Once downloaded, we need a way of opening it. A quick Ecoisa search leads us to the library NiBabel, which seems to be able to open .nii files. Thus, we install that library (depending on your system and setup there may be different ways of doing this):

pip install nibabel

Now, following the instructions on the NiBabel webpage, we load the image data as a numpy array:

python
>>> import nibabel as nib
>>> img = nib.load('sub-01_T1w.nii.gz')
>>> my_data = img.get_fdata()

Note

You need to be in the directory where you placed the downloaded file (here sub-01_T1w.nii.gz) in order for this to work.

Note

Just to point it out once more, the details of this first step depend very much on your use case. It also does not matter whether you work in the live python interpreter like in the example or whether you wrap it all in a script. You’re fine as long as you have a way of getting your data into the form of a numpy array.

Step 2: Optionally change data arrangement

Now that we have our data in a numpy array, we are free to swap axes, cut off undesired parts or apply any processing we like to it. This, again, depends completely on your use case. Since in the example here we just want to see the data, we have nothing to do.

In case you find yourself wanting to do some rearrangements, here are a few functions that might be of interest: numpy.moveaxis(), numpy.transpose() and all basic numpy.ndarray operations, like slicing and indexing.

Step 3: Optionally create axis information

Skipping this step means that the data we end up loading into PIT will have axes that simply count the number of pixels (voxels) from 0 upwards. But we can assign more meaningful units to our axes, like in our example we could assign length units to the x, y and z axes. To do this, we have to create one 1D array for each axis and collect them in a list.

In our example, we found by inspecting the original data that 1 pixel corresponds to 0.85 mm along the x and y directions and 1.5 mm along the z direction. To create some reasonable axes, we could therefore do the following:

>>> import numpy as np
>>> nx, ny, nz = my_data.shape
>>> x_axis = np.arange(0, nx*0.85, 0.85)
>>> y_axis = np.arange(0, ny*0.85, 0.85)
>>> z_axis = np.arange(0, nz*1.5, 1.5)
>>> my_axes = [x_axis, y_axis, z_axis]

The three axes should of course have the lengths corresponding to the data dimensions.

Step 4: Pickle it!

Finally we can store our data in a format that can be efficiently read by PIT. Here, we have different options, depending on whether or not we want to provide axes information (step 3). In all three cases we make use of the convenience function dump(), which uses the pickle module to store any python object:

>>> from data_slicer.dataloading import dump
Option 1: no axes information

This is the easiest, you can just do:

>>> dump(my_data, 'brain.p')

This will create the file brain.p in your current working directory. If a file of that name already exists, it will ask you for confirmation. (Obviously you can pick a filename of your choice. It doesn’t even have to end in .p.)

Option 2: with axes information in a dictionary

In order to also store the axis information we created in step 3, we just construct a dictionary and pickle it:

>>> D = dict(data=my_data, axes=my_axes)
>>> dump(D, 'brain.p')

In this case it is important that the argument names data and axes are exactly like that. Other names will not work. As the only exception, an alternative method is possible if you provide the three axes separately, like this:

>>> D = dict(data=my_data, xaxis=x_axis, yaxis=y_axis, zaxis=z_axis)
>>> dump(D, 'brain.p')
Option 3: with axes information in a Namespace

This option is given for convenience and out of consistency with the data_slicer.dataloading.Dataloader objects. Whether you use options 2 or 3 is entirely up to your personal preference and shouldn’t make any difference. The idea is exactly the same, except that we create a argparse.Namespace instead of a dictionary:

>>> from argparse import Namespace
>>> D = Namespace(data=my_data, axes=my_axes)
>>> dump(D, 'brain.p')

Conclusion

And that’s it. We have now successfully converted a datafile into a PIT-readable format. Of course, if you have to do this kind of operation often, it would be a good idea to write a little script that does these steps for you. If you’re feeling confident, you could even create a plugin for the filetype(s) you need to use and make it available to other people. Or, if you’re lucky, somebody else has already done this and you can just use that plugin.

Plain text files

Working with plain text (ASCII) files is significantly slower and requires more disk space than other file formats, but it can be useful to have the data in a human-readable form. In order to create an plain text file in the correct format from some existing data, you will have to go through steps 1 to 3 exactly as in the description above. The only thing that changes is the final step, step 4.

Step 4 for plain text files

In this case, we can just use the function data_slicer.dataloading.three_d_to_txt():

>>> from data_slicer.dataloading import three_d_to_txt
>>> three_d_to_txt('brain.txt', my_data, axes=my_axes)

If you’ve skipped step 3, you can just leave out the axes argument. In case you’re typing along this tutorial, you will notice that the creation of this txt takes much longer than in the binary case - up to several minutes even.

Footnotes

[1]This data set is taken from the OpenNeuro database. Openneuro Accession Number: ds000108 Authored by: Wager, T.D., Davidson, M.L., Hughes, B.L., Lindquist, M.A., Ochsner, K.N. (2008). Prefrontal-subcortical pathways mediating successful emotion regulation. Neuron, 59(6):1037-50. doi: 10.1016/j.neuron.2008.09.006