Dataset module

Dataset Functions

dataset.get_dataset_class(base_dir, dataset)[source][source]

Returns the dataset object without loading the data

Parameters:
  • base_dir (str) – Functional Fusion base directory

  • dataset (str) – _description_

Returns:

my_dataset (Dataset) – Dataset object

dataset.get_dataset(base_dir, dataset, atlas='SUIT3', sess='all', subj=None, type=None, ext=None, exclude_subjects=True)[source][source]

get_dataset tensor and data set object

Parameters:
  • base_dir (str) – Basis directory for the Functional Fusion datastructure

  • dataset (str) – Data set indicator

  • atlas (str) – Atlas indicator. Defaults to ‘SUIT3’.

  • sess (str or list) – Sessions. Defaults to ‘all’.

  • subj (ndarray, str, or list) – Subject numbers /names to get [None = all]

  • type (str) – ‘CondHalf’,’CondRun’, etc….

  • ext (str) – added qualifier (smoothing, etc.) default None

  • exclude_subjects (bool) – If True, excludes subjects that have been specified in the exclude column of the participants.tsv file.

Returns:
  • data (nd.array) – nsubj x ncond x nvox data tensor

  • info (pd.DataFrame) – Dataframe with info about the data

  • my_dataset (DataSet) – Dataset object

dataset.prewhiten_data(data)[source][source]

prewhitens a list of data matrices. It assumes that the last row of each data matrix is the ResMS-value Returns a list of data matrices that is one shorter

Parameters:

data (list of ndarrays) – List of data arrays

dataset.agg_data(info, by, over, subset=None)[source][source]

Aggregates data over rows (condition) safely by sorting them by the fields in “by” while integrating out “over”. Adds a n_rep field to count how many instances are of each Returns condensed data frame + Contrast matrix.

Parameters:
  • info (DataFrame) – Original DataFrame

  • by (list) – Fields that define the index of the new data

  • over (list) – Fields to ignore / integrate over. All other fields will be pulled through.

  • subset (bool array) – If given, ignores certain rows from the original data frame

Return

data_info (DataFrame): Reduced data frame C (ndarray): Indicator matrix defining the mapping from full to reduced

Example

data,info,mdtb= ds.get_data(‘MDTB’,’MNISymDentate1’,ses_id=’ses-s1’,type=’CondRun’) cinfo,C = ds.agg_data(info,[‘cond_num_uni’],[‘run’,’half’,’reg_num’,’names’]) cdata = np.linalg.pinv(C) @ data

dataset.agg_parcels(data, label_vec, fcn=<function nanmean>)[source][source]

Aggregates data over colums to condense to parcels

Parameters:
  • data (ndarray) – Either 2d or 3d data structure, P has to be the last dimension

  • labels (ndarray) – 1d-array that gives the labels (P-vector)

  • fcn (function) – Function to use to aggregate over these

Returns:
  • aggdata (ndarray) – Aggregated either 2d or 3d data structure

  • labels (ndarray) – Region number corresponding to each “column”

dataset.optimal_contrast(data, C, X, reg_in=None)[source][source]

Recombines betas from a GLM into an optimal new contrast, taking into account a design matrix For mathematical background and motivation, see: :param data: List of N x P_i arrays of beta estimates of the original GLM :type data: list of ndarrays :param C: Contrast matrix (N x Q) going from the original GLM to the new GLM :type C: ndarray :param X: Original (T x Nx) design matrix used in estimation of the data.

Nx could be longer than N by regressors of no interest

Parameters:

reg_in (ndarray) – Contrast of interest: Logical vector indicating which rows of C we will put in the matrix (defaults to all)

dataset.reliability_maps(base_dir, dataset_name, atlas='MNISymC3', type='CondHalf', subtract_mean=True, voxel_wise=True, subject_wise=False)[source][source]

Calculates the average within subject reliability maps across sessions for a single dataset

Parameters:
  • base_dir (str / path) – Base directory

  • dataset_name (str) – Name of data set

  • atlas (str) – _description_. Defaults to ‘MNISymC3’.

  • subtract_mean (bool) – Remove the mean per voxel before correlation calc?

Returns:

_type_ – _description_

General dataset classes

class dataset.DataSet(base_dir)[source][source]

DataSet class: Implements the interface for each of the data set Note that the actual preprocessing and glm estimate do not have to be performed with functionality provided by this class. The class is just a instrument to present the user with a uniform interface of how to get subject info

Parameters:

base_dir (str) – base directory for dataset

condense_data(data, info, type='CondHalf', participant_id=None, ses_id=None, subset=None)[source][source]

Condense the data across the measures to a certain level If a design matrix file exisits, it is used to combine betas optimally

‘CondHalf’: Conditions with seperate estimates for first and second half of experiment (Default) ‘CondRun’: Conditions with seperate estimates per run. ‘CondAll’: Conditions with a single estimate averaging over all runs. ‘TaskHalf’: Task with seperate estimates for first and second half of experiment ‘TaskRun’: Task with seperate estimates per run. ‘TaskAll’: Task with a single estimate averaging over all runs.

if dataset.subtract_baseline is True, the baseline is subtracted from the data.

Parameters:
  • data (ndarray) – List of extracted datasets

  • info (DataFrame) – Data Frame with description of data - row-wise

  • type (str) – Type of extraction:

  • participant_id (str) – ID of participant

  • ses_id (str) – Name of session

  • subset (bool array) – If given, ignores certain rows from the

Returns:
  • Y (list of np.ndarray) – A list (len = numatlas) with N x P_i numpy array of prewhitened data

  • T (pd.DataFrame) – A data frame with information about the N numbers provided

extract_all(ses_id='ses-s1', type='CondHalf', atlas='SUIT3', smooth=None, interpolation=1, subj='all', exclude_subjects=True)[source][source]

Extracts data in Volumetric space from a dataset in which the data is stored in Native space. Saves the results as CIFTI files in the data directory.

Parameters:
  • ses_id (str) – Session. Defaults to ‘ses-s1’.

  • type (str) – Type for condense_data. Defaults to ‘CondHalf’.

  • atlas (str) – Short atlas string. Defaults to ‘SUIT3’.

  • smooth (float) – Smoothing kernel. Defaults to 2.0.

  • subj (list / str) – List of Subject numbers to get use. Default = ‘all’

  • exclude_subjects (bool) – If True, excludes subjects that have been specified in the exclude column of the participants.tsv file.

get_atlasmaps(atlas, sub, ses_id, smooth=None, interpolation=1)[source][source]
This function generates atlas map for the data of a specific subject into a specific atlas space. The general DataSet.get_atlasmaps defines atlas maps for different spaces
  • SUIT: Using individual normalization from source space.

  • MNI152NLin2009cSymC: Via indivual SUIT normalization + group

  • MNI152NLin6AsymC: Via indivual SUIT normalization + group

  • MNI152Lin2009cSym: Via individual MNI normalization

  • MNI152NLin6Asym: Via individual MNI normalization

fs32k: Via individual pial and white surfaces (need to be in source space) Other dataset classes will overwrite and extend this function.

Parameters:
  • atlas (FunctionFusion.Atlas) – Functional Fusion atlas object

  • sub (str) – Subject_id for the individual subject

  • ses_id (str) – Session_id for the individual subject if atlasmap is session dependent. (defaults to none)

  • smooth (float) – Width of smoothing kernel for extraction. Defaults to None.

Returns:

AtlasMap – List of AtlasMap object - usually but for fs32k these are two

get_data(space='SUIT3', ses_id='ses-s1', type=None, subj=None, exclude_subjects=True, fields=None, ext=None, verbose=False)[source][source]

Loads all the CIFTI files in the data directory of a certain space / type and returns they content as a Numpy array

Parameters:
  • space (str) – Atlas space (Defaults to ‘SUIT3’).

  • ses_id (str) – Session ID (Defaults to ‘ses-s1’).

  • type (str) – Type of data (Defaults to ‘CondHalf’).

  • subj (ndarray, str, or list) – Subject numbers /names to get [None = all]

  • exclude_subjects (bool) – If True, excludes subjects that have been specified in the exclude column of the participants.tsv file.

  • fields (list) – Column names of info stucture that are returned these are also be tested to be equivalent across subjects

Returns:
  • Data (ndarray) – (n_subj, n_contrast, n_voxel) array of data

  • info (DataFrame) – Data frame with common descriptor

get_data_fnames(participant_id, session_id=None, type='Cond')[source][source]

Gets all raw data files :param participant_id: Subject :type participant_id: str :param session_id: Session ID. Defaults to None. :type session_id: str :param type: Type of data. Defaults to ‘Cond’ for task-based data. For rest data use ‘Tseries’. :type type: str

Returns:
  • fnames (list) – List of fnames, last one is the resMS image

  • T (pd.DataFrame) – Info structure for regressors (reginfo)

get_info(ses_id='ses-s1', type=None, subj=None, fields=None, exclude_subjects=True)[source][source]

Loads the tsv-files and returns the most complete info structure

Parameters:
  • ses_id (str) – Session ID (Defaults to ‘ses-s1’).

  • type (str) – Type of data (Defaults to ‘CondHalf’).

  • subj (ndarray) – Subject numbers to get - by default none (all)

  • fields (list) – Column names of info stucture that are returned these are also be tested to be equivalent across subjects

  • exclude_subjects (bool) – If True, excludes subjects that have been specified in the exclude column of the participants.tsv file.

Returns:
  • Data (ndarray) – (n_subj, n_contrast, n_voxel) array of data

  • info (DataFramw) – Data frame with common descriptor

get_participants(exclude_subjects=True)[source][source]

returns a data frame with all participants available in the study. The fields in the data frame correspond to the standard columns in participant.tsv. https://bids-specification.readthedocs.io/en/stable/03-modality-agnostic-files.html

Parameters:

exclude_subjects (bool) – If True, excludes subjects that have been specified in the exclude column of the participants.tsv file.

Returns:

Pinfo (pandas data frame) – participant information in standard bids format

group_average_data(ses_id=None, type=None, atlas='SUIT3', subj=None)[source][source]

Loads data from all subjects in for a certain session, type and atlas. Averages data across subjects. Saves the results as CIFTI files in the data/group directory.

Parameters:
  • ses_id (str, optional) – Session ID. If not provided, the first session ID in the dataset will be used.

  • type (str, optional) – Type of data. If not provided, the default type will be used.

  • atlas (str, optional) – Short atlas string. Defaults to ‘SUIT3’.

  • subj (list or None, optional) – Subset of subjects to include in the group average. If None, all subjects will be included.

class dataset.DataSetNative(base_dir)[source][source]

Data set with estimates data stored as nifti-files in Native space.

DataSet class: Implements the interface for each of the data set Note that the actual preprocessing and glm estimate do not have to be performed with functionality provided by this class. The class is just a instrument to present the user with a uniform interface of how to get subject info

Parameters:

base_dir (str) – base directory for dataset

get_atlasmaps(atlas, sub, ses_id, smooth=None, interpolation=1)[source][source]

This function generates atlas map for the data of a specific subject into a specific atlas space. For Native space, we are using indivdual maps for SUIT and surface space. Addtiionally, we defines deformations MNI space via the individual normalization into MNI152NLin6Asym (FSL, SPM Segement). Other MNI space (symmetric etc) are not implemented yet. :param atlas: Functional Fusion atlas object :type atlas: FunctionFusion.Atlas :param sub: Subject_id for the individual subject :type sub: str :param ses_id: Session_id for the individual subject if atlasmap is session dependent. (defaults to none) :type ses_id: str :param smooth: Width of smoothing kernel for extraction. Defaults to None. :type smooth: float

Returns:

AtlasMap – Built AtlasMap object

class dataset.DataSetMNIVol(base_dir, space='MNI152NLin6Asym')[source][source]

Data set with estimates data stored as nifti-files in a standard group space. The exact MNI template should be indicated in the space-argument (‘MNI152NLin6Asym’,’MNI152N2009cAsym’,’MNI152N2009cSym’). The small deformations between the different MNI spaces are implemented when extracting the data.

Parameters:
  • base_dir (str) – basis directory

  • space (str) – Group Space in which data is stored (Defaults to ‘MNI152NLin6Asym’).

get_atlasmaps(atlas, sub, ses_id, smooth=None, interpolation=1)[source][source]

This function generates atlas map for the data stored in MNI space. For SUIT and surface space, it goes over deformations estimated on the individual anatomy. If atlas.space matches dataset.space, it uses no deformation, but a direct readout. For mismatching MNI space it tries to find the correct transformation file. :param atlas: Functional Fusion atlas object :type atlas: FunctionFusion.Atlas :param sub: Subject_id for the individual subject :type sub: str :param ses_id: Session_id for the individual subject if atlasmap is session dependent. (defaults to none) :type ses_id: str :param smooth: Width of smoothing kernel for extraction. Defaults to None. :type smooth: float

Returns:

AtlasMap – Built AtlasMap object

class dataset.DataSetCifti(base_dir)[source][source]

Data set that comes in HCP-format in already pre-extracted cifti files.

DataSet class: Implements the interface for each of the data set Note that the actual preprocessing and glm estimate do not have to be performed with functionality provided by this class. The class is just a instrument to present the user with a uniform interface of how to get subject info

Parameters:

base_dir (str) – base directory for dataset

extract_all(ses_id='ses-s1', type='CondHalf', atlas='SUIT3', exclude_subjects=True, interpolation=1, smooth=None)[source][source]

Extracts cerebellar data. Saves the results as CIFTI files in the data directory. :param ses_id: Session. Defaults to ‘ses-s1’. :type ses_id: str, optional :param type: Type - defined in ger_data. Defaults to ‘CondHalf’. :type type: str, optional :param atlas: Short atlas string. Defaults to ‘SUIT3’. :type atlas: str, optional :param exclude_subjects: If True, excludes subjects that have been specified

in the exclude column of the participants.tsv file.

get_data_fnames(participant_id, session_id=None)[source][source]

Gets all raw data files

Parameters:
  • participant_id (str) – Subject

  • session_id (str) – Session ID. Defaults to None.

Returns:
  • fnames (list) – List of fnames, last one is the resMS image

  • T (pd.DataFrame) – Info structure for regressors (reginfo)

Specific datasets

class dataset.DataSetMDTB(dir)[source][source]

DataSet class: Implements the interface for each of the data set Note that the actual preprocessing and glm estimate do not have to be performed with functionality provided by this class. The class is just a instrument to present the user with a uniform interface of how to get subject info

Parameters:

base_dir (str) – base directory for dataset