MuData specification [RFC]#

Building on top of the AnnData spec, this document provides details on the MuData on-disk format. For user-facing features, please see this document.

>>> import h5py
>>> f = h5py.File("citeseq.h5mu")
>>> list(f.keys())
['mod', 'obs', 'obsm', 'obsmap', 'uns', 'var', 'varm', 'varmap']

.mod#

Modalities are stored in a .mod group of the .h5mu file in the alphabetical order. To preserve the order of the modalities, there is an attribute "mod-order" that lists the modalities in their respective order. If some modalities are missing from that attribute, the attribute is to be ignored.

>>> dict(f["mod-order"])
{'mod-order': array(['rna', 'protein'], dtype=object)}

.obsmap and .varmap#

While in practice MuData relies on .obs_names and .var_names to collate global observations and variables, it also allows to disambiguate between items with the same name using integer maps. For example, global observations will have non-zero integer values in .obsmap["rna"] if they are present in the "rna" modality. If an observation or a variable is missing from a modality, it will correspond to a 0 value.