MuData nuances#

Open in Colab

Binder

This is the sharp bits page for mudata, which provides information on the nuances when working with MuData objects.

First, install and import mudata and other libraries.

[1]:
%pip install mudata
[2]:
import mudata
from mudata import MuData, AnnData
[3]:
import numpy as np
import pandas as pd

Prepare some simple AnnData objects:

[4]:
n, d1, d2, k = 1000, 100, 200, 10

np.random.seed(1)
z = np.random.normal(loc=np.arange(k), scale=np.arange(k)*2, size=(n,k))
w1 = np.random.normal(size=(d1,k))
w2 = np.random.normal(size=(d2,k))

mod1 = AnnData(X=np.dot(z, w1.T))
mod2 = AnnData(X=np.dot(z, w2.T))

Variable names#

NB:It is best to keep variable names unique across all the modalities. This will help to avoid ambiguity as well as performance of some functionality such as updating (see below).

MuData is designed with features (variables) being different in different modalities in mind. Hence their names should be unique and different between modalities. In other words, .var_names are checked for uniqueness across modalities.

This behaviour ensures all the functions are easy to reason about. For instance, if there is a var_name that is present in both modalities, what happens during plotting a joint embedding from .obsm coloured by this var_name is not strictly defined.

Nevertheless, MuData can accommodate modalities with duplicated .var_names. For the typical workflows, we recommend renaming them manually or calling .var_names_make_unique().

[5]:
mdata = MuData({"mod1": mod1, "mod2": mod2})
print(mdata.var_names)
mdata.var_names_make_unique()
print(mdata.var_names)
Index(['0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
       ...
       '190', '191', '192', '193', '194', '195', '196', '197', '198', '199'],
      dtype='object', length=300)
Index(['mod1:0', 'mod1:1', 'mod1:2', 'mod1:3', 'mod1:4', 'mod1:5', 'mod1:6',
       'mod1:7', 'mod1:8', 'mod1:9',
       ...
       'mod2:190', 'mod2:191', 'mod2:192', 'mod2:193', 'mod2:194', 'mod2:195',
       'mod2:196', 'mod2:197', 'mod2:198', 'mod2:199'],
      dtype='object', length=300)
/usr/local/opt/python@3.8/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/mudata/src/mudata/_core/mudata.py:869: UserWarning: Cannot join columns with the same name because var_names are intersecting.
  warnings.warn(
/usr/local/opt/python@3.8/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/mudata/src/mudata/_core/mudata.py:1478: UserWarning: Modality names will be prepended to var_names since there are identical var_names in different modalities.
  warnings.warn(

Variable names in AnnData objects#

In the example above it is worth pointing out that .var_names_make_unique() is an in-place operation, just as the same method is in anndata.

Hence original AnnData objects’ .var_names have also been modified:

[6]:
mdata["mod1"].var_names[:10]
[6]:
Index(['mod1:0', 'mod1:1', 'mod1:2', 'mod1:3', 'mod1:4', 'mod1:5', 'mod1:6',
       'mod1:7', 'mod1:8', 'mod1:9'],
      dtype='object')

Update#

NB:If individual modalities are changed, updating the MuData object containing it might be required.

Modalities in MuData objects are full-featured AnnData objects. Hence they can be operated individually, and their MuData parent will have to be updated to fetch this information.

NB: Starting from v0.3, mudata will be adopting a more flexible approach to metadata management: updating global index with .update() will become independent from managing columns, which can now be done with .pull_obs()/.pull_var() and .push_obs()/.push_var().

See more about annotations management in in the respective tutorial.

Filtering data#

In rare cases some observations (or variables) can be dropped from all the contained modalities:

[7]:
smaller_mdata = MuData({
    "mod1": mod1[:900].copy(),
    "mod2": mod2[:900].copy(),
})

While smaller_mdata includes mod1 and mod2 as its modalities, it currently does not know about this change:

[8]:
smaller_mdata
[8]:
MuData object with n_obs × n_vars = 900 × 300
  2 modalities
    mod1:   900 x 100
    mod2:   900 x 200

.update() method will fetch these updates:

[9]:
smaller_mdata.update()
smaller_mdata
[9]:
MuData object with n_obs × n_vars = 900 × 300
  2 modalities
    mod1:   900 x 100
    mod2:   900 x 200

Notice the global dimensions are now correctly reflected in the MuData object.

Observations annotations#

Consider the following example: a new column has been added to a modality-specific metadata table:

[10]:
mod1.obs["mod1_profiled"] = True

While mdata includes mod1 as its first modality, nothing has changed at the global level of the annotation:

[11]:
mdata.obs.columns
[11]:
Index([], dtype='object')

.update() method will only sync the obs_names:

[12]:
# default from v0.4
mudata.set_options(pull_on_update=False)
[12]:
<mudata._core.config.set_options at 0x3597fec00>
[13]:
mdata.update()
print(mdata.obs.columns)
Index([], dtype='object')

If we need the annotation at the global level, we can copy it from the all the underlying modalities:

[14]:
mdata.pull_obs()
print(mdata.obs.columns)
Index(['mod1:mod1_profiled'], dtype='object')
[15]:
del mdata.obs["mod1:mod1_profiled"]

As MuData objects are designed with shared observations by default, this annotation is automatically prefixed by the modality that originated this annotation.

There is however flexibility when it comes to using prefixes for observations annotations that are specific to individual modalities:

[16]:
mdata.pull_obs(prefix_unique=False)
print(mdata.obs.columns)
Index(['mod1_profiled'], dtype='object')

Variables#

On the other hand, for variables, the default consideration is that they are unique to their modalities. This allows us to merge annotations across modalities, when possible.

[17]:
mod1.var["assay"] = "A"
mod2.var["assay"] = "B"

# Will fetch these values
mdata.pull_var()
[18]:
np.random.seed(10)
mdata.var.sample(5)
[18]:
assay
mod1:24 A
mod1:65 A
mod2:13 B
mod2:161 B
mod2:88 B

See how e.g. muon operates with MuData objects and enables access to modality-specific slots beyond just metadata in the tutorials.