MuData nuances#

Open in Colab

Binder

This is the sharp bits page for mudata, which provides information on the nuances when working with MuData objects.

First, install and import mudata and other libraries.

[1]:
! pip install mudata
[2]:
import mudata as md
from mudata import MuData, AnnData
[3]:
import numpy as np
import pandas as pd

Prepare some simple AnnData objects:

[4]:
n, d1, d2, k = 1000, 100, 200, 10

np.random.seed(1)
z = np.random.normal(loc=np.arange(k), scale=np.arange(k)*2, size=(n,k))
w1 = np.random.normal(size=(d1,k))
w2 = np.random.normal(size=(d2,k))

mod1 = AnnData(X=np.dot(z, w1.T))
mod2 = AnnData(X=np.dot(z, w2.T))

Variable names#

*NB: It is best to keep variable names unique across all the modalities. This will help to avoid ambiguity as well as performance of some functionality such as updating (see below).*

MuData is designed with features (variables) being different in different modalities in mind. Hence their names should be unique and different between modalities. In other words, .var_names are checked for uniqueness across modalities.

This behaviour ensures all the functions are easy to reason about. For instance, if there is a var_name that is present in both modalities, what happens during plotting a joint embedding from .obsm coloured by this var_name is not strictly defined.

Nevertheless, MuData can accommodate modalities with duplicated .var_names. For the typical workflows, we recommend renaming them manually or calling .var_names_make_unique().

[5]:
mdata = MuData({"mod1": mod1, "mod2": mod2})
print(mdata.var_names)
mdata.var_names_make_unique()
print(mdata.var_names)
Index(['0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
       ...
       '190', '191', '192', '193', '194', '195', '196', '197', '198', '199'],
      dtype='object', length=300)
Index(['mod1:0', 'mod1:1', 'mod1:2', 'mod1:3', 'mod1:4', 'mod1:5', 'mod1:6',
       'mod1:7', 'mod1:8', 'mod1:9',
       ...
       'mod2:190', 'mod2:191', 'mod2:192', 'mod2:193', 'mod2:194', 'mod2:195',
       'mod2:196', 'mod2:197', 'mod2:198', 'mod2:199'],
      dtype='object', length=300)
/usr/local/opt/python@3.8/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/mudata/_core/mudata.py:404: UserWarning: Cannot join columns with the same name because var_names are intersecting.
  warnings.warn(
/usr/local/opt/python@3.8/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/mudata/_core/mudata.py:852: UserWarning: Modality names will be prepended to var_names since there are identical var_names in different modalities.
  warnings.warn(

Variable names in AnnData objects#

In the example above it is worth pointing out that .var_names_make_unique() is an in-place operation, just as the same method is in anndata.

Hence original AnnData objects’ .var_names have also been modified:

[6]:
mdata["mod1"].var_names[:10]
[6]:
Index(['mod1:0', 'mod1:1', 'mod1:2', 'mod1:3', 'mod1:4', 'mod1:5', 'mod1:6',
       'mod1:7', 'mod1:8', 'mod1:9'],
      dtype='object')

Update#

*NB: If individual modalities are changed, updating the MuData object containing it might be required.*

Modalities in MuData objects are full-featured AnnData objects. Hence they can be operated individually, and their MuData parent will have to be updated to fetch this information.

Observations#

Consider the following example: a new column has been added to a modality-specific metadata table:

[7]:
mod1.obs["mod1_profiled"] = True

While mdata includes mod1 as its first modality, it currently does not know about this change:

[8]:
mdata.obs.columns
[8]:
Index([], dtype='object')

.update() method will fetch these updates and propagate them to the global .obs table.

[9]:
mdata.update()
print(mdata.obs.columns)
mdata.obs.head(2)
Index(['mod1:mod1_profiled'], dtype='object')
[9]:
mod1:mod1_profiled
0 True
1 True

As MuData objects are designed with shared observations by default, this annotation is automatically prefixed by the modality that originated this annotation.

Variables#

On the other hand, for variables, the default consideration is that they are unique to their modalities. This allows us to merge annotations across modalities, when possible.

[10]:
mod1.var["assay"] = "A"
mod2.var["assay"] = "B"

# Will fetch these values
mdata.update()
[11]:
np.random.seed(10)
mdata.var.sample(5)
[11]:
assay
mod1:24 A
mod1:65 A
mod2:13 B
mod2:161 B
mod2:88 B

See how e.g. muon operates with MuData objects and enables access to modality-specific slots beyond just metadata in the tutorials.