MuData nuances
Contents
MuData nuances#
This is the sharp bits page for mudata
, which provides information on the nuances when working with MuData
objects.
First, install and import mudata
and other libraries.
[1]:
! pip install mudata
[2]:
import mudata as md
from mudata import MuData, AnnData
[3]:
import numpy as np
import pandas as pd
Prepare some simple AnnData objects:
[4]:
n, d1, d2, k = 1000, 100, 200, 10
np.random.seed(1)
z = np.random.normal(loc=np.arange(k), scale=np.arange(k)*2, size=(n,k))
w1 = np.random.normal(size=(d1,k))
w2 = np.random.normal(size=(d2,k))
mod1 = AnnData(X=np.dot(z, w1.T))
mod2 = AnnData(X=np.dot(z, w2.T))
Variable names#
*NB: It is best to keep variable names unique across all the modalities. This will help to avoid ambiguity as well as performance of some functionality such as updating (see below).*
MuData
is designed with features (variables) being different in different modalities in mind. Hence their names should be unique and different between modalities. In other words, .var_names
are checked for uniqueness across modalities.
This behaviour ensures all the functions are easy to reason about. For instance, if there is a var_name
that is present in both modalities, what happens during plotting a joint embedding from .obsm
coloured by this var_name
is not strictly defined.
Nevertheless, MuData
can accommodate modalities with duplicated .var_names
. For the typical workflows, we recommend renaming them manually or calling .var_names_make_unique()
.
[5]:
mdata = MuData({"mod1": mod1, "mod2": mod2})
print(mdata.var_names)
mdata.var_names_make_unique()
print(mdata.var_names)
Index(['0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
...
'190', '191', '192', '193', '194', '195', '196', '197', '198', '199'],
dtype='object', length=300)
Index(['mod1:0', 'mod1:1', 'mod1:2', 'mod1:3', 'mod1:4', 'mod1:5', 'mod1:6',
'mod1:7', 'mod1:8', 'mod1:9',
...
'mod2:190', 'mod2:191', 'mod2:192', 'mod2:193', 'mod2:194', 'mod2:195',
'mod2:196', 'mod2:197', 'mod2:198', 'mod2:199'],
dtype='object', length=300)
/usr/local/opt/python@3.8/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/mudata/_core/mudata.py:404: UserWarning: Cannot join columns with the same name because var_names are intersecting.
warnings.warn(
/usr/local/opt/python@3.8/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/mudata/_core/mudata.py:852: UserWarning: Modality names will be prepended to var_names since there are identical var_names in different modalities.
warnings.warn(
Variable names in AnnData objects#
In the example above it is worth pointing out that .var_names_make_unique()
is an in-place operation, just as the same method is in anndata
.
Hence original AnnData objects’ .var_names
have also been modified:
[6]:
mdata["mod1"].var_names[:10]
[6]:
Index(['mod1:0', 'mod1:1', 'mod1:2', 'mod1:3', 'mod1:4', 'mod1:5', 'mod1:6',
'mod1:7', 'mod1:8', 'mod1:9'],
dtype='object')
Update#
*NB: If individual modalities are changed, updating the MuData object containing it might be required.*
Modalities in MuData
objects are full-featured AnnData
objects. Hence they can be operated individually, and their MuData
parent will have to be updated to fetch this information.
Observations#
Consider the following example: a new column has been added to a modality-specific metadata table:
[7]:
mod1.obs["mod1_profiled"] = True
While mdata
includes mod1
as its first modality, it currently does not know about this change:
[8]:
mdata.obs.columns
[8]:
Index([], dtype='object')
.update()
method will fetch these updates and propagate them to the global .obs
table.
[9]:
mdata.update()
print(mdata.obs.columns)
mdata.obs.head(2)
Index(['mod1:mod1_profiled'], dtype='object')
[9]:
mod1:mod1_profiled | |
---|---|
0 | True |
1 | True |
As MuData
objects are designed with shared observations by default, this annotation is automatically prefixed by the modality that originated this annotation.
Variables#
On the other hand, for variables, the default consideration is that they are unique to their modalities. This allows us to merge annotations across modalities, when possible.
[10]:
mod1.var["assay"] = "A"
mod2.var["assay"] = "B"
# Will fetch these values
mdata.update()
[11]:
np.random.seed(10)
mdata.var.sample(5)
[11]:
assay | |
---|---|
mod1:24 | A |
mod1:65 | A |
mod2:13 | B |
mod2:161 | B |
mod2:88 | B |
See how e.g. muon
operates with MuData
objects and enables access to modality-specific slots beyond just metadata in the tutorials.