MuData nuances#
This is the sharp bits page for mudata
, which provides information on the nuances when working with MuData
objects.
First, install and import mudata
and other libraries.
[1]:
%pip install mudata
[2]:
import mudata
from mudata import MuData, AnnData
[3]:
import numpy as np
import pandas as pd
Prepare some simple AnnData objects:
[4]:
n, d1, d2, k = 1000, 100, 200, 10
np.random.seed(1)
z = np.random.normal(loc=np.arange(k), scale=np.arange(k)*2, size=(n,k))
w1 = np.random.normal(size=(d1,k))
w2 = np.random.normal(size=(d2,k))
mod1 = AnnData(X=np.dot(z, w1.T))
mod2 = AnnData(X=np.dot(z, w2.T))
Variable names#
NB:It is best to keep variable names unique across all the modalities. This will help to avoid ambiguity as well as performance of some functionality such as updating (see below).
MuData
is designed with features (variables) being different in different modalities in mind. Hence their names should be unique and different between modalities. In other words, .var_names
are checked for uniqueness across modalities.
This behaviour ensures all the functions are easy to reason about. For instance, if there is a var_name
that is present in both modalities, what happens during plotting a joint embedding from .obsm
coloured by this var_name
is not strictly defined.
Nevertheless, MuData
can accommodate modalities with duplicated .var_names
. For the typical workflows, we recommend renaming them manually or calling .var_names_make_unique()
.
[5]:
mdata = MuData({"mod1": mod1, "mod2": mod2})
print(mdata.var_names)
mdata.var_names_make_unique()
print(mdata.var_names)
Index(['0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
...
'190', '191', '192', '193', '194', '195', '196', '197', '198', '199'],
dtype='object', length=300)
Index(['mod1:0', 'mod1:1', 'mod1:2', 'mod1:3', 'mod1:4', 'mod1:5', 'mod1:6',
'mod1:7', 'mod1:8', 'mod1:9',
...
'mod2:190', 'mod2:191', 'mod2:192', 'mod2:193', 'mod2:194', 'mod2:195',
'mod2:196', 'mod2:197', 'mod2:198', 'mod2:199'],
dtype='object', length=300)
/usr/local/opt/python@3.8/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/mudata/src/mudata/_core/mudata.py:869: UserWarning: Cannot join columns with the same name because var_names are intersecting.
warnings.warn(
/usr/local/opt/python@3.8/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/mudata/src/mudata/_core/mudata.py:1478: UserWarning: Modality names will be prepended to var_names since there are identical var_names in different modalities.
warnings.warn(
Variable names in AnnData objects#
In the example above it is worth pointing out that .var_names_make_unique()
is an in-place operation, just as the same method is in anndata
.
Hence original AnnData objects’ .var_names
have also been modified:
[6]:
mdata["mod1"].var_names[:10]
[6]:
Index(['mod1:0', 'mod1:1', 'mod1:2', 'mod1:3', 'mod1:4', 'mod1:5', 'mod1:6',
'mod1:7', 'mod1:8', 'mod1:9'],
dtype='object')
Update#
NB:If individual modalities are changed, updating the MuData object containing it might be required.
Modalities in MuData
objects are full-featured AnnData
objects. Hence they can be operated individually, and their MuData
parent will have to be updated to fetch this information.
NB: Starting from
v0.3
,mudata
will be adopting a more flexible approach to metadata management: updating global index with.update()
will become independent from managing columns, which can now be done with.pull_obs()
/.pull_var()
and.push_obs()
/.push_var()
.
See more about annotations management in in the respective tutorial.
Filtering data#
In rare cases some observations (or variables) can be dropped from all the contained modalities:
[7]:
smaller_mdata = MuData({
"mod1": mod1[:900].copy(),
"mod2": mod2[:900].copy(),
})
While smaller_mdata
includes mod1
and mod2
as its modalities, it currently does not know about this change:
[8]:
smaller_mdata
[8]:
MuData object with n_obs × n_vars = 900 × 300 2 modalities mod1: 900 x 100 mod2: 900 x 200
.update()
method will fetch these updates:
[9]:
smaller_mdata.update()
smaller_mdata
[9]:
MuData object with n_obs × n_vars = 900 × 300 2 modalities mod1: 900 x 100 mod2: 900 x 200
Notice the global dimensions are now correctly reflected in the MuData
object.
Observations annotations#
Consider the following example: a new column has been added to a modality-specific metadata table:
[10]:
mod1.obs["mod1_profiled"] = True
While mdata
includes mod1
as its first modality, nothing has changed at the global level of the annotation:
[11]:
mdata.obs.columns
[11]:
Index([], dtype='object')
.update()
method will only sync the obs_names
:
[12]:
# default from v0.4
mudata.set_options(pull_on_update=False)
[12]:
<mudata._core.config.set_options at 0x3597fec00>
[13]:
mdata.update()
print(mdata.obs.columns)
Index([], dtype='object')
If we need the annotation at the global level, we can copy it from the all the underlying modalities:
[14]:
mdata.pull_obs()
print(mdata.obs.columns)
Index(['mod1:mod1_profiled'], dtype='object')
[15]:
del mdata.obs["mod1:mod1_profiled"]
As MuData
objects are designed with shared observations by default, this annotation is automatically prefixed by the modality that originated this annotation.
There is however flexibility when it comes to using prefixes for observations annotations that are specific to individual modalities:
[16]:
mdata.pull_obs(prefix_unique=False)
print(mdata.obs.columns)
Index(['mod1_profiled'], dtype='object')
Variables#
On the other hand, for variables, the default consideration is that they are unique to their modalities. This allows us to merge annotations across modalities, when possible.
[17]:
mod1.var["assay"] = "A"
mod2.var["assay"] = "B"
# Will fetch these values
mdata.pull_var()
[18]:
np.random.seed(10)
mdata.var.sample(5)
[18]:
assay | |
---|---|
mod1:24 | A |
mod1:65 | A |
mod2:13 | B |
mod2:161 | B |
mod2:88 | B |
See how e.g. muon
operates with MuData
objects and enables access to modality-specific slots beyond just metadata in the tutorials.