{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# MuData nuances"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/scverse/mudata/blob/master/docs/source/notebooks/nuances.ipynb)\n",
"\n",
"[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/scverse/mudata/master?labpath=docs%2Fsource%2Fnotebooks%2Fnuances.ipynb)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This is *the sharp bits* page for ``mudata``, which provides information on the nuances when working with ``MuData`` objects."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First, install and import `mudata` and other libraries."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"! pip install mudata"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"import mudata as md\n",
"from mudata import MuData, AnnData"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Prepare some simple [AnnData](https://anndata.readthedocs.io/) objects:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"n, d1, d2, k = 1000, 100, 200, 10\n",
"\n",
"np.random.seed(1)\n",
"z = np.random.normal(loc=np.arange(k), scale=np.arange(k)*2, size=(n,k))\n",
"w1 = np.random.normal(size=(d1,k))\n",
"w2 = np.random.normal(size=(d2,k))\n",
"\n",
"mod1 = AnnData(X=np.dot(z, w1.T))\n",
"mod2 = AnnData(X=np.dot(z, w2.T))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Variable names"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> ***NB:** It is best to keep variable names unique across all the modalities. This will help to avoid ambiguity as well as performance of some functionality such as updating (see below).*"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"``MuData`` is designed with features (variables) being different in different modalities in mind. Hence their names should be unique and different between modalities. In other words, ``.var_names`` are checked for uniqueness across modalities.\n",
"\n",
"This behaviour ensures all the functions are easy to reason about. For instance, if there is a ``var_name`` that is present in both modalities, what happens during plotting a joint embedding from ``.obsm`` coloured by this ``var_name`` is not strictly defined."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Nevertheless, ``MuData`` can accommodate modalities with duplicated ``.var_names``. For the typical workflows, we recommend renaming them manually or calling ``.var_names_make_unique()``."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Index(['0', '1', '2', '3', '4', '5', '6', '7', '8', '9',\n",
" ...\n",
" '190', '191', '192', '193', '194', '195', '196', '197', '198', '199'],\n",
" dtype='object', length=300)\n",
"Index(['mod1:0', 'mod1:1', 'mod1:2', 'mod1:3', 'mod1:4', 'mod1:5', 'mod1:6',\n",
" 'mod1:7', 'mod1:8', 'mod1:9',\n",
" ...\n",
" 'mod2:190', 'mod2:191', 'mod2:192', 'mod2:193', 'mod2:194', 'mod2:195',\n",
" 'mod2:196', 'mod2:197', 'mod2:198', 'mod2:199'],\n",
" dtype='object', length=300)\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/usr/local/opt/python@3.8/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/mudata/_core/mudata.py:404: UserWarning: Cannot join columns with the same name because var_names are intersecting.\n",
" warnings.warn(\n",
"/usr/local/opt/python@3.8/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/mudata/_core/mudata.py:852: UserWarning: Modality names will be prepended to var_names since there are identical var_names in different modalities.\n",
" warnings.warn(\n"
]
}
],
"source": [
"mdata = MuData({\"mod1\": mod1, \"mod2\": mod2})\n",
"print(mdata.var_names)\n",
"mdata.var_names_make_unique()\n",
"print(mdata.var_names)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Variable names in AnnData objects"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In the example above it is worth pointing out that `.var_names_make_unique()` is an in-place operation, just as [the same method](https://anndata.readthedocs.io/en/stable/anndata.AnnData.var_names_make_unique.html) is in `anndata`.\n",
"\n",
"Hence original AnnData objects' `.var_names` have also been modified:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Index(['mod1:0', 'mod1:1', 'mod1:2', 'mod1:3', 'mod1:4', 'mod1:5', 'mod1:6',\n",
" 'mod1:7', 'mod1:8', 'mod1:9'],\n",
" dtype='object')"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mdata[\"mod1\"].var_names[:10]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Update"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> ***NB:** If individual modalities are changed, updating the MuData object containing it might be required.*"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Modalities in ``MuData`` objects are full-featured ``AnnData`` objects. Hence they can be operated individually, and their ``MuData`` parent will have to be updated to fetch this information."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Observations"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Consider the following example: a new column has been added to a modality-specific metadata table:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"mod1.obs[\"mod1_profiled\"] = True"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"While `mdata` includes `mod1` as its first modality, it currently does not know about this change:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Index([], dtype='object')"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mdata.obs.columns"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`.update()` method will fetch these updates and propagate them to the global `.obs` table."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Index(['mod1:mod1_profiled'], dtype='object')\n"
]
},
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" mod1:mod1_profiled | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" True | \n",
"
\n",
" \n",
" 1 | \n",
" True | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" mod1:mod1_profiled\n",
"0 True\n",
"1 True"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mdata.update()\n",
"print(mdata.obs.columns)\n",
"mdata.obs.head(2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As `MuData` objects are designed with shared observations by default, this annotation is automatically prefixed by the modality that originated this annotation."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Variables"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"On the other hand, for variables, the default consideration is that they are unique to their modalities. This allows us to merge annotations across modalities, when possible."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"mod1.var[\"assay\"] = \"A\"\n",
"mod2.var[\"assay\"] = \"B\"\n",
"\n",
"# Will fetch these values\n",
"mdata.update()"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" assay | \n",
"
\n",
" \n",
" \n",
" \n",
" mod1:24 | \n",
" A | \n",
"
\n",
" \n",
" mod1:65 | \n",
" A | \n",
"
\n",
" \n",
" mod2:13 | \n",
" B | \n",
"
\n",
" \n",
" mod2:161 | \n",
" B | \n",
"
\n",
" \n",
" mod2:88 | \n",
" B | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" assay\n",
"mod1:24 A\n",
"mod1:65 A\n",
"mod2:13 B\n",
"mod2:161 B\n",
"mod2:88 B"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.random.seed(10)\n",
"mdata.var.sample(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"See how e.g. ``muon`` operates with ``MuData`` objects and enables access to modality-specific slots beyond just metadata [in the tutorials](https://muon-tutorials.readthedocs.io/en/latest/)."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.13"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": true,
"sideBar": true,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 4
}