{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# MuData nuances" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/scverse/mudata/blob/master/docs/source/notebooks/nuances.ipynb)\n", "\n", "[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/scverse/mudata/master?labpath=docs%2Fsource%2Fnotebooks%2Fnuances.ipynb)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is *the sharp bits* page for ``mudata``, which provides information on the nuances when working with ``MuData`` objects." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, install and import `mudata` and other libraries." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "! pip install mudata" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import mudata as md\n", "from mudata import MuData, AnnData" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Prepare some simple [AnnData](https://anndata.readthedocs.io/) objects:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "n, d1, d2, k = 1000, 100, 200, 10\n", "\n", "np.random.seed(1)\n", "z = np.random.normal(loc=np.arange(k), scale=np.arange(k)*2, size=(n,k))\n", "w1 = np.random.normal(size=(d1,k))\n", "w2 = np.random.normal(size=(d2,k))\n", "\n", "mod1 = AnnData(X=np.dot(z, w1.T))\n", "mod2 = AnnData(X=np.dot(z, w2.T))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Variable names" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> ***NB:** It is best to keep variable names unique across all the modalities. This will help to avoid ambiguity as well as performance of some functionality such as updating (see below).*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "``MuData`` is designed with features (variables) being different in different modalities in mind. Hence their names should be unique and different between modalities. In other words, ``.var_names`` are checked for uniqueness across modalities.\n", "\n", "This behaviour ensures all the functions are easy to reason about. For instance, if there is a ``var_name`` that is present in both modalities, what happens during plotting a joint embedding from ``.obsm`` coloured by this ``var_name`` is not strictly defined." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Nevertheless, ``MuData`` can accommodate modalities with duplicated ``.var_names``. For the typical workflows, we recommend renaming them manually or calling ``.var_names_make_unique()``." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Index(['0', '1', '2', '3', '4', '5', '6', '7', '8', '9',\n", " ...\n", " '190', '191', '192', '193', '194', '195', '196', '197', '198', '199'],\n", " dtype='object', length=300)\n", "Index(['mod1:0', 'mod1:1', 'mod1:2', 'mod1:3', 'mod1:4', 'mod1:5', 'mod1:6',\n", " 'mod1:7', 'mod1:8', 'mod1:9',\n", " ...\n", " 'mod2:190', 'mod2:191', 'mod2:192', 'mod2:193', 'mod2:194', 'mod2:195',\n", " 'mod2:196', 'mod2:197', 'mod2:198', 'mod2:199'],\n", " dtype='object', length=300)\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/usr/local/opt/python@3.8/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/mudata/_core/mudata.py:404: UserWarning: Cannot join columns with the same name because var_names are intersecting.\n", " warnings.warn(\n", "/usr/local/opt/python@3.8/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/mudata/_core/mudata.py:852: UserWarning: Modality names will be prepended to var_names since there are identical var_names in different modalities.\n", " warnings.warn(\n" ] } ], "source": [ "mdata = MuData({\"mod1\": mod1, \"mod2\": mod2})\n", "print(mdata.var_names)\n", "mdata.var_names_make_unique()\n", "print(mdata.var_names)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Variable names in AnnData objects" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the example above it is worth pointing out that `.var_names_make_unique()` is an in-place operation, just as [the same method](https://anndata.readthedocs.io/en/stable/anndata.AnnData.var_names_make_unique.html) is in `anndata`.\n", "\n", "Hence original AnnData objects' `.var_names` have also been modified:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['mod1:0', 'mod1:1', 'mod1:2', 'mod1:3', 'mod1:4', 'mod1:5', 'mod1:6',\n", " 'mod1:7', 'mod1:8', 'mod1:9'],\n", " dtype='object')" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mdata[\"mod1\"].var_names[:10]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Update" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> ***NB:** If individual modalities are changed, updating the MuData object containing it might be required.*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Modalities in ``MuData`` objects are full-featured ``AnnData`` objects. Hence they can be operated individually, and their ``MuData`` parent will have to be updated to fetch this information." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Observations" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Consider the following example: a new column has been added to a modality-specific metadata table:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "mod1.obs[\"mod1_profiled\"] = True" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "While `mdata` includes `mod1` as its first modality, it currently does not know about this change:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index([], dtype='object')" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mdata.obs.columns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`.update()` method will fetch these updates and propagate them to the global `.obs` table." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Index(['mod1:mod1_profiled'], dtype='object')\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
mod1:mod1_profiled
0True
1True
\n", "
" ], "text/plain": [ " mod1:mod1_profiled\n", "0 True\n", "1 True" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mdata.update()\n", "print(mdata.obs.columns)\n", "mdata.obs.head(2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As `MuData` objects are designed with shared observations by default, this annotation is automatically prefixed by the modality that originated this annotation." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Variables" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "On the other hand, for variables, the default consideration is that they are unique to their modalities. This allows us to merge annotations across modalities, when possible." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "mod1.var[\"assay\"] = \"A\"\n", "mod2.var[\"assay\"] = \"B\"\n", "\n", "# Will fetch these values\n", "mdata.update()" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
assay
mod1:24A
mod1:65A
mod2:13B
mod2:161B
mod2:88B
\n", "
" ], "text/plain": [ " assay\n", "mod1:24 A\n", "mod1:65 A\n", "mod2:13 B\n", "mod2:161 B\n", "mod2:88 B" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.random.seed(10)\n", "mdata.var.sample(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "See how e.g. ``muon`` operates with ``MuData`` objects and enables access to modality-specific slots beyond just metadata [in the tutorials](https://muon-tutorials.readthedocs.io/en/latest/)." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.13" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 4 }