{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Axes in MuData" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/scverse/mudata/blob/master/docs/source/notebooks/axes.ipynb) [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/scverse/mudata/master?labpath=docs%2Fsource%2Fnotebooks%2Faxes.ipynb)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This notebooks introduces *axes* interface that supercharges MuData to be used beyond multimodal data storage." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Briefly, the default multimodal storage means that the modalities (AnnData objects) have observations as a shared axis (`axis=0`), and the variables are effectively concatenated.\n", "\n", "We can imagine a symmetrical storage model where the variables are shared and observations are concatenated. This is possible with `axis=1` provided at MuData creation time.\n", "\n", "More than that, in some cases we might want to relax constraints even more and assume that both observations and variables are in fact shared. This allows, for instance, to store subsets of features in the same object. As both axes are shared, a convention is used here, and it is `axis=-1`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Imports" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, install and import `mudata` and other libraries." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "! pip install mudata" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import mudata as md\n", "from mudata import MuData, AnnData" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "\n", "np.random.seed(1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Multimodal: `axis=0`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As expected, this is the default behaviour." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To illustrate it, let's prepare some modalities first:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "n, d1, d2 = 100, 1000, 1500\n", "\n", "ax = AnnData(np.random.normal(size=(n,d1)))\n", "\n", "ay = AnnData(np.random.normal(size=(n,d2)))" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/usr/local/opt/python@3.8/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/mudata/_core/mudata.py:445: UserWarning: Cannot join columns with the same name because var_names are intersecting.\n", " warnings.warn(\n" ] }, { "data": { "text/html": [ "
MuData object with n_obs × n_vars = 100 × 2500\n", " 2 modalities\n", " x:\t100 x 1000\n", " y:\t100 x 1500" ], "text/plain": [ "MuData object with n_obs × n_vars = 100 × 2500\n", " 2 modalities\n", " x:\t100 x 1000\n", " y:\t100 x 1500" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# same as:\n", "# mdata = MuData({\"x\": ax, \"y\": ay})\n", "mdata = MuData({\"x\": ax, \"y\": ay}, axis=0)\n", "mdata" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As `axis=0` corresponds to shared observations, the features should be specific to their modalities. The variable names, however, are unique, which the warning is displayed about:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "ax.var_names: [ 0, 1, 2, 3, 4, ..., 999 ]\n", "ay.var_names: [ 0, 1, 2, 3, 4, ..., 1499 ]\n" ] } ], "source": [ "print(\"ax.var_names: [\", \", \".join(ax.var_names.values[:5]) + \", ..., \", ax.var_names.values[d1-1], \"]\")\n", "print(\"ay.var_names: [\", \", \".join(ay.var_names.values[:5]) + \", ..., \", ay.var_names.values[d2-1], \"]\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In real-world workflows we expect to be able to identify features by their (unique) names:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "ax.var_names = [f\"x_var{i+1}\" for i in range(d1)]\n", "ay.var_names = [f\"y_var{i+1}\" for i in range(d2)]" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
MuData object with n_obs × n_vars = 100 × 2500\n", " 2 modalities\n", " x:\t100 x 1000\n", " y:\t100 x 1500" ], "text/plain": [ "MuData object with n_obs × n_vars = 100 × 2500\n", " 2 modalities\n", " x:\t100 x 1000\n", " y:\t100 x 1500" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mdata = MuData({\"x\": ax, \"y\": ay}, axis=0)\n", "mdata" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Multidataset: `axis=1`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, AnnData objects can represent e.g. multiple scRNA-seq datasets. When analysing them together, it is convenient to store them in one object. This object can then incorporate annotations such as a joint embedding of the datasets." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "n1, n2, d = 100, 500, 1000\n", "\n", "ad1 = AnnData(np.random.normal(size=(n1,d)))\n", "ad2 = AnnData(np.random.normal(size=(n2,d)))" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "# Cell barcodes are dataset-specific\n", "ad1.obs_names = [f\"dat1-cell{i+1}\" for i in range(n1)]\n", "ad2.obs_names = [f\"dat2-cell{i+1}\" for i in range(n2)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What would happen if we create a MuData without specifying the axis?\n", "\n", "```python\n", "mdata = MuData({\"dat1\": ad1, \"dat2\": ad2})\n", "mdata\n", "```\n", "\n", "
MuData object with n_obs × n_vars = 600 × 1000\n", " 2 modalities\n", " dat1:\t100 x 1000\n", " dat2:\t500 x 1000" ], "text/plain": [ "MuData object with n_obs × n_vars = 600 × 1000\n", " 2 modalities\n", " dat1:\t100 x 1000\n", " dat2:\t500 x 1000" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mdata = MuData({\"dat1\": ad1, \"dat2\": ad2}, axis=1)\n", "mdata" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Different views on one modality: `axis=-1`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In some workflows, like the ones with [scVI](https://scvi-tools.org/), AnnData objects typically contain only selected features, e.g. genes. Raw counts for all of the genes are still valuable to keep, for other analyses.\n", "\n", "MuData handles this scenario using the `axis=-1` convention." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "n, d_raw, d_preproc = 100, 900, 300\n", "\n", "a_raw = AnnData(np.random.normal(size=(n,d_raw)))\n", "a_preproc = a_raw[:,np.sort(np.random.choice(np.arange(d_raw), d_preproc, replace=False))].copy()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What would happen if we create a MuData with `axis=0`?\n", "\n", "```python\n", "mdata = MuData({\"raw\": a_raw, \"preproc\": a_preproc}, axis=0)\n", "mdata\n", "```\n", "\n", "
MuData object with n_obs × n_vars = 100 × 900\n", " 2 modalities\n", " raw:\t100 x 900\n", " preproc:\t100 x 300" ], "text/plain": [ "MuData object with n_obs × n_vars = 100 × 900\n", " 2 modalities\n", " raw:\t100 x 900\n", " preproc:\t100 x 300" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mdata = MuData({\"raw\": a_raw, \"preproc\": a_preproc}, axis=-1)\n", "mdata" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.13" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 4 }