Input data#
A default way to import MuData
is the following:
from mudata import MuData
There are various ways in which the data can be provided to create a MuData object:
AnnData objects#
MuData object can be constructed from a dictionary of existing AnnData objects:
mdata = MuData({'rna': adata_rna, 'atac': adata_atac})
AnnData objects themselves can be easily constructed from NumPy arrays and/or Pandas DataFrames annotating features (variables) and samples/cells (observations). This makes it a rather general data format to work with any type of high-dimensional data.
from anndata import AnnData
adata = AnnData(X=matrix, obs=metadata_df, var=features_df)
Please see more details on how to operate on AnnData objects in the anndata documentation.
Omics data#
When data fromats specific to genomics are of interest, specialised readers can be found in analysis frameworks such as muon. These functions, including the ones for Cell Ranger count matrices as well as Snap files, are described here.
Remote storage#
MuData objects can be read and cached from remote locations including via HTTP(S) or from S3 buckets. This is achieved via [fsspec](fsspec/filesystem_spec). For example, to read a MuData object from a remote server:
import fsspec
fname = "https://github.com/gtca/h5xx-datasets/raw/main/datasets/minipbcite.h5mu?download="
with fsspec.open(fname) as f:
mdata = mudata.read_h5mu(f)
A caching layer can be added in the following way:
fname_cached = "filecache::" + fname
with fsspec.open(fname_cached, filecache={'cache_storage': '/tmp/'}) as f:
mdata = mudata.read_h5mu(f)
For more fsspec usage examples see [its documentation](https://filesystem-spec.readthedocs.io/).
S3#
MuData objects in the .h5mu
format stored in an S3 bucket can be read with fsspec
as well:
storage_options = {
'endpoint_url': 'localhost:9000',
'key': 'AWS_ACCESS_KEY_ID',
'secret': 'AWS_SECRET_ACCESS_KEY',
}
with fsspec.open('s3://bucket/dataset.h5mu', **storage_options) as f:
mudata.read_h5mu(f)
MuData objects stored in the .zarr
format in an S3 bucket can be read from a mapping:
import s3fs
s3 = s3fs.S3FileSystem(**storage_options)
store = s3.get_mapper('s3://bucket/dataset.zarr')
mdata = mudata.read_zarr(store)