Quickstart#

Consider 3 ways in which you might come up with hypotheses about gene sets during analyses:

You obtain a DataFrame with differentially expressed genes and you store it at the end of your analysis as a parquet file.
You read a paper and read about up & down genes associated with this mechanism or this treatement. You write down a few notes along the lines of “I read paper X and these were the findings” and store them in a text file (or similar).
You run or train an ML model and obtain genes either as a prediction or by linearly decoding a latent space of your deep learning model. You store either the model or prediction as an artifact.

The problem with all these ways is that the “actual analysis result” is a somewhat unstructured artifact that can’t be easily queried and tabularized to guide thinking around the next experiment.

Hence, we introduce a metadata registry AnalysisResult that links the actual artifact and analyses and can serve as a decision making vehicle on the team.

In this quickstart, we illustrate how to use AnalysisResult with a simplified example of scenario 1 mentioned above.

!lamin init --storage ./test-lrex --schema bionty,lrex

import lamindb as ln
import bionty as bt
import lrex as lx
import scanpy as sc

ln.settings.transform.stem_uid = "NwPopSnhDS1t"
ln.settings.transform.version = "1"

Run an analysis#

Run a mock analysis:

# track run with parameters
pvals_adj = 0.05
ln.track(params={"pvals_adj": pvals_adj})

# get mock data
adata = ln.core.datasets.anndata_human_immune_cells(populate_registries=True)
adata

# run analysis
sc.tl.rank_genes_groups(
    adata,
    use_raw=False,
    groupby="donor",
    method="wilcoxon",
    groups=["582C"],
    reference="rest",
)
rank_genes_groups_df = sc.get.rank_genes_groups_df(adata, "582C")
rank_genes_groups_df.head()
degs_up = rank_genes_groups_df[
    (rank_genes_groups_df["logfoldchanges"] > 0)
    & (rank_genes_groups_df["pvals_adj"] < pvals_adj)
]
degs_down = rank_genes_groups_df[
    (rank_genes_groups_df["logfoldchanges"] < 0)
    & (rank_genes_groups_df["pvals_adj"] < pvals_adj)
]

Store analysis results#

Detailed results:

result_up = ln.Artifact.from_df(degs_up, description="DEGs up").save()
result_down = ln.Artifact.from_df(degs_down, description="DEGs down").save()

Abstracted results:

genes_up = bt.Gene.from_values(degs_up["names"].values, bt.Gene.ensembl_gene_id, organism="human")
genes_down = bt.Gene.from_values(degs_down["names"].values, bt.Gene.ensembl_gene_id, organism="human")

Create the AnalysisResult record:

analysis = lx.AnalysisResult(name="My analysis").save()
analysis.up_genes.set(genes_up)
analysis.down_genes.set(genes_down)
analysis.artifacts.set([result_up, result_down])

Queries on `AnalysisResult`#

analysis.up_genes.df().head()

Show code cell output Hide code cell output

	uid	symbol	stable_id	ensembl_gene_id	ncbi_gene_ids	biotype	description	synonyms	organism_id	public_source_id	created_at	updated_at	created_by_id
id
874	5rVZ6jQgHzOA	SMAP2	None	ENSG00000084070	64744	protein_coding	small ArfGAP2	SMAP1L	1	9	2024-05-07 20:50:42.912682+00:00	2024-05-07 20:50:42.912696+00:00	1
973	3QdwQJwY0B0b	RPS8	None	ENSG00000142937	6202	protein_coding	ribosomal protein S8	S8	1	9	2024-05-07 20:50:42.932163+00:00	2024-05-07 20:50:42.932177+00:00	1
2087	PL64XVlf6Aco	RPS27	None	ENSG00000177954	6232	protein_coding	ribosomal protein S27	S27\|MPS-1\|MPS1	1	9	2024-05-07 20:50:43.164002+00:00	2024-05-07 20:50:43.164017+00:00	1
2438	7MB2cNP2oD4y	SELL	None	ENSG00000188404	6402	protein_coding	selectin L	LAM1\|HLHRC\|PLNHR\|LEU-8\|LSEL\|LAM-1\|LYAM-1\|LYAM1...	1	9	2024-05-07 20:50:43.386625+00:00	2024-05-07 20:50:43.386640+00:00	1
2848	7lqYcBIs0hYC	FCMR	None	ENSG00000162894	9214	protein_coding	Fc mu receptor	FAIM3\|TOSO\|FCMUR	1	9	2024-05-07 20:50:43.471022+00:00	2024-05-07 20:50:43.471036+00:00	1

analysis.down_genes.df().head()

Show code cell output Hide code cell output

	uid	symbol	stable_id	ensembl_gene_id	ncbi_gene_ids	biotype	description	synonyms	organism_id	public_source_id	created_at	updated_at	created_by_id
id
1180	5Kzn6IKvI1ES	JUN	None	ENSG00000177606	3725	protein_coding	Jun proto-oncogene, AP-1 transcription factor ...	C-JUN\|AP-1	1	9	2024-05-07 20:50:42.976136+00:00	2024-05-07 20:50:42.976150+00:00	1
2057	5wZu55GSw4d4	S100A6	None	ENSG00000197956	6277	protein_coding	S100 calcium binding protein A6	CABP\|2A9\|CACY\|PRA	1	9	2024-05-07 20:50:43.158945+00:00	2024-05-07 20:50:43.158959+00:00	1
2059	3ck8bNMmVpIh	S100A4	None	ENSG00000196154	6275	protein_coding	S100 calcium binding protein A4	18A2\|P9KA\|PEL98\|MTS1\|FSP1\|CAPL\|42A	1	9	2024-05-07 20:50:43.159282+00:00	2024-05-07 20:50:43.159297+00:00	1
2771	cM5Scoeej5xN	BTG2	None	ENSG00000159388	7832	protein_coding	BTG anti-proliferation factor 2	MGC126064\|MGC126063\|PC3\|APRO1\|TIS21	1	9	2024-05-07 20:50:43.454949+00:00	2024-05-07 20:50:43.454963+00:00	1
3769	2NWfGttVcAlz	YPEL5	None	ENSG00000119801	51646	protein_coding	yippee like 5	CGI-127	1	9	2024-05-07 20:50:43.665980+00:00	2024-05-07 20:50:43.665995+00:00	1

analysis.artifacts.df()

Show code cell output Hide code cell output

	version	uid	storage_id	key	suffix	accessor	description	size	hash	hash_type	n_objects	n_observations	transform_id	run_id	visibility	key_is_virtual	created_at	updated_at	created_by_id
id
1	None	5VQWjoJAUeO7obqZP87u	1	None	.parquet	DataFrame	DEGs up	6310	Erme0TktObLQCmKSNSVMAw	md5	None	None	1	1	1	True	2024-05-07 20:51:04.731997+00:00	2024-05-07 20:51:04.732026+00:00	1
2	None	E1UuvYA9h4ZQt1qZs8IP	1	None	.parquet	DataFrame	DEGs down	7273	9_HwJdp8bwCqNzsx9YZ05w	md5	None	None	1	1	1	True	2024-05-07 20:51:04.739717+00:00	2024-05-07 20:51:04.739741+00:00	1

The actual analysis (a notebook, a pipeline, a script, a UI interaction):

analysis.transform

The run of the analysis including parameters:

analysis.run

Data lineage:

analysis.transform.view_parents()

Make a new version of the analysis#

Say we re-run an analysis and want to make a new version. Here’s how we can do this:

analysis_v2 = lx.AnalysisResult(is_new_version_of=analysis).save()

analysis_v2.versions.df()

	version	uid	name	description	transform_id	run_id	created_at	updated_at	created_by_id
id
1	1	xvyQYNh3	My analysis	None	1	1	2024-05-07 20:51:11.525839+00:00	2024-05-07 20:51:11.747865+00:00	1
2	2	xvyQYNh3jbaS	My analysis	None	1	1	2024-05-07 20:51:11.759980+00:00	2024-05-07 20:51:11.760008+00:00	1

Quickstart#

Run an analysis#

Store analysis results#

Queries on AnalysisResult#

Make a new version of the analysis#

Queries on `AnalysisResult`#