Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible for Gene_maps values to use the same scale instead of min-max per gene? #72

Open
NicolaasVanRenne opened this issue Jul 15, 2023 · 3 comments

Comments

@NicolaasVanRenne
Copy link

If I understand correctly; the gene maps are minimum and maximum expression for each gene.

However, genes that are hardly expressed in your biological sample will look like they have a strong expression pattern which is overblown (imho).

Would it be possible to have an output where the gene_maps of all genes have the same scale? In this way, genes that are hardly expressed would also show dramatic expression. For example in my tissue, CD1E expression is really low, but it appears enormous in XFuse (with a distribution that doesnt make much sense, its more noise that gets amplified...)

image

Maybe I did sth wrong and this should not be? I expect in this case that the entire tissue would be blackish colour...

@ludvb
Copy link
Owner

ludvb commented Jul 24, 2023

Yes, that's correct. There is currently no option to give the gene maps a common scale. The reason for this is that they are produced one gene at a time, so we don't know the max until all genes have been processed. What you could do possibly is to save the underlying data that the maps are based on by setting writer = "tensor" in the gene_maps section of the config toml file. Then you could create the figures manually. Keep in mind, however, that the tensor file gets very large, so you probably don't want to do this for all genes in the dataset and instead limit the gene set using the gene_regex option.

As an aside, I agree that this gene map looks quite weird. Usually, microstructure (e.g. nuclei) are typically visually distinct. What is the resolution of the H&E after scaling? Usually, I aim for something like 1500 - 2000 px in height/width.

@NicolaasVanRenne
Copy link
Author

I tried what you said, and the genes I selected are now in the output folder as a .pt format.

  1. How do I turn these .pt files into tiff gene expression maps?

  2. Are these gene maps then trained on the entire gene expression data, or only on the genes that I put in the gene_regex?

  3. Can you just elaborate a bit on what writer ="tensor" does (and maybe where in the code it is located - would like to understand)

my toml file was something like this;

[xfuse]
network_depth = 6
network_width = 16
min_counts = 50

[expansion_strategy]
type = "DropAndSplit"
[expansion_strategy.DropAndSplit]
max_metagenes = 50

[optimization]
batch_size = 3
epochs = 100000
learning_rate = 0.0003
patch_size = 768

[analyses]
[analyses.metagenes]
type = "metagenes"
[analyses.metagenes.options]
method = "pca"

[analyses.gene_maps]
type = "gene_maps"
[analyses.gene_maps.options]
gene_regex = "^(GENE1|GENE2|GENE3| (and so on until GENE 34))$"
writer = "tensor"


[slides]
[slides.section1]
data = "sections/sample1/data.h5"
[slides.section1.covariates]
section = 1

[slides.section2]
data = "sections/sample2/data.h5"
[slides.section2.covariates]
section = 2

[slides.section3]
data = "sections/sample3/data.h5"
[slides.section3.covariates]
section = 3

[slides.section4]
data = "sections/sample4/data.h5"
[slides.section4.covariates]
section = 4

@ludvb
Copy link
Owner

ludvb commented Sep 15, 2023

Awesome!

How do I turn these .pt files into tiff gene expression maps?

You will have to do this manually. To give you an idea, something like the below script should work (it's untested so it will likely need some slight modifications to work). You need to run it in the same directory as the *.pt files. It first computes the max expression over all maps and then normalizes the gene maps based on that. Note that it could be that the max will be too high and all maps will then be quite dark. In that case, it could make more sense to clip the maps on the 99th percentile or similar instead of the max.

import os
from glob import glob

import matplotlib.cm as cm
import tifffile

def load_gene_map(path):
    gene_map = torch.load(path, map_location="cpu")
    gene_map = gene_map.mean(0)
    return gene_map

max_expression_all = 0
gene_map_files = glob("*.pt")
for path in gene_map_files:
    gene_map = load_gene_map(path)
    max_expression = gene_map.max()
    if max_expression > max_expression_all:
        max_expression_all = max_expression

for path in gene_map_files:
    gene_name,_ = os.path.splitext(os.path.basename(path))
    gene_map = load_gene_map(path)
    gene_map = gene_map / max_expression_all
    gene_map = gene_map.numpy()
    gene_map = cm.inferno(gene_map)
    tifffile.imsave(f"{gene_name}.tif", gene_map)

Are these gene maps then trained on the entire gene expression data, or only on the genes that I put in the gene_regex?

Good question! As long as the model is trained on the entire gene expression data, the latent state and therefore maps will also be informed by other genes.

Can you just elaborate a bit on what writer ="tensor" does (and maybe where in the code it is located - would like to understand)

It sets the gene map writer to the "tensor" writer, which saves the sampled gene maps as pytorch tensors (as opposed to image files saved by the "image" writer). The tensor writer is defined here:

def _save_tensor(gene, samples, tissue_mask):
if tissue_mask is not None:
samples[:, ~tissue_mask] = 0.0
torch.save(samples, f"{gene}.pt")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants