Visualization scheme for gene maps #41

angadps · 2022-03-30T00:19:50Z

Hello,

This is mostly a question to better understand the current visualizations and if and how they could be configured. Hope this hasn't been already discussed.

I've run xfuse on a couple of dozen samples so far and I'm trying to investigate about a dozen genes in each of the samples. Based on the raw Visium gene expression data some of the plots aren't making sense to me. For instance, one of the genes is expressed in < 10 spots in the Visium data, still I see bright spots throughout the tissue in the gene maps analysis. Maybe having a better understanding of the visuals might help.

Is the data being normalized by default when visualizing the "xfused" gene expression? If so, the relative scaling can cause some of the regions to "light up", even though the absolute predicted expression values might be low. This would not require a scale/legend as there isn't one right now. However, if the visualization is that of the absolute values, having a scale would be useful.
Is the visualization happening on the the original scale, or on a log-scale? Some of the genes are both highly expressed throughout the tissue and have large variation in the expression levels. Visualizing using a log scale can help catch the high expression values throughout along with capturing the differences.

Alternatively, are there configurable options for the above which I can use to display my plots?

Thanks,
--Angad.

ludvb · 2022-03-30T09:37:19Z

Hi Angad,

The gene maps use original scale data. By default, the data for each gene is normalized to the range of the colormap. There is no pixel-wise normalization. I agree that a scale would be very useful.

The data that is shown is the estimated posterior mean and not the observed expression. In case of low-expressed genes, the posterior mean will likely be much smaller than 1 and should not be taken to imply actual expression. It is also likely that the prediction has a high degree of uncertainty, as there may be insufficient data to reliably estimate expression levels.

Here are all the config options for the gene maps analysis with brief comments:

[analyses.analysis-gene_maps.options]
gene_regex = ".*"     # Regex specifying which genes to predict
num_samples = 10      # Number of Monte Carlo samples to draw from the inferred posterior
genes_per_batch = 10  # Number of genes per batch (use a lower value if you get OOM errors)
predict_mean = true   # Whether to draw samples of the posterior mean or the emitted negative binomial count values
normalize = false     # Whether to use pixel-wise normalization
mask_tissue = true    # Whether to mask out regions outside the tissue
scale = 1.0           # A rescaling factor in the range (0, 1). A lower value speeds up the analysis and results in smaller output files (at the cost of lower resolution).
writer = "image"      # Output format (can be either "image" or "tensor")

I think a good idea may be to set the writer argument to "tensor". This will save the expression maps as torch tensors, which can be loaded in Python using torch.load. The tensors will have the shape (num_samples, height, width) and contain the unnormalized samples. It would then be possible to normalize the data in the way most suitable for your analysis and to get a better sense of predictive dispersion/uncertainty.

Let me know if you have any questions!

angadps · 2022-03-31T00:29:20Z

Hi,

Thanks for the detailed explanation. That does explain a lot of my observations. In fact I tried writing out the (num_samples, height, width) maps. As a sanity check, I first read in these new tensors and try plotting them without doing anything else. Incidentally, I notice that the values that I read in have "inverted" masks. For instance, an individual samples from a tensor look something like this:

array([[NZ, NZ, NZ,... NZ, NZ, NZ, ... NZ, NZ, NZ],
[NZ, 0, 0,... 0, 0, 0, ... 0, 0, NZ],
[NZ, 0, 0,... 0, 0, 0, ... 0, 0, NZ],
[NZ, 0, 0,... 0, 0, 0, ... 0, 0, NZ],
[NZ, 0, 0,... 0, 0, 0, ... 0, 0, NZ],
[NZ, NZ, NZ,... NZ, NZ, NZ, ... NZ, NZ, NZ]], dtype=float32)

Here, NZ is any non-zero floating point value. My tissue sample is placed in the center of the slide. How should I interpret the zero and non-zero values here? Should I not expect to see the zero-values for masked out regions and non-zero values for regions containing tissue? Replacing the writer to "image" shows me the posterior means as expected (without inversion). I might be making an incorrect assumption here regarding the tensor object?

Thanks for the help!

ludvb · 2022-03-31T11:51:55Z

That does indeed look like the mask has been inverted! I think this may be a bug. Can you try to modify this line:

xfuse/xfuse/analyze/gene_maps.py

Line 159 in f6fbab3

samples[:, tissue_mask] = 0.0

by adding a tilde before tissue_mask:

samples[:, ~tissue_mask] = 0.0

and see if that fixes it?

angadps · 2022-04-04T21:14:46Z

Yes, that fixes it. Thank you!
The tensor allows me to normalize and combine expression across multiple genes as well.
Sample masking though does not always work but that's for another thread.

ludvb · 2022-04-06T13:53:34Z

Feel free to PR a fix for this if you like!

angadps · 2022-04-12T20:02:59Z

Thanks, I just sent you a PR. I've tested the changes over your latest master commit and it looks good.

I also want to update you that I'm able to use the tensor objects and create visualizations suited better for my needs.

Essentially I've copied you visualization code under analyze/gene_maps.py and added a "pseudo coverage mean normalization". Ideally I would calculate the the total of posterior means across all genes and divide the posterior mean for each individual gene by that number. But obviously it is not feasible to generate the tensors for all genes. Instead, I'm using the spatial gene expression data (as proxy) to do the same. That along with a log scale helps visualize a wide range of expression values across many samples and genes.

Thanks for your help!

ludvb · 2022-04-13T12:04:14Z

Thanks, much appreciated! :) Glad to hear that you have found a good way to do the gene maps analysis. Could be an interesting workflow to implement as an option!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Visualization scheme for gene maps #41

Visualization scheme for gene maps #41

angadps commented Mar 30, 2022

ludvb commented Mar 30, 2022

angadps commented Mar 31, 2022

ludvb commented Mar 31, 2022

angadps commented Apr 4, 2022

ludvb commented Apr 6, 2022

angadps commented Apr 12, 2022

ludvb commented Apr 13, 2022

Visualization scheme for gene maps #41

Visualization scheme for gene maps #41

Comments

angadps commented Mar 30, 2022

ludvb commented Mar 30, 2022

angadps commented Mar 31, 2022

ludvb commented Mar 31, 2022

angadps commented Apr 4, 2022

ludvb commented Apr 6, 2022

angadps commented Apr 12, 2022

ludvb commented Apr 13, 2022