-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Visualization scheme for gene maps #41
Comments
Hi Angad, The gene maps use original scale data. By default, the data for each gene is normalized to the range of the colormap. There is no pixel-wise normalization. I agree that a scale would be very useful. The data that is shown is the estimated posterior mean and not the observed expression. In case of low-expressed genes, the posterior mean will likely be much smaller than 1 and should not be taken to imply actual expression. It is also likely that the prediction has a high degree of uncertainty, as there may be insufficient data to reliably estimate expression levels. Here are all the config options for the gene maps analysis with brief comments: [analyses.analysis-gene_maps.options]
gene_regex = ".*" # Regex specifying which genes to predict
num_samples = 10 # Number of Monte Carlo samples to draw from the inferred posterior
genes_per_batch = 10 # Number of genes per batch (use a lower value if you get OOM errors)
predict_mean = true # Whether to draw samples of the posterior mean or the emitted negative binomial count values
normalize = false # Whether to use pixel-wise normalization
mask_tissue = true # Whether to mask out regions outside the tissue
scale = 1.0 # A rescaling factor in the range (0, 1). A lower value speeds up the analysis and results in smaller output files (at the cost of lower resolution).
writer = "image" # Output format (can be either "image" or "tensor") I think a good idea may be to set the Let me know if you have any questions! |
Hi, Thanks for the detailed explanation. That does explain a lot of my observations. In fact I tried writing out the (num_samples, height, width) maps. As a sanity check, I first read in these new tensors and try plotting them without doing anything else. Incidentally, I notice that the values that I read in have "inverted" masks. For instance, an individual samples from a tensor look something like this: array([[NZ, NZ, NZ,... NZ, NZ, NZ, ... NZ, NZ, NZ], Here, NZ is any non-zero floating point value. My tissue sample is placed in the center of the slide. How should I interpret the zero and non-zero values here? Should I not expect to see the zero-values for masked out regions and non-zero values for regions containing tissue? Replacing the writer to "image" shows me the posterior means as expected (without inversion). I might be making an incorrect assumption here regarding the tensor object? Thanks for the help! |
That does indeed look like the mask has been inverted! I think this may be a bug. Can you try to modify this line: xfuse/xfuse/analyze/gene_maps.py Line 159 in f6fbab3
by adding a tilde before tissue_mask :
samples[:, ~tissue_mask] = 0.0 and see if that fixes it? |
Yes, that fixes it. Thank you! |
Feel free to PR a fix for this if you like! |
Thanks, I just sent you a PR. I've tested the changes over your latest master commit and it looks good. I also want to update you that I'm able to use the tensor objects and create visualizations suited better for my needs. Essentially I've copied you visualization code under analyze/gene_maps.py and added a "pseudo coverage mean normalization". Ideally I would calculate the the total of posterior means across all genes and divide the posterior mean for each individual gene by that number. But obviously it is not feasible to generate the tensors for all genes. Instead, I'm using the spatial gene expression data (as proxy) to do the same. That along with a log scale helps visualize a wide range of expression values across many samples and genes. Thanks for your help! |
Thanks, much appreciated! :) Glad to hear that you have found a good way to do the gene maps analysis. Could be an interesting workflow to implement as an option! |
Hello,
This is mostly a question to better understand the current visualizations and if and how they could be configured. Hope this hasn't been already discussed.
I've run xfuse on a couple of dozen samples so far and I'm trying to investigate about a dozen genes in each of the samples. Based on the raw Visium gene expression data some of the plots aren't making sense to me. For instance, one of the genes is expressed in < 10 spots in the Visium data, still I see bright spots throughout the tissue in the gene maps analysis. Maybe having a better understanding of the visuals might help.
Is the data being normalized by default when visualizing the "xfused" gene expression? If so, the relative scaling can cause some of the regions to "light up", even though the absolute predicted expression values might be low. This would not require a scale/legend as there isn't one right now. However, if the visualization is that of the absolute values, having a scale would be useful.
Is the visualization happening on the the original scale, or on a log-scale? Some of the genes are both highly expressed throughout the tissue and have large variation in the expression levels. Visualizing using a log scale can help catch the high expression values throughout along with capturing the differences.
Alternatively, are there configurable options for the above which I can use to display my plots?
Thanks,
--Angad.
The text was updated successfully, but these errors were encountered: