Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unfinished OptICA step #16

Open
MathiasJoensson opened this issue Feb 21, 2024 · 7 comments
Open

Unfinished OptICA step #16

MathiasJoensson opened this issue Feb 21, 2024 · 7 comments

Comments

@MathiasJoensson
Copy link

I am having an issue both on a local macbook, and a virtual linux machine on Azure, where the OptICA step does not finish. It seems to be 'hanging' indefinetely. For instance I ran this on a dataset of 164 samples:
bash ./run_ica.sh -n 16 -o ../data/interim/ -v ../data/processed_data/log_tpm_norm.csv

Here is the output, where it hangs:

Computing dimension 160 of 164

##################################

Setting up...
0.25 seconds elapsed

Running ICA...
Completed run 1 of 7 on Processor 0
2.10 minutes elapsed
Completed run 2 of 7 on Processor 0
2.08 minutes elapsed
Completed run 3 of 7 on Processor 0
1.85 minutes elapsed
Completed run 4 of 7 on Processor 0
1.58 minutes elapsed
Completed run 5 of 7 on Processor 0
2.07 minutes elapsed
Completed run 6 of 7 on Processor 0
52.93 seconds elapsed
Completed run 7 of 7 on Processor 0
1.60 minutes elapsed

All ICA runs complete!
12.33 minutes elapsed

So I get the A and M files for dimension 150 in this case, but not for 160. I get the same issue doing this as well, where dimension 152 does not complete:
bash ./run_ica.sh -n 16 -m 152 -s 2 -o ../data/interim/ -v ../data/processed_data/log_tpm_norm.csv

Thanks for any help!
/Mathias

Details of machine:
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 20.04.6 LTS
Release: 20.04
Codename: focal

Linux avm-sdt-nilmat-ica 5.15.0-1054-azure #62~20.04.1-Ubuntu SMP Wed Jan 17 12:22:56 UTC 2024 x86_64 GNU/Linux

Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 46 bits physical, 57 bits virtual
CPU(s): 32
On-line CPU(s) list: 0-31
Thread(s) per core: 2
Core(s) per socket: 16
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 106
Model name: Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GH

Memory:
total used free shared buff/cache available
Mem: 128756 3437 122362 6 2956 124229

@HegedusB
Copy link

I have somehow similar problem. When I started the optICA step with a matrix with 13617 rows and 214 columns, I observed a continuous slowdown during the ICA calculation. In addition, the CPU utilization of the calculation node is extremely high. 110 average lode for a 64 core compute node. I can not figure out the cause of this problem. I would guess that the mpi is causing the problem. But that is just a guess. I would like to know what the solution is.

Command used:
bash ./run_ica.sh -i 50 -n 3 -o ../../../iModulon_Round1/data/interim/ -l ../../../iModulon_Round1/data/interim/ica.log ../../../iModulon_Round1/data/raw_data/log_cpm_norm.csv

All ICA runs complete!
2.77 minutes elapsed

All ICA runs complete!
3.11 minutes elapsed

All ICA runs complete!
10.79 minutes elapsed

All ICA runs complete!
14.82 minutes elapsed

All ICA runs complete!
22.52 minutes elapsed

All ICA runs complete!
47.39 minutes elapsed

All ICA runs complete!
54.11 minutes elapsed

All ICA runs complete!
1.33 hours elapsed

All ICA runs complete!
2.13 hours elapsed

All ICA runs complete!
2.91 hours elapsed

All ICA runs complete!
3.73 hours elapsed

All ICA runs complete!
5.74 hours elapsed

All ICA runs complete!
5.82 hours elapsed

All ICA runs complete!
6.70 hours elapsed

All ICA runs complete!
7.36 hours elapsed

All ICA runs complete!
7.47 hours elapsed

@HegedusB
Copy link

Unfortunately, I am experiencing the same problem. The program simply gets stuck in the last ICA iteration step. Do you have an idea for a solution?

@MathiasJoensson
Copy link
Author

Hey HegedusB, No I don't have any idea unfortunately. I messed around with the lowering the max_iter option and that seemed to work.

@HegedusB
Copy link

Hi MathiasJoensson. Thanks for the tip! Does the run_ica.sh have a max_iter option at all? What is the default value there?

@MathiasJoensson
Copy link
Author

Sorry, it's this option
-i|--iter <n_iter> Number of random restarts (default: 100)
I lowered it to 50 and it worked. Trying to increase the iteration now, to see where it stops working.

@HegedusB
Copy link

Yes, I did that too. I reduced the n_iter from 100 to 50. After a week of running, it got stuck at the last ICA iteration. What was your final result? An M and an A table that summarizes the M and A tables of the ICA iterations?

@avsastry
Copy link
Owner

Hi @HegedusB and @MathiasJoensson, unfortunately I'm no longer working with this code. However, it is currently being maintained at https://github.com/SBRG/modulome-workflow

I have talked with the current maintainer and this seems to be a known issue within the group. I'm not sure if there is a current solution, but please try posting the issue there.

Best of luck!

@avsastry avsastry reopened this Feb 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants