Unexpected nan
values in TDVPSchmitt
with specific n_samples
values
#1959
-
I'm experiencing an issue with TDVPSchmitt time evolution in NetKet where the algorithm produces nan values for certain values of n_samples. Specifically, when n_samples is some powers of two (e.g., 128, 256, 512, 1024, and others.), the observables computed during the time evolution become Code to Reproduce:import numpy as np
import netket as nk
import netket.experimental as nkx
import copy
# System parameters
L = 2
hi = nk.hilbert.Spin(0.5, L)
# Hamiltonian setup
h = 1.0
J = 1.0
h_eff = h + J
H1 = nk.operator.LocalOperator(hi, dtype=np.complex128)
for i in range(L):
H1 -= h_eff * nk.operator.spin.sigmaz(hi, i)
# Sample sizes (powers of two)
n_samples = [2**i for i in range(7, 15)]
for samples in n_samples:
print(f"=========== samples: {samples} ===========")
model = nk.models.LogStateVector(hi, param_dtype=complex)
sa = nk.sampler.ExactSampler(hi)
vs1 = nk.vqs.MCState(
model=model,
sampler=sa,
n_samples=samples,
seed=214748364,
)
vs2 = copy.deepcopy(vs1)
# Observables
obs = {
"sum_sx": sum(nk.operator.spin.sigmax(hi, i) for i in range(L)),
"sum_sy": sum(nk.operator.spin.sigmay(hi, i) for i in range(L)),
"sum_sz": sum(nk.operator.spin.sigmaz(hi, i) for i in range(L)),
}
# Time evolution parameters
dt = 0.001
integrator = nkx.dynamics.Euler(dt=dt)
qgt = nk.optimizer.qgt.QGTJacobianDense(holomorphic=True)
total_time = 0.5
# TDVP Schmitt time evolution
te1 = nkx.driver.TDVPSchmitt(
operator=H1,
variational_state=vs1,
ode_solver=integrator,
holomorphic=True
)
te1.run(T=total_time, obs=obs)
print("sz", te1.state.expect(obs["sum_sz"]))
print("sx", te1.state.expect(obs["sum_sx"]))
# Standard TDVP time evolution for comparison
te2 = nkx.driver.TDVP(
operator=H1,
variational_state=vs2,
ode_solver=integrator,
qgt=qgt
)
te2.run(T=total_time, obs=obs) Observed Behavior:For several n_samples that are powers of two, the observables from TDVPSchmitt suddenly becomes nan values:
What is very strange to me, however, is that if we instead use n_samples incremented by 1 (
The standard To be clear, this does not happen for all I think the issue is related to the Hamiltonian in this case. We do not get this behaviour with a regular Context/env:
|
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 10 replies
-
Oh, wow! Thank you for the clear reproducible example! I tried to modify it by stopping it as soon as we get a nan, by inserting def stopmecb(step, logdata, driver):
if driver._dw is None:
return True
dw , _ = nk.jax.tree_ravel(driver._dw)
return not bool(jnp.any(jnp.isnan(dw)))
...
te1.run(T=total_time, obs=obs, callback = stopmecb) which will stop when the update is NaN. Then we can check the eigenvalues... e,s=jnp.linalg.eigh(te1._S.to_dense())
print(e) and I see that there is reliably a numerical zero (1e-17). I then tried to run manually the algorithm in by hand with the variational state and samples at that point, and I saw thatrho gets a zero
>>> rho
Array([ 0.00000000e+00+0.j, -5.55111512e-17+0.j, -1.54679608e+00+0.j,
3.21964677e-15+0.j], dtype=complex128) so snr becomes nan >>> snr
Array([ nan, 1.36241825e-13, 1.59092599e+03, 3.78584085e-12], dtype=float64) So I guess it might be that we have to sanitise |
Beta Was this translation helpful? Give feedback.
-
Marking as answered since the reason behind the unexpected behaviour was found and explained. @PhilipVinc let me know if you want me to open an issue or PR attempt :) |
Beta Was this translation helpful? Give feedback.
@Daniel-Haas-B , in the case I was checking above, the E_loc is not nan: