Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Low battery voltage if wifi disconnected. #220

Open
Linusten opened this issue Jun 19, 2023 · 22 comments
Open

Low battery voltage if wifi disconnected. #220

Linusten opened this issue Jun 19, 2023 · 22 comments

Comments

@Linusten
Copy link

Describe the bug
If the wifi connection is lost the controller reports to my victron that the battery voltage is low.

image

Hardware/Software Versions
Controller version (from PCB): 4.5
Host name: DIYBMS-009DAA84
Processor: ESP32
Version: 5b7135f8127c6fd9d5d18525f7d5de72a32b4232
Compiled: 2023-04-17T08:23:56.977Z
Language: en
SDK Version: v4.4.4
Min free Heap: 59820
Free heap: 109184
Heap size: 293796

To Reproduce
Steps to reproduce the behavior:
Break the wifi connection while the controller is running.

@stuartpittaway
Copy link
Owner

Hello, this is a strange one!

I can't recreate this issue on my test rig. As you would expect, the WIFI code has nothing to do with the communication or alarm monitoring over the CANBUS to Victron.

How are you powering the controller? Is this directly from the battery?

How do you test for lost wifi - do you simply switch the router off?

Can you capture the text/log output of the USB serial port on the ESP32 ?

@JochenSchmidt
Copy link

Does MQTT still run over Wifi to IOBroker or something?
Which Cerbo and Multiplus firmware is on it?
What is set in the Cerbo, which voltage sensor is used? Have you set everything to visible in the Cerbo under Settings - System setup - Battery measurement?
Then you could check in the VRM which sensor is triggering the alarm. In the VRM you can also see the origin of the error (e.g.
VE.Bus System [276]). 276 would be the MP's 512 comes from the diyBMS and if you have a Victron shunt, then the 279.

@Linusten
Copy link
Author

Thanks @JochenSchmidt for the nice hints :)
In VRM i can see that the Error is beeing raised by [276]

VE.Bus System [276] | Automatic monitoring | Low battery: Alarm

Which is very strange because the battery was never below 60%...

@JochenSchmidt
Copy link

@Linusten
Ok, then the cause of the alarm comes from the VE.Bus system (MP). One of the MPs thinks the Voltage is gone for a moment.
I've had this before - very rarely and I have no idea why this happens - but not because I disconnect the WiFi connection between the diyBMS and the repeater. Simply that way.
Until a few days ago I had version V501 (3x Quattro II) on the Quattros and version 3.00~18 (beta version) on the Cerbo.
Now V505 on the Quattros and 3.00 on the Cerbo.
I'll test it with WiFi off. How did you do that? Simply pull the plug on the router/repeater, or how?

@JochenSchmidt
Copy link

@Linusten
Please have a look at Issue #225
I made some tests - after switching WLAN off nothing were bad.
After WLAN switching on, the system switches completly off and the diyBMS had a system error. I had to start it manually by pressing the left button.

@Linusten
Copy link
Author

Linusten commented Jul 7, 2023

I am Testing the newest commit, will update you if the error occurs.

-> https://github.com/stuartpittaway/diyBMSv4ESP32/actions/runs/5476905297

@virtuvas
Copy link

virtuvas commented Jul 18, 2023

@Linusten Ok, then the cause of the alarm comes from the VE.Bus system (MP). One of the MPs thinks the Voltage is gone for a moment. I've had this before - very rarely and I have no idea why this happens - but not because I disconnect the WiFi connection between the diyBMS and the repeater. Simply that way. Until a few days ago I had version V501 (3x Quattro II) on the Quattros and version 3.00~18 (beta version) on the Cerbo. Now V505 on the Quattros and 3.00 on the Cerbo. I'll test it with WiFi off. How did you do that? Simply pull the plug on the router/repeater, or how?

+1 on @JochenSchmidt comment!

system: multiplus II (3000/24, firmware v5.02) connected via MK2 to rpi 3B+ running Venus large pre 3.00 betas and diyBMS on canbus to rpi. Also a Victron MPPT and 600W solar connected to the rpi.
things were almost ok, I would get the occasional low battery having to reboot the lot.

A few months ago, upgraded diyBMS to latest and rpi to VenusOS 3.00 release Large
System would crash V.often leaving me with a system that needed a full shutdown/reboot and a second on the rpi (all were powered together via one dropper) - sorry didn't occur to me to do a ESP32 reboot via the left button, so haven't tried that.

Couple of days ago, upgraded multiplus II from 5.02 to 5.05 firmware. No locking of diyBMS or anything getting offline and messing settings with no BMS found since!
I only managed once to induce a low battery (again from the multiplus) but didn't affect the rest and only lasted for 3-4secs before re-establishing connection and keep on working. At the moment it was drawing 17A from the generator, charging the 20%SoC bank and at the same time providing 2kw to the watermaker.
Unplugging the boat router and leaving it offline for 5-10mins didn't affect the system either, all kept on working fine, no complains from the multiplus.

So if you face such issues, I'd highly recommend getting the multiplus (and any other Victron devices) firmware updated PDQ! (I vaguely remember some months back someone on victron site mentioning that you MUST upgrade multi firmware if you go to full release 3.0 venusOS, and I guess they were right...)

cheers

V.

PS. not using MQTT so not experiencing anything like #225

@bertvaneyken
Copy link
Contributor

Hello all, today was the 2nd time I experienced something similar.

The internet connection failed today while I was in a Teams call, (routing issue at the provider this time)
6 minutes later we lost power to the house (router/modem/wifi on small UPS) and I saw on the Victron GX these errors:

  • Internal Failure
  • Low Battery Voltage

Power restored and failed multiple times when I was observing my setup, until internet was restored and stopped the cycling.
I did not change a thing on my setup..

A few weeks ago they cut a Coax cable in our town, and in hindsight the same happened. The I reset the controller to solve the cycling.
I now also think the CAN-Bus alarm I had previously is linked to loss of Wifi and not to bug in the CAN-bus code.

Since I run the latest Victron firmware the GX now also shows the internal failure notification.
The battery isn't low, the GX just loses communication.
I had also enabled a rule to power a relay while having the Internal BMS Error so I could react before the GX error : the display shows red with "Modules or RS-485 error"

I have 14 modules, some rules defined an SD-card (60s logging) and MQTT enabled.
Maybe MQTT is the culprit, I didn't have time yet to study the code.

Cheers,
Bert

Screenshot 2023-09-07 111606

@stuartpittaway
Copy link
Owner

Hi @bertvaneyken thanks for taking the time to report the issue.

We've seen this problem on a few installations now, some of it appears to be bugs in the Victron software, but I also agree that DIYBMS MQTT interface appears to add to the problem.

The DIYBMS reported to Victron the "internal failure" - this typically only happens when the modules stop responding to the controller. During a power cut, or when the power is going on/off/on/off very quickly, I've seen the symptoms of "power spikes" affecting the DC battery and the modules.

Perhaps this could also be seen in your system?

@bertvaneyken
Copy link
Contributor

bertvaneyken commented Jan 8, 2024

Hi Stuart, I'm still struggling with this.

Looking for a bright idea here after enjoying myself with debugging...

  • installed a new accesspoint closer by, RSSI is now -54 dBm
  • disabled MQTT entirely, it makes no statistical difference
  • upgraded to 2023-12-27 which had the effect of no Internal Failures anymore but now it reboots more than hourly instead of daily.
  • unplugged the grid (completely offgrid), it makes no statistical difference
  • I can't correlate power usage peaks with reboots
  • I can't read voltage spikes with a multimeter when connected directly to the inverter input
  • tried 2 times to catch serial debug, never got lucky. I cannot debug more then 3hrs as I can't use mains to charge my laptop. (I use the INA229 add-on board) . It looks like a cold reboot will keep it stable for longer.
  • replaced the 5V PSU with a new one (I use 2 PSU's one from 48v to 12v and one from the 12v rail to 5v)
  • the 12v environmental system (fan - small heating) is unplugged
  • the running LED on the cell boards looks like it only stops after the controller reboots

To rule out a hardware issue I ordered a new ESP32 (with external antenna) and some missing chips to build a second controller on a spare v4.2 board.

As far as I can see in the code it makes no sense the controller reboots after reporting Low battery over CAN.
Maybe the Victron concludes this after losing CAN connection?

I just completely disabled the Current & Voltage monitoring as a last attempt.

I hope it is something obvious when I replace the controller :)

2024-01-08 20_52_33-VRM Portal - Victron Energy - Mobiele Zonnepanelen - VRM Portal

2024-01-08 21_45_31-VRM Portal - Victron Energy - Mobiele Zonnepanelen - VRM Portal

@stuartpittaway
Copy link
Owner

To rule out a hardware issue I ordered a new ESP32 (with external antenna) and some missing chips to build a second controller on a spare v4.2 board.

As far as I can see in the code it makes no sense the controller reboots after reporting Low battery over CAN.

The DIYBMS controller should NEVER reboot unintentionally.

Can I ask you to provide the initial serial debug output when the ESP32 is power up? I'm wondering if the ESP32 is a particular hardware revision which is causing problems. If you have ordered another one it would be a good test.

DIYBMS is reliable - this is a screenshot from my home system, with uptime over 65 days (since I manually rebooted it) and during that time, I've had zero communication issues and over 30 million CANBUS messages.

image

@bertvaneyken
Copy link
Contributor

No doubt it should be stable and reliable :-)

I have never seen CAN errors either.
2024_01_09_12_37_49

I'm building new modules as well with parts I have laying around (4.40) so I can swap out everything.
A hardware issue is the most probable cause IMO.

I did notice the modules do throw some errors (I use the standard baudrate):
2024-01-09 12_23_14-DIY BMS CONTROLLER v4

Logs while running are here, I'll post boot logs tonight. (MQTT was under maintenance in the first part)
diybms20240105.log

@stuartpittaway
Copy link
Owner

Two observations from the logs...

You are getting SD card errors. Might be worth removing it and re-formatting it on a PC.

[127003][E][vfs_api.cpp:332] VFSFileImpl(): fopen(/sd/data_20240105.csv) failed
I (133555) diybms: Cell monitor log file
I (133647) diybms: Task 2
[127232][E][vfs_api.cpp:332] VFSFileImpl(): fopen(/sd/modbus90_20240105.csv) failed

The available memory is dropping over time - this might be related to the SD card problem, this would ultimately force the controller to reboot if the memory gets too low.

D (102980) diybms: total_free_byte=98428 total_allocated_byte=192376 largest_free_blk=59380 min_free_byte=86152 alloc_blk=545 free_blk=12 total_blk=557
I (7855562) diybms: Time now: Fri Jan  5 21:56:24 2024
D (7855562) diybms: total_free_byte=95572 total_allocated_byte=194640 largest_free_blk=49140 min_free_byte=68884 alloc_blk=582 free_blk=24 total_blk=606

@bertvaneyken
Copy link
Contributor

I've just removed the SD-card from the controller and I'll leave it running without now.

Serial output logs of the initialization are here:
putty.log

(the mqtt password was missing while I ran the dump, so that is why it now fails)

@bertvaneyken
Copy link
Contributor

You were right, it seems that the worn out SD-card is the culprit of the crahes.

@stuartpittaway
Copy link
Owner

Wow, I always had my suspicion but never any proof the SD card would cause the problem.

@bertvaneyken
Copy link
Contributor

I'm not 100% sure the SD-card only is at fault but it is way more stable without it.

I finally could catch a spontanous reboot via de serial output.
It looks like a null pointer exception?

I (356427567) diybms-mqtt: MQTT counters: Err_Con=0,Err_Trans=1,Conn=1,Disc=1
I (356427568) diybms: Time now: Thu Feb 1 20:52:39 2024
D (356427568) diybms: total_free_byte=119900 total_allocated_byte=170408 largest_free_blk=77812 min_free_byte=94132 alloc_blk=576 free_blk=16 total_blk=592
D (356427668) diybms: CANBUS received message ID: 305, DLC: 8, flags: 0
D (356427687) diybms: CANBUS received message ID: 307, DLC: 8, flags: 0
D (356428681) diybms: CANBUS received message ID: 305, DLC: 8, flags: 0
D (356428681) diybms: CANBUS received message ID: 307, DLC: 8, flags: 0
E (356428909) esp-tls: [sock=51] select() timeout
E (356428911) TRANSPORT_BASE: Failed to open a new connection: 32774
E (356428911) MQTT_CLIENT: Error transport connect
E (356428914) diybms-mqtt: ERROR_TYPE_TCP (Success)
I (356428918) diybms-mqtt: MQTT_EVENT_DISCONNECTED
D (356429671) diybms: CANBUS received message ID: 305, DLC: 8, flags: 0
D (356429671) diybms: CANBUS received message ID: 307, DLC: 8, flags: 0
I (356429688) diybms: WIFI_EVENT_STA_DISCONNECTED
I (356429790) diybms-mqtt: Stopping MQTT client
W (356429961) diybms-mqtt: MQTT enabled, but not connected
W (356429961) diybms-mqtt: MQTT enabled, but not connected
W (356429962) diybms-mqtt: MQTT enabled, but not connected
D (356430662) diybms: CANBUS received message ID: 305, DLC: 8, flags: 0
D (356430663) diybms: CANBUS received message ID: 307, DLC: 8, flags: 0
I (356431354) diybms: Task 2
D (356431668) diybms: CANBUS received message ID: 305, DLC: 8, flags: 0
D (356431669) diybms: CANBUS received message ID: 307, DLC: 8, flags: 0
D (356432671) diybms: CANBUS received message ID: 305, DLC: 8, flags: 0
D (356432671) diybms: CANBUS received message ID: 307, DLC: 8, flags: 0
D (356433670) diybms: CANBUS received message ID: 305, DLC: 8, flags: 0
D (356433672) diybms: CANBUS received message ID: 307, DLC: 8, flags: 0
D (356434663) diybms: CANBUS received message ID: 305, DLC: 8, flags: 0
D (356434664) diybms: CANBUS received message ID: 307, DLC: 8, flags: 0
W (356434966) diybms-mqtt: MQTT enabled, but not connected
D (356435662) diybms: CANBUS received message ID: 305, DLC: 8, flags: 0
D (356435662) diybms: CANBUS received message ID: 307, DLC: 8, flags: 0
D (356436672) diybms: CANBUS received message ID: 305, DLC: 8, flags: 0
D (356436673) diybms: CANBUS received message ID: 307, DLC: 8, flags: 0
D (356436855) diybms: Task 3, s=0 e=13
D (356437664) diybms: CANBUS received message ID: 305, DLC: 8, flags: 0
D (356437665) diybms: CANBUS received message ID: 307, DLC: 8, flags: 0
D (356438662) diybms: CANBUS received message ID: 305, DLC: 8, flags: 0
D (356438662) diybms: CANBUS received message ID: 307, DLC: 8, flags: 0
D (356439663) diybms: CANBUS received message ID: 305, DLC: 8, flags: 0
D (356439663) diybms: CANBUS received message ID: 307, DLC: 8, flags: 0
W (356439971) diybms-mqtt: MQTT enabled, but not connected
D (356440662) diybms: CANBUS received message ID: 305, DLC: 8, flags: 0
D (356440662) diybms: CANBUS received message ID: 307, DLC: 8, flags: 0
D (356441670) diybms: CANBUS received message ID: 305, DLC: 8, flags: 0
D (356441670) diybms: CANBUS received message ID: 307, DLC: 8, flags: 0
D (356442665) diybms: CANBUS received message ID: 305, DLC: 8, flags: 0
D (356442682) diybms: CANBUS received message ID: 307, DLC: 8, flags: 0
D (356443664) diybms: CANBUS received message ID: 305, DLC: 8, flags: 0
D (356443664) diybms: CANBUS received message ID: 307, DLC: 8, flags: 0
I (356443935) diybms: WIFI connect quick retry 1
Guru Meditation Error: Core 0 panic'ed (LoadProhibited). Exception was unhandled.

Core 0 register dump:
PC : 0x401b579e PS : 0x00060c30 A0 : 0x801b5883 A1 : 0x3ffd8590
A2 : 0x3ffb6328 A3 : 0xffffffff A4 : 0x00000000 A5 : 0xffffffff
A6 : 0x00000000 A7 : 0x3ffe2d9c A8 : 0x3ffda640 A9 : 0x3ffd8500
A10 : 0x00000000 A11 : 0x00000001 A12 : 0x3ffe3b48 A13 : 0x3ffe3b48
A14 : 0x3ffe2d6c A15 : 0x3ffe2da6 SAR : 0x00000004 EXCCAUSE: 0x0000001c
EXCVADDR: 0x00000000 LBEG : 0x4008c0e1 LEND : 0x4008c0f1 LCOUNT : 0xfffffffe

Backtrace: 0x401b579b:0x3ffd8590 0x401b5880:0x3ffd85e0

ELF file SHA256: 31418cd666101b8d

Rebooting...
ets Jun 8 2016 00:22:57

full log:
putty20240203.zip

@stuartpittaway
Copy link
Owner

Ok, I've had another user report similar problems.

@atanisoft
Copy link
Contributor

I finally could catch a spontanous reboot via de serial output.
It looks like a null pointer exception?

Definitely a null dereference, I'm suspecting something with the MQTT client or the http server is causing the crash within the event handler but it is not clear which is at fault.

@stuartpittaway
Copy link
Owner

It looks like a null pointer exception?

Thanks for this. You appear to have MQTT enabled, but its not connected to the MQTT server (MQTT enabled, but not connected) does the crash still occur with MQTT disabled?

@stuartpittaway
Copy link
Owner

stuartpittaway commented Feb 6, 2024

@bertvaneyken
I've started another debug log from my environment - #276

Could you try and re-produce the same test?

I managed to get a core panic - @atanisoft does this still look like a null dereference issue?

I (1759137) diybms: WIFI connect quick retry 1
Guru Meditation Error: Core  0 panic'ed (LoadProhibited). Exception was unhandled.

Core  0 register dump:
PC      : 0x401b5f3e  PS      : 0x00060a30  A0      : 0x801b6023  A1      : 0x3ffd8f00
A2      : 0x3ffb62d4  A3      : 0xffffffff  A4      : 0x00000000  A5      : 0xffffffff  
A6      : 0x00000000  A7      : 0x3ffe3458  A8      : 0x3ffdae70  A9      : 0x3ffd8e70
A10     : 0x00000000  A11     : 0x00000001  A12     : 0x3ffe2928  A13     : 0x3ffe2928  
A14     : 0x3ffe3428  A15     : 0x3ffe3462  SAR     : 0x00000004  EXCCAUSE: 0x0000001c
EXCVADDR: 0x00000000  LBEG    : 0x4008c0e1  LEND    : 0x4008c0f1  LCOUNT  : 0xfffffffe  


Backtrace: 0x401b5f3b:0x3ffd8f00 0x401b6020:0x3ffd8f50

  #0  0x401b5f3b:0x3ffd8f00 in handler_execute at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/esp_event/esp_event.c:139
      (inlined by) esp_event_loop_run at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/esp_event/esp_event.c:590
  #1  0x401b6020:0x3ffd8f50 in esp_event_loop_run_task at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/esp_event/esp_event.c:115 (discriminator 15)  

@bertvaneyken
Copy link
Contributor

Hi Stuart, I disabled MQTT and it didn't reboot since, however it could run for days or weeks in the past, so no real proof there.
I also checked my other MQTT sending devices and they appear to have kept sending data during the time the BMS restarted.

Tonight I re-enabled MQTT and did the following tests:

  • stopped the MQTT service on my Azure server for 15 minutes
  • unplugged the access point servicing the BMS for 45 minutes
  • disabled the WiFi radio on the AP for 15 minutes

None of this provoked an issue... so i'm not sure what the direct cause would be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants