-
-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expanding nisprog to CAN? #20
Comments
Looks like again, things would be easier on linux, thanks to socketcan, and this recently merged ISO-TP support : https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e057dd3fc20ffb3d7f150af46542a51b59b90127 socketcan already supports a lot of different CAN hardware natively. But there is nothing even remotely similar on win* . Interesting that the ELM327 would be discontinued. That's another nail in the coffin as far as I'm concerned; you would be entirely on your own to work and test that. For fun, go look at what was required just for iso9141/14230 : https://github.com/fenugrec/freediag/blob/master/scantool/diag_l0_elm.c I'm vaguely familiar with macchina; if it does provide a J2534 API, then one doesn't need to know rust at all ? (assuming they produce a DLL or some kind of C bindings) When I was working on nisprog more intensively, what I had in mind for future development was for part of freediag to become a J2534 driver (i.e. use a cheap USB-K line cable or what have you, and provide a J2534 API), and rewrite nisprog to use J2534 as a backend - either provided by this new freediag-j2534 layer, or any other commercial j2534 hardware and their driver. As I understand, reflashing a CAN ECU (whether Nis or Sub) is quite a different process vs K-line , then there may be no advantage to reusing nisprog ? i.e. on Nissan CAN doesn't need a kernel, and Romraider apparently now can do it (with J2534 hardware). Plus, it's not like the CLI interface is a very popular feature : ) |
After a brief glance, it looks similar. Here is the key part of the RR code from J2534NCS (for Nissan) which, as you say, relies on a J2534 cable. There appears to be no reliance on a kernel.
The Subaru ROM I'm looking at has a CAN method that appears to be very similar to the K-Line approach, it's just that data goes in/out via CAN MBs. And a kernel is still required. I stumbled across WSL. Maybe that is a way to get a SocketCAN based solution working on Windows... ? https://www.reddit.com/r/CarHacking/comments/ot3gjf/socketcancanutils_on_windows/ and https://github.com/microsoft/WSL/issues/5533
My knowledge in this space is pretty thin, but is it actually possible to use a USB-K Line cable to do CAN comms? I had assumed the physical differences (eg) voltage levels would make it 'impossible'? |
Hm interesting. You're certain it's not just copying some of its builtin funcs to RAM then jumping to them ?
It sounded like it was incomplete (no USB-CAN support) and/or very experimental ? I wouldn't know, I'm not on win* anymore. Either that or a regular VM and minimal linux distro might work... that means a lot of downloading for the initial install, but most people maybe wouldn't care.
No sorry, I was just talking about what I would have used for regular K-line ECUs . Until recently I never really considered adding CAN reflash to nisprog. But there is some J2534 hardware that does both K and CAN (same API). "Like the one I never finished designing"... sigh I guess my points are :
None of this is insurmountable but there are no easy solutions. And for some of them, lke I said, you would be on your own - ELM I am not touching; anything on WSL / native win* I am unable to test. |
After some digging and thinking, I think I will have a go using this fully open source option: https://fischl.de/usbtin/. It's nice that it's fully open source... USBtin support might also be a future development for RR which would then allow Nissan CAN flashing without an (expensive) J2534 cable.
Yes, it still requires a kernel. There is none of the encryption / decryption, but it retains the basic process of load a kernel into RAM, check the checksum, and then jump into the kernel. Step 1 will be to write java code for USBtin to establish CAN comms with the ECU and enter the CAN bootloader. Step 2 will be to rejig npkern RX/TX for CAN comms. Pretty sure I'll be doing lots of recycle in Step 1 and 2! |
I admire your bravery - not much of a java guy myself.
The main challenge IMO will be either implementing a minimal subset of iso15765 / iso-TP to be able to transfer multi-frame packets. Either that, or at least hopefully another existing simple protocol; else inventing a custom protocol with sequencing+reconstruction etc (gross). cmd_parser.c is fairly hardcoded for serial comms but some of it could be split out - probably move all the command handlers and message / checksum builders to common code, and have entirely separate "comms parsers" for SCI or CAN - there's just too many differences to handle in one generic tree. |
Sadly neither am I. There are some open-source C# or python libraries shown on the USBtin webpage that others have done. But I thought I'd give java a go seeing as it might help with any future integration with RR. I'm open to other approaches.... bravery and stupidity are often two sides of the same coin!
It looks like the SH705x does most of the low level protocol stuff itself. The ECU code simply copies data to the relevant CAN mailbox and sets a register to TX the message (and the converse to RX a message). But, yes, a multi-frame outer loop of some kind will be required. |
Talk to dschultz before writing any java ... he should be able to direct you to the right areas of RR to look at - if he hasn't already secretely started working on usbtin support...
Well of course it gives you mailboxes, hardware filters etc, but you still need to manage frame ordering; bus off , no-ack and other error conditions ... the HCAN peripheral is fairly complex but all you're going to get from it is still just raw 8-byte frames. |
A little while in the making, but fully open-source CAN access to Subaru ECUs should now be possible using the open-source USBtin. See https://github.com/rimwall/USBtinSubProg/tree/USBtinSubProg_test and https://github.com/rimwall/npkern/tree/ssm_can_test I have successfully done test dumps and writes to my bench ECU (Subaru 7058). Looking at ROM dumps from 2005 through to 2019, this should work for most (all?) Subaru CAN ECUs. The kernel is derived from npkern (full credit to you). CAN comms are much more limited - I have cut down the scope of the comms to stumble across the line for providing minimal dump/flash capability. Almost all the changes are in cmd_parser.c A couple of things I wonder:
|
Good work. I've only skimmed the code so far, I'll have more questions later, but for a start
|
The way the HCAN periph works is that you assign a CAN ID to a mailbox. Having done that, all messages with that ID are sent to that mailbox. During initialization, the ROM sets up only two IDs - CAN ID 0x21 on MB1_0 for TX/RX and CAN ID 0xFFFFE on MB0_0 for RX only. The ROM disables all others, although the other MBs wouldn't get involved unless they somehow got assigned the same CAN IDs.
Not yet, that's next on the list, I'm just waiting for another OBD plug and pigtail to arrive. Although I don't think it will be a problem. The entry command uses CAN ID 0xFFFFE which will override every other CAN ID. And once in the bootloader, all normal CAN traffic stops. I can see this on my bench ECU because it pumps out many CAN messages prior to bootloader entry, but nil thereafter.
Hmmm, good question, I will try it and see. The Java error handling is currently rudimentary, this whole 'Exception' thing is new to me. One problem is that the 'catch' block can't seem to access the instance of the USBtin struct to shut down the connections. And if I try to make the instance of the USBtin struct global (like the other globals), the Java compiler complains. So I have to physically disconnect / reconnect at the USB port whenever there is an Exception. My hope is this will all get improved when integrated into RR. |
Tested it in a 2016 Subaru by simply poking some wires into the OBD connector. Suffered from losing some packets, perhaps because wiring to the ECU is now longer or perhaps the wires provide a poor connection. Could still load the kernel by slightly increasing the wait per packet, but kept losing a few packets in a 1MB ROM dump, even with the kernel idling longer between packets. I am going to try doing the data transfer the same way the flash loading is done (using pure data packets), and may also need to break it into smaller chunks and test checksum after each chunk. The good news is the kernel load / run process worked just fine. |
1MB dump successful in a car. Had to break it down into 256 byte chunks and repeat chunks with missing packets. There was about 12 missed packets in 0.5MB. Not sure what causes the missed packets. I have no missed packets when dumping on the bench, so it must be something to do with an 'in car' situation. I guess the ECU is quiet (aside from the kernel) so it may be due to all the other components on the CAN bus (eg: ABS, TCU, displays etc). The kernel has a very low CAN priority (inherited from the ECU ROM initialization), so perhaps if I make the kernel's priority very high the problem will go away. |
well that CANID is actually the second most low-priority ID, but if it has its own mailbox that should be fine.
Which direction are you seeing packet loss ?
I think you should be doing the memcpy before touching the RXPR bit (which essentially marks the mailbox as "emptied" if I understand right) Does
I think this test won't do what you want, since |
Also,
I think you should get rid of that call to |
Thanks again for the feedback / help!
Ah, yes. High ID = Low priority. I had it the wrong way round. I guess it's still possible that ID 0x21 is occasionally getting overruled by a lower ID. That theory is supported by the lost packets only occurring in the car.
RX from ECU to USBtinSubProg TX from USBtinSubProg to ECU
I based this off the bootloader. And, yes, it also puzzled me it's done in this order. I didn't want to try a different order until other bugs were ironed out.
Ah, good pick-up. I will change this so the continue is triggered for values of 0 or -1. I don't think this bug was the source of the above problems.
I'm not sure, this isn't communicated in any way to the java application. I can do so if it's important? |
Having a lower priority doesn't mean a packet will get lost - just means whoever is sending it will have to retry after losing arbitration. CAN hardware usually does this automatically unless specifically configured not to - there is a flag in HCAN for this, maybe check if it's set properly ?
Hm. Do you have another can device you could hook up to sniff the bus to make sure those dropped packets are really being sent on the bus ?
You could do what higher-level protocols like iso15765 or canopen do, and put a sequence number in each frame (or a few bits of the can ID), and every N frame require an acknowledge from the receiver before continuing. Btw, what is that protocol you implemented ?
A more robust structure would be to IRQ on CAN mailbox RX, copy the frame to a FIFO from there, and drain that FIFO from cmd_loop .
What bootloader ? This really looks wrong.
Well if there's an RX overrun, you can be certain that at least one frame has been lost . Could be a problem in some cases, like receiving a block of data ? |
ok, checked the flag. DART is not set. So it should retransmit.
I only have the USBtin. Before I put in can_idle() I did get the kernel running in the ECU, terminate SubProg, start USBtinViewer and issued a dump command using USBtinViewer. I got the same result (ie) first 20-30 packets received ok and then some lost packets after that. The can_tx8bytes() function is so simple it must be TXing the data. So, it's not getting RX'd for some unknown reason.
The new approach of transferring in 256 byte chunks is a little like this (see repo). I haven't added a frame number because that would reduce the number of data bytes I can send per frame. The new approach can handle these lost frames.
The first two bytes duplicate how the bootloader works. After that, it's all made up.
Mmm, ok. There was a FIFO buffer in the bootloader, but the RX didn't work on interrupts (RX was checked each loop cycle). So, I couldn't see how a FIFO buffer was of any use when 1 packet is being RXd and processed at a time.
The bootloader in the ECU ROM. I agree it looks weird, although it's been like that in Subaru ROMs since 2005. Here's the assembly. r14 has the value of 1, and 0xFFFFD042 (RXPR0) is definitely set before the data is copied from the MB.
So, a few things to try:
|
Ok, I don't follow. You're saying there's a CAN bootloader in the subaru ROMs ? what is its purpose if not for reflashing ?
Interesting. FIFO helps when you expect to receive multiple frames in a burst and you know you won't have time to process all of them - so you try to empty the mailbox as fast as possible. If the host is capable of sending data way faster than your code can process, then you need some higher-level flow control (like your 256-byte chunks maybe) to force the sender to wait before sending more.
Found this : Unlikely to be causing all your problems, but should be addressed. Also, in the 7058 DS, figure 17.11, has a flowchart that confirms you should read the mailbox before writing 1 to RXPRn.
If you decide to go that route, I can recommend https://github.com/EliasOenal/etools . Here's a trimmed-down example, from another project I'm working on :
|
Also found a snippet of code in a SH7047 appnote :
|
Yes, there is a very minimal CAN based bootloader in many (all?) Subaru ECU ROMs (in the space up to 0x1000). It is for loading, checking and jumping to a RAM located kernel. No flashing, that must be done by the kernel. Thanks for the other info. I’ll work on updates and trial them over the next few days. |
ok, I updated the TX and RX functions...
...and I still needed can_idle() for it to work. And then, I tried some other things, reversed them, reconnected my alligator clips, and it worked without can_idle(). So now I'm wondering if part of the issue is connection quality from alligator clips. The OBD plug arrives soon, so then I'll be able to use solder connections. This is all bench testing. Haven't tested in a car yet. I can't figure out why the ROM would have a FIFO queue in conjunction with a simple message processing loop. If the loop is activated every time a message is received, and that message is immediately processed in the loop, then I can't see how any received messages would ever be queued. Unless I'm missing something... If I were to implement interrupt based RX, given kernel uploading / dumping / flash uploading are usually many hundreds of packets, I'm not sure how a queue would help, unless it was enormous...? |
Probably should clear RXPR too before returning -1 ?
I haven't seen their bootloader, but on a 500kbps link it takes about 220us to transmit an 8-byte frame (45 bits overhead + DLC * 8, ignoring bit-stuffing), so that if any part of the processing loop takes more than 2*220us, then one frame will occupy the MB but the next will be dropped. Depending on how the protocol is designed (i.e. is there flow control for the slow operations), this may or may not be necessary. For example, if host triggers Again, that's where checking the UMSR bit is useful - the hardware will tell you if it dropped a frame. Why not have it set a flag that you can check with some other command, or a RAM read, or as a special error return code etc ? For the K-line comms, I get away without a fifo because
True, maybe (hopefully) you don't need a FIFO. Only way to be sure is check UMSR - if it's sometimes set, then you definitely have problems. If not, it's no guarantee but maybe you'll be ok. |
Yes, agreed, thanks, done.
If you want to have a look, see attached. This is the first time I have tried to extract a portion of a Ghidra file, so hopefully this works. There are 3 files:
I have added another error code for UMSR (see below), but it is never triggered. Even when I remove wait() from the host program so it pushes data too quickly for the kernel to cope. Instead I get other errors (eg) checksum errors. So, for whatever reason, it seems UMSR is never triggered.
|
I have been working through the ROM on the CAN comms for kernel upload and RAM Jump, so am ready to see if I can communicate with the ECU over CAN. I've been looking into options to expand nisprog to CAN. It will be a steep learning curve, but I'm hoping I can do the bulk of the work. Interested in your thoughts on these options...?
Option 1: Expand on the ELM327 capability already in freediag. The company ELM shuts down end of this month, so I imagine only clones will be available. Advantages: cheap (clone cables are ~$20), nisprog already has some elm code. Disadvantages: variability of ELM clones, CAN compliant only, not full J2534 (although only CAN is required to communicate with the ECU)
Option 2: Expand on the ELM327 capability already in freediag with the target hardware being STTN2230 based https://www.scantool.net/scantool/downloads/98/stn1100-frpm.pdf. Apparently compatible with ELM327. Advantages: relatively cheap (OBDLink SX USB to OBD cable is ~$50), possibly the same coding in nisprog as required for Option 1 meaning it could work for ELM clones (unreliably) and STTN2230 (reliably). Disadvantages: CAN compliant only, not full J2534 (although only CAN is required to communicate with the ECU)
Option 3: Utilise J2534 code from OpenVehicleDiag https://github.com/rnd-ash/Macchina-J2534 in combination with an open-source M2 https://www.macchina.cc/catalog. Advantages: J2534 compliant (mostly?). Disadvantages: no familiarity with rust (for me), CAN compliance is "partial", more expensive than above options but cheaper than most J2534 solutions, integration with nisprog may be difficult(?)
The text was updated successfully, but these errors were encountered: