Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about CUSBEndpoint::SkipPID(unsigned int, bool) #519

Closed
davefilip opened this issue Dec 9, 2024 · 12 comments
Closed

Question about CUSBEndpoint::SkipPID(unsigned int, bool) #519

davefilip opened this issue Dec 9, 2024 · 12 comments

Comments

@davefilip
Copy link

Rene,

More of a question than necessarily a problem with Circle.

I have spent a few days trying to track down a problem I'm having with a Synchronous exception. The PC seems to always be in this function: CUSBEndpoint::SkipPID(unsigned int, bool)

And only when using the keyboard on a RPi3. As I add and remove code, the PC value changes a bit, but it always seems to be in this function (usually first 1 - 5 instructions).

Can you please advise as to what that function does, which might (or might not) help me track down the problem?

It seems to be crashing when opening a socket while using the USB keyboard. If I perform the same functions from the network (telnet server), then I can't replicate the problem no matter how hard I try.

Thanks for any information you can provide that might possibly be helpful. I know this is a long shot, but I am running out of ideas.

Cheers,

Dave.

@davefilip
Copy link
Author

Rene,

I may have a possible work-around, but with the loss of some functionality.

I removed the following code from my keyboard input loop and added it to execute once at the top of kernel.cpp:

m_pKeyboard = (CUSBKeyboardDevice *) m_DeviceNameService.GetDevice ("ukbd1", FALSE);

if (m_pKeyboard != 0) {
    m_pKeyboard->RegisterRemovedHandler(KeyboardRemovedHandler);
    m_pKeyboard->RegisterShutdownHandler(ShutdownHandler);
    m_pKeyboard->RegisterKeyPressedHandler(KeyPressedHandler);
}

I believe this may have solved my keyboard problem. I believe I got this code fragment from one of your examples, and it hasn't changed since the Summer. I don't believe the input loop can ever be executed simultaneously from multiple Tasks, but while check the code to be sure.

Of course, doing this means that the keyboard has to be plugged in when my Circle-based OS boots, and it can't be removed and re-inserted. But that is not something that I need to do often.

I tried this on a hunch, given that the crashes always seemed to be in the USB management functions (you may remember I had a similar earlier problem with RPi 4 + Keyboard).

Regards,

Dave.

@rsta2
Copy link
Owner

rsta2 commented Dec 10, 2024 via email

@davefilip
Copy link
Author

davefilip commented Dec 10, 2024 via email

@rsta2
Copy link
Owner

rsta2 commented Dec 10, 2024 via email

@davefilip
Copy link
Author

davefilip commented Dec 11, 2024 via email

@rsta2
Copy link
Owner

rsta2 commented Dec 11, 2024 via email

@davefilip
Copy link
Author

Rene,

Replying here, so that it is properly documented, and available to others who might run into a similar problem.

Thank you so much for the detailed debugging, and a full explanation. This all makes sense now, and would also explain my problem on the RPi4 as well. I believe the difference between then and now is that I recently ran into a problem with the (previous) 1 MB buffer being too small, so I took the easy way out and just increased it to 2 MB. My thinking was that my operating system is using so little memory (< 100 MB with everything compiled in and running, compared to a full Linux system which minimally boots in at more than >300 MB, even before running any user programs), that allocating a 2 MB data buffer should not be a problem.

My bad, of course, is that I was not thinking about the stack size. Just curious, though, should something else have detected exceeding the stack size? Without running through a debugger -- although that may always be the right answer -- is there any other way that this should have been caught?

I will admit that I am still getting my brain to think properly about things like memory management, which is handled differently in C/C++ than Java, which I had been programming almost exclusively for the last 25+ years (as you may already know, Java has a garbage collector that recovers memory from any object that is no longer referenced). So with C/C++ I am always worried about allocating memory with 'new' or malloc(), and then having to remember to deallocate it later, otherwise run the risk of a memory leak (which is the biggest problem with most operating systems). Easy enough when everything works, but having to make sure that every possible error condition also frees any allocated buffers before returning the error to the user.

Nonetheless, thanks for showing me the problem, and suggesting two possible ways to fix it. I will try to keep in mind the stack size for local variables going forward.

Cheers,

Dave.

============================================================
From: Rene Stange [email protected]
Subject: Re: colorOS Crashing
Date: December 14, 2024 at 11:42:19 AM EST
To: David Filip [email protected]

David,

I was able to reproduce the problem and I think I have found the reason.

My setup were a RPi 3B and a 3A+ connected to the WLAN and HDMI display/USB
keyboard connected to the 3A+. I started the cluster with "message connect" on
the 3B via telnet and did the same via the USB keyboard on the 3A+. After the
cluster appears, I entered "remote rpi3 dir" (rpi3 is the node name of the 3B)
on the 3A+ via the keyboard. This resulted in an Synchronous exception.

Then I started everything again with the JTAG debugger connected to the 3A+
and everything again like above. Please see the attached debug.log file. My
comments are starting with ## .

It turned out, that the method CSocket:Receive() overwrites some parts of the
kernel image code by calling memcpy() with a destination pointer, which points
into the text segment. This pointer has handed over by the method
COSRemote::GetResponse(). When the overwritten code is executed again by the
USB driver, the exception occurs.

How is this possible? The kernel stack by default is only 128KBytes in size.
(Tasks have only a stack of 32KBytes.) So with this definition in remote.h

#define REMOTE_MAX_RESPONSE_SIZE 2097152

the Reponse buffer, which is allocated on the stack as a local variable, is
much too big for the stack. Because the stack expands to lower addresses and
it starts at 0x2A0000 (the stack top) the stack is inside the text segment.
See circle/doc/memorymap.txt for the memory map.

How to solve this? You could increase the kernel stack size by defining

DEFINE += -DKERNEL_STACK_SIZE=0x400000

in Config.mk (for a 4 MBytes stack for example, must be a multiple of
16KBytes), but I don't know, if this is enough, because I cannot check all the
code. I think a better solution is, to allocate such large buffers on the heap
with the "new" operator.

Regards,

Rene

@davefilip
Copy link
Author

Rene,

Sorry, for clarification, if I am gong to go the 'easy' route of increasing the stack side to 4 MB (yes, the 2 MB buffer in the REMOTE command is by far the biggest anywhere in the code I've written), and I am gong to add:

DEFINE += -DKERNEL_STACK_SIZE=0x400000

to Config.mk, and I am building and linking against Circle + Stephan's stdlib/newlib, do I need to increase the stack size in both:

./circle-stdlib/libs/circle/Config.mk
./circle-stdlib/Config.mk

as well as in my own personal build:

./colorOS/Config.mk

Or only my own personal build (compile and link using the standard Circle build scripts)?

I'm thinking that it is only necessary in my own compile and link, but before I get myself into any more trouble, do I also need the increased stack size when building the Circle and stdlib/newlib libraries? Or is that not necessary?

Cheers,

Dave.

@davefilip
Copy link
Author

Fixed by adding:

DEFINE += -DKERNEL_STACK_SIZE=0x400000

to all instances of Config.mk and rebuilding. I will consider refactoring code to use new / malloc for large buffers at a later date (which I current do for GET URL (http/https) and GET FILE (tftp), but not REMOTE).

@rsta2
Copy link
Owner

rsta2 commented Dec 15, 2024

Dave,

you are welcome. I had a look, if some stack checking could be implemented, but this would require support from the GNU compiler. There is the option -fstack-check, but it can only applied with a fixed address of the stack bottom, which is not sufficient, because Circle uses multiple stacks for the different cores and exception levels. So this cannot be easily implemented.

The normal way to apply the KERNEL_MAX_SIZE= option is by using the -o option, when calling configure in circle-stdlib. But if you want to add it manually, adding it to all three config files cannot be wrong. ;)

Rene

@davefilip
Copy link
Author

davefilip commented Dec 15, 2024 via email

@rsta2
Copy link
Owner

rsta2 commented Dec 15, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants