Fix pty close behavior, signals at exit #2387

cgull · 2024-05-11T03:40:48Z

There's two big things in here.

When running mosh --local 0, I found that mosh-server would never terminate on iSH. It was waiting for the master side of its pty to close. Most applications that use ptys use other methods to detect child termination, but mosh-server expects that the pty master will act as a closed file after the last program on the pty slave exits-- any waiting reads should return immediately with 0 bytes, and further read calls should return EOF. This is what pipes and sockets do, of course. iSH master ptys stayed in the read call forever, even though the pty slave was gone. This fixes that and ends up better separating generic tty code from pty slaves from pty masters too.

dtach behaves similarly and the fix helps it too.

After I fixed that, I found that CLI ish would exit with signal 9 when mosh-server exited. This is because the README recommends to use /bin/login as the starting process. Unfortunately, that means /bin/login runs as pid 1 (init) and is notified when orphaned processes terminate (which mosh-server is, because it double forks). /bin/login has never been intended as a process group leader or init process, it expects to start and wait() for exactly 1 child, usually an interactive shell.

But running ish -f alpine /bin/sh didn't work correctly either, when the initial shell exited iSH would exit on signal 9. (Or something, this was a couple of months ago and I'm forgetting details. Maybe it was still when mosh-server exited, or when a subshell exited.) That turned out to be a nasty problem with halt_system(). This code runs when pid 1 exits. This code uses pthread_kill() to send a SIGKILL to threads running the emulated process group to kill off emulated processes in iSH. Uhh... that doesn't work. SIGKILL always nukes the entire process, not a single thread.

After I changed that code to do a more kernel-like thing of using deliver_signal() to send first SIGTERM, then SIGKILL, and then finally pthread_kill() with SIGTERM, iSH behaves a lot better about emulated process termination, and exits cleanly (rather than with SIGKILL) when pid 1 terminates.

I changed the README to tell people to use /bin/sh as the starting process for CLI ish, because it can work as init/pid 1 and ignores termination of orphaned processes. iSH.app should probably change to do the same.

I've not been able to build iSH.app in XCode, so all of this is untested there, but I believe it will help there-- the app won't quit when the initial shell exits.

After fixing this and then finding/fixing the procfs problem in #2386, I ran into yet another bug in iSH, I forget what it was. At that point I gave up chasing bugs in iSH.

cgull · 2024-05-11T17:30:47Z

Tabs removed.

tbodt

Thanks for working on this!

tbodt · 2024-05-11T18:17:49Z

kernel/log.c

@@ -154,6 +154,10 @@ static void log_line(const char *line) {
 static void log_line(const char *line) {
    os_log_fault(OS_LOG_DEFAULT, "%s", line);
 }
+#elif LOG_HANDLER_STDERR


I'm unclear on the utility of this since the program in ish may also be writing to stderr, but if you found it useful it doesn't hurt to have.

If iSH is running interactively (stdin and stdout are on a tty), then emulated programs are running with both stdout and stderr on the iSH tty driver. (This will be true for iSH.app as well, but writing to stderr might be less useful there.)

I hadn't even thought of CLI iSH supporting non-interactive use but yes, it does support that, and logging to stderr would be inappropriate then.

tbodt · 2024-05-11T18:23:04Z

fs/pty.c

@@ -51,6 +69,13 @@ static int pty_slave_open(struct tty *tty) {
    return 0;
 }

+static int pty_slave_close(struct tty *tty) {
+    if (tty->refcount - 1 == (tty->session ? 2 : 1)) {


This looks like a really tricky bit. Could you add a comment?

tbodt · 2024-05-11T18:23:29Z

fs/tty.c

@@ -110,16 +110,7 @@ void tty_release(struct tty *tty) {
        cond_destroy(&tty->produced);
        free(tty);
    } else {
-        // bit of a hack


Bravo for cleaning this up

tbodt · 2024-05-11T18:26:58Z

fs/tty.c

@@ -414,7 +408,7 @@ static bool pty_is_half_closed_master(struct tty *tty) {
    struct tty *slave = tty->pty.other;
    // only time one tty lock is nested in another
    lock(&slave->lock);
-    bool half_closed = slave->ever_opened && slave->refcount == 1;
+    bool half_closed = slave->ever_opened && (slave->refcount == 1 || slave->hung_up);


I wonder if the refcount check is still necessary? Would the new pty_slave_close do that?

This is needed when pty_slave_cleanup() runs, I think. It calls pty_hangup_other(), which calls tty_hangup() on the master. That calls tty_input_wakeup() on the master, which may result in that user program waking up in tty_read(), or calling read() again, before the pty slave device instance is fully cleaned up and the refcount decremented. That's what I remember, anyway.

Many pty-using programs use wait() or SIGCHLD to detect child termination, but some expect the pty master to close and return EOF when all children close the slave pty. This did not work on iSH. If the slave pty was closed by the last program exiting, the program on the master would not receive any read/poll wakeups, and the master pty would not be hung up. Reference counting wasn't working properly when the pty was in a process group or session. The fix for that is a bit simplistic, but it's a big improvement. This also improves the tty/pty code separation and fixes some other minor things.

Using SIGKILL on iSh itself was never going to end well. :)

francoisvignon · 2024-05-17T15:11:37Z

Hi. It seem the new behavior suffer an issue: once the client close his side of the pty, no new connection can be issued to the provider without closing and reopening the provider An idea ?

…

____________________ Francois

Le 15 mai 2024 à 18:02, tbodt ***@***.***> a écrit : Merged #2387 <#2387> into master. — Reply to this email directly, view it on GitHub <#2387 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AK466UU3JOVMRSXAS5BBMOTZCOBIFAVCNFSM6AAAAABHRSHQV2VHI2DSMVQWIX3LMV45UABCJFZXG5LFIV3GK3TUJZXXI2LGNFRWC5DJN5XDWMJSHAZDAMRYGEZTIMY>. You are receiving this because you are subscribed to this thread.

tbodt · 2024-05-17T16:14:27Z

@francoisvignon do you have an easy way to reproduce? @cgull could you check on this?

francoisvignon · 2024-05-18T11:59:55Z

here is an easy way to reproduce.

using iSH 669 and :

gcc-11.2.1_git20220219-r2 x86 {gcc} (GPL-2.0-or-later LGPL-2.1-or-later) [installed]
libgcc-11.2.1_git20220219-r2 x86 {gcc} (GPL-2.0-or-later LGPL-2.1-or-later) [installed]
libstdc++-11.2.1_git20220219-r2 x86 {gcc} (GPL-2.0-or-later LGPL-2.1-or-later) [installed]
minicom-2.8-r0 x86 {minicom} (GPL-2.0-or-later) [installed]
musl-dev-1.2.3-r3 x86 {musl} (MIT) [installed]
musl-utils-1.2.3-r3 x86 {musl} (MIT BSD GPL2+) [installed]
make-4.3-r0 x86 {make} (GPL-3.0-or-later) [installed]

source: main.c

#define _GNU_SOURCE
#include <stdio.h>
#include <errno.h>
#include <fcntl.h>
#include <stdlib.h>
#include <unistd.h>
  
int main(int argv, char **argc)
{
	int fd;
	char buf[256];
	fd = posix_openpt(O_RDWR | O_NOCTTY);
	grantpt(fd);
	unlockpt(fd);
	ptsname_r(fd, buf, 256);
	printf("client name %s\n", buf);

	for(;;)
	{
		int l;
		l = read(fd, buf, 256);
		printf("l=%d, errno=%d\n", l, errno);
		sleep(1); // to not have too many traces ;-)
	}
	return(0);
}

compiling by :

make main

executing:

McFv:~/test# ./main 
client name /dev/pts/2
// from other iSH window, connect to /dev/pts/2 by : minicom -D /dev/pts/2
// successfull
// keying CR one and two times
l=1, errno=0
l=2, errno=0
// disconnecting minicom by ^A ^X
l=-1, errno=5
l=0, errno=5
l=0, errno=5
l=0, errno=5
// from other iSH window, trying to reconnect by : minicom -D /dev/pts/2
// no way ...
l=0, errno=5
l=0, errno=5
l=0, errno=5
l=0, errno=s
// killing main
^C
McFv:
// and restarting main
McFv:~/test# ./main
client name /dev/pts/2 
// from other iSH window, connect to /dev/pts/2 by : minicom -D /dev/pts/2
// successfull
// keying CR one time
1=1, errno=0
// disconnecting minicom by ^A ^X
l=-1, errno=5
l=0, errno=5
l=0, errno=5
l=0, errno=5
l=0, errno=5
...

the same test with iSH 614

McFv:~/test# ./main
client name /dev/pts/2
// connect to /dev/pts/2 by : minicom -D /dev/pts/2
// successfull
// keying CR one and two times
l=1, errno=0
l=2, errno=0
// disconnecting minicom by ^A ^X
// read not interrrupted by disconnection, so, as read is blocking ... no new traces
// from other iSH window, trying to reconnect by : minicom -D /dev/pts/2
// successfull
// keying CR one and two times
l=1, errno=0
// killing main
^C
// minicom show /dev/pts/2 is not longer available
McFv:

with 614, client disconnection can be detected by other mean (setting read not blocking and checking errno).

please note, the source is voluntary simple to be the shortest code showing the issue (as example, displaying errno if not error is voluntary to not have too many lines, not setting read as not blocking also, ... ).

pelase note also: MacOs and Debian allows reconnection (like 614) but manage disconnection like 669 (but with a little variation about read returning -1 for debian and 0 for MacOs)

@cgull : I'm available to do more testing. thanks for your worlk

tbodt · 2024-05-18T15:12:51Z

Ah, one bug replaced by another... The behavior on linux is different from either of those.

$ ./main
client name /dev/pts/1
// connect, type stuff
l=1, errno=0
l=3, errno=0
l=1, errno=0
l=6, errno=0
l=6, errno=0
// disconnect, get error 5 repeating
l=-1, errno=5
l=-1, errno=5
l=-1, errno=5
l=-1, errno=5
// reconnect, errors stop (well it still says `errno=5` but since the actual return code was success ignore that)
l=1, errno=5
l=2, errno=5
// disconnect, errors are back
l=-1, errno=5
l=-1, errno=5
l=-1, errno=5
l=-1, errno=5
l=-1, errno=5
l=-1, errno=5
l=-1, errno=5
l=-1, errno=5

francoisvignon · 2024-05-18T15:40:03Z

this test was done with what ?

…

____________________ Francois

Le 18 mai 2024 à 17:13, tbodt ***@***.***> a écrit : Ah, one bug replaced by another... The behavior on linux is different from either of those. $ ./main client name /dev/pts/1 // connect, type stuff l=1, errno=0 l=3, errno=0 l=1, errno=0 l=6, errno=0 l=6, errno=0 // disconnect, get error 5 repeating l=-1, errno=5 l=-1, errno=5 l=-1, errno=5 l=-1, errno=5 // reconnect, errors stop (well it still says `errno=5` but since the actual return code was success ignore that) l=1, errno=5 l=2, errno=5 // disconnect, errors are back l=-1, errno=5 l=-1, errno=5 l=-1, errno=5 l=-1, errno=5 l=-1, errno=5 l=-1, errno=5 l=-1, errno=5 l=-1, errno=5 — Reply to this email directly, view it on GitHub <#2387 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AK466UUL6QH4WHPBJRKGXRDZC5VYTAVCNFSM6AAAAABHRSHQV2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJYHA2TGOBSGY>. You are receiving this because you were mentioned.

tbodt · 2024-05-18T15:40:43Z

Linux linux-luka 6.7.6-1-aarch64-ARCH #1 SMP PREEMPT_DYNAMIC Sat Feb 24 10:08:35 MST 2024 aarch64 GNU/Linux

francoisvignon · 2024-05-18T15:42:52Z

ok, this is the "expected" behavior :-)

…

____________________ Francois

Le 18 mai 2024 à 17:41, tbodt ***@***.***> a écrit : Linux linux-luka 6.7.6-1-aarch64-ARCH #1 <#1> SMP PREEMPT_DYNAMIC Sat Feb 24 10:08:35 MST 2024 aarch64 GNU/Linux — Reply to this email directly, view it on GitHub <#2387 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AK466UXO5U5XFHIZYIFUF5DZC5ZBDAVCNFSM6AAAAABHRSHQV2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJYHA3DCMZTG4>. You are receiving this because you were mentioned.

tbodt · 2024-05-18T16:35:03Z

I'm trying to decide if I should revert this PR, if the new bug is worse than the old bug. Obviously ideally both bugs would be fixed, but I don't really have time to do that work. How bad is the new behavior?

francoisvignon · 2024-05-18T16:49:22Z

I will check the old bug tomorrow: I have used the old pty implementation without major issue ... so, I will double check, and I tell you ...send from my mobile.____________________FrancoisLe 18 mai 2024 à 18:35, tbodt ***@***.***> a écrit : I'm trying to decide if I should revert this PR, if the new bug is worse than the old bug. Obviously ideally both bugs would be fixed, but I don't really have time to do that work. —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>

tbodt · 2024-05-18T16:50:02Z

Do you run into major issues with the new behavior?

francoisvignon · 2024-05-18T16:59:28Z

Major ? annoying is a better word:- with the previous implementaion, I can check if a client is connected or not: with this one, not longer.- with this one, when the client disconnect, I close the master pty and open it again, with the risk to have a different client pty.So ... I prefer the old one (more standard accross various plateform) and I think the solution used to solve the issue solved by the new one is maybe not the correct one.send from my mobile.____________________FrancoisLe 18 mai 2024 à 18:50, tbodt ***@***.***> a écrit : Do you run into major issues with the new behavior? —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>

cgull · 2024-08-31T01:39:34Z

I think I've fixed this in #2449. Sorry for dropping the ball on it. I'm curious, though, what sort of application would you be running in iSH that needs this? The things I can think of include embedded applications or applications needing to connect to or emulate physical devices, none of which seem like something you'd want to do in an iOS app.

cgull added 3 commits March 15, 2024 00:46

Fix an operator precedence warning from clang.

d8b3df7

Add another very simple log destination-- fprintf to stderr

ebaba89

Add basic LLDB config

edc6016

cgull force-pushed the cgull/pty-init-fixes branch from f61ed97 to 2831b53 Compare May 11, 2024 17:17

cgull mentioned this pull request May 11, 2024

Avoid crashes on /proc/ish from CLI build #2386

Merged

tbodt reviewed May 11, 2024

View reviewed changes

cgull added 3 commits May 13, 2024 01:41

Shutdown more cleanly when init dies.

6e5cdaf

Using SIGKILL on iSh itself was never going to end well. :)

Update READMEs to say /bin/sh rather than bin/login, for CLI ish

7cc11f0

cgull force-pushed the cgull/pty-init-fixes branch from 2831b53 to 7cc11f0 Compare May 13, 2024 05:44

tbodt approved these changes May 15, 2024

View reviewed changes

tbodt merged commit cee7abd into ish-app:master May 15, 2024
6 checks passed

cgull mentioned this pull request Aug 31, 2024

Fix reuse of pty slave devices #2449

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix pty close behavior, signals at exit #2387

Fix pty close behavior, signals at exit #2387

cgull commented May 11, 2024 •

edited

Loading

cgull commented May 11, 2024

tbodt left a comment

tbodt May 11, 2024

cgull May 13, 2024

tbodt May 11, 2024

cgull May 13, 2024

tbodt May 11, 2024

tbodt May 11, 2024

cgull May 13, 2024

francoisvignon commented May 17, 2024 via email

tbodt commented May 17, 2024

francoisvignon commented May 18, 2024 •

edited

Loading

tbodt commented May 18, 2024

francoisvignon commented May 18, 2024 via email

tbodt commented May 18, 2024

francoisvignon commented May 18, 2024 via email

tbodt commented May 18, 2024 •

edited

Loading

francoisvignon commented May 18, 2024 via email

tbodt commented May 18, 2024

francoisvignon commented May 18, 2024 via email

cgull commented Aug 31, 2024

Fix pty close behavior, signals at exit #2387

Fix pty close behavior, signals at exit #2387

Conversation

cgull commented May 11, 2024 • edited Loading

cgull commented May 11, 2024

tbodt left a comment

Choose a reason for hiding this comment

tbodt May 11, 2024

Choose a reason for hiding this comment

cgull May 13, 2024

Choose a reason for hiding this comment

tbodt May 11, 2024

Choose a reason for hiding this comment

cgull May 13, 2024

Choose a reason for hiding this comment

tbodt May 11, 2024

Choose a reason for hiding this comment

tbodt May 11, 2024

Choose a reason for hiding this comment

cgull May 13, 2024

Choose a reason for hiding this comment

francoisvignon commented May 17, 2024 via email

tbodt commented May 17, 2024

francoisvignon commented May 18, 2024 • edited Loading

tbodt commented May 18, 2024

francoisvignon commented May 18, 2024 via email

tbodt commented May 18, 2024

francoisvignon commented May 18, 2024 via email

tbodt commented May 18, 2024 • edited Loading

francoisvignon commented May 18, 2024 via email

tbodt commented May 18, 2024

francoisvignon commented May 18, 2024 via email

cgull commented Aug 31, 2024

cgull commented May 11, 2024 •

edited

Loading

francoisvignon commented May 18, 2024 •

edited

Loading

tbodt commented May 18, 2024 •

edited

Loading