| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
| |
There was an issue when we rebooted a node via qarsh.
The packet size would make it back from qarshd, but none of
the data so we would get stuck in a read(). We needed to
avoid a blocking read() and check for a heart beat while we
wait for the rest of the packet.
|
| |
|
| |
|
|
|
|
|
|
|
| |
After a long running command or when multiple qarsh processes
are running in parallel, it's hard to know which host is having
trouble when a "Remote host rebooted" message shows up. Include
the host name in these message to make them more informative.
|
|
|
|
|
|
| |
This is an option from ssh which disables pseudo-tty allocation.
Since we don't allocate them in the first place, we're compatible with
it.
|
|
|
|
|
|
| |
If we don't get a heart beat when qarsh starts, print a warning to check
on the btimed service. Commands could be stopped early if it doesn't
produce any output and qarsh thinks the host is down.
|
|
|
|
|
| |
We really don't need this field since we
always copy data sequentially.
|
| |
|
|
|
|
|
|
|
| |
If xiogen is flooding requests across qarsh and
xdoio decides to stop, we need to handle that gracefully.
Also, making the pipe non-blocking was not a good idea, xdoio
gets the read error EAGAIN and stops there.
|
|
|
|
|
|
|
| |
When qarshd is run via xinetd, stderr still goes out the socket
and messages from sockutil.c or qarsh_packet.c can interfere
with the protocol. Create a thin wrapper which qacp and qarsh can
send to stderr and qarshd can send to syslog.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
| |
This coordinates the buffer sizes with the
max packet size. qarshd and qarsh will probably break
if this value does not match between client and server
builds. Also increase the value to reduce overhead.
A max packet size of 16k only yields 40MB/s. Increase
that to 128k and we can do 500MB/s.
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
| |
If the user is specified as part of the host,
we don't need to free it and if it was a
separate option, it will get freed when the
process ends
|
|
|
|
| |
Added a new packet to limit data sent from the other side.
|
| |
|
| |
|
|
|
|
| |
I don't see any way to coexist with the old "protocol"
|
| |
|
| |
|
| |
|
| |
|
| |
|
|\
| |
| |
| |
| |
| | |
Conflicts:
qarsh.c
sockutil.c
|
| | |
|
|/
|
|
|
|
| |
The user specified time for holding a connection only.
If the user uses too small a time, like if they are
rebooting a node, the initial connection may fail.
|
|
|
|
|
|
| |
Running things in parallel with pthreads in perl can
lead to file descriptor leaks which may cause hangs
in qarsh.
|
|
|
|
|
|
| |
In rare cases the getpwuid() call will fail because of a YP
or LDAP timeout. If we're not using the local username we
shouldn't even bother looking it up.
|
|
|
|
|
|
| |
If qarshd is broken enough that it can't load libxml2.so, it
won't return an XML packet which we can parse. set_remote_user()
really needs to error out of we didn't get a packet back.
|
|
|
|
|
|
|
|
|
| |
I don't know how, but I found one instance of qarsh looping
through the pselect loop with a one second timeout. If the command has
exited and the output file descriptors are all closed, we fall onto
this continue which prevents us from getting to the break at the end of
the loop. The only thing the continue skips over is that check which we
really should check, so remove the continue.
|
|
|
|
|
|
| |
All the actions which need to be done before we exit are done after the
pselect. Waiting until after the next pselect can cause us to sit for
a second before we exit, which slows down things which use qarsh.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We need to test all exit conditions at once so we fall back into the
hbeat code. I was falling into a case when running "reboot -fin &"
where the command would exit, but the sockets would not close and we
weren't getting to hbeat() to detect the reboot.
In another case which I can't completely explain, we were getting a
double-free error from glibc in the qpfree() at the end of
run_remote_cmd(). Instead of waiting until the very end to read the
exit status, save it off as soon as we get the packet and use
cmd_exitted to determine if we have an exit code or if something went
horribly wrong.
|
| |
|
|
|
|
| |
quickly that we do a hbeat then go back to the pselect.
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
When running rsync on an existing directory structure, rsync may be too
busy to read everything that qarsh is writing to it from the remote
rsync daemon. Create a buffer for each of stdin, stdout, and stderr
and keep it around until we are able to write it, holding off further
reads until it can be written. We still don't handle partial writes.
|
|
|
|
| |
has exitted and we've processed all the output.
|
|
|
|
|
| |
help get all output before we exit. There is still a race if the cmdexit
packet returns before all output where we could truncate output.
|
| |
|
| |
|
|
|
|
|
| |
reproduced properly by qarsh.
Return 127 as an exit code on internal error cases.
|
| |
|
|
|
|
|
| |
name in the usage output would show "(null)." qarsh isn't called anything
else so just hard code qarsh in the usage message.
|
| |
|
|
|
|
|
|
| |
OpenSSH. After they get to the end of the args and they haven't gotten
a host name yet they chew the next arg as the hostname and restart parsing
the command line. Now we do too.
|
| |
|