Failed to bind socket acceptor after restarts

Azarchius · July 8, 2017

Hi,

Every time my server restarts, it needs to wait some 30 seconds or so before it can boot again, I am presented with this error, for some reason TrinityCore isn't gracefully closing the socket. I thought it had something to do with the way I shut down the core, but even with graceful non-force shutdowns, this still happens, but I haven't seen any thread about this (other than some years-old one which wasn't an issue with TC). Is this just something wrong with my server? This wasn't a problem before I upgraded to 7.x. Did some preliminary investigation and couldn't really find anything wrong with my server's network configuration.

Using the latest 7.x build.

Cheers

World initialized in 0 minutes 29 seconds
StartNetwork failed to bind socket acceptor
Failed to initialize network
stage@ju386:~/server/bin$

CDawg · July 8, 2017

I'm not 100%, but it sounds like the firewall is shutting it down and choking the socket after initialization.

Azarchius · July 8, 2017

Hrm, but why the firewall? Seems strange. It's only for the next half a minute or so after the initial shutdown. There's also nothing particularly custom about my iptables and such. I am serving two IPs from the same machine, but this is less relevant and the core did work before.

CDawg · July 8, 2017

What OS are you using?
Did you check if there was a security update? Iv'e ran Ubuntu/CentOS/Debian and IPTables have been updated in the past and causing issues.

When it worked before, what commit did it work without choking the socket?

Azarchius · July 8, 2017

1 hour ago, CDawg said:

What OS are you using?
Did you check if there was a security update? Iv'e ran Ubuntu/CentOS/Debian and IPTables have been updated in the past and causing issues.

When it worked before, what commit did it work without choking the socket?

Ubuntu 16.04. Now that I think about it, I not only updated the core, but the server as well at the time. I was also 14.04, for what it's worth, before the upgrade.

It was very old--the latest 434 commit (at least at the time, though IIRC TC isn't updating Cata anymore).

Also, held on to this reply for a while since what you said did give me an idea. Indeed, I upgraded all software on the system, including the distro, and then a did a recompile from total scratch. Works like a charm, thanks! Did appear to lose the ability to tunnel into MySQL as root though, but... something to investigate later.

Azarchius · July 8, 2017

Never mind, it appears the issue is immediately back. Strange that it worked at first--I straight up interrupted the process and booted it back immediately. Did it again just now and it's back. Alas.

Edit: Found a symptom. It seems like it only happens if a person had logged into the server.

Could it be the core is stuck trying to send them disconnect packets? The client *does* gracefully disconnect even if I interrupt the process.

I tried logging into the server, logging off, and only then restarting. I actually received an error this time

World initialized in 0 minutes 27 seconds
StartNetwork failed to bind instance socket acceptor
Failed to initialize network
	/home/stage/core/src/server/shared/Networking/SocketMgr.h:35 in ~SocketMgr ASSERTION FAILED:
  !_threads && !_acceptor && !_threadCount StopNetwork must be called prior to SocketMgr destruction
Segmentation fault
stage@ju386:~/server/bin$

CDawg · July 9, 2017

Running Ubuntu 16.04 here also, with latest updates. Here is a comparison of what process' should be running for TC

Here is my netstat. Note: I am running (2) servers, so you can ignore 7879 & 8086 my soap and world socket. Also with the latest updates there is a snapd.socket. Make sure that doesn't prematurely close, due to it causing segmentation faults.
Also check selinux for any wonkiness going on there as well. I had to turn mine off.
I'm still theorizing it is something with the network and not the code itself (not 100%)

~$ netstat -l
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 192.168.1.11:7878 *:* LISTEN
tcp 0 0 192.168.1.11:7879 *:* LISTEN
tcp 0 0 *:3724 *:* LISTEN
tcp 0 0 *:8085 *:* LISTEN
tcp 0 0 *:8086 *:* LISTEN
tcp 0 0 *:ssh *:* LISTEN
tcp6 0 0 [::]:ssh [::]:* LISTEN
udp 0 0 *:bootpc *:*
Active UNIX domain sockets (only servers)
Proto RefCnt Flags Type State I-Node Path
unix 2 [ ACC ] STREAM LISTENING 17959 /run/user/1000/systemd/private
unix 2 [ ACC ] SEQPACKET LISTENING 9112 /run/udev/control
unix 2 [ ACC ] STREAM LISTENING 9100 /run/systemd/private
unix 2 [ ACC ] STREAM LISTENING 9104 /run/systemd/fsck.progress
unix 2 [ ACC ] STREAM LISTENING 9111 /run/lvm/lvmpolld.socket
unix 2 [ ACC ] STREAM LISTENING 9113 /run/systemd/journal/stdout
unix 2 [ ACC ] STREAM LISTENING 9115 /run/lvm/lvmetad.socket
unix 2 [ ACC ] STREAM LISTENING 16793 /var/run/dbus/system_bus_socket
unix 2 [ ACC ] STREAM LISTENING 16794 /run/snapd.socket
unix 2 [ ACC ] STREAM LISTENING 16795 /run/snapd-snap.socket
unix 2 [ ACC ] STREAM LISTENING 16796 /run/acpid.socket
unix 2 [ ACC ] STREAM LISTENING 16797 /run/uuidd/request
unix 2 [ ACC ] STREAM LISTENING 16927 @ISCSIADM_ABSTRACT_NAMESPACE
unix 2 [ ACC ] STREAM LISTENING 16792 /var/lib/lxd/unix.socket

CDawg · July 9, 2017

Found the problem!!!

I was able to duplicate it. I was working on screen sessions and just noticed that I already had a worldserver running on that port. So it was occupied.

Long story short. it looks like worldserver is not "restarting" properly. you have to do a full .server shutdown

check that you don't have another worldserver running on the same port.

Example:

$ ps auxf | grep -i worldserver
cellyson 14595 0.7 5.0 1490564 1250720 pts/2 Tl 16:06 1:06 \_ ./worldserver

Azarchius · July 9, 2017

I don't have selinux, and my netstat is clean after shutting down the worldserver and while turning it on again:

Only related thing that's running there is the bnetserver. You can see the server booting to the right--it promptly hit the error.

Also, yeah, the problem is seemingly that it's not.. giving up the port? But it certainly gave it up. Also, I don't have two worldservers up. No scenario where I can have 'em -- worldserver is always running off the same screen.

CDawg · July 9, 2017

That's really odd. I'm only able to duplicate the issue if I try to run another worldserver on top of one already running (same configs).
It gives me that exact error that you get, the only difference, is that I get couldn't bind to port 8085 which is obvious since they are running on top of each other.

Try disabling SOAP or RA in the configs...

I only get an assertion error if they try to run on top of each other, or the SOAP port is in use. It just vomits the same message you get.

Azarchius · July 9, 2017

I regret I'm not running RA and SOAP.

If the SOAP part is in use, you say? That's pretty interesting, though I can't think of what else would work like it and cause the same issue...

By the way thanks a lot for your help so far, I really appreciate it dude. If you can think of anything else, I'd love to hear it.

Edit: For the record, the bnetserver behaves in the exact same way. If I end it, I can't get it up for a good minute.

Azarchius · July 9, 2017

Hrm, this could be related?

Prevented sending of [SMSG_UPDATE_OBJECT 0x280D (10253)] to non existent socket 1 to [Player: Dbdr GUID Full: 0x08000400000000000000000000000001 Type: Player Entry: 0 Low: 1, Account: 1]

Prevented sending of [SMSG_LOGOUT_COMPLETE 0x26AF (9903)] to non existent socket 0 to [Player: Account: 1]

I'm getting loads of messages like these on server shutdown.

Azarchius · July 10, 2017

Seems like the aforementioned packets are not sent if the server is interrupted or crashes while running through gdb and the thread itself is not terminated. If I just run it again without closing gdb (i.e. run and when prompted to overwrite previous thread do yes) the issue will not occur.

CDawg · July 10, 2017

Seems only like an issue with 7.x? I have'nt run into that problem in 335a.

Azarchius · July 10, 2017

1 minute ago, CDawg said:

Seems only like an issue with 7.x? I have'nt run into that problem in 335a.

I mean yeah. 335a has been stable forever. Like I said in the OP, definitely a 7.x issue as I didn't have it either when I was 4.x

LordPsyan · September 30, 2017

I have the exact same issue with 335a running latest TC as of this post. Base core, no extras. Driving me nuts... Not running in screen, but on debian. I do have another worldserver running, but on a different port - 8085. this one runs on 8086 but both bind to 0.0.0.0

CDawg · October 1, 2017

9 hours ago, LordPsyan said:

I have the exact same issue with 335a running latest TC as of this post. Base core, no extras. Driving me nuts... Not running in screen, but on debian. I do have another worldserver running, but on a different port - 8085. this one runs on 8086 but both bind to 0.0.0.0

BindIP to 0.0.0.0 on multiple worldserver configs won't matter.
Ports are where there issues are happening. The service is trying to run on a port that is already occupied.

The only logical explanation is to find worldserver service running and kill it. Restaring the service seems to be apparent that the service is not properly shutting down and thus, still occupying the port.

Aokromes · October 1, 2017

4 hours ago, CDawg said:

Restaring the service seems to be apparent that the service is not properly shutting down and thus, still occupying the port.

I am unable to reproduce this with the restarter i use (maybe because i run core under gdb + screen) every time i restart the port is released for me (60 seconds between checks if worldserver is up)

LordPsyan · October 1, 2017

doing .server shutdown force 1 gives same result. my restarter runs every 10 seconds, but that doesn't matter. realm being down 30 seconds to 1 minute isn't a big deal, since my server is only for me and my kids and wife, but on a populated server 1 minute could be bad. I see no worldserver process running. Don't know what other information I can give....

Back to spell scripts lol

Doctor · March 12, 2018

I'm having the same problem on TrinityCore 3.3.5.
I tried almost everything, even changing the machine i was using but i have the same problem.
I'm on Debian 9.3, i'm using IpTables as firewall and i run the server with this restarter on sh in a tmux session.
I can't find a solution !

Sign In

Failed to bind socket acceptor after restarts

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Recently Browsing 0 members