On Fri, Mar 04, 2011 at 04:12:59AM -0500, Marc Kramis wrote:
> Hi All
> I'm experiencing a very strange network problem that occurs every four
> to six weeks and lasts for approximately one hour. I can not manually
> provoke the problem and, during this very hour, I can not resolve it
> even with rebooting the server. Even stranger, two physically separated
> servers suffer from the same problem at the same time. Both servers use
> nginx as SSL reverse proxy. Each server has disjunct domains to handle.
> During one hour, we just see: Nginx - Gateway Timout. After one hour, it
> suddenly works again. It started around four months ago. Note that all
> other network traffic is not affected, only nginx HTTPS and HTTP.
> So, what's in common with both servers:
> 1) Hardware (UltraSPARC T2 Plus).
Some endianness issue?...
> 2) OS (Solaris 10 U9 latest patch level).
> 3) Time (both servers use the exact same NTP-controlled time).
> 4) Switch.
> 5) Firewall (I replaced the firewall four weeks ago, the error still
> just appeared).
> 6) Nginx 0.8.46-0.8.54, same configuration but for different domains
> internally hosted at different servers, compiled with Solaris OpenSSL.
> I observed that nginx, during this mysterious hour, mistakenly proxies
> the requests back to the original IP on random ports instead of the
> proxy IP and that these requests are blocked by the firewall.
> Because two different machines are affected at the same time and it
> cannot be resolved by a restart of nginx or a reboot of the whole
> server, and it resolves itself after approximately one hour, my guess is
> that some time-dependent error occurs in nginx.
> I will replace nginx with apache to verify the problem actually is nginx
> and not the OS, switch are whatever and then wait and hope :-)
> Does anyone have an idea how to locate or investigate this problem?
You may want to follow http://wiki.nginx.org/Debugging and provide
config and debug log (if you are able to obtain one), as well as
nginx -V output. This may help to investigate the problem if the
issue is actually in nginx.