I'm experiencing a very strange network problem that occurs every four
to six weeks and lasts for approximately one hour. I can not manually
provoke the problem and, during this very hour, I can not resolve it
even with rebooting the server. Even stranger, two physically separated
servers suffer from the same problem at the same time. Both servers use
nginx as SSL reverse proxy. Each server has disjunct domains to handle.
During one hour, we just see: Nginx - Gateway Timout. After one hour, it
suddenly works again. It started around four months ago. Note that all
other network traffic is not affected, only nginx HTTPS and HTTP.
So, what's in common with both servers:
1) Hardware (UltraSPARC T2 Plus).
2) OS (Solaris 10 U9 latest patch level).
3) Time (both servers use the exact same NTP-controlled time).
5) Firewall (I replaced the firewall four weeks ago, the error still
6) Nginx 0.8.46-0.8.54, same configuration but for different domains
internally hosted at different servers, compiled with Solaris OpenSSL.
I observed that nginx, during this mysterious hour, mistakenly proxies
the requests back to the original IP on random ports instead of the
proxy IP and that these requests are blocked by the firewall.
Because two different machines are affected at the same time and it
cannot be resolved by a restart of nginx or a reboot of the whole
server, and it resolves itself after approximately one hour, my guess is
that some time-dependent error occurs in nginx.
I will replace nginx with apache to verify the problem actually is nginx
and not the OS, switch are whatever and then wait and hope :-)
Does anyone have an idea how to locate or investigate this problem?
Posted at Nginx Forum: http://forum.nginx.org/read.php?2,180289,180289#msg-180289