On Tue, Mar 15, 2011 at 12:41:32AM -0400, assistlydavid wrote:
> We're running multiple nginx web servers on EC2 behind multiple
> ELBs (load balancers). I'm seeing strange behaviour in our
> nginx system logs. This strange behaviour seems to coincide
> with brief outages spotted by our external monitoring (chartbeat
> & new relic). I'm not sure whether I'm on the right track here,
> investigating this strange logging ... or if it's just a
> coincidence and this is normal logging for nginx.
> It starts with typical nginx logging, timestamps increasing
> chronologically. Then there's a jump back in the timestamps
> while a large block of ELB-only traffic is logged (health
> checks). After that, logging returns to normal (increasing
> chronologically) and the timestamps resume from where they left
> off before the ELB-only burst.
> For example (details changed to protect the innocent):
> 126.96.36.199 - - [14/Mar/2011:06:39:08 +0000] "GET /stuff HTTP/1.1" 301 178 "-" "Jakarta Commons-HttpClient/3.1" http example.com /stuff
> 10.162.18.95 - - [14/Mar/2011:06:23:17 +0000] "GET /host HTTP/1.1" 200 0 "-" "ELB-HealthChecker/1.0" - 10.162.217.229:80 /host
> Notice the jump back in time, then ELB-only traffic is logged,
> then logging returns to normal once the ELB timestamps have
> caught up with user traffic.
> Should I expect all nginx log messages to increase
> chronologically? What would explain these large jumps back?
> We redefined our log format to use a 32k buffer. Unfortunately,
> I can't risk changing/disabling this in production to see if the
> behaviour changes. Could there be more than one 32k buffer
> flushing at different times?
Each worker process has it's own log buffer and flush it as soon
it's filled up. So the observed behaviour is normal with log
buffering used and low traffic.