May 22, 2014 at 3:54 PM
Edited May 22, 2014 at 4:25 PM
- Windows Server 2008 R2 SP1 x64, patched up.
- .NET 4.0, patched up.
- Long-Running ASP.NET Web App "Daemons" in IIS 7.5
- * Basically, long-running apps without the suck of Windows Service applications
- v8 188.8.131.52 x64 Release
I have one particular script that runs very, very frequently on a portion of the server farm, and it works flawlessly in multiple applications. These processes are restarted only once every 2 weeks when we deploy.
This same script runs on another "newer" portion of the server farm and in another "newer" application. This newer application runs the same identical script, but it is running much less frequently than the other applications elsewhere in
the server farm. After roughly about 11 hours or so, we start to get compile errors from this script (the script code is not changing anywhere) only on the "newer" application on the "newer" portion of the server farm.
Once these compiler errors begin, they continue until the process is restarted.
Process memory on the "newer" application is stable throughout, and the application otherwise continues to function absolutely normally.
Unfortunately, we've never been able to re-create this problem in non-production environments, even with similar servers and under load tests.
The exception we get is:
The code we're using is quite simple:
MyResponse response = new MyResponse();
if (null != objectInput)
// Log and handle the error
We have ensured that the app servers are identical. We have added logging to emit the script when this occurs, and the script is always intact (unchanged). Interestingly (confusingly?), if we run the "newer" application on the "older" portion
of the server farm, alongside other applications, this script compile error never occurs (even over weeks).
We are currently auto-recycling our "newer" application every 11-ish hours to work-around this problem.
Has anyone run into anything similar?
Is there something that we're doing wrong that we're just missing?
Things we're investigating:
- Running an additional canary script to see if the failure is script-specific
- * Since the problematic servers are only using this one script
- Upgrading to latest GitHub code, latest stable v8
- Adding alternative support for ClearScript
- * But the performance is not nearly as good there for our use cases
- Compiling/running scripts in an isolated app domain, so we can destroy/re-create when this problem appears