Using a Python timeout decorator for uploading to S3
At work we are uploading many images to S3 using Python's boto library. However we are experiencing a RequestTimeTooSkewed error once every 100 uploads on average. We googled, but did not find a solution. Our system time was in sync and our file sizes were small (~50KB).
Since we couldn't find the root cause, we added a watchdog timer as a bandaid solution. We already use a retry decorator to retry uploads to S3 when we get a 500 Internal Server Error response. To this we added a timeout decorator which cancels the S3 upload if it takes more than a couple minutes. With this decorator, we don't have to wait the full 15 minutes before S3 returns the 403 Forbidden (RequestTimeTooSkewed error) response.
I found the timeout decorator at Activestate's Python recipes. It makes use of Python's signal library. Below is an example of how it's used.
import signal
class TimeoutError(Exception):
def __init__(self, value = "Timed Out"):
self.value = value
def __str__(self):
return repr(self.value)
def timeout(seconds_before_timeout):
def decorate(f):
def handler(signum, frame):
raise TimeoutError()
def new_f(*args, **kwargs):
old = signal.signal(signal.SIGALRM, handler)
signal.alarm(seconds_before_timeout)
try:
result = f(*args, **kwargs)
finally:
# reinstall the old signal handler
signal.signal(signal.SIGALRM, old)
# cancel the alarm
# this line should be inside the "finally" block (per Sam Kortchmar)
signal.alarm(0)
return result
new_f.func_name = f.func_name
return new_f
return decorate
Try it out:
import time
@timeout(5)
def mytest():
print "Start"
for i in range(1,10):
time.sleep(1)
print "%d seconds have passed" % i
if __name__ == '__main__':
mytest()
Results:
Start 1 seconds have passed 2 seconds have passed 3 seconds have passed 4 seconds have passed Traceback (most recent call last): File "timeout_ex.py", line 47, in <module> function_times_out() File "timeout_ex.py", line 17, in new_f result = f(*args, **kwargs) File "timeout_ex.py", line 42, in function_times_out time.sleep(1) File "timeout_ex.py", line 12, in handler raise TimeoutError() __main__.TimeoutError: 'Timed Out'
Bug found by Sam Kortchmar (added 2018-08-18)
The code on the Activestate recipe has signal.alarm(0)
outside of the finally
block, but Sam Kortchmar reported to me that it needs to be inside the finally
block so that the alarm will be cancelled even if there is an exception in the user's function that is handled by the user. With signal.alarm(0)
outside of the finally
block, the alarm still fires in that case.
Here is the test case sent by Sam:
import unittest2 import time class TestTimeout(unittest2.TestCase): def test_watchdog_doesnt_kill_interpreter(self): """If this test executes at all, it's working! otherwise, the whole testing section will be killed and print out "Alarm clock" """ @timeout(1) def my_func(): raise Exception try: my_func() except Exception: pass time.sleep(1.2) assert True
The RequestTimeTooSkewed error
S3ResponseError: 403 Forbidden <?xml version="1.0" encoding="UTF-8"?> <Error><Code>RequestTimeTooSkewed</Code><Message>The difference between the request time and the current time is too large.</Message><MaxAllowedSkewMilliseconds>900000</MaxAllowedSkewMilliseconds><RequestId>7DDDS67HF8E37</RequestId><HostId>LKE893JFGDLASKJR9BJ-A9NASFPNAOPWEJORG-98DFGJA498JVJ-A04320JF0293JLKE</HostId><RequestTime>Tue, 27 Apr 2010 22:20:58 GMT</RequestTime><ServerTime>2010-04-27T22:55:24Z</ServerTime></Error>
See also
Related posts
- Fabric post-run processing Python decorator — posted 2010-11-06
- Two of the simplest Python decorator examples — posted 2010-03-09
- Trying out a Retry decorator in Python — posted 2009-11-17
Comments
The problem is that the internal "hardware" clock is off the mark. You need to NTPD on that server and re-sync the clock. It is important to skip the use the local hardware clock and adjust time ONLY to the NTP time standard.
This issue comes up most of the time on Windows EC2 Servers. But I have seen CentOS servers that have this issue also.
NTPD or xntpd is the cure. Just add this service to the server with yum and set up the external resources.
Edward M. Goldberg http://myCloudWaatcher.com/ e.m.g.
This is not thread-safe decorator, right?
Edwrrd: We did apt-get install ntp
on our Ubuntu Hardy server and still got the RequestTimeTooSkewed
errors.
Alexander: I don't know much about thread safety so I'm not sure.
Hey Eliot, I know this is an old thread, but did you ever figure out the root cause of your _RequestTimeTooSkewed_ errors? I think I'm running into the same issue -- NTPD is installed and the time on the server is correct, but requests will randomly hang for 15 minutes before returning a 403 error. Subsequent requests work immediately.
I'll probably end up implementing a similar fix, but I'd like to fix the source of the problem if I can...
Hi Kevin, I don't think I ever did find the root cause. If I did, I don't remember now. Sorry.
I have sort of run into related issue with boto and Google Storage and I seem to be able to use socket.setdefaulttimeout()
function:
import socket
socket.setdefaulttimeout(30)
It sets the timeout on reads/writes to newly created sockets. Currently the only advantage of using this function over the alarm signal approach that I can think of is the ability to use it in a multithreaded environment (@Alexander signals and threads don't match well).
Mansour