Bugzilla – Bug 818
Fix problems fetching DTDs from w3.org: increase HTTP timeout, remove 1 DTD
Last modified: 2011-05-17 11:43:18 CEST
I tried the hg tip today and it seems that www.w3.org has some issues serving some pages. I have modified build/build.py to use urllib2 instead of urllib and added the increased HTTP timout. Also it seems that sometimes an empty response is returned, so I had to catch BadStatusLine exception. A trivial patch is available here: https://bitbucket.org/saper/validator-build/changeset/b893eb8c0260 Another issue was that we get 404 fetching http://www.w3.org/TR/xhtml1/DTD/xhtml11.dtd (not single "1" after first "xhtml"). There never seems was anything like this (even web.archive.org knows nothing). I have removed this, but I wonder what should be there instead? Maybe it was supposed to be this? http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd A patch is here: https://bitbucket.org/saper/validator-validator/changeset/55e83b3149fb After those changes, the validator built successfully on my FreeBSD machine. --saper
https://bitbucket.org/validator/build/changeset/b893eb8c0260 https://bitbucket.org/validator/validator/changeset/05607e55502f
Marcin, Thanks for the build patch. It's now landed in the upstream repo, along with a fix for the bogus DTD URL. Sorry for having taken so long to get around to landing the patch.
Marcin, I now notice that your patch depends on the 'timeout' argument to urllib2.urlopen(...), and that argument does not seem to be new in python 2.6. So it won't work in a python 2.5 environment. So if you have time, please take a look at the following refinement and let me know if you see any problems with it. diff -r b893eb8c0260 build.py --- a/build.py Sat Feb 12 20:51:26 2011 +0000 +++ b/build.py Tue May 17 17:35:45 2011 +0900 @@ -25,6 +25,7 @@ import shutil import httplib import urllib2 +import socket import re try: from hashlib import md5 @@ -643,14 +644,18 @@ # I bet there's a way to do this with more efficient IO and less memory print url completed = False + defaultTimeout = socket.getdefaulttimeout() while not completed: try: - f = urllib2.urlopen(url, timeout=httpTimeoutSeconds) + socket.setdefaulttimeout(httpTimeoutSeconds) + f = urllib2.urlopen(url) data = f.read() f.close() completed = True except httplib.BadStatusLine, e: print "received error, retrying" + finally: + socket.setdefaulttimeout(defaultTimeout) if md5sum: m = md5(data) if md5sum != m.hexdigest():