NOTE: The current preferred location for bug reports is the GitHub issue tracker.
Bug 818 - Fix problems fetching DTDs from increase HTTP timeout, remove 1 DTD
Fix problems fetching DTDs from increase HTTP timeout, remove 1 DTD
Classification: Unclassified
Component: General
All All
: P2 major
Assigned To: Marcin Cieślak
Depends on:
  Show dependency treegraph
Reported: 2011-02-12 23:43 CET by Marcin Cieślak
Modified: 2011-05-17 11:43 CEST (History)
1 user (show)

See Also:


Note You need to log in before you can comment on or make changes to this bug.
Description Marcin Cieślak 2011-02-12 23:43:59 CET
I tried the hg tip today and it seems that has some issues
serving some pages. I have modified build/ to use urllib2 instead
of urllib and added the increased HTTP timout. Also it seems that sometimes
an empty response is returned, so I had to catch BadStatusLine exception.

A trivial patch is available here:

Another issue was that we get 404 fetching

(not single "1" after first "xhtml").

There never seems was anything like this (even knows nothing).

I have removed this, but I wonder what should be there instead?

Maybe it was supposed to be this?

A patch is here:

After those changes, the validator built successfully on my FreeBSD machine.

Comment 2 Michael[tm] Smith 2011-04-20 17:09:26 CEST

Thanks for the build patch. It's now landed in the upstream repo, along with a fix for the bogus DTD URL.

Sorry for having taken so long to get around to landing the patch.
Comment 3 Michael[tm] Smith 2011-05-17 11:43:18 CEST

I now notice that your patch depends on the 'timeout' argument to urllib2.urlopen(...), and that argument does not seem to be new in python 2.6. So it won't work in a python 2.5 environment.

So if you have time, please take a look at the following refinement and let me know if you see any problems with it.

diff -r b893eb8c0260
--- a/	Sat Feb 12 20:51:26 2011 +0000
+++ b/	Tue May 17 17:35:45 2011 +0900
@@ -25,6 +25,7 @@
 import shutil
 import httplib
 import urllib2
+import socket
 import re
   from hashlib import md5
@@ -643,14 +644,18 @@
   # I bet there's a way to do this with more efficient IO and less memory
   print url
   completed = False
+  defaultTimeout = socket.getdefaulttimeout()
   while not completed:
-    f = urllib2.urlopen(url, timeout=httpTimeoutSeconds)
+    socket.setdefaulttimeout(httpTimeoutSeconds)
+    f = urllib2.urlopen(url)
     data =
     completed = True
    except httplib.BadStatusLine, e:
     print "received error, retrying"
+   finally:
+    socket.setdefaulttimeout(defaultTimeout)
   if md5sum:
     m = md5(data)
     if md5sum != m.hexdigest():