As far as the problem of the validator repeatedly not returning anything at
all, I can’t recall ever seeing any other reports of that—nor any other
reports of problems access it through Tor. But that could just be because
not many people have tried using it through Tor.
Does it happen every time you try to you make a validation request to
validator.w3.org through Tor? Or only sometimes? Does it happen if you use
http://validator.w3.org/nu/ instead? Or if you use the CSS validator?
We may be able to troubleshoot it by having you make a validation request for
a particular URL and checking the validator logs to see if anything unexpected
is getting logged at the times when you’re seeing nothing returned.
As far as the “excessive traffic pattern blocked” problem the W3C systems
team does get reports of that regularly, and every single time they
investigate it they find that in fact it is because a particular IP address
or range is actually sending excessive traffic to the validator.
The short answer to how to solve that problem is: Run a local copy of the
Nu HTML Checker instead—either just using the vnu.jar executable directly
from the command line, or by using it to run your own Web-based persistent
instance of the checker—
Any traffic from a particular IP address or range that’s more than a
certain maximum number of allowed requests per minute is considered
excessive. The maximum is set high enough that it’s not something you’re
ever going to it hit if you’re just checking documents using the Web-based
form frontend and manually entering a URL for a document to check, or a
file to upload for checking.
But the common case where people have run into it in the past has been when
they’ve installed a browser plugin that automatically sends a request to
the validator for every single page they visit—or when somebody else in
their local network has such a browser plugin installed.
Otherwise the only other case where I think you’d ever hit the limit is if
you’re running a script or some other custom application that’s capable of
sending a large number of requests to the validator in a short amount of
time. I’ve gotten blocked myself just when running a simple shell script
that recursively finds all HTML files in a particular directory and then
uses curl to make a validator request for each HTML file it finds.
If you’re running a script or app like that and it’s processing a lot of
files at once, you’re going to get blocked and there’s no way to avoid it—
because it is actually generating an excessive amount of traffic as far as
the W3C rate-limiting metrics are concerned.
But the solution to that problem is as I mentioned above pretty
straightforward: Just grab the latest vnu.jar release and run that locally.
I ran into the issue using http://validator.w3.org/; at first
consistently, then hit and miss (occasionally the validator would
work), now it seems all fine (but I’ll have been dealt new IPs).
I guess my concern is that some services, like VPNs, only offer a
limited number of end points, and a bigger user base using these end
points may inevitably look excessive to a service. I know from
Wikipedia, Google, others that they throw CAPTCHAs or even disable
their services then, which is not quite helpful. And so that’s what I
thought had happened with the validator, too.
I’ll have an eye on the local option, though I’ve shied away from that
in the past. I’ve set it up in some environments in the past and it
wasn’t always that straight-forward :) Maybe it’s different now.