June 01, 2007 WEBDEV: URL underscores and IIS6+

I’m not doing much Microsoft-platform web development these days. I’m working with PHP/MySQL frameworks and deploying to my ‘nix/Apache server at (mt) MediaTemple. That means I’m not deploying to IIS anymore either, and certainly not using it as a test environment.
Recently contracted to do a full HTML build of a site for a firm, I stuck to static XHTML/CSS without using includes for navigation. The reason? Make it easy for them to make edits and changes once it went live. Anything that was reproduced across pages was coded with no page-specific references, so edits to navigation or other common elements could be copied and pasted across files.
Much as my content management system (ExpressionEngine) does with my blog entry URLs and mod_rewrite, I coded the site’s pages as index.html across the board, with each residing in its own subdirectory, named with meaningful_names_that_use_underscore for search optimization and navigability.
The site went live, and as each link was clicked, _ was converted to %5f, resulting in meaningful%5fnames%5fthat%5fuse%5funderscore/. Not dysfunctional, but not so friendly.
I immediately assessed it as a server configuration issue, citing the fact that when I hovered over links I’d see /meaningful_names_that_use_underscore in the browser’s status bar. I was assuming the site had been deployed to Apache and something was enabled or not disabled, causing the server response to urlencode all the URLs. I was wrong.
I checked the site’s domain at http://www.whois.sc and learned the site was running on IIS6. A little digging on Google uncovered an IIS6 security feature. As it turns out, when IIS receives a request without a trailing ‘/’, it has to determine whether a file is being requested, or a directory’s default file—index.html. .../meaningful_names_that_use_underscore is not clearly a directory, so it does a check to see if that file exists. If not, it does a redirect to the folder (or something to that effect—I’m not sure if the redirect is the check or the final action): .../meaningful_names_that_use_underscore/.
The catch is, when IIS does a redirect, it converts a number of characters to be urlencoded, to prevent cross-site scripting (XSS) attacks or other manipulation. So when the server redirected my link to a directory, it swapped in the encoded value for the _.
The solution: Go through all the HTML files and update each link to have a trailing ‘/’.
I’m not sure in this case that inquiring the target platform at the onset would’ve saved me any trouble, as a I made a conscious decision up front to build the site platform-independent. I know now, though, to always code links to directories with a trailing ‘/’.