Friendly url scheme?
One of the many things that's been lacking from my scraper service that I set up last week are pretty URLs. Right now the user parameter is being passed into the script with ?u=, which is a symptom of a lazy hack (which the script admittedly is). However, I've been thinking about redoing it and I'd like to get some feedback on the options available. Right now there are two pages, update and chart, that provide information to the user. Here are the two possibilities that I came up with. "1234" is the user ID number. For technical reasons the user name unfortunately cannot be used:
- http://< tld >/update/1234
- http://< tld >/chart/1234
- http://< tld >/1234/update
- http://< tld >/1234/chart
Option #1, conceptually, is calling update with the user ID. Option #2 is providing a verb to operate on a user ID.
From a consistency standpoint, which makes more sense?
Another option mentioned is
- http://< tld >/user/1234/update
- http://< tld >/user/1234/chart
This provides room for pages not relating to a specific user. i.e.
- http://< tld >/stats
I'd be gently inclined toward leading with the userid -- option #2 -- since (what exists of) the directory structure is two different functions over a user's data. It's the user's chart, and the user's update.
It's a pretty minor point, though, without knowing if there's plans for significant expansion of the functionality of this thing.
- Is everything going forward going to be additional functions foo and bar and baz for individual users? If so, option #2 gets more attractive for the above reason -- the userid is the core data, it kind of makes sense to start with it semantically.
- Are you going to add non-user-driven functionality? Leading with a header directory might make sense then -- /user/1234/update, /user/1234/chart, /question/45678/activity, /question/45678/stats, etc.
If you go with this scheme it becomes simple to stop (well-behaved) robots from spidering your site:
http://< tld >/update/1234 http://< tld >/chart/1234
This is because you could setup a /robots.txt file to contain:
Disallow /update/ Disallow /chart/
To me that is a nice bonus which is often overlooked.