I’ve been thinking about an import/export API for PickyWiki; I want something that’s sensible, and works well enough that it can be the basic for things like creating restorable snapshots, integration with version control systems, and being good at self-hosting documentation.
So far I’ve made a simple import/export system based on Atom. You can export the entire site as an Atom feed, and you can import Atom feeds. But whole-site import/export isn’t enough for the tools I’d like to write on top of the API.
WebDAV would seem like a logical choice, as it lets you get and put resources. But it’s not a great choice for a few reasons:
- It’s really hard to implement on the server.
- Even clients are hard to implement.
- It uses GET to get resources. This is probably its most fatal flaw. There is no CMS that I know of (except maybe one) where the thing you view the browser is the thing that you’d actually edit. To work around this CMSes use User-Agent sniffing or an alternate URL space.
- WebDAV is worried about "collections" (i.e., directories). The web basically doesn’t know what "collections" are, it only knows paths, and paths are strings.
- (In summary) WebDAV uses HTTP, but it is not of the web.
I don’t want to invent something new though. So I started thinking of Atom some more, and Atompub.
The first thought is how to fix the GET problem in WebDAV. A web page isn’t an editable representation, but it’s pretty reasonable to put an editable representation into an Atom entry. Clients won’t necessarily understand extensions and properties you might add to those entries, but I don’t see any way around that. An entry might look like:
<entry>
<content type="html">QUOTED HTML</content>
... other normal metadata (title etc) ...
<privateprop:myproperty xmlns:privateprop="URL" name="foo" value="bar" />
</entry>
While there is special support for HTML, XHTML, and plain text in Atom, you can put any type of content in <content>, encoded in base64.
To find the editable representation, the browser page can point to it. I imagine something like this:
<link rel="alternate" type="application/atom+xml; type=entry"
href="this-url?format=atom">
The actual URL (in this example this-url?format=atom) can be pretty much anything. My one worry is that this could be confused with feed detection, which looks like:
<link rel="alternate" type="application/atom+xml"
href="/atom.xml">
The only difference is "; type=entry", which I’m betting a lot of clients don’t pay attention to.
The Atom entries then can have an element:
<link rel="edit" href="this-url" />
This is a location where you can PUT a new entry to update the resource. You could allow the client to PUT directly over the old page, or use this-url?format=atom or whatever is convenient on the server-side. Additionally, DELETE to the same URL would delete.
This handles updates and deletes, and single-page reads. The next issue is creating pages.
Atompub makes creation fairly simple. First you have to get the Atompub service document. This is a document with the type application/atomsvc+xml and it gives the collection URL. It’s suggested you make this document discoverable like:
<link rel="service" type="application/atomsvc+xml"
href="/atomsvc.xml">
This document then points to the "collection" URL, which for our purposes is where you create documents. The service document would look like:
<service xmlns="http://www.w3.org/2007/app"
xmlns:atom="http://www.w3.org/2005/Atom">
<workspace>
<atom:title>SITE TITLE</atom:title>
<collection href="/atomapi">
<atom:title>SITE TITLE</atom:title>
<accept>*/*</accept>
<accept>application/atom+xml;type=entry</accept>
</collection>
</workspace>
</service>
Basically this indicates that you can POST any media to /atomapi (both Atom entries, and things like images).
To create a page, a client then does a POST like:
POST /atomapi
Content-Type: application/atom+xml; type=entry
Slug: /page/path
<entry xmlns="...">...</entry>
There’s an awkwardness here, that you can suggest (via the Slug header) what the URL for the new page is. The client can find the actual URL of the new page from the Location header in the response. But the client can’t demand that the slug be respected (getting an error back if it is not), and there’s lots of use cases where the client doesn’t just want to suggest a path (for instance, other documents that are being created might rely on that path for links).
Also, "slug" implies… well, a slug. That is, some path segment probably derived from the title. There’s nothing stopping the client from putting a complete path in there, but it’s very likely to be misinterpreted (e.g. translating /page/path to /2009/01/pagepath).
Bug I digress. Anyway, you can post every resource as an entry, base64-encoding the resource body, but Atompub also allows POSTing media directly. When you do that, the server puts the media somewhere and creates a simple Atom entry for the media. If you wanted to add properties to that entry, you’d edit the entry after creating it.
The last missing piece is how to get a list of all the pages on a site. Atompub does have an answer for this: just GET /atomapi will give you an Atom feed, and for our purposes we can demand that the feed is complete (using paging so that any one page of the feed doesn’t get too big). But this doesn’t seem like a good solution to me. GData specifies a useful set of queries to for feeds, but I’m not sure that this is very useful here; the kind of queries a client needs to do for this use case aren’t things GData was designed for.
The queries that seem most important to me are queries by page path (which allows some sense of "collections" without being formal) and by content type. Also to allow incremental updates on the client side, filtering these queries by last-modified time (i.e., all pages created since I last looked). Reporting queries (date of creation, update, author, last editor, and custom properties) of course could be useful, but don’t seem as directly applicable.
Also, often the client won’t want the complete Atom entry for the pages, but only a list of pages (maybe with minimal metadata). I’m unsure about the validity of abbreviated Atom entries, but it seems like one solution. Any Atom entry can have something like:
<link rel="self" type="application/atom+xml; type=entry"
href="url?format=atom" />
This indicates where the entry exists, though it doesn’t suggest very forcefully that the actual entry is abbreviated. Anyway, I could then imagine a feed like:
<feed>
<entry>
<content type="some/content-type" />
<link rel="self" href="..." />
<updated>YYYYMMDDTHH:MM:SSZ</updated>
<entry>
...
</feed>
This isn’t entirely valid, however — you can’t just have an empty <content> tag. You can use a src attribute to use indirection for the content, and then add Yet Another URL for each page that points to its raw content. But that’s just jumping through hoops. This also seems like an opportunity to suggest that the entry is incomplete.
To actually construct these feeds, you need some way of getting the feed. I suggest that another entry be added to the Atompub service document, something like:
<cmsapi:feed href="URI-TEMPLATE" />
That would be a URI Template that accepted several known variables (though frustratingly, URI Templates aren’t properly standardized yet). Things like:
- content-type: the content type of the resource (allowing wildcards like image/*)
- container: a path to a container, i.e., /2007 would match all pages in /2007/...
- path-regex: some regular expression to match the paths
- last-modified: return all pages modified at the given date or later
All parameters would be ANDed together.
So, open issues:
- How to strongly suggest a path when creating a resource (better than Slug)
- How to rename (move) or copy a page (it’s easy enough to punt on copy, but I’d rather move by a little more formal than just recreating a resource in a new location and deleting the original)
- How to represent abbreviated Atom entries
With these resolved I think it’d be possible to create a much simpler API than WebDAV, and one that can be applied to existing applications much more easily. (If you think there’s more missing, please comment.)
Automatically generated list of related posts:
- Atompub & OpenID One of the thinmgs I would like to do is...
- Atom Publishing Protocol: Atompub Doing stuff with the Atom Publishing Protocol, I’ve noticed that...
- Hypertext-driven URLs Roy T. Fielding, author of the REST thesis wrote an...
- Throw out your frameworks! (forms included) No, I should say forms particularly. I have lots of...
- Avoiding Silos: “link” as a first-class object One of the constant annoyances to me in web applications...
Useful thinking, it seems. Why don’t you go ahead and build something and report on what experience teaches you? That’d be even more useful.
Hi,
interesting, but I really have to say that I find your statements about WebDAV hard to accept…
I do not believe this is the case. And, as a matter of fact, you don’t need to implement all of it; there are lots of libraries out there that help.
Using GET to get a representation of a resource is a feature, not a problem.
It’s correct that this becomes problematic when the representation to be edited is not the same as the one served by default. One way around that is a different URI space (this is what AtomPub allows, but so does WebDAV). The other way is to use Content-Negotation, although I wouldn’t recommend it (this is supported in all Microsoft clients/servers, using the “Translate” header).
Well, paths are strings, but they do have hierarchy. See RFC3986. WebDAV maps these hierachies to WebDAV collections, but that doesn’t mean that any use case for WebDAV actually needs to use them.
Obviously, as I disagree with the statements made before, I disagree with that one as well :-)
Best regards, Julian
> It uses GET to get resources. This is probably its most fatal flaw. There is no CMS that I know of (except maybe one) where the thing you view the browser is the thing that you’d actually edit. To work around this CMSes use User-Agent sniffing or an alternate URL space.
> It’s correct that this becomes problematic when the representation to be edited is not the same as the one served by default. One way around that is a different URI space (this is what AtomPub allows, but so does WebDAV).
Ian, your proposal also uses GET to get the editable form of the resource, doesn’t it? And it uses a different URL space (in this case, URLs appended with “this-url?format=atom”), doesn’t it? So I don’t see how your proposal is any different than WebDAV in terms of meeting your objection to the way that GET is used to get source resources.
Incidentally, perhaps all of the discussion participants are already aware of this, but in case any readers are not, the old spec for WebDAV (http://www.ietf.org/rfc/rfc2518.txt) included a DAV:source property that would be attached to the “processed” URL and which would point to the “source” URL (i.e., DAV:source is analogous to Ian’s ). DAV:source was removed from the current spec (http://www.ietf.org/rfc/rfc4918.txt) due to “lack of implementation experience” — but presumably no one would complain if you used DAV:source as originally intended.
My proposal uses an explicit
<link>
, which I think satisfies this problem. I suppose this is similar to DAV:source, though the specification for DAV:source seems very vague to me, and not at all reassuring. Also, I think it requires fetching properties on a page, and just involves a lot of chasing links around (thus you can’t determine the presence of that link automatically), compared to putting in the HTML explicitly.Of course, it would be even better if you could not just label the source for an entire page, but if you could label a fragment of a page with its source. Heck, maybe you could start to mark up not just the underlying data sources, but also the underlying templates. Simply by marking up fragments of data, you could start building up a Javascript layer that would read these links and do inline editing on the client side.
> How to rename (move) or copy a page (it’s easy enough to punt on copy, but I’d rather move by a little more formal than just recreating a resource in a new location and deleting the original).
Please forgive me that I have to point out that this becomes a non-issue if you use WebDAV.
Best regards, Julian
Writing a WebDAV server isn’t hard, especially if you rely on the premade libraries. Perl has Net::DAV::Server, which I’ve used to serve a custom file system within 10 minutes of work and I’m confident that Python also has a mostly premade WebDAV server to which you just need to provide the proper backend.
I also don’t see the problem of having to access things via a different URI when wanting to download/edit them – in fact, I don’t see what’s wrong with an appended ?action=edit or /webdav/foo to edit the resource /foo.
WebDAV clients are different between OSes and unfortunately, Microsoft managed to make the “Web Folders” unusable with XP SP2 , but again, I think that Python should also have a WebDAV client library.
For what’s worth, Nanoki, a simple wiki engine implemented in Lua, does provide a WebDAV interface… so one can ‘mount’ the entire wiki directly on a file system… check the online demo for an illustration:
http://svr225.stepx.com:3388/nanoki
dav://svr225.stepx.com:3388/
As far as WebDAV being ‘hard’ and/or ‘a misfit’, such characterization seems to be more related to, hmmm, ‘unfamiliarity’ than anything else :)
> How to strongly suggest a path when creating a resource (better than Slug)
Wouldn’t the obvious answer be using PUT? It would be more forcing than strongly suggesting, but I would assume that for a wiki, explicitly stating the URI would indeed be what you want.
I know it’s strictly speaking not part of Atompub, but otoh, it is definitely consistent with HTTP, and could be seen as a compatible extension to Atompub. Everything else, including creating via POST (with slug) would still work. PUT on a non-existing URI would return
201 Created
if this method of creating is supported,404 Not Found
if the service is restricted to standard Atompub. PUT on an existing URI would remain as it is.Agreed, except that I think something other than 404 would be better choice in case the server doesn’t allow that particular URI (such as 403 or 409).
It occurred to me too, but I still went for 404. It would perhaps be nice to know whether the server supports it or not, but servers that don’t allow/support this would probably return 404 anyway. It seems a bad idea to implement a non-standard atompub extension by requiring that non-supporting servers do special-case handling.
Furthermore:
what if you’re actually doing PUT-for-update as in standard atompub, would you want to get anything else than 404 if the entry doesn’t exist?
what if it already exists, but you are really forbidden to update (403), or there is a conflict (409)?
If needed, indicating support for this seems to belong in the service document. But I’m not sure 404 actually poses a problem, because the client will presumably know it is trying to create the entry (and will hence understand what the 404 means)
[Someone else suggested the same thing](http://blog.mozilla.com/rob-sayre/2009/01/11/ian-bicking-on-atompub-vs-webdav/) and as I first thought about it PUT seemed reasonable, but after a bit more thought I’m not that happy with it. The parts of the protocol I’ve mapped out so far don’t require changes to existing CMS URLs and controllers except for the introduction of the
<link>
tags. With PUT, all pages have to accept and properly respond to those requests. Note that when editing a resource, the PUT location is the URL from<link rel="edit">
, not the resource URL itself.Ian, why are you saying that supporting PUT in one part of the namespace requires doing so everywhere?
Well, you need PUT everywhere that’s editable. With the indirection otherwise available, you could have something like
<link rel="edit" href="/atompubapi.php?path=/path/to/page" />
in the Atom entry.Hmmm, hadn’t thought about the edit uri and its implications for PUT/create… It could be made to work with the resource uri: create (and return in Content-Location) a new edit uri (but that seems to introduce other problems)
Another idea maybe: from the http spec: “The meaning of the Content-Location header in PUT or POST requests is undefined; servers are free to ignore it in those cases.” More “undefined custom stuff”, I know… But I would expect Content-Location in POST to have the effect of forcing the path of the newly created resource. (problem: you couldn’t count on the server respecting it…)
I think I’d just use PUT on a non-existing URI from the edit uri-space, and expect this to result in an equivalent resource uri (wiki-like uris ignore uri-opacity anyway, and you’d probably want edit uri’s to have a similar path as well). And then ignore the problems only if/until you actually hit the wall ;-)
Maybe a way to make it specific would be an extension to the atompub service document. Something like:
Then the client can know for sure that the server supports this feature. Maybe the PUT functionality people suggest could be allowed for, like:
Ian,
> There’s an awkwardness here, that you can suggest (via the Slug header) what the URL for the new page is. The client can find the actual URL of the new page from the Location header in the response. But the client can’t demand that the slug be respected (getting an error back if it is not), and there’s lots of use cases where the client doesn’t just want to suggest a path (for instance, other documents that are being created might rely on that path for links).
To be fair since the server can do what it sees best, you can write your server to always respect the Slug header. It’s an implementation detail which isn’t formally advertised but I don’t think that’ is really an issue here.
> (…) you can put any type of content in , encoded in base64.
While being true, media types terminating with /xml or +xml don’t require base64 encoding. Nor do those starting with text/
> This indicates where the entry exists, though it doesn’t suggest very forcefully that the actual entry is abbreviated.
You could define a specific rel attribute value to indicate that or perhaps use an atom:category for that. Alternatively you might even be able to use the app:control element for that sort of meta-data.
If it’s already overloading Slug with atypical functionality, I wonder if it would be best just to use a different header entirely?
Readying the [section on app:control](http://bitworking.org/projects/atom/rfc5023.html#rfc.section.13.1) it appears that it’s only intended to be used by clients to indicate something to the server. I suppose just a custom element would be best, as there’s nothing already in Atom that really fits. A category would be a misrepresentation of the original entry, I think.
Regarding Slug overloading, I think it depends if you’re talking about Slug in your particular application or in a more general way. Slug is rather loosely defined and it would seem rather fair to make it more clearly specified in your application as long as it wasn’t going against the spirit of the spec.
About app:control, I guess you’re right, since it’s more a way for the client to pass information, it is probably not the right tool here. Your remark on the category element is fair only because AtomPub has decided to consider posting an Atom entry has a specific case where the returned member entry was generated from the POSTed entry. I’ve often thought it would lead to confusion not to consider an Atom entry has any other media-type, in other words so that you distinguish what had been sent from had been generated. That’s the way it is and due to that the category element is indeed probably the wrong idea here.
I don’t see what webdav offers that SVN does not.
SVN is WebDAV with versioning extensions. So yes, WebDAV doesn’t offer anything SVN does not…? Despite that, I don’t think it was ever a useful basis for Subversion (e.g., [this post](http://blog.red-bean.com/sussman/?p=139))
That post really is about DeltaV (the WebDAV versioning extensions defined in RFC 3253), not WebDAV itself. As far as I can tell, nobody is questioning the value value of WebDAV (GET/PUPT/PROPFIND/PROPPATCH/REPORT) itself as Subversion transport protocol.
A few years ago I was wondering about the relationship between WebDAV and Atom. Here is a wiki page in which I recorded notes on what I found:
http://intertwingly.net/wiki/pie/WebDav
Summary for the busy:
Seems that Sam Ruby and Greg Stein, at least, worked on WebDAV before working on Atom, and Sam made a comment that may imply that he thinks of Atom as a lightweight replacement WebDAV, at least for some tasks (but I haven’t actually asked Sam Ruby if I’m interpreting him correctly).
Hmm, I think asking every little blog and wiki to implement queries limited by “path-regex” is too much. I think this proposal clarifies which which operations wikis need:
http://www.jspwiki.org/Wiki.jsp?page=WikiRPCInterface2
I think if you develop a standard that forces wiki servers to do more than that, people won’t implement it. In fact, even that proposal is more than what will be tolerated by most wiki programmers. I think the most you could really get all people to do is:
(1) Get HTML version of page (2) Get editable wiki text version of page (3) Get RecentChanges (4) Submit edited page (5) Identify which version/level/services of the standard is/are being provided (in a very easy-to-implement way, i.e. the server might return a number or a string or two, and maybe a list of strings (“list of services provided”), and maybe even a hardcoded XML document).
Other things that you might be able to get a substantial number of implementors to do (in my opinion): * Present RecentChanges in an Atom feed * Hardcode a service discovery document (I doubt you would convince everyone to add service discovery links to each wiki page though) * A minority might even deign to allow a “later than date X” query to limit RecentChanges. * A minority might support getting page histories and past page versions.
> I think the most you could really get all people to do is:
oh, and i forgot
(6) a list of all of the pages on the wiki
oh and i should note — even though i’m pointing out things that i think won’t be universally implemented, i do believe that sometimes it’s a good idea to do something in your own implementation, or even to suggest something in a spec, even if you don’t think “everyone will implement it”.
You may also be interested in posting to http://www.wikisym.org/cgi-bin/mailman/listinfo/wiki-standards, in fact, maybe I’ll post there directing people to this article. Before wiki-standards existed, some similar topics used to be aired on https://lists.sourceforge.net/lists/listinfo/interwiki-discuss.
Also, I worked on a project awhile ago that might be slightly related, called WikiGateway (http://interwiki.sourceforge.net/cgi-bin/wiki.pl?WikiGateway), the goal of which was to provide a standard API for interacting programmatically with various different types of wiki (WikiGateway included a commandline wikiclient, a Perl library and a Python library for clients, and also gateway servers that exposed foreign wikis via WebDAV and via Atom and via XMLRPC). The project only ever worked with a handful of wiki types, and it is now a couple of years out of date (maybe in a few more years I’ll update it again..). I don’t know what similar things have been done since.
I hope that someday the various wiki implementors rally around a single standard for programmatic interaction with wikis. There was some discussion of this around 2005, but I haven’t heard much about it recently; then again, I haven’t been looking.