.. -*- coding: utf-8 -*-
imageSTORE REST protocol
========================
A frontend can interact with the store using a REST_ interface. The
store is more like a web site than like a API. Instead of objects and
methods what is exposed are resources and representations. A client
can interact with the resources using HTTP GET, POST, PUT and DELETE
requests.
This document shows how you can interact with the store using the REST
interface. We go into the details of various interactions. The nice
thing about REST is that it is just HTTP, and most languages have
support libraries to deal with the HTTP protocol.
The REST protocol in this application uses XML. In this document, we
will also show how you can use XML processing tools to make dealing
with XML relatively convenient. Your own application may be written in
a different language than Python, in which case you cannot use lxml,
the XML processing library we use in this document. Since XML
processing tools exist in most languages, it should be possible to
find equivalents to what we do here.
.. _REST: http://rest.blueoxen.net/cgi-bin/wiki.pl
Accessing the application
-------------------------
The application is stored in an object database, the ZODB. In order to
create the application we first need to create a new application
object. Let's create it here::
>>> from imagestore.app import ImageStore
>>> store = ImageStore()
We now need to store this object into the object database itself. The
object database exposes a root container we can store things in::
>>> root_container = getRootFolder()
It exposes its content using Python dictionary style access. We will store
the store in the root container, naming it 'store'::
>>> root_container['store'] = store
The URL of the application will be as follows::
>>> app_url = 'http://localhost/store'
We can now access the application using HTTP GET on this URL::
>>> response = http_get(app_url)
The GET request was a success, so we get the HTTP status of 200 (OK)
from the server in the response::
>>> response.getStatusString()
'200 Ok'
What is more interesting is what is in the body of the response. We
will now go into some detail on how to handle this. What is in the
body is an XML document::
>>> xml = response.getBody()
The ``Content-Type`` of the response will be ``application/xml``. All
XML content will be encoded using the UTF-8 encoding (the XML
default)::
>>> response.getHeader('Content-Type')
'application/xml; charset=UTF-8'
XML documents are essentially just plaintext documents. Let's take a
look at the raw text of the document::
>>> xml
''
We need to parse this document using an XML parser in order to do
anything useful with it, so let's parse it::
>>> from lxml import etree
>>> el = etree.XML(xml)
``el`` now refers to the top element of the document, allowing us to
access the rest of the document (by navigating to its children and so
on). The lxml libraries exposes a convenient way to access and
manipulate XML structures.
Raw XML documents as returned by the ImageStore are a bit hard to
read, as they are all on one line. This is compact and more convenient
to deal with programmatically, but not very readable. Using lxml's
pretty-print facility, we can display the XML document in
pretty-printed form::
>>> print etree.tostring(el, pretty_print=True)
We will be using this pretty-print facility a lot to display XML
documents. Let's define a convenience function that will help us
format the response (and return the parsed root element, which we
may need later)::
>>> def pretty(response):
... el = etree.XML(response.getBody())
... print etree.tostring(el, pretty_print=True)
... return el
This function is a big ugly in the way that the *side effect* is
printing something and it actually returns the element, but it does
the job.
Let's see whether it works::
>>> el = pretty(response)
Let's take a closer look at the document. The document contains a
namespace declaration::
xmlns="http://studiolab.io.tudelft.nl/ns/imagestore"
What this declares is that the whole document is in the following
namespace::
http://studiolab.io.tudelft.nl/ns/imagestore
Namespaces are used to make it possible to reuse short element names
in different XML vocabularies without conflicts. They are URIs. In
fact this one looks like a URL (a special kind of URI) and in fact it
is, but doesn't function like a normal URL. Nothing has to happen when
you point your browser to it. URLs are used for namespaces to to have
a way to create a unique identifier. Since we will need this
particular namespace URI programmatically later, let's store it::
>>> NS = 'http://studiolab.io.tudelft.nl/ns/imagestore'
This namespace URI is one we created specifically for the image store
protocol. It is also the only one we're ever going to need when
handling the protocol of the image store.
Let's take a look at the top XML element of the document::
>>> el.tag
'{http://studiolab.io.tudelft.nl/ns/imagestore}imagestore'
What you see here is that the lxml library represents the
``imagestore`` tag with an extra bit in front:
``{http://studiolab.io.tudelft.nl/ns/imagestore}``. Because of the
special ``xmlns=`` declaration in the XML document, all elements in
this document will be in this namespace. Let's for example look at the
``sessions`` element that is below the top element (the third
element, so index ``2``)::
>>> el[2].tag
'{http://studiolab.io.tudelft.nl/ns/imagestore}sessions'
In technical terms, this special way to represent the element name is
called "Clarke notation". Clarke notation is verbose, but since the
namespace URI is in there it is actually quite convenient to use
programmatically, and always unique across all XML documents. This is
why lxml uses it.
Constructing URLs
-----------------
The store has a number of sessions in it, contained in the sessions
container. There are no sessions yet, so the ``sessions`` element in
the XML is empty::
>>> len(el[2])
0
We want to create a new session now. To do this, we need to have the
URL of the sessions container. How do we get to it?
One idea behind REST is that the client application should never make
any assumptions about URLs itself and only use those provided by the
server. In effect it is just like a human user typically uses a web
site or web application: the user usually clicks links, and doesn't
construct them manually in the location bar. The site provides a
navigation structure so that the user can get to where they are.
In this RESTful application, the server provides a relative path to
the sessions container in the ``href`` attribute of the ``sessions``
element. We want to access the sessions container directly, so we need
this information.
XPath is a way to easily retrieve bits of information from an XML
document. We will use this to to retrieve the ``href`` attribute from
the document. Another way to retrieve information from the document
would be to look at a its tree structure and navigate to the right
attribute. In fact, we saw some of this navigation above, using the
``[0]`` construction to get to the ``sessions`` element. Yet another
way would be to parse it with a streaming parser like SAX and watch
for the information we need. The advantage of using XPath is that it
is both succint and convenient, so in this document we will typically
use XPath.
We know that all our elements are in a special namespace. We use a
default namespace declaration to indicate this. XPath 1.0, which is
what we're using, unfortunately does not support default
namespaces. Instead, namespaces need to be indicated specifically with
a so called *namespace prefix*. Such namespace prefixes can in fact
also be used in XML documents themselves, but since we don't use that
facility here we won't show it.
A prefix is just a shorthand for a namespace URI. It is there for
convenience only (it's less long than a URI) and does not have a
meaning by itself. We can define what prefix belongs to which namepace
URI using a mapping like the following::
>>> NS_MAP = {'ids': NS}
Here we say that the prefix ``ids`` (we just made that shorthand up)
is mapped to the namespace URI in ``NS`` (which we defined earlier
on).
When we want to address elements that are in a namespace, which as
we've seen before, all the elements in the idstore XML are, we will
need to use prefixes in the XPath expression.
Inside an XPath expression you can now identify an element in a
certain namespace by using the prefix, the colon (``:``) and the
element itself, like this::
ids:sessions
With lxml, we can use the ``xpath`` method to evaluate XPath
expressions on element objects. One element we already have is ``el``:
the ``imagestore`` element. Let's now access the ``sessions`` element
using XPath. We know this is directly under the ``imagestore``
element. In XPath you can get to an element directly under another one
like this::
ids:sessions
A very simple XPath expression indeed: we just say: give us all elements below
the current one (``el`` in our case) that have that name. In Clarke notation
that would read like this::
'{http://studiolab.io.tudelft.nl/ns/imagestore}sessions'
Now let's take a look. Note that we have to pass ``NS_MAP`` along,
otherwise XPath will have no idea what the prefix ``ids`` really
stands for::
>>> l = el.xpath('ids:sessions', namespaces=NS_MAP)
XPath typically returns a list instead of a single element, even if
there is only a single sub-element available. Our list will just have
a single element::
>>> len(l)
1
Let's get it::
>>> sessions_el = l[0]
In fact it's one we've already seen before (``el[2]``)::
>>> sessions_el is el[2]
True
>>> sessions_el.tag
'{http://studiolab.io.tudelft.nl/ns/imagestore}sessions'
We are going to use a more complicated XPath expression now to
retrieve the contents of the ``href`` attribute that is on the
``sessions`` element. We will not go into further details on how this
works - you can look up the rest in an XPath tutorial::
>>> el.xpath('ids:sessions/@href', namespaces=NS_MAP)
['sessions']
That's correct: the ``href`` attribute of the ``sessions`` element has
indeed the value ``sessions`` too::
As we said previously, XPath usually returns a list of matching
elements or strings. When we look for a single ``href`` this is
somewhat inconvenient. Let's define a convenience function which will
take some work out of our hands (including passing in ``NS_MAP``):
>>> def xpath(el, path):
... return el.xpath(path, namespaces=NS_MAP)[0]
Let's get the relative path again using this convenience function::
>>> rel_path = xpath(el, 'ids:sessions/@href')
>>> rel_path
'sessions'
That's better.
We cannot use relative URLs directly to access the application. We
have to turn them into absolute URLs first. Using the relative URL we
retrieved with XPath, we can now construct the absolute URL to the
sessions container, by just adding the relative URL to the URL we are
currently accessing.
>>> app_url + '/' + rel_path
'http://localhost/store/sessions'
Let's define another convenience function to construct an absolute URL
from another one and a relative one::
>>> def rel(path, rel_path):
... return path + '/' + rel_path
We can now use this function to construct the absolute URL to the
sessions container::
>>> sessions_url = rel(app_url, rel_path)
>>> sessions_url
'http://localhost/store/sessions'
We can make things even more convenient by combining these two
functions::
>>> def url_to(url, element, path):
... return rel(url, xpath(element, path))
Using this we can construct the absolute URL indicated by some ``href`` in the
document in one step::
>>> sessions_url = url_to(app_url, el, 'ids:sessions/@href')
>>> sessions_url
'http://localhost/store/sessions'
Let's take a look at what is behind that URL by issuing a HTTP GET
request to it::
>>> response = http_get(sessions_url)
Let's take a look at the response using our ``pretty`` convenience function::
>>> el = pretty(response)
Creating a session
------------------
Now that we have constructed a URL directly to the ``sessions``
container, how do we actually create a new session?
We can do this by issuing a POST request to the sessions container
URL. The POST request is supplied with XML which defines the new
session. That XML looks like this::
>>> session_xml = '''
...
... '''
We want to create a new session with the name ``one``. Please also
note that we need to declare all elements to be contained in our
special namespace using ```xmlns`` again; if we won't do that, things
won't work.
Let's now issue the HTTP POST request to the ``sessions`` container::
>>> response = http_post(sessions_url, session_xml)
When we have successfully created a new object using a POST, we expect
a status of 201 (Created)::
>>> response.getStatusString()
'201 Created'
Since our POST to create a new session was indeed successful, we
expect the URL of the newly created object to be the ``Location``
header of the response::
>>> response.getHeader('Location')
'http://localhost/store/sessions/one'
The body of the POST response contains a simple statement of success::
>>> success_el = pretty(response)
The new session should now be visible in the sessions container::
>>> response = http_get(sessions_url)
>>> el = pretty(response)
...
...
We cannot POST a session with the same name twice::
>>> response = http_post(sessions_url, session_xml)
The status code of the response will be 409 (Conflict)::
>>> response.getStatusString()
'409 Conflict'
The Location header will have the URL to the resource that this request
is conflicting with (the existing session called 'One')::
>>> response.getHeader('Location')
'http://localhost/store/sessions/one'
The body contains more information about what is wrong::
>>> error_el = pretty(response)
There is already a resource with this name in this location.
Each session is automatically supplied with an ``images`` container,
containing the images in use for that session, as well as a
``collection``, which is the root group of a group hierarchy for that
session.
Let's construct a link to the individual session we just created::
>>> session_url = url_to(sessions_url, el, 'ids:session[@name="one"]/@href')
>>> session_url
'http://localhost/store/sessions/one'
We can examine this session::
>>> response = http_get(session_url)
>>> response.getStatusString()
'200 Ok'
>>> el = pretty(response)
...
...
Images in the images container
------------------------------
The session has an images container. This container contains the binary
images: JPGs and so on. Let's examine its XML representation::
>>> images_url = url_to(session_url, el, 'ids:images/@href')
>>> images_url
'http://localhost/store/sessions/one/images'
>>> response = http_get(images_url)
>>> response.getStatusString()
'200 Ok'
>>> el = pretty(response)
It doesn't contain any images yet. Let's POST an image to the
``images`` container to create a new image. We need to have some way
to set the name of the new image, and unlike in the session, we don't
have the XML body to set it. Instead, we supply the name of image
using the special ``Slug`` header::
>>> data = image_data('test1.jpg')
>>> response = http_post(images_url, data,
... Slug='alpha.jpg')
We should get the 201 (Created) response status again::
>>> response.getStatusString()
'201 Created'
The location of the new image object is in the Location header::
>>> response.getHeader('Location')
'http://localhost/store/sessions/one/images/alpha.jpg'
The body contains a simple success message::
>>> success_el = pretty(response)
The image should now be there::
>>> response = http_get(images_url)
>>> el = pretty(response)
It should contain the uploaded data::
>>> alpha_url = url_to(images_url, el, 'ids:source-image/@href')
>>> alpha_url
'http://localhost/store/sessions/one/images/alpha.jpg'
>>> response = http_get(alpha_url)
>>> response.getStatusString()
'200 Ok'
>>> response.getBody() == data
True
We cannot upload an image without the Slug header::
>>> response = http_post(images_url, data)
>>> response.getStatusString()
'400 Bad Request'
>>> error_el = pretty(response)
Slug header is missing from request.
Using HTTP PUT we can also replace an existing image::
>>> data2 = image_data('test2.jpg')
>>> response = http_put(alpha_url, data2)
>>> response.getStatusString()
'200 Ok'
>>> response = http_get(alpha_url)
>>> retrieved_data = response.getBody()
>>> retrieved_data == data
False
>>> retrieved_data == data2
True
Let's put back the original image, and add a second image::
>>> response = http_put(alpha_url, data)
>>> response = http_post(images_url, data2,
... Slug='beta.jpg')
We should now see two images::
>>> response = http_get(images_url)
>>> el = pretty(response)
Let's add a third image::
>>> response = http_post(images_url, data,
... Slug='gamma.jpg')
And a fourth image::
>>> response = http_post(images_url, data,
... Slug='delta.jpg')
>>> response = http_get(images_url)
>>> el = pretty(response)
We cannot add an image with the same name twice::
>>> response = http_post(images_url, data,
... Slug='delta.jpg')
>>> response.getStatusString()
'409 Conflict'
There is a conflict with an already existing resource. The resource's
URL is in the Location header::
>>> response.getHeader('Location')
'http://localhost/store/sessions/one/images/delta.jpg'
The body of the response contains a bit more information about what
went wrong, in plain-text form::
>>> error_el = pretty(response)
There is already a resource with this name in this location.
Now let's look into deleting images. First, we construct a URL to the
delta image::
>>> delta_url = url_to(images_url, el, 'ids:source-image[@name="delta.jpg"]/@href')
>>> delta_url
'http://localhost/store/sessions/one/images/delta.jpg'
Using HTTP DELETE we delete the ``delta`` image::
>>> response = http_delete(delta_url)
We expect a 200 OK response code if everything went okay::
>>> response.getStatusString()
'200 Ok'
The response body contains a success message::
>>> success_el = pretty(response)
The ``delta`` image should now be gone from the overview::
>>> response = http_get(images_url)
>>> el = pretty(response)
Note that Slugs have restrictions on naming. We cannot add images with
(Slug) names that contain illegal characters, such as space
characters::
>>> response = http_post(images_url, data,
... Slug='delta .jpg')
>>> response.getStatusString()
'400 Bad Request'
>>> error_el = pretty(response)
Slug name 'delta .jpg' contains illegal characters.
Groups
------
Groups are contained in other groups. The root group of a session is
the ``collection`` group, which is directly contained by the session.
Let's examine the ``collection`` group of the session now::
>>> response = http_get(session_url)
>>> el = etree.XML(response.getBody())
>>> collection_url = url_to(session_url, el, 'ids:group[@name="collection"]/@href')
>>> collection_url
'http://localhost/store/sessions/one/collection'
Each session also contains a special ``groups`` section. This shows a
list of all groups in that session (flattened from their nested
structure)::
>>> groups_url = url_to(session_url, el, 'ids:groups/@href')
>>> groups_url
'http://localhost/store/sessions/one/groups'
The only group in the session at this point in time is the collection
(root) group::
>>> response = http_get(groups_url)
>>> el = pretty(response)
...
Let's go to the collection group directly::
>>> response = http_get(collection_url)
>>> el = pretty(response)
...
We can examine it in more detail::
>>> print etree.tostring(el, pretty_print=True)
...
As we can see, no groups are contained by the collection group
yet. The ``objects`` listing is empty.
We will now add a new group by issuing a POST to the collection
group's objects URL. As the request body we will supply the XML
describing the group::
>>> collection_objects_url = url_to(collection_url, el, 'ids:objects/@href')
>>> xml = '''
...
...
...
... 1.0
... 2.0
...
... 3.0
... 4.0
...
...
...
... '''
We create a group with a particular image as its source, a set of
metadata and no sub-objects::
>>> response = http_post(collection_objects_url, xml)
>>> response.getStatusString()
'201 Created'
>>> response.getHeader('Location')
'http://localhost/store/sessions/one/collection/objects/test'
>>> success_el = pretty(response)
The new group should be visible in the collection group::
>>> response = http_get(collection_url)
>>> el = pretty(response)
...
...1.0...2.03.04.0
We will add another group called ``test2``::
>>> xml = '''
...
...
...
... 1.0
... 2.0
...
... 3.0
... 4.0
...
...
...
... '''
>>> response = http_post(collection_objects_url, xml)
>>> response.getStatusString()
'201 Created'
>>> response.getHeader('Location')
'http://localhost/store/sessions/one/collection/objects/test2'
>>> success_el = pretty(response)
We should see this new group in the ``objects`` listing now::
>>> response = http_get(collection_url)
>>> el = pretty(response)
...
...
...
Let's go back for a bit to the flat groups listing on the session. The
new groups should be there too::
>>> response = http_get(groups_url)
>>> el2 = pretty(response)
...
...
...
We will remove the ``test2`` group again:, using a HTTP DELETE::
>>> test2_url = url_to(collection_url, el, 'ids:objects/ids:group[@name="test2"]/@href')
>>> test2_url
'http://localhost/store/sessions/one/collection/objects/test2'
>>> response = http_delete(test2_url)
>>> response.getStatusString()
'200 Ok'
>>> success_el = pretty(response)
The deleted group should now be gone::
>>> response = http_get(collection_url)
>>> el = pretty(response)
...
...
XML validation
--------------
When we submit data with POST or PUT, we should submit XML text. When
we submit something that is not XML at all, we expect an error::
>>> non_xml = "This isn't valid XML at all"
>>> response = http_post(collection_objects_url, non_xml)
>>> response.getStatusString()
'400 Bad Request'
>>> error_el = pretty(response)
The submitted data was not well-formed XML.
Even if the data looks like XML it can be malformed. Let's try again::
>>> non_xml = '''
...
...
...
... '''
>>> response = http_post(collection_objects_url, non_xml)
>>> response.getStatusString()
'400 Bad Request'
>>> error_el = pretty(response)
The submitted data was not well-formed XML.
This isn't allowed with a PUT either::
>>> response = http_put(collection_objects_url, non_xml)
>>> response.getStatusString()
'400 Bad Request'
>>> error_el = pretty(response)
The submitted data was not well-formed XML.
Even if we do add well-formed XML, we can add invalid XML: XML that
contains illegal elements or XML that isn't allowed in this location.
Let's try adding a group with invalid XML (the ``flub`` element)::
>>> xml = '''
...
...
...
...
... 1.0
... 2.0
...
... 3.0
... 4.0
...
...
...
... '''
>>> response = http_post(collection_objects_url, xml)
>>> response.getStatusString()
'400 Bad Request'
>>> error_el = pretty(response)
The submitted XML was invalid.
There should of course be no new ``test3`` resource available now::
>>> response = http_get(collection_objects_url)
>>> el2 = pretty(response)
...
We aren't allowed to PUT invalid XML either::
>>> response = http_put(collection_objects_url, xml)
>>> response.getStatusString()
'400 Bad Request'
>>> error_el = pretty(response)
The submitted XML was invalid.
Let's for a change inspect the raw HTTP result to see whether the response
structure is what we expect::
>>> print http('POST /store/sessions/one/collection/objects HTTP/1.1\r\n%s' % xml)
HTTP/1.1 400 Bad Request
Content-Length: 117
Content-Type: application/xml; charset=UTF-8
The submitted XML was invalid.
We cannot POST or PUT something which has a ``name`` attribute in it
that contains illegal characters. The name is restricted to avoid the
creation of URLs with hard to read characters in them. Let's try
POSTing a document that has a name with an illegal character (a
question mark)::
>>> xml = '''
...
...
...
... 1.0
... 2.0
...
... 3.0
... 4.0
...
...
...
... '''
>>> response = http_post(collection_objects_url, xml)
>>> response.getStatusString()
'400 Bad Request'
>>> error_el = pretty(response)
The submitted XML was invalid.
Spaces in names aren't allowed either::
>>> xml = '''
...
...
...
... 1.0
... 2.0
...
... 3.0
... 4.0
...
...
...
... '''
>>> response = http_post(collection_objects_url, xml)
>>> response.getStatusString()
'400 Bad Request'
It is legal to include periods, underscores and hash signs::
>>> xml = '''
...
...
...
... 1.0
... 2.0
...
... 3.0
... 4.0
...
...
...
... '''
>>> response = http_post(collection_objects_url, xml)
>>> response.getStatusString()
'201 Created'
>>> response.getHeader('Location')
'http://localhost/store/sessions/one/collection/objects/foo_bar-baz.foo'
Let's clean up again::
>>> response = http_delete(response.getHeader('Location'))
>>> response.getStatusString()
'200 Ok'
It is also not allowed to add content to a location where it is not
allowed. Here we try to add a group directly to another group (NOT its
objects sub-url)::
>>> xml = '''
...
...
...
... 1.0
... 2.0
...
... 3.0
... 4.0
...
...
...
... '''
>>> response = http_post(collection_url, xml)
>>> response.getStatusString()
'400 Bad Request'
>>> error_el = pretty(response)
The submitted content could not be added in this location.
Image objects
-------------
A group's ``objects`` container can, besides sub-groups, also contain
``image`` objects.
Let's examine the test group::
>>> test_url = url_to(collection_url, el, 'ids:objects/ids:group[@name="test"]/@href')
>>> response = http_get(test_url)
>>> el = pretty(response)
...1.0...2.03.04.0
The group has metadata, and a number of sub-objects. In this case we
have no sub-objects yet. Let's take a look at the sub-object listing
by itself::
>>> test_objects_url = url_to(test_url, el, 'ids:objects/@href')
>>> response = http_get(test_objects_url)
>>> el = pretty(response)
We can add an image object to a group using a POST to the group's
objects container::
>>> xml = '''
...
...
...
... 1.0
... 2.0
...
... 3.0
... 4.0
...
...
... '''
>>> response = http_post(test_objects_url, xml)
>>> response.getStatusString()
'201 Created'
>>> response.getHeader('Location')
'http://localhost/store/sessions/one/collection/objects/test/objects/a'
>>> success_el = pretty(response)
The image can now be found in the objects container::
>>> response = http_get(test_objects_url)
>>> el = pretty(response)
...1.0...2.03.04.0
Let's add a second object::
>>> xml = '''
...
...
...
... 1.0
... 2.0
...
... 3.0
... 4.0
...
...
... '''
>>> response = http_post(test_objects_url, xml)
The object is now there::
>>> response = http_get(test_objects_url)
>>> el = pretty(response)
...
...
Let's remove it again::
>>> b_url = url_to(test_objects_url, el, 'ids:image[@name="b"]/@href')
>>> response = http_delete(b_url)
It should be gone from the overview again::
>>> response = http_get(test_objects_url)
>>> el = pretty(response)
...
Let's examine the ``a`` object. It has a link to the actual image used
(that we supplied) and links to the x and y coordinates of that
image::
>>> a_url = url_to(test_objects_url, el, 'ids:image[@name="a"]/@href')
>>> response = http_get(a_url)
>>> el = pretty(response)
...1.0...2.03.04.0
We can also look at the source individually::
>>> source_url = url_to(a_url, el, 'ids:source/@href')
>>> response = http_get(source_url)
>>> source_el = pretty(response)
We can modify the source using PUT. Note that we should modify the
name, as the ``src`` URL is autogenerated::
>>> xml = '''
...
... '''
>>> response = http_put(source_url, xml)
>>> response.getStatusString()
'200 Ok'
It will have changed now::
>>> response = http_get(source_url)
>>> source_el = pretty(response)
We now look at the metadata by itself::
>>> a_metadata_url = url_to(a_url, el, 'ids:metadata/@href')
>>> response = http_get(a_metadata_url)
>>> el = pretty(response)
...1.0...2.03.04.0
We can modify the metadata using PUT::
>>> xml = '''
...
... 0.0
... 0.0
...
... 170.0
... 65.0
...
... '''
>>> response = http_put(a_metadata_url, xml)
>>> response.getStatusString()
'200 Ok'
When we retrieve the metadata, it will have changed::
>>> response = http_get(a_metadata_url)
>>> el = pretty(response)
...0.0...0.0170.065.0
Accessing individual metadata fields
------------------------------------
We can access individual metadata fields::
>>> response = http_get(a_metadata_url)
>>> el = etree.XML(response.getBody())
>>> x_field_url = url_to(a_metadata_url, el, 'ids:x/@href')
>>> x_field_url
'http://localhost/store/sessions/one/collection/objects/test/objects/a/metadata/x'
>>> response = http_get(x_field_url)
>>> el = pretty(response)
170.0
We can also alter individual metadata fields using PUT::
>>> xml = etree.tostring(el, pretty_print=True)
>>> xml = xml.replace('170.0', '181.0')
>>> response = http_put(x_field_url, xml)
>>> response.getStatusString()
'200 Ok'
>>> response = http_get(x_field_url)
>>> el = pretty(response)
181.0
This also works for set fields::
>>> response = http_get(a_metadata_url)
>>> el = etree.XML(response.getBody())
>>> tags_field_url = url_to(a_metadata_url, el, 'ids:tags/@href')
>>> response = http_get(tags_field_url)
>>> el = pretty(response)
Let's change the list of tags with a PUT::
>>> sub = etree.SubElement(el, '{%s}tag' % NS)
>>> sub.text = 'a'
>>> sub = etree.SubElement(el, '{%s}tag' % NS)
>>> sub.text = 'b'
>>> xml = etree.tostring(el, pretty_print=True)
>>> response = http_put(tags_field_url, xml)
>>> response = http_get(tags_field_url)
>>> el = pretty(response)
ab
Let's look at an overview of all the metadata after our changes::
>>> response = http_get(a_metadata_url)
>>> el = pretty(response)
...0.0...0.0ab181.065.0
We can use non-ascii characters in tags::
>>> xml = u'Hé'.encode('UTF-8')
>>> response = http_put(tags_field_url, xml)
>>> response = http_get(tags_field_url)
We'll look at the raw response body to see whether the character is
encoded correctly. Because doctests do not support printing unicode
characters, we'll look at the raw string output instead. We expect the
character 'é' to be encoded in UTF-8 as \xc3\xa9::
>>> response.getBody()
'H\xc3\xa9'
Creation and modification datetime
----------------------------------
Objects track the time they were created and when they were modified.
To demonstrate this, we first need to record the current datetime::
>>> from datetime import datetime
>>> start = datetime.now()
When we create a new object, it will have its creation datetime set::
>>> xml = '''
...
...
...
... 1.0
... 2.0
...
... 3.0
... 4.0
...
...
... '''
>>> response = http_post(test_objects_url, xml)
When we examine the newly created object it will contain a created and
modified field in its metadata::
>>> response = http_get(test_objects_url)
>>> el = etree.XML(response.getBody())
>>> c_object_url = url_to(test_objects_url, el, 'ids:image[@name="c"]/@href')
>>> response = http_get(c_object_url)
>>> el = pretty(response)
...
...
...
.../modified>
...
We will now record the end datetime::
>>> end = datetime.now()
We know that the newly created object will have a created and modified
datetime between ``start`` and ``end``::
>>> created_datestamp = xpath(el, 'ids:metadata/ids:created/text()')
>>> from imagestore.metadata import parse_iso_to_datetime
>>> created_datetime = parse_iso_to_datetime(created_datestamp)
>>> start <= created_datetime <= end
True
>>> modified_datestamp = xpath(el, 'ids:metadata/ids:modified/text()')
>>> modified_datetime = parse_iso_to_datetime(modified_datestamp)
>>> start <= modified_datetime <= end
True
Setting the created or modified fields explicitly using PUT will have no effect
as these are managed by the system::
>>> created_url = url_to(c_object_url, el, 'ids:metadata/ids:created/@href')
>>> xml = '''
... 2007-01-01T17:00:00
... '''
>>> response = http_put(created_url, xml)
We get the original created datetime again::
>>> response = http_get(c_object_url)
>>> el = etree.XML(response.getBody())
>>> new_created_datestamp = xpath(el, 'ids:metadata/ids:created/text()')
>>> new_created_datestamp == created_datestamp
True
We also cannot set these by PUTing all the metadata at once; these
values will be ignored by the system as well::
>>> xml = '''
...
... 2007-01-01T17:00:00
... 0.0
... 2007-01-02T18:00:00
... 0.0
...
... 170.0
... 66.0
...
... '''
>>> c_metadata_url = url_to(c_object_url, el, 'ids:metadata/@href')
>>> start = datetime.now()
>>> response = http_put(c_metadata_url, xml)
>>> end = datetime.now()
We expect the created field to be the same as before::
>>> response = http_get(c_object_url)
>>> el = etree.XML(response.getBody())
>>> new_created_datestamp = xpath(el, 'ids:metadata/ids:created/text()')
>>> new_created_datestamp == created_datestamp
True
The modified field should be set to the datetime one of last
modification::
>>> modified_datestamp = xpath(el, 'ids:metadata/ids:modified/text()')
>>> modified_datetime = parse_iso_to_datetime(modified_datestamp)
>>> start <= modified_datetime <= end
True
We make another modification, this time through a field directly::
>>> x_url = url_to(c_object_url, el, 'ids:metadata/ids:x/@href')
>>> xml = '''
... 777.0
... '''
>>> start = datetime.now()
>>> response = http_put(x_url, xml)
>>> end = datetime.now()
When we retrieve the metadata again, the x value is indeed modified::
>>> response = http_get(c_object_url)
>>> el = etree.XML(response.getBody())
>>> print xpath(el, 'ids:metadata/ids:x/text()')
777.0
Moreover, the modified datestamp is now again listing the time of
modification::
>>> modified_datestamp = xpath(el, 'ids:metadata/ids:modified/text()')
>>> modified_datetime = parse_iso_to_datetime(modified_datestamp)
>>> start <= modified_datetime <= end
True
Custom XML metadata
-------------------
The special ``custom`` metadata field can be used to maintain arbitary
XML with client-specific information.
Let's first access an empty ``custom`` field::
>>> response = http_get(a_metadata_url)
>>> el = etree.XML(response.getBody())
>>> custom_field_url = url_to(a_metadata_url, el, 'ids:custom/@href')
>>> custom_field_url
'http://localhost/store/sessions/one/collection/objects/test/objects/a/metadata/custom'
>>> response = http_get(custom_field_url)
>>> el = pretty(response)
As you can see, this field is very empty. Let's now put some arbitrary
XML in there using PUT::
>>> xml = ''
>>> xml += ''
>>> xml += ''
>>> response = http_put(custom_field_url, xml)
>>> response.getStatusString()
'200 Ok'
>>> response = http_get(custom_field_url)
>>> el = pretty(response)
The custom XML field can also be set to be empty again::
>>> xml = ''
>>> response = http_put(custom_field_url, xml)
>>> response = http_get(custom_field_url)
>>> el = pretty(response)