Wednesday, 29 August 2007

IJDC again

At the end of July I reported on the second issue of the International Journal of Digital Curation (IJDC), and asked some questions:
"We are aware, by the way, that there is a slight problem with our journal in a presentational sense. Take the article by Graham Pryor, for instance: it contains various representations of survey results presented as bar charts, etc in a PDF file (and we know what some people think about PDF and hamburgers). Unfortunately, the data underlying these charts are not accessible!

"For various reasons, the platform we are using is an early version of the OJS system from the Public Knowledge Project. It's pretty clunky and limiting, and does tend to restrict what we can do. Now that release 2 is out of the way, we will be experimenting with later versions, with an aim to including supplementary data (attached? External?) or embedded data (RDFa? Microformats?) in the future. Our aim is to practice what we may preach, but we aren't there yet."

I didn't get any responses, but over on the eFoundations blog, Andy Powell was taking us to task for only offering PDF:
"Odd though, for a journal that is only ever (as far as I know) intended to be published online, to offer the articles using PDF rather than HTML. Doing so prevents any use of lightweight 'semantic' markup within the articles, such as microformats, and tends to make re-use of the content less easy."
His blog is more widely read than this one, and he attracted 11 comments! The gist of them was that PDF plus HTML (or preferably XML) was the minimum that we should be offering. For example, Chris Leonard [update, not Tom Wilson! See end comments] wrote:
"People like to read printed-out pdfs (over 90% of accesses to the fulltext are of the pdf version) - but machines like to read marked-up text. We also make the xml versions availble for precisely this purpose."
Cornelius Puschmann [update, not Peter Sefton] wrote:
"Yeah, but if you really want semantic markup why not do it right and use XML? The problematic thing with OJS (at least to some extent) is/was that XML article versions are not the basis for the "derived" PDF and HTML, which deal almost purely with visuals. XML is true semantic markup and therefore the best way to store articles in the long term (who knows what formats we'll have 20 years from now?). HTML can clearly never fill that role - it's not its job either. From what I've heard OJS will implement XML (and through it neat things such as OpenOffice editing of articles while they're in the workflow) via Lemon8 in the future."
Bruce D'Arcus [update, not Jeff] says:
"As an academic, I prefer the XHTML + PDF option myself. There are times I just want to quickly view an article in a browser without the hassle of PDF. There are other times I want to print it and read it "on the train."

"With new developments like microformats and RDFa, I'd really like to see a time soon where I can even copy-and-paste content from HTML articles into my manuscripts and have the citation metadata travel with it."
Jeff [update, not Cornelius Puschmann] wrote:
"I was just checking through some OJS-based journals and noticed that several of them are only in PDF. Hmmm, but a few are in HTML and PDF. It has been a couple of years since I've examined OJS but it seems that OJS provides the tools to generate both HTML and PDF, no? Ironically, I was going to do a quick check of the OJS documentation but found that it's mostly only in PDF!

"I suspect if a journal decides not to provide HTML then it has some perceived limitations with HTML. Often, for scholarly journals, that revolves around the lack of pagination. I noticed one OJS-based journal using paragraph numbering but some editors just don't like that and insist on page numbers for citations. Hence, I would be that's why they chose PDF only."
I think in this case we used only PDF because that was all our (old) version of the OJS platform allowed. I certainly wanted HTML as well. As I said before, we're looking into that, and hope to move to a newer version of the platform soon. I'm not sure it has been an issue, but I believe HTML can be tricky for some kinds of articles (Maths used to be a real difficulty, but maybe they've fixed that now).

I think my preference is for XHTML plus PDF, with the authoritative source article in XML. I guess the workflow should be author-source -> XML -> XHTML plus PDF, where author-source is most likely to be MS Word or LaTeX... Perhaps in the NLM DTD (that seems to be the one people are converging towards, and it's the one adopted by a couple of long term archiving platforms)?

But I'm STILL looking for more concrete ideas on how we should co-present data with our articles!

[Update: Peter Sefton pointed out to me in a comment that I had wrongly attributed a quote to him (and by extension, to everyone); the names being below rather than above the comments in Andy's article. My apologies for such a basic error, which also explains why I had such difficulty finding the blog that Peter's actual comment mentions; I was looking in someone else's blog! I have corrected the names above.

In fact Peter's blog entry is very interesting; he mentions the ICE-RS project, which aims to provide a workflow that will generate both PDF and HTML, and also bemoans how inhospitable most repository software is to HTML. He writes:
"It would help for the Open Access community and repository software publishers to help drive the adoption of HTML by making OA repositories first-class web citizens. Why isn't it easy to put HTML into Eprints, DSpace, VITAL and Fez?

"To do our bit, we're planning to integrate ICE with Eprints, DSpace and Fedora later this year building on the outcomes from the SWORD project – when that's done I'll update my papers in the USQ repository, over the Atom Publishing Protocol interface that SWORD is developing."
So thanks again Peter for bringing this basic error to my attention, apologies to you and others I originally mis-quoted, and I look forward to the results of your efforts! End Update]

1 comment:

  1. You have wrongly attributed a quote to me. The names are UNDER the comments.

    My comment was mainly a link to where I blogged about how to get HTML and PDF.


Please note that this blog has a Creative Commons Attribution licence, and that by posting a comment you agree to your comment being published under this licence. You must be registered to comment, but I'm turning off moderation as an experiment.