Monday, 6 April 2009

Libraries of the Future: SourceForge as Repository?

In his talk (which he pre-announced on his resumed blog), Peter Murray-Rust (PMR) suggested (as he has done previously) that we might like to think of SourceForge as an alternative model for a scholarly repository (“Sourceforge. A true repository where I store all my code, versioned, preserved, sharable”). I’ve previously put forward a related suggestion, stimulated by PMR’s earlier remarks, on the JISC-Repositories email list, where it got a bit of consideration. But this time the reaction, especially via the Twitter #lotf09 feed, was quite a bit stronger. Reactions really fell into two groups: firstly something like: SourceForge was setting our sights way too low; it’s ugly, cluttered by adverts, and SLOOOOOWWW (I can’t find the actual quote in my Twitter search; maybe it was on Second Life). Secondly, the use cases are different, eg @lescarr: “Repositories will look like sourceforge when researchers look like developers” (it might be worth noting that PMR is a developer as well as a researcher).

Through my JISC-Repositories explorations, I was already beginning to be convinced by the “different use-case” argument, although it was not articulated that way. SourceForge acts as a platform providing a range of services and utilities supporting the development of software by disparate teams of developers. Is there a parallel use case in the repositories area?

The closest I could come was some of my suggestions from my Research Repository System posts, for moving the repository upstream in the researcher’s workflow, and specifically helping authors write multi-author articles. This is a reasonable parallel in several ways, but there are some significant differences.

Software source files are simple text files, completely independent of whatever system is used to edit them. The source of a software system comprises many individual software modules, each of which is usually in a separate file and can be edited independently. There are strong file name conventions, and complex “recipe” systems (eg make files) for converting them from source to actionable programs, combining them with software libraries and system-wide variables. If one developer changes one module, the system “knows” how to rebuild itself.

Journal articles, by contrast, are mostly written using proprietary (sometimes open source) office systems which create and edit highly complex files whose formats are closely linked to the application. Apart from some graphic and tabular elements, an article will usually be one file, and if not, there is no convention on how it might be composed. Workflow for multi-author articles is entirely dependent on the lead author, who will usually work with colleagues to assign sections for authoring, gather these contributions, assemble and edit them into a coherent story, circulating this repeatedly for comments and suggested improvements (sometimes using tracked change features).

Using LaTeX and include files, it might be possible to re-create a similar approach to that used in software, but for the human reader, the absence of a single editorial “voice” would be very apparent. And anyway, most of us just don’t want to work that way. I got out of the habit of writing in LaTeX around 20 years ago, when word processors began to arrive on workable desktop computers (Word on the early Macs), and I’d prefer not to go back.

Oh, and while it’s a “category mistake” to confuse SourceForge with any particular version control system, it looks to me (and at least one respondent on JISC-Repositories) as if distributed systems like git are fitter for purpose (in any kind of distributed authoring system) than their more centralised companions like VCS, SVN etc.

Well, is PMR’s suggestion of SourceForge as a model for scholarly repositories entirely a dead duck? I think it probably is, but I do hope that those who design repository systems, and indeed virtual research environments, do think hard about the capabilities of such systems when thinking through their responses to user requirements. And particularly where those requirements relate to what the researchers and authors need, rather than their institutions, libraries or readers.

1 comment:

  1. Heh, my experience with LaTeX was the exact opposite: I moved from word processors to LaTeX about 10 years ago, and I'd prefer not to go back ☺, especially to the bloated behemoth that is Word. I much prefer LaTeX for its predictability and lack of weirdness.

    That aside, I do agree that the workflow model for multi-author documents is quite different from that for software. Source code just doesn't have the same level of interdependence ("coupling" would be the correct software engineering term) that any sort of written document does, whether an article or a book. Indeed, proper software development methods should discourage this, whereas with writing you really want to encourage the exact opposite.

    I think SourceForge could perhaps provide a basic framework, but the tools would be completely different.


Please note that this blog has a Creative Commons Attribution licence, and that by posting a comment you agree to your comment being published under this licence. You must be registered to comment, but I'm turning off moderation as an experiment.