Showing posts with label Load test. Show all posts
Showing posts with label Load test. Show all posts

Tuesday, 20 January 2009

Load testing repositories

One of the issues that has worried me about moving from repositories of e-prints to repositories of data is the increased challenges of scale. Scale could be vastly different for data repositories in several dimensions, including
  • rate of deposit
  • numbers of objects
  • size of objects
  • rate of access
  • rate of change to existing objects...
Now Stuart Lewis is reporting on the first stage of the JISC-funded ROAD project, where they have load-tested a DSpace implementation (on a fairly chunky configuration), loading 300,000 digital objects of 9 MB each.

Stuart reports
  • "As expected, the more items that were in the repository, the longer an average deposit took to complete.
  • On average deposits into an empty repository took about one and a half seconds
  • On average deposits into a repository with three hundred thousand items took about seven seconds
  • If this linear looking relationship between number of deposits and speed of deposit were to continue at the same rate, an average deposit into a repository containing one million items would take about 19 to 20 seconds.
  • Extrapolate this to work out throughput per day, and that is about 10MB deposited every 20 seconds, 30MB per minute, or 43GB of data per day.
  • The ROAD project proposal suggested we wanted to deposit about 2Gb of data per day, which is therefore easily possible.
  • If we extrapolate this further, then DSpace could theoretically hold 4 to 5 million items, and still accept 2B of data per day deposited via SWORD."
They plan to repeat these tests with ePrints and FEDORA platforms.

Like all such, it's an artificial test, but it does give encouragement that DSpace could scale to handle a data repository for some tasks. I don't know if other issues would be show-stoppers or not, for something like a lab repository, but most of the scale issues seem OK.

Saturday, 19 July 2008

Load testing repository platforms

Very interesting post on Stewart Lewis's blog:
"One of the core aims of the ROAD project is to load test DSpace, EPrints and Fedora repositories to see how they scale when it comes to using them as repositories to archive large amounts of data (in the form of experimental results and metadata). According to ROAR, the largest repositories (housing open access materials) based on these platforms are 191,510, 59,715 and 85,982 respectively (as of 18th July 208). We want to push them further and see how they fare."
Stuart goes on to mention a paper on "Testing the Scalability of a DSpace-based Archive", that pushed it up to 1 million items with "no particular issues"! They have their chunky test hardware in place, and plan to do the stress test using SWORD, so that they can throw the same suite of data at each repository.

He says
"More details will be blogged once we start getting some useful comparative data, however seeing as the report cited above took about 10 days to deposit 1 million items, it may be some weeks before we’re able to report data from meaningful tests on each platform."
This is an extremely worthwhile exercise, particularly for those thinking of using repository software platforms for data, where the numbers of objects can very rapidly mount up.