Of All The Things We've Made

Posted on 26th August 2013

Several years ago, we frequently updated the Birmingham.pm website with book reviews. To begin with, updating all the book information was rather labourious. Thankfully, on CPAN there was a set of modules that had been written by Andrew Schamp, that provided the framework to search online resources. I then wrote drivers for Amazon, O'Reilly & Associates, Pearson Education and Yahoo!. As the books we were reviewing were technical books, these four sources were able to cover all the books we reviewed.

A few years ago, I started working for a book company. In one project, we needed to evaluate book data, particularly for books where we had no data or very little. Often these were imports or out of stock titles that we could still order, but we were lacking information about. As such I created a number of further drivers, particularly for non-UK online catalogues, to help retrieve this information. I managed to create a collection of 17 drivers, and 1 bundle, all available on CPAN.

Via my CPAN Testers work, I've been promoting the CPAN::Changes Kwalitee Service website. Neil Bowers read one of the posts, and thought it would be good to improve the Changes files in his distributions, by way of QuestHub. I'd not heard of this site before, but after reading Neil's post I joined up, as I had been looking for a suitable way to keep a TODO list of my Perl work for a while. Neil had created a stencil to standardise the Changes file in 5 distributions, but unfortunately, I only had a few distributions of my own to complete. Another stencil emerged to add License and Repository information to 5 CPAN distributions. Again, I'd completed this for most of my distributions, apart from my 18 ISBN distributions, which I'd never got around to creating repositories for.

Then Neil had the idea to look at some of the quality aspects of all the CPAN distributions, and highlight those that might need adoption. As part of his reviews of similar modules over the past few years, he's adopted several modules, and was looking at what others he could help with. The results included 2 of the modules written by Andrew Schamp, which formed part of the ISBN searching framework I used for my ISBN distributions. Seeing as they hadn't been touched in eight years, I suspected that Andrew had moved on to other languages or work. So I contacted him to see whether he was interested in letting me take the modules on and update them.

It turns out that Andrew had written the modules for a college project, and since moving to C and with his programming interests now nothing to do with books, he was happy to hand over the keys to the modules. Over the past week, I have now taken ownership of Andrew's 5 modules, added these and my own 18 ISBN distributions to my local git repository, added all 23 to GitHub, updated the Changes file, and License & Repository info to the 5 new modules and released them all to CPAN. My next task is to update the Repository info in my 18 ISBN distributions and release these to CPAN.

Although I don't work in the book industry anymore, writing these search drivers has been fun. The distributions are perhaps my most frequently releases to CPAN, due to the various websites updating their sites. Now that I have access to the core modules in the framework, I plan to move some of the repeated code across many of the drivers into the core modules. I also plan to merge the three main modules into one distribution. When Andrew originally wrote the modules, it wasn't uncommon to have 1 module per distribution. However, as all three are tightly bound together, it doesn't make much sense to keep them separate. The two drivers Andrew wrote have not worked for several years, as unsurprisingly the websites have changed in the last 8 years. I've already updated one, and will be working on the other soon.

It's nice to realise that a few of my CPAN Testers summary posts inspired Neil, who in turn has inspired me, and has ended up with me helping to keep a small corner of CPAN relevant and up to date again.

If you're a new Perl developer, who wants to take a more active role in CPAN and the Perl community, a great way to start is to look at the stencils on QuestHub, and help to patch and submit pull/RT requests to update distributions. If you feel adventurous, take a look at the possible adoption list, and see whether anything there is something you'd like to fix and bring up to date. You can also look at the failing distributions lists, and see whether the authors would like help with the test suites in their distributions. You can then create your tasks as quests in QuestHub and earn points for your endeavours. Be warned though, it can become addictive :)

There is one more ISBN distribution on the adoption list, and I have now emailed the author. Depending on the response, I may be going through the adoption process all over again :) [Late update, the author came back to me and he's happy for me to take on his distribution too]

File Under: isbn / opensource / perl

Behind The Lines

Posted on 25th May 2011

Back last year I got a curious email from a fellow London.pm'er asking why I was releasing so many WWW-Scraper-ISBN distributions. The reason was quite simple, to make my life easier! Well okay, that's why I wrote the distributions, but I figured others might find them useful too.

In the UK the book trade is a bit odd, and I dare say the rest of the world suffers from this too. The publishers don't like to give too much information away about their books, and the central body for allocating ISBNs, Nielsen, don't always have all the necessary metadata available. The book trade uses MARC Records to transfer this metadata around, and unfortunately, while there is provision to include much of the metadata, it often isn't included. The obvious things such as the Author, Title and the ISBN itself are usually there, but some of the data relating to the physical attributes (pages, height, width and weight) rarely are.

Originally I wrote the Amazon, Pearson Education, O'Reilly Media and Yahoo! Books distributions to use within Labyrinth, particularly for the Birmingham Perl Mongers website, and our Book Reviews. The plugin mechanism allowed me, when I received a review, to enter the ISBN and prepopulate the metadata fields and links before adding the review itself. The four distributions saved a lot of time, but the initial releases were quite basic.

Jumping forward several years, now needing this extra metadata, I first expanded the original four distributions. However, not all of these online bookstores provided this extra metadata. Picking a variety of books I searched to see what metadata I could retrieve, and came across several sites around the world that included this information to varying degrees. Much of the basic information regarding an ISBN shouldn't change from country to country, so metadata retrieved from Australia or New Zealand is as valid as that from America or the UK. There are aspects that can differ, such as the cover illustration, but the majority of metadata returned should be applicable regardless of location.

There was some interesting discrepancies with the different units of weights and measures used across the sites too. While some stuck to a set of fixed units, others changed depending how big the values were, particularly for grammes and kilogrammes. I settled on grammes for weight and millimetres for height and width, seeing as metric was the most commonly used on the various sites.

It did cross my mind whether to include the prices in the metadata returned, but as prices often fluctuate frequently and are very location dependent, you are probably better to write this side of things yourself for your specific purpose, such as a comparision website. I also left out depth, as only a few sites regularly provided a value for it. I can always save it for a future release anyway.

Hopefully those that work in the book trade, who have been wishing that MARC Records were populated a little more fully than they are currently, can make use of these distributions to help fill in the gaps.

File Under: book / isbn / opensource / perl

Some Rights Reserved Unless otherwise expressly stated, all original material of whatever nature created by Barbie and included in the Memories Of A Roadie website and any related pages, including the website's archives, is licensed under a Creative Commons by Attribution Non-Commercial License. If you wish to use material for commercial puposes, please contact me for further assistance regarding commercial licensing.