Lullaby of London

Posted on 21st December 2013

The 2013 London Perl Workshop Conference Survey results are now online.

Although percentage wise the submissions are up, the actual number of respondents are just slightly lower than previous years. Though it has to be said I'm still pleased to get roughly a third of attendees submitting survey responses. It might not give a completely accurate picture of the event, but hopefully we still get a decent flavour of it.

Two questions, which I plan to pay closer attention to in future surveys are; 'How do you rate your Perl knowledge?' and 'How long have you been programming in Perl?' Originally the age question usually gave some indication of how long someone had been using Perl, but from experience, I now know that doesn't work. As such, these two questions hopefully give us a better idea of the level of knowledge and experience of attendees. Perhaps unsurprisingly London.pm had a lot of attendees who have been around the Perl community for many years, particularly as it was the first non-US Perl Monger group. However, we do still see a notable number of people who are relatively new to Perl. It will be interesting to see whether these numbers change over the years, as although the community doesn't appear to be growing radically, it is still attracting first-time attendees.

Looking at the list of suggested topics, I was intrigued to see "Testing" in there. Apart from my own talk and Daniel Perrett's, there wasn't anything specifically about testing. I don't know if its because the older hands are more weary of giving test talks, or whether everyone thinks everything has been said, but I do think it's a topic that worth repeating. We regularly have new attendees who have never seen these talks before, so hopefully we'll see some more submitted at future workshops and YAPCs. There was also a lot of interest in practical uses of web frameworks. Although Andrew Solomon held a Dancer tutorial, seeing how to solve specific problems with web applications would be valuable to many. Having said that, the diverse range of subjects that was on offer at the workshop, was equally as interesting. I just hope Mark and Ian are so inundated with talks next year, we have an even greater choice from the schedule.

Thank you to Mark and Ian from organising another great Perl event, and thanks to all the speakers for making it worth attending. Also to all the attendees, especially those who took the time to respond to the survey, and for all the talk evaluations. I know the speakers appreciate the evaluations, as I've had a few thank yous already :)

Enjoy the results.

File Under: community / london / opensource / survey / workshop
NO COMMENTS


Even Flow

Posted on 8th December 2013

The following is part of an occasional series of highlighting CPAN modules/distributions and why I use them. This article looks at Data::FlexSerializer.

Many years ago the most popular module for persistent data storage in Perl was Storable. While still used, it's limitations have often cause problems. It's most significant problem was that each version was incompatible with another. Upgrading had to be done carefully. The data store was often unportable, and made making backups problematic. In more recent years JSON has grown to be more acceptable as a data storage format. It benefits from being a compact data structure format, and human readable, and was specifically a reaction to  XML, which requires lots of boilerplate and data tags to form simple data elements. It's one reason why most modern websites use JSON for AJAX calls rather than XML.

Booking.com had a desire to move away from Storable and initially looked to moving to JSON. However, since then they have designed their own data format, Sereal. But more of that later. Firstly they needed some formatting code to read their old Storable data, and translate into JSON. The next stage was to compress the JSON. Although JSON is already a compact data format, it is still plain text. Compressing a single data structure can reduce the storage by as much as half the original data size, which when you're dealing with millions of data items can be considerable. In Booking.com's case they needed to do this with zero downtime, running the conversion on live data as it was being used. The resulting code was to later become the basis for Data::FlexSerializer.

However, for Booking.com they found JSON to be unsuitable for their needs, as they were unable to store Perl data structures they way they wanted to. As such they created a new storage format, which they called Searal. You can read more about the thoughts behind the creation of Sereal on the Booking.com blog. That blog post also looks at the performance and sizes of the different formats, and if you're looking for a suitable serialisation format, Sereal is very definitely worth investigating.

Moving back to my needs, I had become interested in the work Booking.com had done, as within the world of CPAN Testers, we store the reports in JSON format. With over 32 million reports at the time (now over 37 million), the database table had grown to over 500GB. The old server was fast running out of disk space, and before exploring options for increasing storage capacity, I wanted to try and see whether there was an option to reduce the size of the JSON data structures. Data::FlexSerializer was an obvious choice. It could read uncompressed JSON and return compressed JSON in milliseconds.

So how easy was it to convert all 32 million reports? Below is essentially the code that did the work:

  my $serializer = Data::FlexSerializer->new( detect_compression => 1 );

    for my $next ( $options{from} .. $options{to} ) {
        my @rows = $dbx->GetQuery('hash','GetReport',$next);
        return  unless(@rows);

        my ($data,$json);
        eval {
            $json = $serializer->deserialize($rows[0]->{report});
            $data = $serializer->serialize($json);
        };

        next  if($@ || !$data);

        $dbx->DoQuery('UpdateReport',$data,$rows[0]->{id});
    }

Simple, straighforward and got the job done very efficiently. The only downside was the database calls. As the old server was maxed out on I/O, I could only run the script to convert during quiet periods as the CPAN Testers server would become unresponsive. This wasn't a fault of Data::FlexSerializer, but very much a problem with our old server.

Before the conversion script completed, the next step was to add functionality to permanently store reports in a compressed format. This only required 3 extra lines being added to CPAN::Testers::Data::Generator.

  use Data::FlexSerializer;

    $self->{serializer} = Data::FlexSerializer->new( detect_compression => 1 );

    my $data = $self->{serializer}->serialize($json);

The difference has been well worth the move. The compressed version of the table has reclaimed around 250GB. Because MySQL doesn't automatical free the data back to the system, you need to run the optimize command on a table. Unfortunately, for CPAN Testers this wouldn't be practical as it would mean locking the database for far too long. Also with the rapid growth of CPAN Testers (we now receive over 1 million reports a month) it is likely we'll be back up to 500GB in a couple of years anyway. Now that we've moved to a new server, our backend hard disk is 3TB, so has plenty of storage capacity for several years to come.

But I've only scratched the surface of why I think Data::FlexSerializer is so good. Aside from its ability to compress and uncompress, as well as encode and decode, at speed, it is ability to switch between formats is what makes it such a versatile tool to have around. Aside from Storable, JSON and Sereal, you can also create your own serialisation interface, using the add_format method. Below is an example, from the module's own documentation, which implements Data::Dumper as a serialsation format:

    Data::FlexSerializer->add_format(
        data_dumper => {
            serialize   => sub { shift; goto \&Data::Dumper::Dumper },
            deserialize => sub { shift; my $VAR1; eval "$_[0]" },
            detect      => sub { $_[1] =~ /\$[\w]+\s*=/ },
        }
    );
    
    my $flex_to_dd = Data::FlexSerializer->new(
      detect_data_dumper => 1,
      output_format => 'data_dumper',
    );

It's unlikely CPAN Testers will move from JSON to Sereal (or any other format), but if we did, Data::FlexSerializer would be only tool I would need to look to. My thanks to Booking.com for releasing the code, and thanks to the authors; Steffen Mueller, Ævar Arnfjörð Bjarmason, Burak Gürsoy, Elizabeth Matthijsen, Caio Romão Costa Nascimento and Jonas Galhordas Duarte Alves, for creating the code behind the module in the first place.

File Under: database / modules / opensource / perl
2 COMMENTS


Of All The Things We've Made

Posted on 26th August 2013

Several years ago, we frequently updated the Birmingham.pm website with book reviews. To begin with, updating all the book information was rather labourious. Thankfully, on CPAN there was a set of modules that had been written by Andrew Schamp, that provided the framework to search online resources. I then wrote drivers for Amazon, O'Reilly & Associates, Pearson Education and Yahoo!. As the books we were reviewing were technical books, these four sources were able to cover all the books we reviewed.

A few years ago, I started working for a book company. In one project, we needed to evaluate book data, particularly for books where we had no data or very little. Often these were imports or out of stock titles that we could still order, but we were lacking information about. As such I created a number of further drivers, particularly for non-UK online catalogues, to help retrieve this information. I managed to create a collection of 17 drivers, and 1 bundle, all available on CPAN.

Via my CPAN Testers work, I've been promoting the CPAN::Changes Kwalitee Service website. Neil Bowers read one of the posts, and thought it would be good to improve the Changes files in his distributions, by way of QuestHub. I'd not heard of this site before, but after reading Neil's post I joined up, as I had been looking for a suitable way to keep a TODO list of my Perl work for a while. Neil had created a stencil to standardise the Changes file in 5 distributions, but unfortunately, I only had a few distributions of my own to complete. Another stencil emerged to add License and Repository information to 5 CPAN distributions. Again, I'd completed this for most of my distributions, apart from my 18 ISBN distributions, which I'd never got around to creating repositories for.

Then Neil had the idea to look at some of the quality aspects of all the CPAN distributions, and highlight those that might need adoption. As part of his reviews of similar modules over the past few years, he's adopted several modules, and was looking at what others he could help with. The results included 2 of the modules written by Andrew Schamp, which formed part of the ISBN searching framework I used for my ISBN distributions. Seeing as they hadn't been touched in eight years, I suspected that Andrew had moved on to other languages or work. So I contacted him to see whether he was interested in letting me take the modules on and update them.

It turns out that Andrew had written the modules for a college project, and since moving to C and with his programming interests now nothing to do with books, he was happy to hand over the keys to the modules. Over the past week, I have now taken ownership of Andrew's 5 modules, added these and my own 18 ISBN distributions to my local git repository, added all 23 to GitHub, updated the Changes file, and License & Repository info to the 5 new modules and released them all to CPAN. My next task is to update the Repository info in my 18 ISBN distributions and release these to CPAN.

Although I don't work in the book industry anymore, writing these search drivers has been fun. The distributions are perhaps my most frequently releases to CPAN, due to the various websites updating their sites. Now that I have access to the core modules in the framework, I plan to move some of the repeated code across many of the drivers into the core modules. I also plan to merge the three main modules into one distribution. When Andrew originally wrote the modules, it wasn't uncommon to have 1 module per distribution. However, as all three are tightly bound together, it doesn't make much sense to keep them separate. The two drivers Andrew wrote have not worked for several years, as unsurprisingly the websites have changed in the last 8 years. I've already updated one, and will be working on the other soon.

It's nice to realise that a few of my CPAN Testers summary posts inspired Neil, who in turn has inspired me, and has ended up with me helping to keep a small corner of CPAN relevant and up to date again.

If you're a new Perl developer, who wants to take a more active role in CPAN and the Perl community, a great way to start is to look at the stencils on QuestHub, and help to patch and submit pull/RT requests to update distributions. If you feel adventurous, take a look at the possible adoption list, and see whether anything there is something you'd like to fix and bring up to date. You can also look at the failing distributions lists, and see whether the authors would like help with the test suites in their distributions. You can then create your tasks as quests in QuestHub and earn points for your endeavours. Be warned though, it can become addictive :)

There is one more ISBN distribution on the adoption list, and I have now emailed the author. Depending on the response, I may be going through the adoption process all over again :) [Late update, the author came back to me and he's happy for me to take on his distribution too]

File Under: isbn / opensource / perl
NO COMMENTS


Lost In The Echo

Posted on 26th August 2012

I've just released new versions of my use.perl distributions, WWW-UsePerl-Journal and WWW-UsePerl-Journal-Thread. As use.perl became decommisioned at the end of 2010, the distrubutions had been getting a lot of failure reports, as they used screen-scraping to get the content. As such, I had planned to put them out to pasture in BackPAN. That was until recently I discovered that Léon Brocard had not only released WWW-UsePerl-Server, but also provided a complete SQL archive of the use.perl database (see the POD for a link). Then combining the two, he put up a read-only version of the website.

While at YAPC::Europe this last week, I started tinkering, and fixing the URLs, regexes, logic and tests in my two distributions. Both distributions have had functionality removed, as the read-only site doesn't provide all the same features as the old dynamic site. The most obvious is that posting new journal entries is now disabled, but other lesser features not available are searching for comments based on thread id or users based on the user id. The majority of the main features are still there, and those that aren't I've used alternative methods to retrieve them where possible.

Although the distributions and modules are now working again, they're not perhaps as useful as they once were. As such, I will be looking to merge both distributions for a future release, and also providing support to a local database of the full archive from Léon.

Seeing as no-one else seems to have stepped forward and written similar modules for blogs.perl, I'm now thinking it might also be useful to take my use.perl modules and adapt them for blogs.perl. It might be a while before I finish them, but it'll be nice to have the ability to have many of the same features. I also note that blogs.perl.org also now has paging. Yeah \o/ :) This has been a feature that I have been wanting to see on the site since it started, so thanks to the guys for finding some tuits. There was a call at YAPC::Europe for people to help add even more functionality, so I look forward to seeing what delights we have in store next.

File Under: opensource / perl / website
NO COMMENTS


Young Parisians

Posted on 10th April 2012

Did I mention I went to Paris to take part in the 2012 QA Hackathon? Did I remember to mention all the cool stuff I got done? Well if you've been hiding for the past few weeks, have a look at the last couple of posts :)

As per usual, while there I took my camera along. However, unlike many previous visits to Paris, I didn't do any sight-seeing. And that includes failing to wander around the venue we were in and discovering the real submarine among other things, that others found while taking a breath of fresh air.

Instead I spent my time hacking away, and only occasionly coming up for air for food, drink and some camera action.

With over 40 people in attendance, it was going to be difficult to capture everyone, but I think I managed it. If I did miss you, my apologies. It was great to meet so many friends old and new, and a real pleasure to finally put faces to names that I've known for a while, but not had the opportunity to meet in person.

So many great things happened in Paris, and I'm really looking forward to see what we can achieve in London for the 2013 QA Hackathon. See you there.

File Under: community / hackathon / opensource / paris / photography / qa / testing
NO COMMENTS


<< Page 1 Page 3 >>

Some Rights Reserved Unless otherwise expressly stated, all original material of whatever nature created by Barbie and included in the Memories Of A Roadie website and any related pages, including the website's archives, is licensed under a Creative Commons by Attribution Non-Commercial License. If you wish to use material for commercial puposes, please contact me for further assistance regarding commercial licensing.