Counting Out Time

Posted on 20th March 2014

I had an SQL query I wanted to translate into a DBIx::Class statement. I knew there must be a way, but trying to find the answer took some time. As a result I though it worth sharing in the event somebody else might be trying to find a similar answer.

The SQL I was trying to convert was:

SELECT status,count(*) AS mailboxes,
count(distinct username) AS customers
FROM mailbox_password_email GROUP BY status

The result I got running this by hand gave me:

+-----------+-----------+-----------+
| status    | mailboxes | customers |
+-----------+-----------+-----------+
| active    |     92508 |     48791 |
| completed |       201 |       174 |
| inactive  |    116501 |     56843 |
| locked    |    129344 |     61220 |
| pending   |      1004 |       633 |
+-----------+-----------+-----------+

My first attempt was:

my @rows = $schema->resultset('Mailboxes')->search({},
    {
        group_by => 'status',
        distinct => 1,
        '+select' => [
            { count => 'id', -as => 'mailboxes' },
            { count => 'username', -as => 'customers' } ]
    })->all;

Unfortunately this gave me the following error:

DBIx::Class::ResultSet::all(): Useless use of distinct on a grouped 
resultset ('distinct' is ignored when a 'group_by' is present) at 
myscript.pl line 469

So I took the 'distinct  => 1' out and got the following results:

+-----------+-----------+-----------+
| status    | mailboxes | customers |
+-----------+-----------+-----------+
| active    |     92508 |     92508 |
| completed |       201 |       201 |
| inactive  |    116501 |    116501 |
| locked    |    129344 |    129344 |
| pending   |      1004 |      1004 |
+-----------+-----------+-----------+

Which might be distinct for the mailboxes, but is not sadly distinct for customers. So I try:

my @rows = $schema->resultset('Mailboxes')->search({},
{
group_by  => 'status',
      '+select' => [
            { count => 'id', -as => 'mailboxes' },
            { count => 'username', -as => 'customers', distinct  => 1 } ]
    })->all;

and get:

Failed to retrieve mailbox password email totals: 
DBIx::Class::ResultSet::all(): Malformed select argument - too many keys
 in hash: -as,count,distinct at myscript.pl line 469\n

After several attempts at Google, and reading the DBIx::Class::Manual, I finally stumbled on: SELECT COUNT(DISTINCT colname)

My query now looks like:

my @rows = $schema->resultset('Mailboxes')->search({},
    {
group_by  => 'status',
'+select' => [
{ count => 'id', -as => 'mailboxes' },
{ count => { distinct => 'username' }, -as => 'customers' } ] })->all;

And provides the following results:

+-----------+-----------+-----------+
| status    | mailboxes | customers |
+-----------+-----------+-----------+
| active    |     92508 |     48791 |
| completed |       201 |       174 |
| inactive  |    116501 |     56843 |
| locked    |    129344 |     61220 |
| pending   |      1004 |       633 |
+-----------+-----------+-----------+

Exactly what I was after.

DBIx::Class does require some head-scratching at times, but looking at the final statement it now seems obvious, and pretty much maps directly to my original SQL!

Hopefully, this provides a lesson others can find and learn from.

File Under: database / perl
NO COMMENTS


Rendez-Vous 6

Posted on 17th March 2014

My 2014 QA Hackathon

Day One

I arrived the previous day, as did most of us, and we naturally talked about coding projects. Not necessarily about work at the hackathon, but discussion did come around to that too. I talked with Tux at one point, who convinced me that a stand-alone smoker client would be really useful. Once upon a time, we did have this, but with the advent of the more sophisticated smokers, and the move to the Metabase transport layer, the old script never got updated. The following morning Tux sent me a copy of the script he has, so at some point over the next few months I will take a look to see what I do to make it compatible with the modern smokers.

My intention was to release a distribution each day of the Hackthon. Unfortunately this was scuppered on the first day, when trying to add support for the full JSON report from CPAN Testers, when I realised I don't store the full report in the database. In the future when we have MongoDB and replication set up, this will be a non-issue, but for the moment, I now need to store the full report. This now requires a change to the metabase database on the cpanstats server (as opposed to the Metabase server). Over the course of the hackthon I reviewed the changes needed, and updated a lot of the Generator code, as it was an ideal time to remove SQLite references too.

In looking into the code changes, Andreas and I again looked at the updated timestamp used by the various CPAN Testers sites to do statistical analysis, which was also causing us problems. In the Metabase, the CPAN Testers Report fact is the container for all the child facts, such as LegacyReport and TestSummary. When the facts are created by the tester, the 'creation' timestamp is used to reference the time on the tester's own server that the report was generated. This could be better stored as UTC, but that's a problem for another day. However, it does mean the timestamp could be different to the one on the Metabase server. When the Metabase server retrieves the report from the outside world, it updates the 'updated' timestamp across all facts and saves into the SimpleDB instance on the server. Except it doesn't. The 'updated' timestamp is always the same as the 'creation' timestamp. Andreas has been noting this for quite some time, and finally he convinced me, at which point we roped in David Golden to take a look. Reviewing the code, there is nothing wrong that we can see. The 'updated' timestamp should be updated with the current timestamp on the Metabase server, which should also cascade to each child fact. As such you would expect several reports to have a different 'creation' timestamp from that of the 'updated' timestamp, even if only different by a second. Sadly this is going to take more effort/time to debug, as David in particular is working several different aspects of QA here at the hackathon.

Towards the end of the day, I spoke with liz and Tobias (FROGGS) about how CPAN Testers might handle perl6 modules. Currently there is no client available, but there could be in the future. However, due to the way Perl6 modules are to be uploaded to CPAN it is possible that smokers may submit reports for perl6 only modules, as many ignore the path to the distribution. Right now, liz tells me that all perl6 modules are being release under the /perl6/ path inside the authors' directory. This makes things easier for CPAN Testers as we can initially ignore these test reports, as they will not be valid. However, in the longer term it will be interesting to have a CPAN Testers smoker client for Perl6. The main difference would be to record in the metadata that it's a perl6 only distribution, and we *should* be able to carry on as normal, submitting reports to the Metabase, etc. It may require some distributions to have a 'Did you mean the Perl 6 distribution?' link on the website, but for the most part I think we could handle this. It will require further work to define a CPAN Testers Perl 6 Fact, but it will be a nice addition to the family.

Day Two

The morning was spent visiting the Charteuse cellars, and enjoying a tasting session, before heading back to the hacking in the afternoon.

In the afternoon, I started to look at some of the statistics the CPAN Testers Statistic site generated. After some discussions with Neil Bowers, he was interested in the drop-off of report submissions when a distribution was released. I believed this to be fairly consistent, and found that it did indeed last roughly 8 days, with a tail off that could last for months or years. There was an initial blast of tests within the first few hours, thanks to Chris' and Andreas' smokers, but the rest of the more reliable smokers get submitted within those first 8 days. Neil has created some initial graphs, and I'm looking at ways to integrate those with the Reports site. How we display these will likely revolve around a specific selected version, as overlaying versions might be a bit too much ... we'll see.

It also led me to think about what time of day do testers submit reports. So, I'll be looking at creating some graphs to show submissions per month, per day of the week, and per hour of the day. Along with BooK, we discussed further metrics, although they look likely to be used within their CPAN Dashboard project, although some of the data can be provided by CPAN Testers APIs already, so little work need by me :)

Looking through aggregated data, as stored and indexed within the Statistics codebase, it was obvious some had were now incomplete. It seems some of the outages we had in the last few months, prevented the data storage files from being saved. As such, I started off a complete reindex. It meant the Statistics site was out of sync for the following day, but at least it meant we once had again had correct data to produce the graphs we wanted.

There was more work rewriting the Generator to store the report objects. Yves asked why I wasn't using Sereal sometime ago, when I posted about using Data::FlexSerializer, and at the time I didn't have a need to rework the code. However, seeing as I'm rewriting to store the perl object now, rather than just JSON, it does make sense to move to Sereal, so hopefully that will make Yves happy too ;)

Day Three

Continued work on the Generator to remove all SQLite references, and a few further clean ups. Also worked on adding the necessary support to allow perl6 reports to be ignored. At some point in the future we will accept perl6 reports, but following further discussion with Tobias, we'll handle this using the metadata in the report not on the path of the resource.

Salve interviewed me for a future post about CPAN Testers. It'll be interesting to see whether I made sense or not, but hopefully I managed to convey the usefulness and uniqueness of CPAN Testers to Perl and the community. It good opportunity to also thanked Salve for starting the QA Hackathons, as without them CPAN Testers may well have stalled several years ago. Like many other projects, if we had relied on email to handle all the discussions and move the project forward, it would have taken years to get the Metabase working and move away from the old email/NNTP mechanisms.

charsbar updated CPANTS with some altered metrics, and at the same time added selected CSS colours for BooK and Leon, so I asked too. I now have a shade of my own purple on my author page ;) Thanks charsbar.

As Wendy went to lunch, she made the mistake of asking whether we wanted anything. I asked for a Ferrari, but sadly they couldn't find one, so I got a Lambourgini instead. If you don't ask, you don't get .... vroom, vrooom, vroom :) I'll add a picture once I've sorted them out.

At some point during the afternoon, Ricardo told me one of his asks for the hackathon. He wanted to be able to ignore the NA reports in his No Pass RSS feeds. Mulling it over this seemed entirely sensible, and so I fixed it. Ricardo celebrated :)

During a discussion with Neil, he mentioned that Paul Johnson was creating a Devel::Cover service, that he wanted to run like a CPAN Testers service. The idea was to write a system, that could allow distributed testing with testers sending in reports, which could then be accumulated, based on the OS being tested. As the Metabase is already able to handle different buckets, adding another bucket for coverage reports simplifies some of the work. The distributed client can then be moduled on the CPAN Testers means of report contruction, creating a new coverage report fact and use the same transport mechanism to submit to the Metabase. A web service can then poll the Metabase for the new bucket, and create report pages in exactly the same way as CPAN Testers. It'll be interesting to see whether we can use the same (or similar) code to provide this.

Day Four

The morning threw us a curve-ball, as the building wouldn't open up. It was a Sunday and apparently no-one works on a Sunday. Thankfully a few phonecalls to the right people got us in, just in time for lunch. In the meantime as we all were staying in the same hotel, we took over the bar, and borrowed a conference for the morning.

The poor wifi connection, gave us a good opportunity to have further discussions. Neil gathered together several interested parties to discuss author emails. Both PAUSE and CPAN Testers send emails to authors, and there is a plan to send authors a yearly email to advertise improvements to their modules, and let them know about sites and tools that they might not be aware of. However, although many emails get through without a problem, several fail to reach their intended recipient. Typically this is because authors have changed their email address but failed to update the email stored within the PAUSE system. CPAN Testers highlights some of these Missing In Action authors, but it would be better to have an automated system. Also, as Ricardo noted, the envelope of an email is left unchanged when is sent to the develooper network, so bouncebacks come back to the original sender containing the authors' potenmtially secret email address. It would be much better to have a service that monitors bouncebacks, but change the envelope to return to the handling network and can send an appropriate email to the sender. It could then provide an API to enable PAUSE and CPAN Testers, and any future system, to know whether compiling an email was worth the effort. For CPAN Testers there can be a great deal of analysis to prepare the summary emails, so knowing in advance an author email is not going to get through would be very beneficial. Neil is going to write up the ideas, so we can more formally design a system that will work all of PAUSE related systems. CPAN Testers already has the Preferences site to allow authors to manage their summary emails, and also turn off receiving any emails, and it may be worth extending this to PAUSE or other system to provide a subscription handling system.

The rest of the day was mostly spent monitoring the metabase table in the cpanstats database, as the new 'fact' column was added. The new field will store the reports from the parent in Sereal. I was a bit worried about locking the table all day, but no-one seemed to notice. While this was happening, I started back on the original new module I started on the first day of the conference,and had hoped to release. However, it highlighted further problems with the way reports are stored. I'm not sure what is doing it, but the underlying fact.content field in JSON was being stored as a string. In most cases this isn't a problem, however for this module it caused problems trying to encode/decode the JSON. After fixing the Generator code, it means the new module still didn't get finished. Well at least I have something to start my neocpanism.once-a-week.info stint with :)

Wrap Up

I now have several pieces of work to continue with, some for a few months to come, but these 4 days have been extremely productive. Despite playing with the CPAN Testers databases rather than writing code, the discussions have been invaluable. Plus it's always great to catch up with everyone.

This year's QA Hackthon was great, and it wouldn't have been possible without BooK and Laurent organising it, Wendy keeping us eating healthily (and in good supply of proper English tea ... I'll try and remember to bring the PG Tips next time), Booking.com for supplying the venue and all the other sponsors for helping to make the QA Hackathon the great success it was. In no particular order, thanks to Booking.com, SPLIO, Grant Street Group, DYN, Campus Explorer, EVOZON, elasticsearch, Eligo, Mongueurs de Perl, WenZPerl for the Perl6 Community, PROCURA, Made In Love and The Perl Foundation.

Looking forward to 2015 QA Hackathon in Berlin.

File Under: hackathon / perl / qa
NO COMMENTS


History Of Modern (part I)

Posted on 23rd February 2014

Neil Bowers recently unleashed CPAN::ReleaseHistory on the world. Internally the distribution uses the a BACKPAN Index, which records every release to CPAN. I was already interested in this kind of representation, as I wanted to add a similar metric on each Author page of the CPAN Testers Reports website, but hadn't got around to it. Neil then posted about the script included in the distribution, cpan-release-counts in an interesting post; What's your CPAN release history?.

After a quick download, I ran the following for myself:

barbie@kmfdm:~$ cpan-release-counts --char = --width 30 --user barbie
 2003 ( 12) ==
 2004 ( 26) =====
 2005 ( 80) ===============
 2006 (  6) =
 2007 ( 59) ===========
 2008 ( 62) ===========
 2009 (122) =======================
 2010 (148) ============================
 2011 ( 89) =================
 2012 (156) ==============================
 2013 (123) =======================
 2014 ( 11) ==

So my most prolific year was in 2012. I'll have to see if I can change that this year. However, it does give a nice yearly snapshot of my releases.

As it turns out, for CPAN Testers I don't need the BACKPAN index, as I already generate and maintain an 'uploads' table within the 'cpanstats' database. I do need to write the code to add this metric to the Author pages. Thanks to Neil's script though, he has given me a starting point. Being able to see the releases for yourself (or a particular Author) is quite cool, so I may adapt that to make any such matrix more dynamic. It might also be worth adding a more generic metric for all of CPAN to the CPAN Testers Statistics website. Either way, I now have two more things to add to my list of projects for the QA Hackathon next month. Neil will be there too, so I hope he can give me even more ideas, while I'm there ;)

File Under: hackathon / opensource / perl
NO COMMENTS


Grand Designs

Posted on 31st December 2013

Over the last year I've made several releases for Labyrinth and its various plugins. Some have been minor improvements, while others have been major improvements as I've reviewed the code for various projects. I originally wrote Labyrinth after being made redundant back in December 2002, and after realising all mistakes I made with the design of its predecessor, Mephisto. In the last 11 years has helped me secure jobs, enabled me to implement numerous OpenSource projects (CPAN Testers and YAPC Conference Surveys to name just two) and provided the foundation to create several websites for friends and family. It has been a great project to work on, as I've learnt alot about Perl, AJAX/JSON, Payment APIs, Security, Selenium and many other aspects of web development.

I did a talk about Labyrinth in Frankfurt for YAPC::Europe 2011, and one question I was asked, was about comparing Labyrinth to Catalyst. When I created Labyrinth, Catalyst and its predecessor Maypole, were 2 years (and 1 year) away from release. Back then I no idea about an MVC, but I was pleased that in later years when I was introduced to the design concept, that it had seemed an obvious and natural way to design a web framework. Aside from this and both being written in Perl, Labyrinth and Catalyst are very different beasts. If you're looking for a web framework to design a mojor system for your company, then Catalyst is perhaps the better choice. Catalyst also has a much bigger community, whereas Labyrinth is essentially just me. I'd love for Labyrinth to get more usage and exposure, but for the time being, I'm quite comfortable with it being the quiet machine behind CPAN Testers, YAPC Surveys, and all the other commercial and non-commercial sites I've worked on over the years.

This year I finally released the code to enable Labyrinth to run under PSGI and Plack. It was much easier than I thought, and enabled me to better understand the concepts behind the PSGI protocol. There are several other concepts in web development that are emerging, and I'm hoping to allow Labyrinth to teach me some of them. However, I suspect most of my major work with Labyrinth in 2014 is going to be centred on some of the projects I'm currently involved with.

The first is the CPAN Testers Admin site. This has been a long time coming, and is very close to release. There are some backend fixes that are still needed to join the different sites together, but the site itself is mostly done. It still needs testing, but it'll be another Labyrinth site to join the other 4 in the CPAN Testers family. The site has taken a long time to develop, not least because of various other changes to CPAN Testers that have happened over the few years, and the focus on getting the reports online sooner rather than later.

The next major Labyrinth project I plan to work on during 2014, is the YAPC Conference Surveys. Firstly to release the current code base and language packs, to enable others to develop their own survey sites, as that has been long over due. Secondly, I want to integrate the YAPC surveys into the Act software tool, so that promoting surveys for YAPCs and Perl Workshops will be much easier, and we won't have to rely on people remembering their keycode login. Many people have told me after various events that they never received the email to login to the surveys. Some have later been found in spam folders, but some have changed their email address and the one stored in Act is no longer valid. Allowing Act to request survey links will enable attendees to simply log into the conference site and click a link. Further to this, if the conference has surveys enabled, then I'd like the Act site to be able to provide links next to each talk, so that talk evaluations can be donme much more easily.

Lastly, I finally want to get all the raw data online as possible. I still have the archives of all the surveys that have been undertaken, and some time ago I wrote a script to create a data file, combining both the survey questions and the responses, appropriately anonymised, with related questions linked, so that others can evaluate the results and provide even more statistical analysis than I currently provide.

In the meantime the next notable release from Labyrinth will be a redesign of the permissions system. From the very beginning Labyrinth had a permissions system, which for many of the websites was adequate. However, the original Mephisto project encompassed a permissions system for the tools it used, which for Labyrinth were redesigned as plugins. Currently a user has a level of permission; Reader, Editor, Publisher, Admin and Master. Each level grants more access than the previous one as you might expect. Users can also be assigned to groups, which also have permissions. It is quite simplistic, but as most of the sites I've developed only have a few users, granting these permissions across the whole site has been perfectly acceptable.

However, with a project I'm currently working on this isn't enough. Each plugin, and its level of functionality (View, Edit, Delete), need different permissions for different users and/or groups. The permissions system employed by Mephisto came close, but they aren't suitable for the current project. A brainwave over Christmas saw a better way to do this, and not just to implement for the current project, but to improve and simplify the current permission system, and enable to plugins to set their permissions in data or configuration rather than code, which is a key part of the design of Labyrinth.

This ability to control via data is a key element of how Labyrinth was designed, and it isn't just about your data model. In Catalyst and other web frameworks, the dispatch table is hardcoded. At the time we designed Mephisto, CGI::Application was the most prominent web framework, and this hardcoding was something that just seemed wrong. If you need to change the route through your request at short notice, you shouldn't have to recode your application and make another release. With Labyrinth switching templates, actions and code paths is done via configuration files. Changing can be dne in seconds. Admittedly it isn't something I've needed to do very often, but it has been necessary from time to time, such as disabling functionality due to broken 3rd party APIs, or switching templates for different promotions.

The permission system needs to be exactly the same. A set of permissions for one site may be entirely different for another. Taking this further, the brainwave encompassed the idea of profiles. Similar to groups, a profile can establish a set of generic permissions. Specific permissions can then be adjusted as required, and reset via a profile on a per user or per group basis. This then allows the site permissions to be tailored for a specific user. This then allows UserA and UserB to have generic Reader access, but for UserA to have Editor access to TaskA and UserB to be granted Editor access to TaskB. Previously the permission system would have meant both users be granted Editor access for the whole site. Now, or at least when the system is finished, a user's permissions can be set so they can be restricted to only the tasks they need access to.

Over Christmas there have been a few other fixes and enhancements to various Labyrinth sites, so expect to see those to also find their way back into the core code and plugins. I expect several Labyrinth related releases this year, and hopefully a few more talks at YAPCs, Workshops and technical events in the coming year about them all. Labyrinth has been a fun project to work on, and long may it continue.

File Under: labyrinth / opensource / website
NO COMMENTS


Lullaby of London

Posted on 21st December 2013

The 2013 London Perl Workshop Conference Survey results are now online.

Although percentage wise the submissions are up, the actual number of respondents are just slightly lower than previous years. Though it has to be said I'm still pleased to get roughly a third of attendees submitting survey responses. It might not give a completely accurate picture of the event, but hopefully we still get a decent flavour of it.

Two questions, which I plan to pay closer attention to in future surveys are; 'How do you rate your Perl knowledge?' and 'How long have you been programming in Perl?' Originally the age question usually gave some indication of how long someone had been using Perl, but from experience, I now know that doesn't work. As such, these two questions hopefully give us a better idea of the level of knowledge and experience of attendees. Perhaps unsurprisingly London.pm had a lot of attendees who have been around the Perl community for many years, particularly as it was the first non-US Perl Monger group. However, we do still see a notable number of people who are relatively new to Perl. It will be interesting to see whether these numbers change over the years, as although the community doesn't appear to be growing radically, it is still attracting first-time attendees.

Looking at the list of suggested topics, I was intrigued to see "Testing" in there. Apart from my own talk and Daniel Perrett's, there wasn't anything specifically about testing. I don't know if its because the older hands are more weary of giving test talks, or whether everyone thinks everything has been said, but I do think it's a topic that worth repeating. We regularly have new attendees who have never seen these talks before, so hopefully we'll see some more submitted at future workshops and YAPCs. There was also a lot of interest in practical uses of web frameworks. Although Andrew Solomon held a Dancer tutorial, seeing how to solve specific problems with web applications would be valuable to many. Having said that, the diverse range of subjects that was on offer at the workshop, was equally as interesting. I just hope Mark and Ian are so inundated with talks next year, we have an even greater choice from the schedule.

Thank you to Mark and Ian from organising another great Perl event, and thanks to all the speakers for making it worth attending. Also to all the attendees, especially those who took the time to respond to the survey, and for all the talk evaluations. I know the speakers appreciate the evaluations, as I've had a few thank yous already :)

Enjoy the results.

File Under: community / london / opensource / survey / workshop
NO COMMENTS


Even Flow

Posted on 8th December 2013

The following is part of an occasional series of highlighting CPAN modules/distributions and why I use them. This article looks at Data::FlexSerializer.

Many years ago the most popular module for persistent data storage in Perl was Storable. While still used, it's limitations have often cause problems. It's most significant problem was that each version was incompatible with another. Upgrading had to be done carefully. The data store was often unportable, and made making backups problematic. In more recent years JSON has grown to be more acceptable as a data storage format. It benefits from being a compact data structure format, and human readable, and was specifically a reaction to  XML, which requires lots of boilerplate and data tags to form simple data elements. It's one reason why most modern websites use JSON for AJAX calls rather than XML.

Booking.com had a desire to move away from Storable and initially looked to moving to JSON. However, since then they have designed their own data format, Sereal. But more of that later. Firstly they needed some formatting code to read their old Storable data, and translate into JSON. The next stage was to compress the JSON. Although JSON is already a compact data format, it is still plain text. Compressing a single data structure can reduce the storage by as much as half the original data size, which when you're dealing with millions of data items can be considerable. In Booking.com's case they needed to do this with zero downtime, running the conversion on live data as it was being used. The resulting code was to later become the basis for Data::FlexSerializer.

However, for Booking.com they found JSON to be unsuitable for their needs, as they were unable to store Perl data structures they way they wanted to. As such they created a new storage format, which they called Searal. You can read more about the thoughts behind the creation of Sereal on the Booking.com blog. That blog post also looks at the performance and sizes of the different formats, and if you're looking for a suitable serialisation format, Sereal is very definitely worth investigating.

Moving back to my needs, I had become interested in the work Booking.com had done, as within the world of CPAN Testers, we store the reports in JSON format. With over 32 million reports at the time (now over 37 million), the database table had grown to over 500GB. The old server was fast running out of disk space, and before exploring options for increasing storage capacity, I wanted to try and see whether there was an option to reduce the size of the JSON data structures. Data::FlexSerializer was an obvious choice. It could read uncompressed JSON and return compressed JSON in milliseconds.

So how easy was it to convert all 32 million reports? Below is essentially the code that did the work:

  my $serializer = Data::FlexSerializer->new( detect_compression => 1 );

    for my $next ( $options{from} .. $options{to} ) {
        my @rows = $dbx->GetQuery('hash','GetReport',$next);
        return  unless(@rows);

        my ($data,$json);
        eval {
            $json = $serializer->deserialize($rows[0]->{report});
            $data = $serializer->serialize($json);
        };

        next  if($@ || !$data);

        $dbx->DoQuery('UpdateReport',$data,$rows[0]->{id});
    }

Simple, straighforward and got the job done very efficiently. The only downside was the database calls. As the old server was maxed out on I/O, I could only run the script to convert during quiet periods as the CPAN Testers server would become unresponsive. This wasn't a fault of Data::FlexSerializer, but very much a problem with our old server.

Before the conversion script completed, the next step was to add functionality to permanently store reports in a compressed format. This only required 3 extra lines being added to CPAN::Testers::Data::Generator.

  use Data::FlexSerializer;

    $self->{serializer} = Data::FlexSerializer->new( detect_compression => 1 );

    my $data = $self->{serializer}->serialize($json);

The difference has been well worth the move. The compressed version of the table has reclaimed around 250GB. Because MySQL doesn't automatical free the data back to the system, you need to run the optimize command on a table. Unfortunately, for CPAN Testers this wouldn't be practical as it would mean locking the database for far too long. Also with the rapid growth of CPAN Testers (we now receive over 1 million reports a month) it is likely we'll be back up to 500GB in a couple of years anyway. Now that we've moved to a new server, our backend hard disk is 3TB, so has plenty of storage capacity for several years to come.

But I've only scratched the surface of why I think Data::FlexSerializer is so good. Aside from its ability to compress and uncompress, as well as encode and decode, at speed, it is ability to switch between formats is what makes it such a versatile tool to have around. Aside from Storable, JSON and Sereal, you can also create your own serialisation interface, using the add_format method. Below is an example, from the module's own documentation, which implements Data::Dumper as a serialsation format:

    Data::FlexSerializer->add_format(
        data_dumper => {
            serialize   => sub { shift; goto \&Data::Dumper::Dumper },
            deserialize => sub { shift; my $VAR1; eval "$_[0]" },
            detect      => sub { $_[1] =~ /\$[\w]+\s*=/ },
        }
    );
    
    my $flex_to_dd = Data::FlexSerializer->new(
      detect_data_dumper => 1,
      output_format => 'data_dumper',
    );

It's unlikely CPAN Testers will move from JSON to Sereal (or any other format), but if we did, Data::FlexSerializer would be only tool I would need to look to. My thanks to Booking.com for releasing the code, and thanks to the authors; Steffen Mueller, Ævar Arnfjörð Bjarmason, Burak Gürsoy, Elizabeth Matthijsen, Caio Romão Costa Nascimento and Jonas Galhordas Duarte Alves, for creating the code behind the module in the first place.

File Under: database / modules / opensource / perl
2 COMMENTS


The Great Gates of Kiev

Posted on 27th October 2013

I've now uploaded the survey results for YAPC::Europe 2013 and The Pittsburgh Perl Workshop 2013. Both had only a third of attendees respond, which for PPW is still 20 out of 54, and 122 out of 333 for YAPC::Europe.

YAPC::Europe

In previous years we have had higher percentages of response at YAPC::Europe, but that is possibly because I was in attendance and promoted the surveys during lightning talks, and encouraged other speakers to remind people about them. It may also be the fact that there is a newer crowd coming to YAPCs, and the fact we had 44 out of the 122 respondees saying that this was their first YAPC, who have never experienced the surveys. While definitely encouraging to see newer attendees, it would be great to see more of their feedback to help improve the conferences each year. Like YAPC::NA 2013, we have reintroduced the gender question. This time around I didn't get the negative reaction, but this may also be due to the fact I've had more feedback about approaching the subject this time around. Perhaps unsurprisingly, there were rather more male respondees, but I am also very encouraged to see that 8 respondees were female. While its difficult to know the exact numbers at the event, I'd like to think that we have been able to welcome more women to the event, and hopefully will see this number increase in the future.

Looking at the locations where attendees were travelling from to attend YAPC::Europe in Kiev, it is interesting to see a much more diverse spread. Once upon a time the UK was often the highest number, even eclipsing the host country. This year, it seems many more from across the whole of Europe took advantage of the conference. Again I think this is very encouraging. If Perl is to grow and reach newer (and younger) audiences, it needs to be of interest to a large number of people, particular from many different locations. While the UK (particularly London, thanks to Dave Cross) was perhaps the start of European Perl community, YAPC::Europe is now capable of being hosted in just about any major European city and see several hundred people attend. It will be interesting to see if Sofia next year, has a similar evenly spread of locations.

Of those that responded, it does seem that we had more people in the advanced realm. Particularly seeing as we had 56 people respond with more than 10 years experience of Perl. Back when we started the surveys, it would likely have been only a handful of people who attended who could have said that they had been programming Perl for more than 10 years. Thankfully though, it isn't just us old hands, as those only programming in Perl for a few years or less, are still making it worthwhile for speakers to come back each year and promote their projects big and small to a new audience.

One comment in the feedback however, described the Perl community as hermetic. I'm not entirely convinced that's true, but it is quite likely that some find it difficult to introduce themselves and get involved with projects. Having said that, there are plenty of attendees who have only been coming to YAPCs, or been involved with the Perl community, for a short while, who have made an impact, and are now valued contributors. So I guess it may just be down to having the right personality to just get stuck in and introduce yourself. This is one area of the Perl community that Yaakov Sloman is keen to break down barriers for, even perceived ones. We do need more Yaakov's at these events to not just break the ice, but shatter it, so we all see the benefit of getting know each other better.

And talking of getting to know others better, it was a shame I didn't get to meet the 15 CPAN Testers who responded. We have had group photos in the past, and I'd like to do more when I next attend a YAPC, but I think it would also be very worthwhile if the Catalyst, Dancer, Padre and many other projects could find the time to do some group shots while at YAPCs. At YAPC::NA it is a bit of a tradition for all those who contribute to #perl on IRC to have a large group photo, but it's never encouraged others to do the same. Perhaps this is also a way for people to get to know project contributors better, as new attendees will have a better idea of who to look out for, rather than trying to figure out who fits an IRC nick or PAUSEID.

The suggest topics for future talks were quite diverse, and "Web Development Web Frameworks Testing" is definitely an interesting suggestion, particularly as we are seeing more and more web frameworks written in Perl now, and we are after all very well known for our testing culture. One question I'm planning to include next years surveys, also looks at some of these topics and attempts to find out what primary interests people have. Again, this might help guide future speakers towards subjects that are of interest to their target audience.

Pittsburgh Perl Workshop

Workshops, by their very nature, are much smaller events, but with Pittsburgh being the home of the very first YAPC::NA, it is well established to host a workshop, and it would seem attracted some high profile speakers too. Possibly as a consequence, at least one attendee felt some of the talks were a little too advanced for them. At a smaller technical event it is much harder to try and please everyone, and with fewer tracks there often is less diversity. Having said that, I hope that the attendee didn't feel too overwhelmed, and got something out of the event in other talks.

From the feedback it would seem that more knowledgeable Perl developers were in attendance, so understandable that more talks might lean towards more advanced subjects, but as mentioned for YAPCs, speakers shouldn't feel afraid of beginner style introductions or howtos for their project, that could appeal to all levels of interest.

Overall I think the Pittsburgh Perl Workshop went down very well.

What's Next?

I now have to compile the more detailed personal feedback for these and the YAPC::NA organisers, so expect to see some further documentation updates in the near future. In addition, I want to work more on the raw data downloads. While it's interesting to see the data as currently presented, others may have other ideas to interrogate the raw data for further interesting analysis. I also still need to put the current code base on CPAN/GitHub and add the features to integrate with Act better.

The next survey will be for the London Perl Workshop at the end of November. If you are planning a workshop, YAPC or other technical event that you'd to have a survey for, please let me know and I'll set you up. It typically takes me a weekend to set up an instance, so please provide as much advanced warning as possible.

File Under: community / conference / perl / survey / workshop / yapc
NO COMMENTS


Highway Star

Posted on 15th September 2013

A few years ago I posted about whether the M6 viaduct over the Gravelly Hill area of Birmingham, was the longest bridge in the UK. In February 2012, someone added a page to Wikipedia for the bridge, which appears to be called the Bromford Viaduct. The Wikipedia page references the same page I found from the Motorway Archive, but nowhere on that page did it reference the name. So now I had two questions:

  1. Is the Bromford Viaduct its official name?
  2. Is this still the longest bridge in the UK?

Regards the name, it would seem that the proposals for HS2 refer to it as the M6 Bromford Viaduct. After some further searching, it would seem that The Highways Agency calls it the Bromford Viaduct too, so I guess that is its official name, seeing as they are the government department in charge of UK roads. I still think the Spaghetti Viaduct sounds better :)

However, the second question still remained unanswered. On the Motorway Archive page it states the bridge "was then the longest continuous viaduct in Great Britain", which implies it no longer is. Having said that, in November 2010, a few months after my initial post, it seems someone else had thought the same thought, and added the bridge to the Wikipedia page for the longest bridges. Seeing as there is no other mention of a longer UK bridge and the Second Severn Crossing is measured as being 472m (1576ft) shorter, I think, until proven otherwise, Brummies can be proud to have the longest bridge in the UK.

File Under: birmingham / bridges / road
1 COMMENT


Of All The Things We've Made

Posted on 26th August 2013

Several years ago, we frequently updated the Birmingham.pm website with book reviews. To begin with, updating all the book information was rather labourious. Thankfully, on CPAN there was a set of modules that had been written by Andrew Schamp, that provided the framework to search online resources. I then wrote drivers for Amazon, O'Reilly & Associates, Pearson Education and Yahoo!. As the books we were reviewing were technical books, these four sources were able to cover all the books we reviewed.

A few years ago, I started working for a book company. In one project, we needed to evaluate book data, particularly for books where we had no data or very little. Often these were imports or out of stock titles that we could still order, but we were lacking information about. As such I created a number of further drivers, particularly for non-UK online catalogues, to help retrieve this information. I managed to create a collection of 17 drivers, and 1 bundle, all available on CPAN.

Via my CPAN Testers work, I've been promoting the CPAN::Changes Kwalitee Service website. Neil Bowers read one of the posts, and thought it would be good to improve the Changes files in his distributions, by way of QuestHub. I'd not heard of this site before, but after reading Neil's post I joined up, as I had been looking for a suitable way to keep a TODO list of my Perl work for a while. Neil had created a stencil to standardise the Changes file in 5 distributions, but unfortunately, I only had a few distributions of my own to complete. Another stencil emerged to add License and Repository information to 5 CPAN distributions. Again, I'd completed this for most of my distributions, apart from my 18 ISBN distributions, which I'd never got around to creating repositories for.

Then Neil had the idea to look at some of the quality aspects of all the CPAN distributions, and highlight those that might need adoption. As part of his reviews of similar modules over the past few years, he's adopted several modules, and was looking at what others he could help with. The results included 2 of the modules written by Andrew Schamp, which formed part of the ISBN searching framework I used for my ISBN distributions. Seeing as they hadn't been touched in eight years, I suspected that Andrew had moved on to other languages or work. So I contacted him to see whether he was interested in letting me take the modules on and update them.

It turns out that Andrew had written the modules for a college project, and since moving to C and with his programming interests now nothing to do with books, he was happy to hand over the keys to the modules. Over the past week, I have now taken ownership of Andrew's 5 modules, added these and my own 18 ISBN distributions to my local git repository, added all 23 to GitHub, updated the Changes file, and License & Repository info to the 5 new modules and released them all to CPAN. My next task is to update the Repository info in my 18 ISBN distributions and release these to CPAN.

Although I don't work in the book industry anymore, writing these search drivers has been fun. The distributions are perhaps my most frequently releases to CPAN, due to the various websites updating their sites. Now that I have access to the core modules in the framework, I plan to move some of the repeated code across many of the drivers into the core modules. I also plan to merge the three main modules into one distribution. When Andrew originally wrote the modules, it wasn't uncommon to have 1 module per distribution. However, as all three are tightly bound together, it doesn't make much sense to keep them separate. The two drivers Andrew wrote have not worked for several years, as unsurprisingly the websites have changed in the last 8 years. I've already updated one, and will be working on the other soon.

It's nice to realise that a few of my CPAN Testers summary posts inspired Neil, who in turn has inspired me, and has ended up with me helping to keep a small corner of CPAN relevant and up to date again.

If you're a new Perl developer, who wants to take a more active role in CPAN and the Perl community, a great way to start is to look at the stencils on QuestHub, and help to patch and submit pull/RT requests to update distributions. If you feel adventurous, take a look at the possible adoption list, and see whether anything there is something you'd like to fix and bring up to date. You can also look at the failing distributions lists, and see whether the authors would like help with the test suites in their distributions. You can then create your tasks as quests in QuestHub and earn points for your endeavours. Be warned though, it can become addictive :)

There is one more ISBN distribution on the adoption list, and I have now emailed the author. Depending on the response, I may be going through the adoption process all over again :) [Late update, the author came back to me and he's happy for me to take on his distribution too]

File Under: isbn / opensource / perl
NO COMMENTS


Who Knows Where The Time Goes?

Posted on 24th July 2013

YAPC::NA 2013 - The Results Are Out

The YAPC::NA 2013 Conference Survey results are now online.

The number of responses was much lower than in previous years, which is a shame, but may in part be due to one comment I received, saying it was too long. Reviewing the survey, I'd have to agree, and I'll be removing some of the questions for future surveys. Some of the questions had good intentions originally, and did provide an insight to what people got out of the conference. However, there is now a degree of predictability about them, that doesn't warrant their inclusion. Such questions about holidays and speakers you missed really don't add anything any more. The latter has generated some interesting comments over the years, but typically the same names appear each year.

This year was also slightly different, as the organisers asked for a lot of additional questions. Particularly related to the Code of Conduct. I will be forwarding the results of these questions to the TPF in the next day or two. They may choose to make the results public, but for now they won't appear on the YAPC Survey site. Of the other questions they asked, most related specifically to YAPC::NA, and wouldn't be applicable to other conferences and workshops. These too will be reviewed for next year.

Interestingly, VM Brasseur has done some analysis of the survey data, particularly around the age of attendees, and the length of time people have been a Perl programmer. Although the survey includes the former, it doesn't really include the latter. We do ask what level people feel they are at, but it'll be an area I'll be reviewing for future surveys.

As both the surveys and VM's analysis shows, the Perl community (at least those answering the survey) is getting older. I've noticed this too when attending. There are new and younger people attending, but generally the audience has been getting older. In the UK, this was identified in an technical article I read a few years ago (sadly I don't have a link to the source), which highlighted a shift in the late 80s/early 90s away from writing computer games on Spectrums, Dragons and Beebs to just playing consoles. I suspect the age of attendees at other technical conferences are also seeing a shift.

As noted in a previous post, I'm going to be looking at the Conference Survey software over the summer, and hopefully integrate it more with the Act software. I'm hoping this may encourage more to respond. I'll also be reviewing the survey itself, and looking at better and more relevant questions to include. If you have ideas of how to improve the survey, please feel free to drop me an email.

Enjoy :)

File Under: conference / perl / survey / yapc
NO COMMENTS


Some Rights Reserved Unless otherwise expressly stated, all original material of whatever nature created by Barbie and included in the Memories Of A Roadie website and any related pages, including the website's archives, is licensed under a Creative Commons by Attribution Non-Commercial License. If you wish to use material for commercial puposes, please contact me for further assistance regarding commercial licensing.