Let's Go Crazy

Posted on 31st May 2016

Last weekend saw me in Rubgy for the 9th QA Hackathon. This is a Perl event where the key developers for CPAN, PAUSE, MetaCPAN, CPAN Testers, Dist::Zilla, Test2 and many other Perl 5 and Perl 6 projects, get together to discuss problems, future plans and collaborate on code.

Although I was a co-organiser of the event, I really would like to thank my fellow co-organisers; Neil Bowers (NEILB) and JJ Allen (JONALLEN). Without these guys, organising this QA Hackathon would have been tough, as they really did all the hard work. Also many thanks to Wendy for keeping us fed with nibbles, keeping notes and generally making sure we all stayed focused. An event like this needs a team, and they are an awesome team.

My main aim for this event was to meet Doug Bell (PREACTION). Back last summer, the CPAN Testers server had some severe problems, which meant we had to switch to a new physical server. It was at this moment I realised that I couldn"t do this alone any more. Doug stepped up and started to take over the reins, and has done a great job since. However, I"d never met Doug, so this was going to the first opportunity to catch up in person. After only a few moments of saying hello, I knew we had found the right person to take over CPAN Testers. Doug has a lot of great ideas, and is more than capable of taking the project to the next level, which is where I wanted to see it grow to, but knew it needed fresh eyes to take it there. I feel immensely confident that I have left the keys in capable hands, and with the ideas Doug has already shown me, I expect bigger and better things for CPAN Testers' future. Please look after him :)

On the first day Oriol Soriano Vila (UREE) introduced himself to Doug and I. Oriol was suggested to the organisers by his employer, and after explaining what the event was about, Oriol was even more enthusiastic to attend. I'm glad he did, as he is another great asset to both CPAN Testers and Perl. Although we have referred to him as our "intern", Oriol has proved he was just one of the team. He has some great ideas, asked all the right questions and had results by the end of the hackathon too! You can read more on his own blog.

So once we got our introductions out of the way, we started looking at a high priority problem. One that had been reported in two different ways, but was in fact the same problem. The Summary RAG bars and the release database (as used by MetaCPAN). In turns out the problem was straightforward. After the server crash last year, The database scheme used to rebuild the new server was missing a new column in the release summary table, and thus wasn't getting updated. Once that was fixed, it was "simply" a matter of rebuilding the table. Sadly it took the whole weekend to rebuild, but once completed, we were able to start regenerating the release SQLite database. That took a week, but I"m pleased to say all is now updating and available again.

While that was rebuilding, I started to take a look at some other issues. After introducing Oriol to our family of websites, he registered for the wiki, and spotted a problem with the registration process. After some tinkering, I got that working again. I've no idea how long it's been a problem, but apologies to anyone affected.

In the introductions at the beginning of the event, Leo Lapworth (LLAP) mentioned that he was hoping to refine MetaCPAN"s use of Fastly, and was interested in helping anyone else who might be interested in using the service for their project. I got Leo to sit with me for a while and he gave me a good run through of what the service is, what it can do, and why we should use it for CPAN Testers. I didn"t take much convincing, and quickly opened an account and started to move the main family of websites to it. We have since seen a slight drop on hits to the server, but I expect that to improve as the static pages (the individual reports) are cached. Even the dynamic pages can benefit from their caching, as although many will change throughout the day, only a small portion are updated more than once an hour. Once we learn more about Fastly, expect to see some better response times for your page hits.

Talking with Christian Walde (MITHALDU), he wanted to help with the performance of the websites, particularly the Reports website. However, with the rebuilding ongoing, the server wasn't in the best place to really evaluate performance. He did happen to mention that the reports he was getting from the mailer were coming through as garbage. After some investigation, I discovered that the mailer had not been upgraded to use Sereal, which is now our serialiser of choice for the reports stored in the database. With that fixed, together with some further improvements and with all tests running, we put it live and waited. The following morning Christian report he had readable reports coming through again.

One aspect for testing the Reports site, and one that would have restricted Christian to evaluate the performance, is that apart from mine and Doug"s development machines, there is no stable installable full instance of the CPAN Testers Report site, including databases and cron scripts. As such, Doug has been working on providing exactly that. It has been on my TODO list for some time, as some of the bug reports and issue requests would have been quashed much more efficiently had others been able to fire up a working site and be able to send a pull request. You can read more about Doug"s progress on his blog, and hopefully this will encourage more people in the longer term to get involved with CPAN Testers development work.

Throughout the weekend I worked on cleaning up some of the templates on the various websites, ensuring that sponsors were correctly attributed, and fixed several bugs in some of the associated distributions. Not all have been pushed to CPAN, but work is ongoing.

Having finally met, Doug and I went through all the website logins and social media accounts, and made sure he had all the keys. The handover process has been a longish one, but I didn"t want to overwhelm Doug, and wanted him to find his feet first. After this weekend, expect more posts and updates from him rather than me. Please look after him :)

I also joined in some for the discussions regarding the CPAN River and the naming of the QA Hackathon. Neil has written up both admirably, and while I didn"t contribute much, it was good to see a lot of healthy discussion on both subjects. Regarding the naming of the event, I do think it's a shame that the likes of Google have turn the word "Hackathon" into the concept of a competition event, which the QA Hackathon event is definitely not. Ours is about collaboration and planning for the future, with many of the key technical leads for the various toolchain and associated projects within Perl 5 and Perl 6. I don"t have a suitable name to suggest, but I would recommend ensuring the acronym could not be used negatively.

In the coming weeks, I hope to collate all the website tests I run prior to updating the CPAN Testers family websites, and handing over to Doug for his new development environment for CPAN Testers. This will hopefully enable easier access to anyone wanting to help fix problems on the websites and backends in the future. 

In short, my completed tasks during the hackathon were:

  • Fixed the registrations for the CPAN Testers Wiki.
  • Got CPAN Testers Reports running on the Fastly (http://fastly.com) service, allowing us to caching some of the pages, and reduce the load on the webserver when trying to recreate reasonably static pages. Also means the routing for anyone viewing the site outside of Europe is going to reduce page load times too.
  • Fixed some bugs in the Reports Mailer, refreshed the tests and test data, and tidied up the notifications.
  • Fixed the Reports Mailer for sending individual reports, due to the DB storage now using Sereal. Note this had no effect on the summary reports.
  • Fixed a long running bug with the Summary panel (and release summary table), which turns out has also been affecting MetaCPAN.
  • Continued to hand over the final keys to Doug Bell (PREACTION), who is now carrying the torch for CPAN Testers.
  • Fixed a few bugs in other distributions, a couple related to CPAN Testers.
  • Cleaned up some of the CPAN Testers family website templates.
  • Joined discussions for The Perl River, the (re)naming of the QAH and the future of CPAN Testers.

It was a very productive event, and for CPAN Testers, I'm pleased it gave Doug and I a chance to knowledge share, and ensure he has everything he needs to not only keep the project going, but help develop new ideas to solve some of the big data problems that CPAN Testers sometimes throws up. Over the past 6 months or so, I have been taking a back seat, for various reasons, and in the coming months you will hear much less from me regarding CPAN Testers. Occasionally, I may pitch in to discussions to help give some background to decisions that were made, to give some context to why we wrote code a certain way, or designed a DB table the way we did, but this is now Doug's project, he will be the main point of contact now.

During the wrap at the end of the event, where we got to say a little piece about what we achieved, Chris Williams (BINGOS) made announcement to say thank you to me for 10 years of CPAN Testers. After taking on the challenge to grow CPAN Testers, and make it more interesting for people to get involved, I think I've achieved that. The project is well respected throughout the Perl community, and I've had some kind words from people in the wider OpenSource community too, and with over 68 million test reports in the database, I think I can safely say that has been a success. I wish Doug all the best taking it to the next level, and hope he gains as much knowledge and experience (if not more) from the project as I've done. Thanks to everyone who has support the project, me and all those that came before.

The QA Hackathon would not have been possible without the Sponsors. No matter what they have contributed, we owe them all our thanks for enabling the participants the time and ability to work together for the benefit of all. Thank you to FastMail, ActiveState, ZipRecruiter, Strato, SureVoIP, CV-Library, OpusVL, thinkproject!, MongoDB, Infinity, Dreamhost, Campus Explorer, Perl 6, Perl Careers, Evozon, Booking, Eligo, Oetiker+Partner, CAPSiDE, Perl Services, Procura, Constructor.io, Robbie Bow, Ron Savage, Charlie Gonzalez, and Justin Cook.

File Under: hackathon / opensource / perl / qa / rugby

Crash Course in Brain Surgery

Posted on 22nd March 2015

A Year of CPAN Uploads

On Thursday, 19th March 2015 I uploaded my 366th consecutive release to CPAN. To most that may well be "meh, whatever!", but for me it has been an exhausting yet fulfilling exercise. The last 60 days though, were undoubtably the hardest to achieve.

When I started this escapade, I did it without realising it. It was several days before I noticed that I had been commiting changes every day, just after the QA Hackahon in Lyon. What made it worse was that I then discovered that I had missed a day, and could have had a 3 day head-start beyond the 9 days I already had in hand. Just one day behind me was Neil Bowers, and the pair of us set about trying to reach 100 consecuive days. It took a while for us to get into the flow, but once we did, we were happily committing each day.

Both of us created our own automated upload scripts, to help us maintain the daily uploads. This was partly to ensure we didn't forget, but also allowed us to be away for a day or two and still know that we would be able to upload something. In my case I had worried I would miss out when I went on holiday to Cornwall, but thankfully the apartment had wifi installed, and I was able to manage my releases and commits every morning before we left to explore for the day.

I mostly worked at weekends and stocked up on releases, sometimes with around 10 days prepared in advance. Most of the changes centred around bug fixes, documentaion updates and test suite updates, but after a short while, we both started looking at our CPANTS ratings and other metrics around what makes a good packaged release. We both created quests on QuestHub, and ticked off the achievements as we went. There were plenty of new features along the way too, as well as some new modules and distributions, as we both wanted to avoid making only minor tweaks, just for the sake of releasing something. I even adopted around 10 distributions from others, who had either moved on to other things or sadly passed away, and brought them all up to date.

Sadly, Neil wasn't able to sustain the momentum, and had to bail out after 111 consecutive uploads. Thankfully, I still had plenty of fixes and updates to work through, so I was hopeful I could keep going for a little while longer at least.

One major change that happened during 2014, was to the CPANTS analysis code. Kenichi Ishigaki updated the META file evaluations to employ a stricter rendition of the META Specification, which meant the license field in most of my distributions on CPAN now failed. As a consequence this gave me around 80 distributions that needed a release. On top of this, I committed myself to releasing 12 new distribuions, one each month, for a year, beginning March 2014. Although I've now completed the release of the 12 distributions, I have yet to complete all the blog posts, so that quest is still incomplete.

I made a lot of changes to Labyrinth (my website management framework) and the various ISBN scrapers I had written, so these formed the bedrock of my releases. Without these I probably wouldn't have been able to make 100 consecutive releases, and definitely not for a full year. But here I am 366+ days later and still have releases yet to do. Most of the releases from me in the future will centre around Labyrinth and CPAN Testers, but as both require quite in depth work, it's unlikely you'll see such a frequent release schedule. I expect I'll be able to get at least one released a week, to maintain and extend my current 157 week stretch, but sustaining a daily release is going to be a struggle.

Having set the bar, Mohammad S Anwar (MANWAR) and Michal Špaček (SKIM) have now entered the race, and Mohammad has said he wants to beat my record. Both are just over 200 days behind, and judging from my experience, they are going to find it tricky once they hit around 250, unless they have plenty of plans for releases by then. After 100, I had high hopes of reaching 200, however I wasn't so sure I would make 300. After 300, it really was much tougher to think of what to release. Occasionally, I would be working on a test suite and bug fixes would suggest themselves, but mostly it was about working through the CPAN Testers reports. Although, I do have to thank the various book sites too, for updating their sites, which in turn meant I had several updates I could make to the scrapers.

I note that Mohammad and Michal both are sharing releases against the Map-Tube variants, which may keep them going for a while, but eventually they do need to think about other distributions. Both have plenty of other distributions in their repetoire, so it's entirely possible for them both to overtake me, but I suspect it will be a good while before anyone else attempts to tackle this particular escapade. I wish then both well on their respective journies, but at least I am safe in the knowledge I was the first to break 1 year of daily consecutive CPAN uploads. Don't think I'll be trying it again though :)

File Under: cpan / opensource / perl

Wondering What Everyone Knows

Posted on 26th September 2014

The YAPC::Europe 2014 survey results are now online.

YAPC::Europe Survey

Although we appear to have had an increased response this year, 42% up from 36%, there were only 74 actual responses, down from 122 last year. Sadly it was a smaller audience in total, and perhaps that's partly due to the previously core attendees not wishing to travel further. However, that said, as we are attempting to increase attendance at conferences, it was hoped that more first time attendees from South Eastern Europe would attend. Unfortunately, it would seem the opposite was the case, with only 4 responses from people who came from Bulgaria itself. I am willing to accept that many of the non-respondees are non-native English speakers, and found it difficult to complete the survey, but that would be true across a lot of Europe too. Those responding were not just the usual attendees either, there were many first and second time attendees there too.

I think part of this lack of reponse from previous years, is also down to the lack of promotion of the surveys. As I'm not able to attend in person at the moment, I have to hope that the organisers advertise the surveys, but it would be nice to have the speakers themselves mention them too. It is obvious that several speakers value the feedback they get, both publicly and privately, so it would be nice to see that encouraged after each talk. My hope is that the knock-on effect is that respondees also complete the main survey too.

Looking at both the demographics and the questions around Perl knowledge and experience, it would still seem we have an attendance that is on a upward curve for all aspects. Although attendance is perhaps getting older and wiser, that's not necessarily true of Perl programmers in general, particularly outside the traditional group of people who get involved in the community. It would be great to see how we can change that in the future. After all, Perl's future relies on those coming to the language in their teens and early twenties. If you're interested in helping, reading Daisuke Maki's post about what do you want your YAPC to be?, is very worthwhile. YAPC::Asia has helped to change attitudes, but that has only happened because the organisers set themselves goals of what they want to achieve. With YAPC::Europe, we haven't had this, partly due to the organisers changing each year, but also because there doesn't appear to be any long term goals.

From the responses of this year's attendees, we can see that we have a lot of people who are involved with the community in several ways. However, what about those who aren't involved. Is it just this conference is their first exposure to the Perl community, did those people not respond to the survey, or is it that they weren't there? YAPC::Asia's decision to be inclusive to other languages and technologies has been a benefit, not just to Perl, but to the whole OpenSource community in Japan. Couldn't we do the same in Europe or North America?

This year the conference schedule was largely handled by a small remote group of interested volunteers, who stepped up when Marian asked for help. I believed it worked, and if the same group, perhaps with input from next year's local organisers for what they would like, this might be a step to help improve the schedule. Using the same team for YAPC::Asia has worked, so why not elsewhere. It is always difficult to get to know about good speakers, particularly if the speakers was previously unknown to schedule organisers. The missing speakers are not too surprising, but it is always nice to see some unexpected suggestions, and it is useful for schedule organisers to get these hints so that they can try and reduce clashes. I would also encourage attendees to make use of the star system in Act, as this too can be used by the schedule organisers to identify popular talks, and ensure that in future they are in appropriately sized rooms.

I would also suggest that speakers and organisers take note of the topic suggestions. These are subjects people are asking to hear, and if they are returning attendees, may well be your future attendees at your next talk. The beginner track is also a great one, and would be worth looking at for the future. I know it was attempted for several years, but seems to have died off, which is a shame. Those new to Perl wanting to get more involved, can and want to learn a lot from the rest of us. It's why they come. It would be great to have them return to their bosses afterwards full of ideas, which may then enable them and/or their colleagues to return in the future.

Talk Evaluations

In an organisers mailing list, it was asked whether the talk evaluation surveys should be partly public, and exposed to the organisers. For anyone worried about that, I have said no. One reason being that I don't think that the results can or should be compared. Every attendee has a different opinion and a different scale to anyone else. We also have a wide variety of attendees to talks. Trying to compare a talk with 100 attendees to one with 10 doesn't make any sense. My second reason is that I think speakers or respondees should not be publicly compared. In private, one bad review could be taken into consideration and either disregarded, or help the speaker improve next time. Making that public is likely to have two effects; firstly a request from speakers to not be part of the evaluations, and secondly judgements made against the speaker for past work, when they have learnt and improved. They deserve our support, not rejection.

I have also been asked about providing talk evaluations for the video streams. Initially I was against this, partly because of a couple of people launching a torrent of abuse at me for not doing them already! However, a couple of people who have been more rational have asked about doing them. As such, I've started to look at what is involved. I expect these talk evaluations to be different, if only because the listener is not in the room, and the rapport and presentation will have a different feel during the stream. I may look to introduce this for YAPC::NA 2015, if I can get the rest of my plans for the surveys implemented, together with the changes to Act in time.

Future Surveys

I have several plans to write about the future of the surveys, and I'll be writing a few more blog posts about them in the months to come. If you have suggestions for improvements, please let me know. The software is now on CPAN and GitHub, so you are welcome to contribute changes, to help move things along. If you would like your event, workshop or conference, to have a survey, please get in touch and I'll see what I can set up for you.

File Under: conference / survey / yapc

A Celebration

Posted on 11th August 2014

August the 16th is CPAN Day. Its the day that is marked as the first upload to CPAN. CPAN came into existence in July/August 1995, but on August 16th 1995 Andreas König made the first true upload to the archive. And so began a growth that is continuing and growing to this day. Jumping ahead to several weeks ago, Neil Bowers decided to make a big deal about this day. After all we celebrate Perl's birthday, why not celebrate CPAN too.

Neil has been posting several ideas for how you can improve distributions on CPAN, with the aim of making several releases on CPAN Day. With that aim in mind, Neil posted some stats and graphs to show what has happened on CPAN Day in previous years. I did some data mining and came up with a page to help monitor CPAN Day. I sent the link to Neil, who then came up with several suggestions. Trying to create the graphs proved interesting, and thanks to everyone on Twitter who sent me various links to help out.

The page has expanded slightly and includes the neoCPANisms, which Neil has been monitoring. NeoCPANism being the number of new distributions that have never been uploaded to CPAN before. It will be interesting to see how many new distributions get released on CPAN Day, as the biggest day of new release was nearly 2 years ago, with 41 new distributions release on the same day.

The page is now created in real time (well every 5 minutes) so you can see how we're progressing throughout the day. The page is now available at stats.cpantesters.org/uploads.html. You can watch progress for each day now, not just CPAN Day, but let's see if we can reach the suggested target on Saturday :)

File Under: cpan / perl / statistics

Sunshine Of Your Love

Posted on 17th July 2014

The survey results for YAPC::NA 2014 are now online.

Even we with lower numbers of attendees this year, 27% of you took the time to respond to the survey. As always, this doesn't necessarily allow us to see the whole picture, but hopefully it is enough of a cross-section of the attendees to help us improve future events. Once again we had a healthy number of respondees for whom this was their first YAPC, many having never attendeed a workshop either.

There was a bit of a mixed reaction throughout the survey. Although having read the feedback from the talk evaluations, there was a lot of positive comments, and several words of encouragement for some of the new speakers, which was great to see. Overall it seems to have been another great conference, although there are areas of communication that many felt could be improved.

I see I'll have to expand the options for the question "What other areas of the Perl Community do you contribute to?", as firstly I would include hacking on Perl core, as part of a Perl project (i.e. a group of great people doing great work to improve Perl), but also to include a new option; I donate to one of the funds managed by TPF or EPO. During the conference I saw a few Twitter posts about contributing to some of the Perl funds, which I think came about following Dan Wright's presentation. It is great that so many have donated, big and small amounts, to the various funds. They all help to improve and promote Perl, and give us good reasons to continue putting together great conferences and workshops every year.

It was great to see any good list of suggestions for topics this year, and I hope that speakers new and old, get some ideas for future talks from them.

Lastly it does seem that the location question, really does depend where the current location is. The higher numbers last year may also indicate that Austin was easier to get to for most people, whereas a more easterly location, such as Florida, may restrict the ability to attend for those on the west coast. It would be interesting to see whether a similar opposite trend would result if the conference was held in Nevada, California, Oregon, Idaho, Utah or Arizona. There must be several Perl Monger groups in those states, so if you're in one, perhaps think about balancing out the number of eatern hosting states ;)

File Under: community / conference / perl / yapc

100 Nights

Posted on 13th July 2014

100 in more ways than one!

100 #1

11 years ago I was eager to be a CPAN Author, execpt I had nothing to release. I tried thinking of modules that I could write, but nothing seemed worth posting. Then I saw a post on a technical forum, and came up with a script to give the result the poster was looking for. Looking at the script I suddenly realised I had my first module. That script was then released as Calendar::List, and I'm pleased to say I still use it today. Although perhaps more importantly, I know of others who use it too.

Since then, I have slowly increased my distributions to CPAN. However, it wasn't until I got involved with CPAN Testers that my contributions increased noticeably. Another jump was when I wrote some WWW::Scraper::ISBN driver plugins for the Birmingham Perl Mongers website to help me manage the book reviews. I later worked for a book publishing company, during which time I added even more. My next big jump was the release of Labyrinth.

In between all of those big groups of releases, there have been several odds and ends to help me climb the CPAN Leaderboard. Earlier this year, with the idea of the Monthly New Distribution Challenge, I noticed I was tantalisingly close to having 100 distributions on CPAN. I remember when Simon Cozens was the first author to achieve that goal, and it was noted as quite an achievement. Since then Adam Kennedy, Ricardo Signes and Steven Haryanto have pushed those limits even further, with Steven having over 300 distributions on CPAN!

My 100th distribution came in the form of an addoption, Template-Plugin-Lingua-EN-Inflect, originally written by the sadly departed Andrew Ford.

100 #2

My 100th distribution came a few days before I managed to complete my target of a 100 consecutive days of CPAN uploads. A run I started accidentally. After the 2014 QA Hackathon, I had several distribution releases planned. However, had I realised what I could be doing, I might have been a bit more vigilant and not missed the day between what now seems to be my false start and the real run. After 9 consecutive days, I figured I might as well try to reach at least a month's worth of releases, and take the top position from ZOFFIX (who had previously uploaded for 27 consecutive days) for the once-a-day CPAN regular releasers.

As it happened, Neil Bowers was on a run that was 1 day behind me, but inspired by my new quest, decided he would continue as my wingman. As I passed the 100 consecutive day mark, Neil announced that he was to end his run soon, and finally bowed out after 111 days of releases. My thanks to Neil for sticking with me, and additionally for giving me several ideas for releases, both as suggestions for package updates and a few ideas for new modules.

I have another quest to make 200 releases to CPAN this year, and with another 20 release currently planned, I'm still continuing on. We'll see if I can make 200, or even 365, consecutive days, but reaching 100 was quite a milestone that I didn't expect to achieve.

100 #3

As part of my 100 consecutive days of CPAN uploads challenge, I also managed to achieve 100 consecutive days of commits to git. I had been monitoring GitHub for this, and was gutted to realise that just after 101 days, I forgot to commit some changes over that particular weekend. However, I'm still quite pleased to have made 101 days. I have a holiday coming up soon, so I may not have been able to keep that statistic up for much longer anyway.

100 #4

As part of updates to the CPAN Testers Statistics site, I looked at some additional statistics regarding CPAN uploads. In particular looking at the number of distributions authors have submitted to CPAN, both over the life of CPAN (aka BackPAN) and currently on CPAN. The result was two new distributions, Acme-CPANAuthors-CPAN-OneHundred and Acme-CPANAuthors-BACKPAN-OneHundred.

When I first released the distributions, I only featured in the second. For my 100th consecutive day, I released the latest Acme-CPANAuthors-CPAN-OneHundred up to that day, and with my newly achieved 100th distribution, was delighted to feature in the lists for both distributions.

File Under: opensource / perl

Time Waits For No One

Posted on 10th May 2014

When I relaunched the CPAN Testers sites back in 2008, I was in a position to be responsible for 3 servers, the CPAN Testers server, the Birmingham Perl Mongers server, and my own server. While managing them wasn't too bad, I did think it would be useful having some sort of monitoring system that could help me keep an eye on them. After talking to a few people, the two key systems most keenly suggested were Nagios and Munin. Most seemed to favour Munin, so I gave it a go. Sure enough it was pretty easy to set up, and I was able to monitor the servers, using my home server to monitor them. However, there was one area of monitoring that wasn't covered. The performance of the websites.

At the time I had around 10-20 sites up and running, and the default plugins didn't provide the sort of monitoring I was looking for. After some searching I found a script written by Nicolas Mendoza. The script not only got me started, but helped to make clear how easy it was to write a Munin plugin. However, the script as was, didn't suit my needs exactly, so had to make several tweaks. I then found myself copying the file around for each website, which seem a bit unnecessary. So I wrote what was to become Munin::Plugin::ApacheRequest. Following the Hubris and DRY principles copying the script around just didn't make sense, and being able to upgrade via a Perl Module on each server, was far easier than updating the 30+ scripts for the sites I now manage.

Although the module still contains the original intention of the script, how it does it has changed. The magic still happens in the script itself.

To start with an example, this is the current script to monitor the CPAN Testers Reports site:

#!/usr/bin/perl -w
use Munin::Plugin::ApacheRequest;
my ($VHOST) = ($0 =~ /_([^_]+)$/);

Part of the magic is in the name of the script. This one is 'apache_request_reports'. The script extracts the last section of the name, in this case 'reports', and passes that to Run() as the name of the virtual host. If you wish to name the scripts slightly differently, you only need to amend this line to extract the name of your virtual host as appropriate. If you only have one website you may wish to name the host explicity, but then if you create more it does mean you will need to edit each file, which is what I wanted to avoid. All I do now is copy an existing file to one to represent the new virtual host when I create a new website, and Munin automatically adds it to the list.

Munin::Plugin::ApacheRequest does make some assumptions, one of which is where you locate the log files, and how you name them for each virtual host. On my servers '/var/www/' contains all the virtual hosts (/var/www/reports, in this example), and '/var/www/logs/' contains the logs. I also use a conventional naming system for the logs, so '/var/www/logs/reports-access.log' is the Access Log for the CPAN Testers Reports site. Should you have a different path or naming format for your logs, you can alter the internal variable $ACCESS_LOG_PATTERN to the format you wish. Note that this is a sprintf format, and the first '%s' in the format string is replaced by the virtual host name. If you only have one website, you can change the format string to the specific path and file of the log, and no string interpolation is done.

The log format used is quite significant, and when you describe the LogFormat for your Access Log in the Apache config file, you will need to use an extended format type. The field to show the time taken to execute a request is needed, which is normally set using the %T (seconds) or %D (microseconds) format option (see also Apache Log Formats). For example my logs use the following:

LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\" %T %v"

The second to last field is our time field. In Munin::Plugin::ApacheRequest, this is stored in the $TIME_FIELD_INDEX variable. By default this is -2, assuming a similar log format as above. If you have a different format, where the execution time is in another position, like $ACCESS_LOG_PATTERN, you can change this in your script before calling Run(). A positive number assumes a column left to right, while a negative number assumes a column right to left.

The last number passed to the Run() method, determines the number of lines read for the access log to describe the average execution time. For high hit rate sites, you may wish this to be a higher number, but as most of my sites are not that frequently visited, 1000 seems to be a reasonable number.

The config statements that are generated for the Munin master monitor are currently hardcoded with values. This will change in a future version. For the example above the config produced reads as:

graph_title reports ave msecs last 1000 requests
graph_args --base 1000
graph_scale no
graph_vlabel Average request time (msec)
graph_category Apache
graph_info This graph shows average request times for the last 1000 requests
images.warning 30000000
images.critical 60000000
total.warning 10000000
total.critical 60000000

The highlighted values are interpolated from the arguments passed to Run(). In a future version I want to be able to allow you to reconfigure the warning and critical values and the graph base value, should you wish to.

I have now been using Munin::Plugin::ApacheRequest and the associated scripts for 6 years now, and it has proved very successful. I have thought about releasing the module to CPAN previously, and have made several attempts to contact Nicolas over the years, but have never had a reply. I know he was working for Opera when he released his script, but have no idea of his whereabouts now. As the script contained no licensing information, I was also unsure what licensing he had intended the code to be. I hope he doesn't mind me having adapted his original script, that I'm now releasing the code under the Artistic License v2.

Although I haven't been able to contact Nicolas, I would like to thank him for releasing his original script. If I hadn't have found it, it is unlikely I would have found a way to write a Munin plugin myself to do Apache website monitoring. With his headstart, I discovered how to write Munic plugins, and can now set up monitor of new websites within a few seconds. Thanks Nicolas.

File Under: opensource / perl / website

Counting Out Time

Posted on 20th March 2014

I had an SQL query I wanted to translate into a DBIx::Class statement. I knew there must be a way, but trying to find the answer took some time. As a result I though it worth sharing in the event somebody else might be trying to find a similar answer.

The SQL I was trying to convert was:

SELECT status,count(*) AS mailboxes,
count(distinct username) AS customers
FROM mailbox_password_email GROUP BY status

The result I got running this by hand gave me:

| status    | mailboxes | customers |
| active    |     92508 |     48791 |
| completed |       201 |       174 |
| inactive  |    116501 |     56843 |
| locked    |    129344 |     61220 |
| pending   |      1004 |       633 |

My first attempt was:

my @rows = $schema->resultset('Mailboxes')->search({},
        group_by => 'status',
        distinct => 1,
        '+select' => [
            { count => 'id', -as => 'mailboxes' },
            { count => 'username', -as => 'customers' } ]

Unfortunately this gave me the following error:

DBIx::Class::ResultSet::all(): Useless use of distinct on a grouped 
resultset ('distinct' is ignored when a 'group_by' is present) at
myscript.pl line 469

So I took the 'distinct  => 1' out and got the following results:

| status    | mailboxes | customers |
| active    |     92508 |     92508 |
| completed |       201 |       201 |
| inactive  |    116501 |    116501 |
| locked    |    129344 |    129344 |
| pending   |      1004 |      1004 |

Which might be distinct for the mailboxes, but is not sadly distinct for customers. So I try:

my @rows = $schema->resultset('Mailboxes')->search({},
        group_by  => 'status',
        '+select' => [
            { count => 'id', -as => 'mailboxes' },
            { count => 'username', -as => 'customers', distinct  => 1 } ]

and get:

Failed to retrieve mailbox password email totals: 
DBIx::Class::ResultSet::all(): Malformed select argument - too many keys
 in hash: -as,count,distinct at myscript.pl line 469\n

After several attempts at Google, and reading the DBIx::Class::Manual, I finally stumbled on: SELECT COUNT(DISTINCT colname)

My query now looks like:

my @rows = $schema->resultset('Mailboxes')->search({},
        group_by  => 'status',
        '+select' => [
            { count => 'id', -as => 'mailboxes' },
            { count => { distinct => 'username' }, -as => 'customers' } ]

And provides the following results:

| status    | mailboxes | customers |
| active    |     92508 |     48791 |
| completed |       201 |       174 |
| inactive  |    116501 |     56843 |
| locked    |    129344 |     61220 |
| pending   |      1004 |       633 |

Exactly what I was after.

DBIx::Class does require some head-scratching at times, but looking at  the final statement it now seems obvious, and pretty much maps directly  to my original SQL!

Hopefully, this provides a lesson others can find  and learn from.

File Under: database / perl

Rendez-Vous 6

Posted on 17th March 2014

My 2014 QA Hackathon

Day One

I arrived the previous day, as did most of us, and we naturally talked about coding projects. Not necessarily about work at the hackathon, but discussion did come around to that too. I talked with Tux at one point, who convinced me that a stand-alone smoker client would be really useful. Once upon a time, we did have this, but with the advent of the more sophisticated smokers, and the move to the Metabase transport layer, the old script never got updated. The following morning Tux sent me a copy of the script he has, so at some point over the next few months I will take a look to see what I do to make it compatible with the modern smokers.

My intention was to release a distribution each day of the Hackthon. Unfortunately this was scuppered on the first day, when trying to add support for the full JSON report from CPAN Testers, when I realised I don't store the full report in the database. In the future when we have MongoDB and replication set up, this will be a non-issue, but for the moment, I now need to store the full report. This now requires a change to the metabase database on the cpanstats server (as opposed to the Metabase server). Over the course of the hackthon I reviewed the changes needed, and updated a lot of the Generator code, as it was an ideal time to remove SQLite references too.

In looking into the code changes, Andreas and I again looked at the updated timestamp used by the various CPAN Testers sites to do statistical analysis, which was also causing us problems. In the Metabase, the CPAN Testers Report fact is the container for all the child facts, such as LegacyReport and TestSummary. When the facts are created by the tester, the 'creation' timestamp is used to reference the time on the tester's own server that the report was generated. This could be better stored as UTC, but that's a problem for another day. However, it does mean the timestamp could be different to the one on the Metabase server. When the Metabase server retrieves the report from the outside world, it updates the 'updated' timestamp across all facts and saves into the SimpleDB instance on the server. Except it doesn't. The 'updated' timestamp is always the same as the 'creation' timestamp. Andreas has been noting this for quite some time, and finally he convinced me, at which point we roped in David Golden to take a look. Reviewing the code, there is nothing wrong that we can see. The 'updated' timestamp should be updated with the current timestamp on the Metabase server, which should also cascade to each child fact. As such you would expect several reports to have a different 'creation' timestamp from that of the 'updated' timestamp, even if only different by a second. Sadly this is going to take more effort/time to debug, as David in particular is working several different aspects of QA here at the hackathon.

Towards the end of the day, I spoke with liz and Tobias (FROGGS) about how CPAN Testers might handle perl6 modules. Currently there is no client available, but there could be in the future. However, due to the way Perl6 modules are to be uploaded to CPAN it is possible that smokers may submit reports for perl6 only modules, as many ignore the path to the distribution. Right now, liz tells me that all perl6 modules are being release under the /perl6/ path inside the authors' directory. This makes things easier for CPAN Testers as we can initially ignore these test reports, as they will not be valid. However, in the longer term it will be interesting to have a CPAN Testers smoker client for Perl6. The main difference would be to record in the metadata that it's a perl6 only distribution, and we *should* be able to carry on as normal, submitting reports to the Metabase, etc. It may require some distributions to have a 'Did you mean the Perl 6 distribution?' link on the website, but for the most part I think we could handle this. It will require further work to define a CPAN Testers Perl 6 Fact, but it will be a nice addition to the family.

Day Two

The morning was spent visiting the Charteuse cellars, and enjoying a tasting session, before heading back to the hacking in the afternoon.

In the afternoon, I started to look at some of the statistics the CPAN Testers Statistic site generated. After some discussions with Neil Bowers, he was interested in the drop-off of report submissions when a distribution was released. I believed this to be fairly consistent, and found that it did indeed last roughly 8 days, with a tail off that could last for months or years. There was an initial blast of tests within the first few hours, thanks to Chris' and Andreas' smokers, but the rest of the more reliable smokers get submitted within those first 8 days. Neil has created some initial graphs, and I'm looking at ways to integrate those with the Reports site. How we display these will likely revolve around a specific selected version, as overlaying versions might be a bit too much ... we'll see.

It also led me to think about what time of day do testers submit reports. So, I'll be looking at creating some graphs to show submissions per month, per day of the week, and per hour of the day. Along with BooK, we discussed further metrics, although they look likely to be used within their CPAN Dashboard project, although some of the data can be provided by CPAN Testers APIs already, so little work need by me :)

Looking through aggregated data, as stored and indexed within the Statistics codebase, it was obvious some had were now incomplete. It seems some of the outages we had in the last few months, prevented the data storage files from being saved. As such, I started off a complete reindex. It meant the Statistics site was out of sync for the following day, but at least it meant we once had again had correct data to produce the graphs we wanted.

There was more work rewriting the Generator to store the report objects. Yves asked why I wasn't using Sereal sometime ago, when I posted about using Data::FlexSerializer, and at the time I didn't have a need to rework the code. However, seeing as I'm rewriting to store the perl object now, rather than just JSON, it does make sense to move to Sereal, so hopefully that will make Yves happy too ;)

Day Three

Continued work on the Generator to remove all SQLite references, and a few further clean ups. Also worked on adding the necessary support to allow perl6 reports to be ignored. At some point in the future we will accept perl6 reports, but following further discussion with Tobias, we'll handle this using the metadata in the report not on the path of the resource.

Salve interviewed me for a future post about CPAN Testers. It'll be interesting to see whether I made sense or not, but hopefully I managed to convey the usefulness and uniqueness of CPAN Testers to Perl and the community. It good opportunity to also thanked Salve for starting the QA Hackathons, as without them CPAN Testers may well have stalled several years ago. Like many other projects, if we had relied on email to handle all the discussions and move the project forward, it would have taken years to get the Metabase working and move away from the old email/NNTP mechanisms.

charsbar updated CPANTS with some altered metrics, and at the same time added selected CSS colours for BooK and Leon, so I asked too. I now have a shade of my own purple on my author page ;) Thanks charsbar.

As Wendy went to lunch, she made the mistake of asking whether we wanted anything. I asked for a Ferrari, but sadly they couldn't find one, so I got a Lambourgini instead. If you don't ask, you don't get .... vroom, vrooom, vroom :) I'll add a picture once I've sorted them out.

At some point during the afternoon, Ricardo told me one of his asks for the hackathon. He wanted to be able to ignore the NA reports in his No Pass RSS feeds. Mulling it over this seemed entirely sensible, and so I fixed it. Ricardo celebrated :)

During a discussion with Neil, he mentioned that Paul Johnson was creating a Devel::Cover service, that he wanted to run like a CPAN Testers service. The idea was to write a system, that could allow distributed testing with testers sending in reports, which could then be accumulated, based on the OS being tested. As the Metabase is already able to handle different buckets, adding another bucket for coverage reports simplifies some of the work. The distributed client can then be moduled on the CPAN Testers means of report contruction, creating a new coverage report fact and use the same transport mechanism to submit to the Metabase. A web service can then poll the Metabase for the new bucket, and create report pages in exactly the same way as CPAN Testers. It'll be interesting to see whether we can use the same (or similar) code to provide this.

Day Four

The morning threw us a curve-ball, as the building wouldn't open up. It was a Sunday and apparently no-one works on a Sunday. Thankfully a few phonecalls to the right people got us in, just in time for lunch. In the meantime as we all were staying in the same hotel, we took over the bar, and borrowed a conference for the morning.

The poor wifi connection, gave us a good opportunity to have further discussions. Neil gathered together several interested parties to discuss author emails. Both PAUSE and CPAN Testers send emails to authors, and there is a plan to send authors a yearly email to advertise improvements to their modules, and let them know about sites and tools that they might not be aware of. However, although many emails get through without a problem, several fail to reach their intended recipient. Typically this is because authors have changed their email address but failed to update the email stored within the PAUSE system. CPAN Testers highlights some of these Missing In Action authors, but it would be better to have an automated system. Also, as Ricardo noted, the envelope of an email is left unchanged when is sent to the develooper network, so bouncebacks come back to the original sender containing the authors' potenmtially secret email address. It would be much better to have a service that monitors bouncebacks, but change the envelope to return to the handling network and can send an appropriate email to the sender. It could then provide an API to enable PAUSE and CPAN Testers, and any future system, to know whether compiling an email was worth the effort. For CPAN Testers there can be a great deal of analysis to prepare the summary emails, so knowing in advance an author email is not going to get through would be very beneficial. Neil is going to write up the ideas, so we can more formally design a system that will work all of PAUSE related systems. CPAN Testers already has the Preferences site to allow authors to manage their summary emails, and also turn off receiving any emails, and it may be worth extending this to PAUSE or other system to provide a subscription handling system.

The rest of the day was mostly spent monitoring the metabase table in the cpanstats database, as the new 'fact' column was added. The new field will store the reports from the parent in Sereal. I was a bit worried about locking the table all day, but no-one seemed to notice. While this was happening, I started back on the original new module I started on the first day of the conference,and had hoped to release. However, it highlighted further problems with the way reports are stored. I'm not sure what is doing it, but the underlying fact.content field in JSON was being stored as a string. In most cases this isn't a problem, however for this module it caused problems trying to encode/decode the JSON. After fixing the Generator code, it means the new module still didn't get finished. Well at least I have something to start my neocpanism.once-a-week.info stint with :)

Wrap Up

I now have several pieces of work to continue with, some for a few months to come, but these 4 days have been extremely productive. Despite playing with the CPAN Testers databases rather than writing code, the discussions have been invaluable. Plus it's always great to catch up with everyone.

This year's QA Hackthon was great, and it wouldn't have been possible without BooK and Laurent organising it, Wendy keeping us eating healthily (and in good supply of proper English tea ... I'll try and remember to bring the PG Tips next time), Booking.com for supplying the venue and all the other sponsors for helping to make the QA Hackathon the great success it was. In no particular order, thanks to Booking.com, SPLIO, Grant Street Group, DYN, Campus Explorer, EVOZON, elasticsearch, Eligo, Mongueurs de Perl, WenZPerl for the Perl6 Community, PROCURA, Made In Love and The Perl Foundation.

Looking forward to 2015 QA Hackathon in Berlin.

File Under: hackathon / perl / qa

History Of Modern (part I)

Posted on 23rd February 2014

Neil Bowers recently unleashed CPAN::ReleaseHistory on the world. Internally the distribution uses the a BACKPAN Index, which records every release to CPAN. I was already interested in this kind of representation, as I wanted to add a similar metric on each Author page of the CPAN Testers Reports website, but hadn't got around to it. Neil then posted about the script included in the distribution, cpan-release-counts in an interesting post; What's your CPAN release history?.

After a quick download, I ran the following for myself:

barbie@kmfdm:~$ cpan-release-counts --char = --width 30 --user barbie
 2003 ( 12) ==
 2004 ( 26) =====
 2005 ( 80) ===============
 2006 (  6) =
 2007 ( 59) ===========
 2008 ( 62) ===========
 2009 (122) =======================
 2010 (148) ============================
 2011 ( 89) =================
 2012 (156) ==============================
 2013 (123) =======================
 2014 ( 11) ==

So my most prolific year was in 2012. I'll have to see if I can change that this year. However, it does give a nice yearly snapshot of my releases.

As it turns out, for CPAN Testers I don't need the BACKPAN index, as I already generate and maintain an 'uploads' table within the 'cpanstats' database. I do need to write the code to add this metric to the Author pages. Thanks to Neil's script though, he has given me a starting point. Being able to see the releases for yourself (or a particular Author) is quite cool, so I may adapt that to make any such matrix more dynamic. It might also be worth adding a more generic metric for all of CPAN to the CPAN Testers Statistics website. Either way, I now have two more things to add to my list of projects for the QA Hackathon next month. Neil will be there too, so I hope he can give me even more ideas, while I'm there ;)

File Under: hackathon / opensource / perl

Page 2 >>

Some Rights Reserved Unless otherwise expressly stated, all original material of whatever nature created by Barbie and included in the Memories Of A Roadie website and any related pages, including the website's archives, is licensed under a Creative Commons by Attribution Non-Commercial License. If you wish to use material for commercial puposes, please contact me for further assistance regarding commercial licensing.