Rendez-Vous 6

Posted on 17th March 2014

My 2014 QA Hackathon

Day One

I arrived the previous day, as did most of us, and we naturally talked about coding projects. Not necessarily about work at the hackathon, but discussion did come around to that too. I talked with Tux at one point, who convinced me that a stand-alone smoker client would be really useful. Once upon a time, we did have this, but with the advent of the more sophisticated smokers, and the move to the Metabase transport layer, the old script never got updated. The following morning Tux sent me a copy of the script he has, so at some point over the next few months I will take a look to see what I do to make it compatible with the modern smokers.

My intention was to release a distribution each day of the Hackthon. Unfortunately this was scuppered on the first day, when trying to add support for the full JSON report from CPAN Testers, when I realised I don't store the full report in the database. In the future when we have MongoDB and replication set up, this will be a non-issue, but for the moment, I now need to store the full report. This now requires a change to the metabase database on the cpanstats server (as opposed to the Metabase server). Over the course of the hackthon I reviewed the changes needed, and updated a lot of the Generator code, as it was an ideal time to remove SQLite references too.

In looking into the code changes, Andreas and I again looked at the updated timestamp used by the various CPAN Testers sites to do statistical analysis, which was also causing us problems. In the Metabase, the CPAN Testers Report fact is the container for all the child facts, such as LegacyReport and TestSummary. When the facts are created by the tester, the 'creation' timestamp is used to reference the time on the tester's own server that the report was generated. This could be better stored as UTC, but that's a problem for another day. However, it does mean the timestamp could be different to the one on the Metabase server. When the Metabase server retrieves the report from the outside world, it updates the 'updated' timestamp across all facts and saves into the SimpleDB instance on the server. Except it doesn't. The 'updated' timestamp is always the same as the 'creation' timestamp. Andreas has been noting this for quite some time, and finally he convinced me, at which point we roped in David Golden to take a look. Reviewing the code, there is nothing wrong that we can see. The 'updated' timestamp should be updated with the current timestamp on the Metabase server, which should also cascade to each child fact. As such you would expect several reports to have a different 'creation' timestamp from that of the 'updated' timestamp, even if only different by a second. Sadly this is going to take more effort/time to debug, as David in particular is working several different aspects of QA here at the hackathon.

Towards the end of the day, I spoke with liz and Tobias (FROGGS) about how CPAN Testers might handle perl6 modules. Currently there is no client available, but there could be in the future. However, due to the way Perl6 modules are to be uploaded to CPAN it is possible that smokers may submit reports for perl6 only modules, as many ignore the path to the distribution. Right now, liz tells me that all perl6 modules are being release under the /perl6/ path inside the authors' directory. This makes things easier for CPAN Testers as we can initially ignore these test reports, as they will not be valid. However, in the longer term it will be interesting to have a CPAN Testers smoker client for Perl6. The main difference would be to record in the metadata that it's a perl6 only distribution, and we *should* be able to carry on as normal, submitting reports to the Metabase, etc. It may require some distributions to have a 'Did you mean the Perl 6 distribution?' link on the website, but for the most part I think we could handle this. It will require further work to define a CPAN Testers Perl 6 Fact, but it will be a nice addition to the family.

Day Two

The morning was spent visiting the Charteuse cellars, and enjoying a tasting session, before heading back to the hacking in the afternoon.

In the afternoon, I started to look at some of the statistics the CPAN Testers Statistic site generated. After some discussions with Neil Bowers, he was interested in the drop-off of report submissions when a distribution was released. I believed this to be fairly consistent, and found that it did indeed last roughly 8 days, with a tail off that could last for months or years. There was an initial blast of tests within the first few hours, thanks to Chris' and Andreas' smokers, but the rest of the more reliable smokers get submitted within those first 8 days. Neil has created some initial graphs, and I'm looking at ways to integrate those with the Reports site. How we display these will likely revolve around a specific selected version, as overlaying versions might be a bit too much ... we'll see.

It also led me to think about what time of day do testers submit reports. So, I'll be looking at creating some graphs to show submissions per month, per day of the week, and per hour of the day. Along with BooK, we discussed further metrics, although they look likely to be used within their CPAN Dashboard project, although some of the data can be provided by CPAN Testers APIs already, so little work need by me :)

Looking through aggregated data, as stored and indexed within the Statistics codebase, it was obvious some had were now incomplete. It seems some of the outages we had in the last few months, prevented the data storage files from being saved. As such, I started off a complete reindex. It meant the Statistics site was out of sync for the following day, but at least it meant we once had again had correct data to produce the graphs we wanted.

There was more work rewriting the Generator to store the report objects. Yves asked why I wasn't using Sereal sometime ago, when I posted about using Data::FlexSerializer, and at the time I didn't have a need to rework the code. However, seeing as I'm rewriting to store the perl object now, rather than just JSON, it does make sense to move to Sereal, so hopefully that will make Yves happy too ;)

Day Three

Continued work on the Generator to remove all SQLite references, and a few further clean ups. Also worked on adding the necessary support to allow perl6 reports to be ignored. At some point in the future we will accept perl6 reports, but following further discussion with Tobias, we'll handle this using the metadata in the report not on the path of the resource.

Salve interviewed me for a future post about CPAN Testers. It'll be interesting to see whether I made sense or not, but hopefully I managed to convey the usefulness and uniqueness of CPAN Testers to Perl and the community. It good opportunity to also thanked Salve for starting the QA Hackathons, as without them CPAN Testers may well have stalled several years ago. Like many other projects, if we had relied on email to handle all the discussions and move the project forward, it would have taken years to get the Metabase working and move away from the old email/NNTP mechanisms.

charsbar updated CPANTS with some altered metrics, and at the same time added selected CSS colours for BooK and Leon, so I asked too. I now have a shade of my own purple on my author page ;) Thanks charsbar.

As Wendy went to lunch, she made the mistake of asking whether we wanted anything. I asked for a Ferrari, but sadly they couldn't find one, so I got a Lambourgini instead. If you don't ask, you don't get .... vroom, vrooom, vroom :) I'll add a picture once I've sorted them out.

At some point during the afternoon, Ricardo told me one of his asks for the hackathon. He wanted to be able to ignore the NA reports in his No Pass RSS feeds. Mulling it over this seemed entirely sensible, and so I fixed it. Ricardo celebrated :)

During a discussion with Neil, he mentioned that Paul Johnson was creating a Devel::Cover service, that he wanted to run like a CPAN Testers service. The idea was to write a system, that could allow distributed testing with testers sending in reports, which could then be accumulated, based on the OS being tested. As the Metabase is already able to handle different buckets, adding another bucket for coverage reports simplifies some of the work. The distributed client can then be moduled on the CPAN Testers means of report contruction, creating a new coverage report fact and use the same transport mechanism to submit to the Metabase. A web service can then poll the Metabase for the new bucket, and create report pages in exactly the same way as CPAN Testers. It'll be interesting to see whether we can use the same (or similar) code to provide this.

Day Four

The morning threw us a curve-ball, as the building wouldn't open up. It was a Sunday and apparently no-one works on a Sunday. Thankfully a few phonecalls to the right people got us in, just in time for lunch. In the meantime as we all were staying in the same hotel, we took over the bar, and borrowed a conference for the morning.

The poor wifi connection, gave us a good opportunity to have further discussions. Neil gathered together several interested parties to discuss author emails. Both PAUSE and CPAN Testers send emails to authors, and there is a plan to send authors a yearly email to advertise improvements to their modules, and let them know about sites and tools that they might not be aware of. However, although many emails get through without a problem, several fail to reach their intended recipient. Typically this is because authors have changed their email address but failed to update the email stored within the PAUSE system. CPAN Testers highlights some of these Missing In Action authors, but it would be better to have an automated system. Also, as Ricardo noted, the envelope of an email is left unchanged when is sent to the develooper network, so bouncebacks come back to the original sender containing the authors' potenmtially secret email address. It would be much better to have a service that monitors bouncebacks, but change the envelope to return to the handling network and can send an appropriate email to the sender. It could then provide an API to enable PAUSE and CPAN Testers, and any future system, to know whether compiling an email was worth the effort. For CPAN Testers there can be a great deal of analysis to prepare the summary emails, so knowing in advance an author email is not going to get through would be very beneficial. Neil is going to write up the ideas, so we can more formally design a system that will work all of PAUSE related systems. CPAN Testers already has the Preferences site to allow authors to manage their summary emails, and also turn off receiving any emails, and it may be worth extending this to PAUSE or other system to provide a subscription handling system.

The rest of the day was mostly spent monitoring the metabase table in the cpanstats database, as the new 'fact' column was added. The new field will store the reports from the parent in Sereal. I was a bit worried about locking the table all day, but no-one seemed to notice. While this was happening, I started back on the original new module I started on the first day of the conference,and had hoped to release. However, it highlighted further problems with the way reports are stored. I'm not sure what is doing it, but the underlying fact.content field in JSON was being stored as a string. In most cases this isn't a problem, however for this module it caused problems trying to encode/decode the JSON. After fixing the Generator code, it means the new module still didn't get finished. Well at least I have something to start my neocpanism.once-a-week.info stint with :)

Wrap Up

I now have several pieces of work to continue with, some for a few months to come, but these 4 days have been extremely productive. Despite playing with the CPAN Testers databases rather than writing code, the discussions have been invaluable. Plus it's always great to catch up with everyone.

This year's QA Hackthon was great, and it wouldn't have been possible without BooK and Laurent organising it, Wendy keeping us eating healthily (and in good supply of proper English tea ... I'll try and remember to bring the PG Tips next time), Booking.com for supplying the venue and all the other sponsors for helping to make the QA Hackathon the great success it was. In no particular order, thanks to Booking.com, SPLIO, Grant Street Group, DYN, Campus Explorer, EVOZON, elasticsearch, Eligo, Mongueurs de Perl, WenZPerl for the Perl6 Community, PROCURA, Made In Love and The Perl Foundation.

Looking forward to 2015 QA Hackathon in Berlin.

Comments

No Comments


Add A Comment

Ignore this:
Your Name *
Subject *
Comment *
Link

Some Rights Reserved Unless otherwise expressly stated, all original material of whatever nature created by Barbie and included in the Memories Of A Roadie website and any related pages, including the website's archives, is licensed under a Creative Commons by Attribution Non-Commercial License. If you wish to use material for commercial puposes, please contact me for further assistance regarding commercial licensing.