The Time of the Turning

Posted on 7th May 2013

A few weeks ago I had the pleasure of attending the 6th annual QA Hackathon. The event has become THE event for developers of test modules, projects and toolchain applications to come together to discuss ideas and plan for the future, as well as release some great work while they are there too.

This year Shadowcat, the primary sponsors, took on the organisational duties. The event was originally to be in London, but due to personal circumstances the decision was made to move the location to Lancaster in the North West of England. Personally they made the right choice. The venue itself was the new InfoLab building at Lancaster University. The attendees came from far and wide once again, and it was great to catch-up with friends old and new, and even be introduced to some newer friends.

My plan for the weekend was mainly to look at CPAN Testers. With the servers for the Metabase coming soon, David Golden and myself had hoped to be able to set them up, and start looking at changing the backend code to work with the new Metabase database. Unfortunately, the servers weren't ready for us just yet, so I started looking at other things. For myself, one area of CPAN Testers, particularly the cpanstats database side of things, needed attention. Speed of processing reports.

My first task once settled in, was to look at the way that the reports are consumed from the Metabase. Due to the way SimpleDB has become very unreliable with the results it sends, in order to avoid missing reports the criteria for the date search has been altered slightly to be a little more thorough, and a smaller range is now used to retrieve a set of GUIDs. The results now appear to be a little more complete, although we still appear to be missing some every so often. There is also a tail of log.txt which also helps to catch up with the reports. This work saw a new release of CPAN-Testers-Data-Generator.

A big factor with the slowness of the CPAN Testers server is that it requires a lot of disk I/O, with the database updates being a key factor. The most intensive updates are surrounding the SQLite database that could be downloaded. This also includes creating the Gzip and Bzip2 archives. As only web crawlers seem to be downloading the files, I've suspended the update. This has now freed up a lot of resources and consequently some of the other tasks, particularly the builder has improved.

Next, the builder was the focus of my attention. Previously the builder has been building pages for both authors and distros all at once. Although the author pages are viewed slightly less, they were getting built more frequently, due to the way the requests are pushed into the queue for each report. Initially the logic for building pages was altered, which improved some of the higher requested pages, but the more optimal solution was to split the builder into two, one for authors and one for distros. With the reduction in processing elsewhere, this improved the builder performance considerably. Monitoring the way the author pages are built since the hackathon, has also allowed me to alter when the builder for authors runs. This has then allowed the builder for distros to take a higher priority. With more distro pages than authors, this now gives distro pages more opportunity to be built quicker. Currently reports are being built in less than 24 hours of being submitted. These updates saw a new release of CPAN-Testers-WWW-Reports.

Another release while at the event, related to the QA Hackathon itself, was the main QA Hackathon website. Before the event, BooK had asked if the files that make up the website that the main QA Hackathon uses could be added to GitHub. As such, I packaged up the site into a git repository and released it. If you wish to help contribute to the site, please do.

Although there was a lot of coding work involved in the weekend, one of the bigger uses of time was the Lancaster Consensus organised by David Golden. For a few hours each afternoon, a large group of key toolchain developers, secondary project developers and various interested parties, gathered to discussed various aspects associated with configuration, installation, testing and specification of Perl and CPAN. With so many developers in one room, it wasn't too surprising to have a few opposing views, but with a guiding hand from David, we did achieve a consensus. If you wish to read the outcome, please read David's write-ups of the discussion points. The Consensus meetings were perhaps the greatest achievement of the event. While there might not have been too much immediate coding output from them, the potential to improve Perl and CPAN is considerable. From a CPAN Testers perspective, Post-installation testing, Case insensitive package permissions and Rules for distribution naming were perhaps of most interest. Although it may be some time before Post-installation testing could be hooked into a CPAN Testers smoker, it will be a valuable addition to the testing reports against pre-installed environments.

During the event, I had several discussions with Garu regarding his work on the cpanminus smoker client, and the common smoker client. In the last minutes of the hackathon we were able to push through a very notable report submission. It is exactly this sort of collaborative effort that makes these hackathons worthwhile. I look forward to see everyone again in Lyon.

The QA Hackathons could not be the success they are with the support of all the sponsors. My personal thanks to them for helping to providing accommodation, food and a venue for us all to hack. A big thank you to cPanel, Dijkmat, Dyn, Eligo, Evozon, $foo, Shadowcat Systems Limited, Enlightened Perl Organisation and Mongueurs de Perl.

File Under: hackathon / perl / qa / testing
NO COMMENTS


Young Parisians

Posted on 10th April 2012

Did I mention I went to Paris to take part in the 2012 QA Hackathon? Did I remember to mention all the cool stuff I got done? Well if you've been hiding for the past few weeks, have a look at the last couple of posts :)

As per usual, while there I took my camera along. However, unlike many previous visits to Paris, I didn't do any sight-seeing. And that includes failing to wander around the venue we were in and discovering the real submarine among other things, that others found while taking a breath of fresh air.

Instead I spent my time hacking away, and only occasionly coming up for air for food, drink and some camera action.

With over 40 people in attendance, it was going to be difficult to capture everyone, but I think I managed it. If I did miss you, my apologies. It was great to meet so many friends old and new, and a real pleasure to finally put faces to names that I've known for a while, but not had the opportunity to meet in person.

So many great things happened in Paris, and I'm really looking forward to see what we can achieve in London for the 2013 QA Hackathon. See you there.

File Under: community / hackathon / opensource / paris / photography / qa / testing
NO COMMENTS


Parisienne Walkways

Posted on 3rd April 2012

And so to the final part of my notes from the 2012 QA Hackathon.

CPAN Testers Report Status

After asking several times, Andreas thought he finally understood what the dates mean on the Status page for the CPAN Testers Reports. He started watching and making page requests to see whether his requests were actioned. On Day 3 he pointed out that the date went backwards! Once he'd shown me, I understand now why the first date is confusing. And for anyone else who has been confused by it, you can blame Amazon. SimpleDB sucks. It's why the Metabase is moving to another NoSQL DB.

The date references the update date of the report as it entered the Metabase. The last processed report is the last report that was extracted from the Metabase and entered into the cpanstats DB. Unfortunately, SimpleDB has a broken concept of searching. It will return results before the date requested, and regularly return the sorted results in an unsorted order. As such the dates you see on the Status page may go backwards in time! I'm not going to try and fix this, as it will all work as intended with the new system.

Missing Reports

There have been several questions relating to missing reports over the past few years. Sometimes it just needs me to refresh the indices, but in other cases it may be due to the fact that SimpleDB omits reports from a request. Did I mention SimpleDB sucks? In a request to the Metabase, I will ask for all the reports from a given date. The results are limited to 2500, due to Amazon's own restriction. In the returned list it will often omit entries, due to its ignorance of sorting in the search request. I have gone through the Metabase code on several occasions and can verify it does the right thing. SimpleDB just chooses to ignore the complete search request and returns what it *thinks* you want to know.

Ribasushi questioned me about one of his modules that had been released recently, which still had no Cygwin reports listed, even though he sent a few himself. Further investigation revealed that they are indeed missing from the cpanstats DB. Although they did enter the Metabase, they never came out again.

To resolve this I have been revisiting the Generator code to rework the reparse and regenerate code to enable search requests for missing periods, in the hope that this will retrieve most of the missing results. If it doesn't, then I will be asking David to produce a definitive list for me, and I will make specific requests for any missing reports. The Generator code has been updated in GitHub to include all the performance improvements that have been in live for some time too.

Erronously Parsed Reports

Every so often the parsing mechanism fails and stores the wrong data within the cpanstats DB. These days it seems to only affect the platform, OS version and OS name. I'm not quite sure what is happening, as reparsing the report locally again produces the correct results. This uses the same routine to parse the report, so why they occasional fail remains a mystery. However, to combat this, I  now have a script that can run and search periodicly for this erroneous data and attempt to reparse the results. It can then alert me when it can't fix it and I can investigate manually. The have been occasions where the report can't be parsed due to the output being corrupted on the test machine, which unfortunately we can't always resolve. Sometimes there are enough clues within other parts of the report that point to a particular OS, but sometimes we just have to leave it blank.

It seems in putting some of this code live before leaving the hackthon, I accidentally reintroduced a bug. Slaven was quick to spot it and tell me about it, but unfortunately it was too late for me to fix it, as I needed to leave and catch my flight home. It should be fixed by the time you read this though, so all should be back to your regular viewing pleasure :) With the new script I've written, it should hopefully find and fix these errors in the future, as well as alerting me to fix the bug again!

Thanks Again

So that was the 2012 QA Hackathon. The show ended with a group photo, although a few were missing due to their early departures home, but I think we got most of us in. Including Miyagawa, who was taking the picture. The traditional thanks yous and good byes ensued and then Andreas and I headed off to begin our adventure getting the airport! The next hackathon, the 2013 QA Hackathon, will be in London. We'll have the domain pointed to the right place just as soon as Andy gets the website up and running. I look forward to a lot more involvement for next year, as we have been steadily growing in numbers each year. There has already been some significant output, but the event is much more than that. It's a chance to take to people face to face, discuss ideas and plan for the future. Expect more news for CPAN Testers soon.

Once again I would like to thank ShadowCat Systems for getting me here, and for being a great supporter of the QA Hackthons, as well as many other Perl events over the years. Thanks too to Laurent Boivin (elbeho), Philippe Bruhat (BooK) and the French Perl Mongers for making the 2012 QA Hackathon happen. The Hackathon wouldn't have happened without the generosity of corporations and the communities that donate funds. So thank you to ... The City of Science and Industry, Diabolo.com, Dijkmat, DuckDuckGo, Dyn, Freeside Internet Services, Hedera Technology, Jaguar Network, Mongueurs de Perl, Shadowcat Systems Limited, SPLIO, TECLIB", Weborama, and $foo Magazine. We also have several individuals to thank too, who all made personal contributions to the event, so many thanks to Martin Evans, Mark Keating, Prakash Kailasa, Neil Bowers, 加藤 敦 (Ktat), Karen Pauley, Chad Davis, Franck Cuny, 近藤嘉雪, Tomohiro Hosaka, Syohei Yoshida, 牧 大輔 (lestrrat), and Laurent Boivin

Meanwhile, Dan & Ethne would also like to thank Booking.com for their silly putty ;)

File Under: hackathon / opensource / paris / perl / qa / testing
NO COMMENTS


Party In Paris

Posted on 31st March 2012

I'm currently at the 2012 QA Hackathon working on CPAN Testers servers, sites, databases and code. It has already been very productive, and already I have two new module releases.

CPAN::Testers::WWW::Reports::Query::AJAX

This module was originally written in response to a question by Leo Lapworth about how the summary information is produced. As a consequence he wrote CPAN::Testers::WWW::Reports::Query::JSON, which takes the data from the stored JSON file. In most cases this data is sufficient, but the module requires parsing the JSON file which may be slow for distributions with a large number of reports. On the CPAN Testers Reports site, in the side panel on the distribution page, you will see the temperature graphs measuring the percentage of PASS, FAIL, NA and UNKNOWN reports a particular release has. This is glean from an AJAX call to the server.

But what if you don't want an HTML/Javascript styled response? What if you wanted the results in plain test or XML? Enter CPAN::Testers::WWW::Reports::Query::AJAX. Now you can use this to query the live data to for a particular distribution, and optionally a specific version, all the result values and get them pack as a simple hash to do with as you please.

I anticipate this might be most useful to project website who wish to display their latest results from CPAN Testers in some way. They can now get the data, and present it however they wish.

CPAN::Testers::WWW::Reports::Query::Reports

Now we get to perhaps the bigger module, even though its smaller than the one above. This module is perhaps most useful to all those who are trying to maintain a version of the cpanstats metadata from the SQLite database. As mentioned previously the SQLite database has been giving us grief over the past year, and we haven't gotten to the bottom of it. Andreas suspects there is some unusual textual data in some reports that is causing SQLite problems when it tries to store it. I'm not quite convinced by this, but as I'm only inserting records, I'm at a lost as to what else be the cause.

The SQLite file now clocks in at over 1GB compressed and over 8GB uncompressed, and is starting to take a notable amount of disk space (though considerably smaller than the 250GB+ Metabase database ;) ). It is also a significant bandwidth consumer each day, which can slow processing and page displays, as disk access is our limiting factor now.

Enter CPAN::Testers::WWW::Reports::Query::Reports. This module uses the same principles as the AJAX module above, but now accesses an new API on the CPAN Testers Reports site to enable consumers to get either a specific record or a whole range of report metadata records. Currently the maximum number of records that can be return in a single request is 2500, but this may be increased once the system has been proven to work well. Typically we have around 30,000 reports submitted each day, so to allow consumers to make best use of this API, I will look to increasing the limit to maybe 50,000 or 100,000. I want to impose a limit as I don't want accidental requests being sent to consume the full database in one go, as again this would put a strain on disk access.

The aim of the module is to allow those that currently consume the SQLite database, to more regularly request smaller updates and store the results in any database they so choose. Even into a NoSQL style database. It will ultimately reduce the bandwidth, data stored and processing to gzip and bzip2, which then means we can reallocate effort to more useful tasks.

If you currently consume the SQLite database, please take a look at this module and see how you can use it. I plan to include some example scripts that could be drop-in replacements for your current processes, but if you get there first, please feel free to submit them to me too, and I will include them with full credit. If you spot any issues or improvements, please also let me know.

CPAN Testers Platform Metabase Facts

This morning we had a CPAN Testers presentation and discussion hosted by David Golden. As there is plenty of interest from a variety of parties about CPAN Testers, it was a good opportunity to highlight an area that needs work, but which David and myself, as well as other key developers in the CPAN Tester community, just don't have time to do. Breno de Oliveira (garu or IRC) has very kindly stepped forward to look at one particular task, which we have been wanting to write since the QA Hackathon in Birmingham, back in 2009!

Breno has written a CPAN Testers client for cpanminus. At the moment its a stand-alone application, but it may well be included within cpanminus in the future. As part of writing the application, Breno asked David and myself about how the clients for CPAN::Reporter and CPANPLUS::YACSmoke create the report. Due to the legacy system we came from (email and NNTP) we still use an email style presentation of the reports. However, it has always been our intention to produce structured data. A CPAN Testers Report currently has only two facts that are required, a Legacy Report and a Test Summary. However there are other facts that we have already scoped, except they are just not implemented.

Back last year the Birmingham Perl Mongers produced the CPAN::Testers::Fact::PlatformInfo fact, that consumes the data from Devel::Platform::Info (which we'd written the previous year). The problem with the way test reports are currently created, is that we don't always know the definite platform information for the platform the test suite was run on. Reports, particularly in the Perl Config section, can lie. Not big lies necessarily, but enough that it can disguise why a particular OS may have problems with a particular distribution.

Breno is now looking to produce a module that firstly abstracts all the metadata creation parts from CPAN::Reporter, CPANPLUS::YACsmoke, Test::Reporter as well as his own new application, and puts them into a single library that can then create all the appropriate facts before submitting the report to the metabase. Hopefully he can get this done during the Hackathon, but even if he doesn't, we're hopful that he will get enough done to make it easy to complete soon after. Once we then patch the respective clients to use the new library, we will then start to be able to do interesting things with how we present reports.

The CPAN Testers Reports site only displays the legacy style report, which for most is sufficient, but it really would be nice to have some specially styled presentations for particular sections, or even allow user preferences to show/hide sections automatically when a user reads a report.

CPAN Testers Admin site

This is a site that I have been working on, on and off, for about 4 years, before we even had a Metabase. As a consequence it has been promised at various points and I've always failed to deliver. Now I have release the modules above, and there have been several comments already about having such functionality, I think I need to put some focus on it again. I have shown Breno the site running on my laptop and he has given me some more ideas to make it even more useful. It'll still be awhile before its released, but this will likely be down to running with some beta testers first before a major launch, just so it doesn't break the eco-system too badly!

Essentially the site was written to help authors and testers to highlight dubious reports and have them deleted from the system. Although the reports won't actually be deleted, they will be marked to ignore, so that they can be removed from JSON files and summary requests, as well as on the CPAN Testers Report site. This will hopefully enable us to get more accurate data, and bogus reports about running out of memory or disk space can be disregarded.

However, following Breno suggestions, I will look to making the site more public, so that authors can more easily see the reporting patterns without having to log in. The log in aspect will still be needed to flag reports, but the alternate browsing of reports by testers will be much more accessible.

Thanks

I would like to thank a few people who have helped to get me here, and have enabled these QA projects, not just CPAN Testers, to advance further.

Firstly I would like to single out ShadowCat Systems, who have very kindly paid for my flight here. Thanks to BooK and Laurent for organising the event, and to all the sponsors and Perl community who have provided the funding for the venue, accommodation and food for the event. It has already been very much appreciated, and hopefully the significant submissions to GitHub and PAUSE are evidence of just how worthwhile this event is.

Thanks also to all those who are here, and are helping out in all shapes and forms to help Perl QA be even better than it already is.

File Under: community / hackathon / opensource / paris / qa / testing
NO COMMENTS


Rearviewmirror

Posted on 19th August 2011

Earlier this week I attended YAPC::Europe 2011. Many thanks to Andrew, Alex and all the others involved with bringing the conference to life, it was well worth all the effort.

During the conference I gave two talks. The first was my main talk, Smoking The Onion - Tales of CPAN Testers, which looked at how authors can use the CPAN Testers websites to improve their distributions, as well some further hints and tips for common mistakes spotted by testers over the years. It also looked at how some of the sites can be used by users to see whether a particular distribution might be suitable for their purposes or not. The talk seemed to go down well, and it seems a few were disappointed to have missed it, after discovering it wasn't my usual update of what has been happening with CPAN Testers. Thankfully, I did video the talk, and I think the organisers also have a copy, so expect to see it on YAPC TV and Presenting Perl at some point in the future.

Photo by Jon Allen

Photo by Jon Allen

My second talk, Perl Jam - How To Organise A Conference (and live to tell the tale), was a lightning talk to help promote my book and the YAPC Conference Surveys. The book is currently a work in progress, and I'd like to get more feedback from anyone who has been an organiser of a YAPC, Workshop or Hackathon, as well as any photos that would help to highlight particular sections of the book. If you think you could help, please take a look at the GitHub repository and send a pull request with any updates you think appropriate.

Congratulations to Frankfurt.pm for winning the chance to host YAPC::Europe 2012. See you next year.

File Under: book / community / conference / opensource / perl / survey / testing / yapc
NO COMMENTS


Points of Authority

Posted on 27th May 2011

Back in February I did a presentation for the Birmingham Perl Mongers, regarding a chunk of code I had been using to test websites. The code was originally based on simple XHTML validation, using the DTD headers found on each page. I then expanded the code to include pattern matching so I could verify key phrases existed in the pages being tested. After the presentation I received several hints and suggestions, which I've now implemented and have set up a GitHub repository.

Since the talk, I have now started to add some WAI compliance testing. I got frustrated with finding online sites that claimed to be able to validate full websites, but either didn't or charged for the service. There are some downloadable applications, but most require you to have Microsoft Windows installed or again charge for the service. As I already had the bulk of the DTD validation code, it seemed a reasonable step to add the WAI compliance code. There is a considerable way to go before I get all the compliance tests that can be automated written into the distribution, but some of the more immediate tests are now there.

As mentioned in my presentation to Birmingham.pm, I still have not decided on a name. Part of the problem being that the front-end wrapper, Test::XHTML, is written using Test::Builder so you can use it within a standard Perl test suite, while the underlying package, Test::XHTML::Valid uses a rather different approach and does provides a wider API than just validating single pages against a DTD specification. Originally, I had considered these two packages should be two separate releases, but now that I've added the WAI test package, I plan to expose more of the functionality of Test::XHTML::Valid within Test::XHTML. If you have namespace suggestions, please let me know, as I'm not sure Test-XHTML is necessarily suitable.

Ultimately I'm hoping this distribution can provide a more complete validation utility for web developers, which will be free to use and will work cross-platform. For those familiar with the Perl test suite structure, they can use it as such, but as it already has a basic stand-alone script to perform the DTD validation checks, it should be usable from the command-line too.

If this sounds interesting to you, please feel free to fork the GitHub repo and try it out. If you have suggestions for fixes and more tests, you are very welcome to send me pull requests. I'd be most interested in anyone who has the time to add more WAI compliance tests and can provide a better reporting structure, particularly when testing complete websites.

File Under: modules / opensource / perl / technology / testing / usability / web
NO COMMENTS


Ultraviolet (Light My Way)

Posted on 28th February 2011

Last week I gave my first technical talk for several months. Despite being a bit rusty, everyone seemed to find the talk interesting. The talk itself was about code I'd written to test XHTML completeness of web pages and further pattern matching of page content. I've been using and developing the testing code over the last few years, having written the initial basic script, xhtml-valid, back in 2008. Over the last 18 months I have revisited the code and rewritten it into a traditional Perl testing structure. The talk looked at the current state of the code and asked for advice on where to take it next.

The code has developed into two packages, Test::XHTML and Test::XHTML::Valid, and as such the talk naturally fell into two parts, looking at each package in more depth. I had originally planned a demo, but unfortunately my laptop wouldn't talk to the projector, so had to rely on slides alone. This didn't seem to matter too much, as the slides conveyed enough of the API to give a decent flavour of what the packages were about.

The final questions I asked originally centred on where I was thinking of heading with the code base, but I also got asked a few questions regarding the technical aspects. My thanks to Colin Newell and Nick Morrott for giving me some ideas and pointers for further expansion of the code. As for my final questions, it was generally agreed that these should appear on CPAN in some form, and as two separate packages, but unfortunately nobody had a suitable name for either.

I plan to work further on the code, both to package them better and to include the suggestions from Colin and Nick, and then I'll see if anyone has some better suggestions for the names. In the meantime, the slides are now online [1] and the 2008 version 1.00 of the code base is also available [2]. I aim to have the current code base online soon, with a GitHub repo to provide ongoing developments for anyone who might be interested.

[1] http://birmingham.pm.org/talks/barbie/text-xhtml
[2] http://barbie.missbarbell.co.uk/page/code

File Under: opensource / perl / testing
NO COMMENTS


Some Heads Are Gonna Roll

Posted on 11th February 2011

Some time ago I wrote Test-YAML-Meta. At the time the name was given as a compliment to Test-YAML-Valid, which validates YAML files in terms of the formatting, rather than the data. Test-YAML-Meta took that a step further and validated the content data for META.yml files included with CPAN distributions against the evolving CPAN META Specification.

With the release of Parse-CPAN-Meta I wrote Test-CPAN-Meta, which dropped the sometimes complex dependency of the more verbose YAML parsers, for the one that was specifically aimed at CPAN META.yml files. With the emergence of JSON, there was a move to encourage authors to release META.json files too. Although considered a subset of the full YAML specification, JSON has a much better defined structure that has more complete parser support. Coinciding with this move was the desire by David Golden to properly define a specification for the CPAN Meta files. It was agreed that v2.0 of the CPAN Meta Specification should use JSON as the default implementation. As a consequence I then released Test-JSON-Meta.

Although the initial naming structure seemed the right the thing at the time, it is becoming clearer that really the names need to be revised. As such I looking to change two of the distributions to better fit the implementations. So in the coming weeks expect to see some updates. The name changes I'm planning are:

Underneath these current namespaces is the Version module that describes the data structures of the various specifications. In the short term these will also move, but will be replaced by a dependency on the main CPAN-Meta distribution in the future. There will be final releases for Test-YAML-Meta and Test-JSON-Meta, which will act as a wrapper distribution to re-point the respective distributions to their new identities.

File Under: modules / perl / qa / testing
NO COMMENTS


Some Rights Reserved Unless otherwise expressly stated, all original material of whatever nature created by Barbie and included in the Memories Of A Roadie website and any related pages, including the website's archives, is licensed under a Creative Commons by Attribution Non-Commercial License. If you wish to use material for commercial puposes, please contact me for further assistance regarding commercial licensing.