Time Waits For No One

Posted on 10th May 2014

When I relaunched the CPAN Testers sites back in 2008, I was in a position to be responsible for 3 servers, the CPAN Testers server, the Birmingham Perl Mongers server, and my own server. While managing them wasn't too bad, I did think it would be useful having some sort of monitoring system that could help me keep an eye on them. After talking to a few people, the two key systems most keenly suggested were Nagios and Munin. Most seemed to favour Munin, so I gave it a go. Sure enough it was pretty easy to set up, and I was able to monitor the servers, using my home server to monitor them. However, there was one area of monitoring that wasn't covered. The performance of the websites.

At the time I had around 10-20 sites up and running, and the default plugins didn't provide the sort of monitoring I was looking for. After some searching I found a script written by Nicolas Mendoza. The script not only got me started, but helped to make clear how easy it was to write a Munin plugin. However, the script as was, didn't suit my needs exactly, so had to make several tweaks. I then found myself copying the file around for each website, which seem a bit unnecessary. So I wrote what was to become Munin::Plugin::ApacheRequest. Following the Hubris and DRY principles copying the script around just didn't make sense, and being able to upgrade via a Perl Module on each server, was far easier than updating the 30+ scripts for the sites I now manage.

Although the module still contains the original intention of the script, how it does it has changed. The magic still happens in the script itself.

To start with an example, this is the current script to monitor the CPAN Testers Reports site:

#!/usr/bin/perl -w
use Munin::Plugin::ApacheRequest;
my ($VHOST) = ($0 =~ /_([^_]+)$/);

Part of the magic is in the name of the script. This one is 'apache_request_reports'. The script extracts the last section of the name, in this case 'reports', and passes that to Run() as the name of the virtual host. If you wish to name the scripts slightly differently, you only need to amend this line to extract the name of your virtual host as appropriate. If you only have one website you may wish to name the host explicity, but then if you create more it does mean you will need to edit each file, which is what I wanted to avoid. All I do now is copy an existing file to one to represent the new virtual host when I create a new website, and Munin automatically adds it to the list.

Munin::Plugin::ApacheRequest does make some assumptions, one of which is where you locate the log files, and how you name them for each virtual host. On my servers '/var/www/' contains all the virtual hosts (/var/www/reports, in this example), and '/var/www/logs/' contains the logs. I also use a conventional naming system for the logs, so '/var/www/logs/reports-access.log' is the Access Log for the CPAN Testers Reports site. Should you have a different path or naming format for your logs, you can alter the internal variable $ACCESS_LOG_PATTERN to the format you wish. Note that this is a sprintf format, and the first '%s' in the format string is replaced by the virtual host name. If you only have one website, you can change the format string to the specific path and file of the log, and no string interpolation is done.

The log format used is quite significant, and when you describe the LogFormat for your Access Log in the Apache config file, you will need to use an extended format type. The field to show the time taken to execute a request is needed, which is normally set using the %T (seconds) or %D (microseconds) format option (see also Apache Log Formats). For example my logs use the following:

LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\" %T %v"

The second to last field is our time field. In Munin::Plugin::ApacheRequest, this is stored in the $TIME_FIELD_INDEX variable. By default this is -2, assuming a similar log format as above. If you have a different format, where the execution time is in another position, like $ACCESS_LOG_PATTERN, you can change this in your script before calling Run(). A positive number assumes a column left to right, while a negative number assumes a column right to left.

The last number passed to the Run() method, determines the number of lines read for the access log to describe the average execution time. For high hit rate sites, you may wish this to be a higher number, but as most of my sites are not that frequently visited, 1000 seems to be a reasonable number.

The config statements that are generated for the Munin master monitor are currently hardcoded with values. This will change in a future version. For the example above the config produced reads as:

graph_title reports ave msecs last 1000 requests
graph_args --base 1000
graph_scale no
graph_vlabel Average request time (msec)
graph_category Apache
graph_info This graph shows average request times for the last 1000 requests
images.warning 30000000
images.critical 60000000
total.warning 10000000
total.critical 60000000

The highlighted values are interpolated from the arguments passed to Run(). In a future version I want to be able to allow you to reconfigure the warning and critical values and the graph base value, should you wish to.

I have now been using Munin::Plugin::ApacheRequest and the associated scripts for 6 years now, and it has proved very successful. I have thought about releasing the module to CPAN previously, and have made several attempts to contact Nicolas over the years, but have never had a reply. I know he was working for Opera when he released his script, but have no idea of his whereabouts now. As the script contained no licensing information, I was also unsure what licensing he had intended the code to be. I hope he doesn't mind me having adapted his original script, that I'm now releasing the code under the Artistic License v2.

Although I haven't been able to contact Nicolas, I would like to thank him for releasing his original script. If I hadn't have found it, it is unlikely I would have found a way to write a Munin plugin myself to do Apache website monitoring. With his headstart, I discovered how to write Munic plugins, and can now set up monitor of new websites within a few seconds. Thanks Nicolas.

File Under: opensource / perl / website

Grand Designs

Posted on 31st December 2013

Over the last year I've made several releases for Labyrinth and its various plugins. Some have been minor improvements, while others have been major improvements as I've reviewed the code for various projects. I originally wrote Labyrinth after being made redundant back in December 2002, and after realising all mistakes I made with the design of its predecessor, Mephisto. In the last 11 years has helped me secure jobs, enabled me to implement numerous OpenSource projects (CPAN Testers and YAPC Conference Surveys to name just two) and provided the foundation to create several websites for friends and family. It has been a great project to work on, as I've learnt alot about Perl, AJAX/JSON, Payment APIs, Security, Selenium and many other aspects of web development.

I did a talk about Labyrinth in Frankfurt for YAPC::Europe 2011, and one question I was asked, was about comparing Labyrinth to Catalyst. When I created Labyrinth, Catalyst and its predecessor Maypole, were 2 years (and 1 year) away from release. Back then I no idea about an MVC, but I was pleased that in later years when I was introduced to the design concept, that it had seemed an obvious and natural way to design a web framework. Aside from this and both being written in Perl, Labyrinth and Catalyst are very different beasts. If you're looking for a web framework to design a mojor system for your company, then Catalyst is perhaps the better choice. Catalyst also has a much bigger community, whereas Labyrinth is essentially just me. I'd love for Labyrinth to get more usage and exposure, but for the time being, I'm quite comfortable with it being the quiet machine behind CPAN Testers, YAPC Surveys, and all the other commercial and non-commercial sites I've worked on over the years.

This year I finally released the code to enable Labyrinth to run under PSGI and Plack. It was much easier than I thought, and enabled me to better understand the concepts behind the PSGI protocol. There are several other concepts in web development that are emerging, and I'm hoping to allow Labyrinth to teach me some of them. However, I suspect most of my major work with Labyrinth in 2014 is going to be centred on some of the projects I'm currently involved with.

The first is the CPAN Testers Admin site. This has been a long time coming, and is very close to release. There are some backend fixes that are still needed to join the different sites together, but the site itself is mostly done. It still needs testing, but it'll be another Labyrinth site to join the other 4 in the CPAN Testers family. The site has taken a long time to develop, not least because of various other changes to CPAN Testers that have happened over the few years, and the focus on getting the reports online sooner rather than later.

The next major Labyrinth project I plan to work on during 2014, is the YAPC Conference Surveys. Firstly to release the current code base and language packs, to enable others to develop their own survey sites, as that has been long over due. Secondly, I want to integrate the YAPC surveys into the Act software tool, so that promoting surveys for YAPCs and Perl Workshops will be much easier, and we won't have to rely on people remembering their keycode login. Many people have told me after various events that they never received the email to login to the surveys. Some have later been found in spam folders, but some have changed their email address and the one stored in Act is no longer valid. Allowing Act to request survey links will enable attendees to simply log into the conference site and click a link. Further to this, if the conference has surveys enabled, then I'd like the Act site to be able to provide links next to each talk, so that talk evaluations can be donme much more easily.

Lastly, I finally want to get all the raw data online as possible. I still have the archives of all the surveys that have been undertaken, and some time ago I wrote a script to create a data file, combining both the survey questions and the responses, appropriately anonymised, with related questions linked, so that others can evaluate the results and provide even more statistical analysis than I currently provide.

In the meantime the next notable release from Labyrinth will be a redesign of the permissions system. From the very beginning Labyrinth had a permissions system, which for many of the websites was adequate. However, the original Mephisto project encompassed a permissions system for the tools it used, which for Labyrinth were redesigned as plugins. Currently a user has a level of permission; Reader, Editor, Publisher, Admin and Master. Each level grants more access than the previous one as you might expect. Users can also be assigned to groups, which also have permissions. It is quite simplistic, but as most of the sites I've developed only have a few users, granting these permissions across the whole site has been perfectly acceptable.

However, with a project I'm currently working on this isn't enough. Each plugin, and its level of functionality (View, Edit, Delete), need different permissions for different users and/or groups. The permissions system employed by Mephisto came close, but they aren't suitable for the current project. A brainwave over Christmas saw a better way to do this, and not just to implement for the current project, but to improve and simplify the current permission system, and enable to plugins to set their permissions in data or configuration rather than code, which is a key part of the design of Labyrinth.

This ability to control via data is a key element of how Labyrinth was designed, and it isn't just about your data model. In Catalyst and other web frameworks, the dispatch table is hardcoded. At the time we designed Mephisto, CGI::Application was the most prominent web framework, and this hardcoding was something that just seemed wrong. If you need to change the route through your request at short notice, you shouldn't have to recode your application and make another release. With Labyrinth switching templates, actions and code paths is done via configuration files. Changing can be dne in seconds. Admittedly it isn't something I've needed to do very often, but it has been necessary from time to time, such as disabling functionality due to broken 3rd party APIs, or switching templates for different promotions.

The permission system needs to be exactly the same. A set of permissions for one site may be entirely different for another. Taking this further, the brainwave encompassed the idea of profiles. Similar to groups, a profile can establish a set of generic permissions. Specific permissions can then be adjusted as required, and reset via a profile on a per user or per group basis. This then allows the site permissions to be tailored for a specific user. This then allows UserA and UserB to have generic Reader access, but for UserA to have Editor access to TaskA and UserB to be granted Editor access to TaskB. Previously the permission system would have meant both users be granted Editor access for the whole site. Now, or at least when the system is finished, a user's permissions can be set so they can be restricted to only the tasks they need access to.

Over Christmas there have been a few other fixes and enhancements to various Labyrinth sites, so expect to see those to also find their way back into the core code and plugins. I expect several Labyrinth related releases this year, and hopefully a few more talks at YAPCs, Workshops and technical events in the coming year about them all. Labyrinth has been a fun project to work on, and long may it continue.

File Under: labyrinth / opensource / website

To Wish Impossible Things

Posted on 4th May 2013

The QA Hackathon website has had a bit of an update today. Primarily a new page and new photos have been added, but plenty of other updates have been included too.

The new page is a review page, to collect various blog and news posts relating to each year's event. Originally I listed all the reviews from previous years in the side panel, but now that we've just had the 6th annual event, the list was looking a little bit too cramped.

With the extra space, I've also been able to include the group shots that were taken at some of the events. Unfortunately there was no group shot taken in Birmingham, and I've not seen any during the 2010 and 2011 events, so if there are any, please let me know. Also if there is one of the Tokyo Satellite event this year I would love to include it on the site.

I've added some write-ups to the last few events in the About page. The biggest change though is likely only visible to those with screen readers, as I've made many changes to links and images to provide more accessibility. Several fixes to layout, spelling and wording have also been included too.

The site, particularly the list of reviews, is still incomplete. If a blog entry is missing that you think should be there, or you spot other items that could do with an update, feel free to email me with details, or fork the repo on GitHub and send me a pull request.

File Under: hackathon / perl / qa / website

Lost In The Echo

Posted on 26th August 2012

I've just released new versions of my use.perl distributions, WWW-UsePerl-Journal and WWW-UsePerl-Journal-Thread. As use.perl became decommisioned at the end of 2010, the distrubutions had been getting a lot of failure reports, as they used screen-scraping to get the content. As such, I had planned to put them out to pasture in BackPAN. That was until recently I discovered that Léon Brocard had not only released WWW-UsePerl-Server, but also provided a complete SQL archive of the use.perl database (see the POD for a link). Then combining the two, he put up a read-only version of the website.

While at YAPC::Europe this last week, I started tinkering, and fixing the URLs, regexes, logic and tests in my two distributions. Both distributions have had functionality removed, as the read-only site doesn't provide all the same features as the old dynamic site. The most obvious is that posting new journal entries is now disabled, but other lesser features not available are searching for comments based on thread id or users based on the user id. The majority of the main features are still there, and those that aren't I've used alternative methods to retrieve them where possible.

Although the distributions and modules are now working again, they're not perhaps as useful as they once were. As such, I will be looking to merge both distributions for a future release, and also providing support to a local database of the full archive from Léon.

Seeing as no-one else seems to have stepped forward and written similar modules for blogs.perl, I'm now thinking it might also be useful to take my use.perl modules and adapt them for blogs.perl. It might be a while before I finish them, but it'll be nice to have the ability to have many of the same features. I also note that blogs.perl.org also now has paging. Yeah \o/ :) This has been a feature that I have been wanting to see on the site since it started, so thanks to the guys for finding some tuits. There was a call at YAPC::Europe for people to help add even more functionality, so I look forward to seeing what delights we have in store next.

File Under: opensource / perl / website

Know Your Rights

Posted on 26th May 2011

The changes required as part of the EU Privacy and Electronic Communications Directive, which I discussed last week, come into effect today (26th May 2011). The Information Commissioner's Office (ICO) released a press release on their website stating that "Organisations and businesses that run websites aimed at UK consumers are being given 12 months to 'get their houses in order'." However, this statement only serves to confuse the issue more. Does this mean that individuals are not covered by the law (the directive implies they are) or does it mean that the leniency given to businesses does not apply to individuals, and thus the full weight of the law and fines will be imposed immediately. The press release also seems to imply that the new law only applies to businesses providing ecommerce websites, so does that mean other businesses and organisations are exempt?

Or, does it mean that those implementing the law and writing press releases are so eager to get something out, they have forgotten that their peace offering to (some?) businesses still leaves a gaping hole in their policy of adhering to the original directive.

And it gets worse. Reading an article on eWeek, George Thompson, information security director at KPMG, is quoted as saying "The new law inadvertently makes the collection of consent - yet another set of sensitive, customer data - compulsory. Companies need to tighten up their data management policies and make absolutely sure that every new data composition is covered." Which leads me to believe that you can now be fined if you don't ask the user to accept cookies, and can be fined if you don't record details of those who said they don't want cookies! Then I assume you can then be fined again if that data isn't securely stored away to adhere to the Data Protection Act.

Did no-one really sit down and think of the implications of all this?

The Register reports that only 2 countries within the EU have notified the Commision that all the rulings have been passed into law, with the other Member States possibly facing infringement proceedings. With such a weight of resistence, wouldn't it be more wise to review the directive properly so all Member States understand and agree to all the implications?

It's not all doom and gloom though. Another article by Brian Clifton on Measuring Success, looks at Google Analytics, and concludes that "Google Analytics uses 1st party cookies to anonymously and in aggregate report on visits to your website. This is very much at the opposite end of the spectrum to who this law is targeting. For Google Analytics users, complying with the ToS (and not using the other techniques described above), there is no great issue here - you already respect your visitors privacy...!" (also read Brian's car counting analogy in comment 3, as well as other comments). In fact Google's own site about Google Analytics supports Brian's conclusion too.

The BBC have posted on their BBC Internet Blog, explaining how they are going to be changing to comply with the law. To begin with they have updated their list of cookies used across all their services. Interestingly they list Google Analytics as 3rd-party cookies, even though they are not, but I think that comes from the misunderstanding many of us had about GA cookies.

Although the ICO website has tried to lead by example, with a form at the top of their pages requesting you accept cookies, this doesn't suit all websites. This method of capturing consent works fine for those generating dynamic websites from self controlled applications, such as ICO's own ASP.NET application, but what about static websites? What about off-the-shelf packages that haven't any support for this sort of requirement?

On the other side of the coin, the ICO themselves have discovered that a cookie used to maintain session state is required by their own application. Providing these are anonymous, the directive would seem to imply that these cookies are exempt, as being "strictly necessary" for the runing of the site. Then again, if they did contain identifying data, but the application wouldn't work without it, is that still "strictly necessary"? A first step for most website owners will be to audit their use of cookies, as the BBC have done, but I wonder how many will view them all as strictly necessary?

It generally means this is going to be an ongoing headache for quite sometime, with ever more questions than answers. As some have noted, it is going to take a legal test case before we truly know what is and isn't acceptable. Here's hoping it goes before a judge well versed with how the internet works, and that common sense prevails.

File Under: internet / law / life / website

Page 2 >>

Some Rights Reserved Unless otherwise expressly stated, all original material of whatever nature created by Barbie and included in the Memories Of A Roadie website and any related pages, including the website's archives, is licensed under a Creative Commons by Attribution Non-Commercial License. If you wish to use material for commercial puposes, please contact me for further assistance regarding commercial licensing.