Points of Authority

Posted on 27th May 2011

Back in February I did a presentation for the Birmingham Perl Mongers, regarding a chunk of code I had been using to test websites. The code was originally based on simple XHTML validation, using the DTD headers found on each page. I then expanded the code to include pattern matching so I could verify key phrases existed in the pages being tested. After the presentation I received several hints and suggestions, which I've now implemented and have set up a GitHub repository.

Since the talk, I have now started to add some WAI compliance testing. I got frustrated with finding online sites that claimed to be able to validate full websites, but either didn't or charged for the service. There are some downloadable applications, but most require you to have Microsoft Windows installed or again charge for the service. As I already had the bulk of the DTD validation code, it seemed a reasonable step to add the WAI compliance code. There is a considerable way to go before I get all the compliance tests that can be automated written into the distribution, but some of the more immediate tests are now there.

As mentioned in my presentation to Birmingham.pm, I still have not decided on a name. Part of the problem being that the front-end wrapper, Test::XHTML, is written using Test::Builder so you can use it within a standard Perl test suite, while the underlying package, Test::XHTML::Valid uses a rather different approach and does provides a wider API than just validating single pages against a DTD specification. Originally, I had considered these two packages should be two separate releases, but now that I've added the WAI test package, I plan to expose more of the functionality of Test::XHTML::Valid within Test::XHTML. If you have namespace suggestions, please let me know, as I'm not sure Test-XHTML is necessarily suitable.

Ultimately I'm hoping this distribution can provide a more complete validation utility for web developers, which will be free to use and will work cross-platform. For those familiar with the Perl test suite structure, they can use it as such, but as it already has a basic stand-alone script to perform the DTD validation checks, it should be usable from the command-line too.

If this sounds interesting to you, please feel free to fork the GitHub repo and try it out. If you have suggestions for fixes and more tests, you are very welcome to send me pull requests. I'd be most interested in anyone who has the time to add more WAI compliance tests and can provide a better reporting structure, particularly when testing complete websites.

File Under: modules / opensource / perl / technology / testing / usability / web

Know Your Rights

Posted on 26th May 2011

The changes required as part of the EU Privacy and Electronic Communications Directive, which I discussed last week, come into effect today (26th May 2011). The Information Commissioner's Office (ICO) released a press release on their website stating that "Organisations and businesses that run websites aimed at UK consumers are being given 12 months to 'get their houses in order'." However, this statement only serves to confuse the issue more. Does this mean that individuals are not covered by the law (the directive implies they are) or does it mean that the leniency given to businesses does not apply to individuals, and thus the full weight of the law and fines will be imposed immediately. The press release also seems to imply that the new law only applies to businesses providing ecommerce websites, so does that mean other businesses and organisations are exempt?

Or, does it mean that those implementing the law and writing press releases are so eager to get something out, they have forgotten that their peace offering to (some?) businesses still leaves a gaping hole in their policy of adhering to the original directive.

And it gets worse. Reading an article on eWeek, George Thompson, information security director at KPMG, is quoted as saying "The new law inadvertently makes the collection of consent - yet another set of sensitive, customer data - compulsory. Companies need to tighten up their data management policies and make absolutely sure that every new data composition is covered." Which leads me to believe that you can now be fined if you don't ask the user to accept cookies, and can be fined if you don't record details of those who said they don't want cookies! Then I assume you can then be fined again if that data isn't securely stored away to adhere to the Data Protection Act.

Did no-one really sit down and think of the implications of all this?

The Register reports that only 2 countries within the EU have notified the Commision that all the rulings have been passed into law, with the other Member States possibly facing infringement proceedings. With such a weight of resistence, wouldn't it be more wise to review the directive properly so all Member States understand and agree to all the implications?

It's not all doom and gloom though. Another article by Brian Clifton on Measuring Success, looks at Google Analytics, and concludes that "Google Analytics uses 1st party cookies to anonymously and in aggregate report on visits to your website. This is very much at the opposite end of the spectrum to who this law is targeting. For Google Analytics users, complying with the ToS (and not using the other techniques described above), there is no great issue here - you already respect your visitors privacy...!" (also read Brian's car counting analogy in comment 3, as well as other comments). In fact Google's own site about Google Analytics supports Brian's conclusion too.

The BBC have posted on their BBC Internet Blog, explaining how they are going to be changing to comply with the law. To begin with they have updated their list of cookies used across all their services. Interestingly they list Google Analytics as 3rd-party cookies, even though they are not, but I think that comes from the misunderstanding many of us had about GA cookies.

Although the ICO website has tried to lead by example, with a form at the top of their pages requesting you accept cookies, this doesn't suit all websites. This method of capturing consent works fine for those generating dynamic websites from self controlled applications, such as ICO's own ASP.NET application, but what about static websites? What about off-the-shelf packages that haven't any support for this sort of requirement?

On the other side of the coin, the ICO themselves have discovered that a cookie used to maintain session state is required by their own application. Providing these are anonymous, the directive would seem to imply that these cookies are exempt, as being "strictly necessary" for the runing of the site. Then again, if they did contain identifying data, but the application wouldn't work without it, is that still "strictly necessary"? A first step for most website owners will be to audit their use of cookies, as the BBC have done, but I wonder how many will view them all as strictly necessary?

It generally means this is going to be an ongoing headache for quite sometime, with ever more questions than answers. As some have noted, it is going to take a legal test case before we truly know what is and isn't acceptable. Here's hoping it goes before a judge well versed with how the internet works, and that common sense prevails.

File Under: internet / law / life / website

Behind The Lines

Posted on 25th May 2011

Back last year I got a curious email from a fellow London.pm'er asking why I was releasing so many WWW-Scraper-ISBN distributions. The reason was quite simple, to make my life easier! Well okay, that's why I wrote the distributions, but I figured others might find them useful too.

In the UK the book trade is a bit odd, and I dare say the rest of the world suffers from this too. The publishers don't like to give too much information away about their books, and the central body for allocating ISBNs, Nielsen, don't always have all the necessary metadata available. The book trade uses MARC Records to transfer this metadata around, and unfortunately, while there is provision to include much of the metadata, it often isn't included. The obvious things such as the Author, Title and the ISBN itself are usually there, but some of the data relating to the physical attributes (pages, height, width and weight) rarely are.

Originally I wrote the Amazon, Pearson Education, O'Reilly Media and Yahoo! Books distributions to use within Labyrinth, particularly for the Birmingham Perl Mongers website, and our Book Reviews. The plugin mechanism allowed me, when I received a review, to enter the ISBN and prepopulate the metadata fields and links before adding the review itself. The four distributions saved a lot of time, but the initial releases were quite basic.

Jumping forward several years, now needing this extra metadata, I first expanded the original four distributions. However, not all of these online bookstores provided this extra metadata. Picking a variety of books I searched to see what metadata I could retrieve, and came across several sites around the world that included this information to varying degrees. Much of the basic information regarding an ISBN shouldn't change from country to country, so metadata retrieved from Australia or New Zealand is as valid as that from America or the UK. There are aspects that can differ, such as the cover illustration, but the majority of metadata returned should be applicable regardless of location.

There was some interesting discrepancies with the different units of weights and measures used across the sites too. While some stuck to a set of fixed units, others changed depending how big the values were, particularly for grammes and kilogrammes. I settled on grammes for weight and millimetres for height and width, seeing as metric was the most commonly used on the various sites.

It did cross my mind whether to include the prices in the metadata returned, but as prices often fluctuate frequently and are very location dependent, you are probably better to write this side of things yourself for your specific purpose, such as a comparision website. I also left out depth, as only a few sites regularly provided a value for it. I can always save it for a future release anyway.

Hopefully those that work in the book trade, who have been wishing that MARC Records were populated a little more fully than they are currently, can make use of these distributions to help fill in the gaps.

File Under: book / isbn / opensource / perl

Time Flies By (When You're The Driver Of A Train)

Posted on 13th May 2011

In light of my recent posts, I received the following story from a friend.

Rules Is Rules...

Good news:
It was a normal day in Sharon Springs, Kansas, when a Union Pacific crew boarded a loaded coal train for the long trek to Salina.

Bad news:
Just a few miles into the trip a wheel bearing became overheated and melted, letting a metal support drop down and grind on the rail, creating white hot molten metal droppings spewing down to the rail.

Good news:
A very alert crew noticed smoke about halfway back in the train and immediately stopped the train in compliance with the rules

Bad news:
The train stopped with the hot wheel over a wooden bridge with creosote ties and trusses.

The crew tried to explain this to Union Pacific higher-ups but were instructed not to move the train!

They were informed that Rules prohibited moving the train when a part was found to be defective!


(Don't ever let common sense get in the way of a good Disaster!)

And just in case you thought that wasn't a true story, here it is with pictures!

File Under: humour / trains

The Sanity Assassin

Posted on 12th May 2011

An update to my recent post.

With thanks to a fellow Perler, Smylers informs me that a Flash Cookie refers to the cookie used by Flash content on a site, which saves state on the users machines, by-passing browsers preferences. Odd that the advice singles out this type of cookie by name though, and not the others.

In an article on the Wall Street Journal I found after posting my article, I found it interesting to discover that the ICO themselves use Google Analytics. So after 25th May, if you visit the ICO website and see no pop-up, I guess that means Google Analytics are good to go. Failing that they'll see a deluge of complaints that their own website fails to follow the EU directive.

I also recommend reading the StatCounter's response too. They also note the problem with the way hosting locations are (not) covered by the directive, and the fact that the protection from behavioural advertising has got lost along the way.

After a discussion about this at the Birmingham.pm Social meeting last night, we came to the considered opinion that this would likely just be a wait and see game. Until the ICO bring a test case to court, we really won't know how much impact this will have. Which brings us back to the motives for the directives. If you're going to take someone to court, only big business is worth fining. Bankrupting an individual or a small business (ICO now have powers to fine up to £500,000) is going to give the ICO, the government and the EU a lot of really negative press.

Having tackled the problem in the wrong way, those the directives sort to bring into line are only going to use other technologies to retrieve and store the data they want. It may even effect EU hoisting companies, if a sizeable portion of their market decide to register and host their websites in non-EU countries.

In the end the only losers will be EU businesses, and thus the EU economy. Did anyone seriously think these directives through?

File Under: government / law / security / technology / usability / web / website

The Planner's Dream Goes Wrong

Posted on 11th May 2011

On May 26th 2011, UK websites must adhere to a EU directive regarding cookies, that still hasn't been finalised. Other member states of the EU are also required to have laws in place that enforce the directive.

Within the web developer world this has caused a considerable amount of confusion and annoyance, for a variety of reasons, and has enabled media outlets to scaremonger the doom and gloom that could befall developers, businesses and users. It wouldn't be so bad if there was a clear piece of legislation that could be read, understood and followed, but there isn't. Even the original EU directives are vague in the presentation of their requirements.

If you have the time and/or inclination the documents to read are Article 2 of Directive 2009/136/EC (the Directive), which amends the E-Privacy Directive 2002/58/EC (the E-Privacy Directive), with both part of the EU Electronic Communications Framework (ECF).

Aside from the ludicrous situation of trying to enforce a law with no actual documentation to abide by (George Orwell would have a field day), and questioning why we are paying polictians for this shambolic situation, I have to question the motives behind the creation of this directive.

The basic Data Protection premise for tightening up the directive is a reasonable one, however the way it has been presented is potentially detremental to the way developers, businesses and users, particularly in the EU, are going to browse and use the internet. The directive needed tightening due to the way advertisers use cookies to track users as they browse the web and target adverts. There has been much to complain about in this regard, and far beyond the use of cookies with companies such as Phorm trying to track information at the server level too. However, the directive has ended up being too vague and covers too wide a perspective to tackle the problem effectively.

Others have already questioned whether it could push users to use non-EU websites to do their business because they get put off using EU based sites. Continually being asked whether you want to have information stored in a cookie every time you visit a website is going to get pretty tiresome pretty quickly. You see, if you do not consent to the use of cookies, that information cannot be saved in a cookie, and so when revisiting the site, the site doesn't know you said no, and will ask you all over again. For those happy to save simple preferences and settings stored in cookies, then you'll be asked once and never again. If you need an example of how bad it could get, Paul Carpenter took a sartirical look at a possible implementation.

On Monday 9th May 2011, the Information Commissioner's Office (ICO) issued an advice notice to UK businesses and organisation on how to comply with the new law. However even their own advice states the document "is a starting point for getting compliant rather than a definitive guide." They even invent cookie types that don't exist! Apparently "Flash Cookies" is a commonly used term, except in the web technology world there are just two types of cookie, Persistent Cookies and Session Cookies. They even reference the website AllAboutCookies, which makes no mention of "Flash Cookies". Still not convinced this is a complete shambolic mess?

The directives currently state that only cookies that are "strictly necessary" to the consumer are exempt from the ruling. In most cases shopping carts have been used as an example of cookie usage which would be exempt. However, it doesn't exempt all 1st party cookies (those that come from the originating domain), and especially targets 3rd party cookies (from other domains). The advice states "The exception would not apply, for example, just because you have decided that your website is more attractive if you remember users' preferences or if you decide to use a cookie to collect statistical information about the use of your website." Both of which have significant disruption potential for both websites and their visitors.

Many of the 1st party cookies I use are Session Cookies, which either store an encrypted key to keep you logged into the site, or store preferences to hide/show elements of the site. You could argue both are strictly necessary or not depending on your view. Of the 3rd party cookies, like many people these days, I use Google Analytics to study the use of my websites. Of particular interest to me is how people find the site, and the search words used that brough the visitor to the site. It could be argued that these are strictly necessary to help allow the site visitor find the site in the first place. Okay its a weak argument, but the point remains that people use these types of analysis to improve their sites and make the visitor experience more worthwhile.

Understandly many people have questioned the implications of using Google Analytics, and on one Google forum thread, the Google approved answer seems to imply that it will only mean websites make it clearer that they use Google Analtyics. However this is at odds with the ICO advice, which says that that isn't enough to comply with the law.

If the ruling had been more explicit about consent for the storing of personal data in cookies, such as a name or e-mail address, or the use of cookies to create a personal profile, such as with advertisier tracking cookies, it would have been much more reasonable and obvious what is permissible. Instead it feels like the politicians are using a wrecking ball to take out a few bricks, but then aiming at the wrong wall.

For a site like CPAN Testers Reports, it is quite likely that I will have to block anyone using the site, unless they explictly allow me to use cookies. The current plan is to redirect people to the static site, which will have Google Analytics switched off, and has no other cookies to require consent. It also doesn't have the full dynamic driven content of the main site. In Germany, which already has much stricter requirements for data protection, several personal bloggers have choosen to not use Google Analytics at all in case they are prosecuted. I'm undecided at the moment whether I will remove GA from my websites, but will watch with interest whether other bloggers use pop-ups or remove GA from their sites.

Perhaps the most frustrating aspect of the directives and the advice is that it discusses only website compliance. It doesn't acknowledge that the websites and services may be hosted on servers outside the EU, although the organisation or domain may have been registered within the EU. It also doesn't differentiate between commercial businesses, voluntary organisations or individuals. Personal bloggers are just as at risk to prosecution as multinational, multibillion [currency of choice] businesses. The ICO is planning to issue a separate guidance on how they intend to enforce these Regulations, but no timescale is given. I hope that they make it absolutely clear that commercial businesses, voluntary organisations or individuals will all be treated differently from each other.

In their eagerness to appear to be doing something, the politicians, in their ignorance, have crafted a very misguided ruling that will largely fail to prevent the tracking of information and creation of personal profiles, which was the original intent of the changes. When companies, such as Phorm, can create all this personal information on their servers, using the same techology to capture the data, but sending it back to a server, rather than saving a cookie, have these directives actually protected us? By and large this will be a resounding No. Have they put in place a mission to disrupt EU business and web usage, and deter some from using EU based websites? Definitely. How much this truly affects web usage remains to be seen, but I suspect initially there will be an increase in pop-ups appearing on websites asking to use cookies.

It will also be interesting to see how many government websites adhere to the rulings too.

File Under: government / law / security / technology / usability / web / website

Questions & Answers

Posted on 9th May 2011

I mentioned in my last post that I was working on a Survey Plugin for Labyrinth. The plugin is used within the YAPC Conference Survey system, which has now been running for several YAPC events over the last 5 years. I had promised to try and release the complete survey site last year, but with it being a Labyrinth based site setup, I didn't want to release it without releasing Labyrinth first. Now that's done I can concentrate on getting the Survey Plugin and the complete survey system on CPAN.

This year I will be running the YAPC::NA and YAPC::Europe surveys as per usual. However, this year I am delighted to say I have also been asked to handle the survey for the Pittsburgh Perl Workshop too. Hopefully if all goes to plan, this will provide the test bed for many other workshops to provide surveys.

The Conference Surveys themselves started in 2006, and have provided some very interesting feedback for organisers. While event organisers and myself never expect to get 100% response from all attendees, the levels that we do get is absolutely phenomenal. With this kind of success, I would be very interested to see whether the same Survey system can be used by other non-Perl events. There is certainly nothing that prevents a non-Perl (or even a non-tech) event from using the system. Last year I did have a query from a non-Perl event, but the system wasn't ready for a stand-alone release, and I wasn't able to set anything up. However, this year, with a CPAN release coming soon, I am more hopeful that others might be able to use the system.

If you are an organiser for an event where you think a survey would be useful for feedback, please do get in touch. If I cannot host an instance for you, once I get a full release on CPAN, I can provide help and advice for getting your own hosting instance running.

File Under: conference / labyrinth / perl / survey / workshop / yapc

Into The Blue

Posted on 7th May 2011

I haven't been posting recently about the Perl projects I'm currently working on, so over the next few posts I hope to remedy that.

To begin with, one of the major projects I've been involved with for the past 8 years has been CPAN Testers, although you can find out more of my work there on the CPAN Testers Blog. This year I've been releasing the code that runs some of the websites, specifically those that are based on my other major project, Labyrinth. Spearheading these releases have been the CPAN Testers Wiki and CPAN Testers Blog, with further releases for the Reports, Preferences and Admin sites also planned. The releases have taken time to put together mostly because of the major dependency they all have, which is Labyrinth.

Labyrinth is the website management framework I started writing back in 2002. Since then it has grown and become a stable platform on which to build websites. With both the CPAN Testers Wiki and the CPAN Testers Blog, three key plugins for Labyrinth have also been released which hopefully others can make use of.

The Wiki plugin, was intended to be written for the YAPC::Europe 2006 Wiki, but with pressures of organising the conference and setting up the main conference site (which also used Labyrinth), I didn't get it finished in time. Once a CPAN Testers Wiki was mooted, I began finishing off the plugin and integrating into Labyrinth. The plugin has been very stable for the last few years, and as a consequence was the first non-core plugin to be released. It's a fairly basic Wiki plugin, not too many bells and whistles, although there are a couple of Perlish shortcuts, but for the most part you don't need them. The CPAN Testers Wiki codebase release was also the first complete working site for Labyrinth, which was quite a milestone for me.

Following that success, the next release was for the CPAN Testers Blog. Again the underlying plugin, the Blog Plugin, has been stable for a few years, so was fairly quick to package and release, however the secondary plugin, the Event Plugin, has been evolving for quite some time and took a little more time. As I use both these plugin for several other sites, it was a good opportunity to bring together any minor bug fixes and layout changes. Some of these have seen slight modifications to the core Labyrinth codebase and the core set of plugins. In addition it has prompted me to start working on the documentation. It is still a long way from being complete, but at least the current documentation might provide some guidance to other users.

One of my major goals for Labyrinth was for it to be a 'website in a box'. Essentially this means that I wanted anyone to take a pre-packaged Labyrinth base (similar to the Demo site), drop it on a hosting service and be able to run a simple installation script to instantiate the database and configuration. The installation would then also be able to load requested plugins, and amend the database and configuration files appropriately. I haven't got to that stage yet, but it is still a goal.

With this goal in mind, I have read with interest the recent postings regarding the fact that DotCloud are now able to run Perl apps. This is definitely great news, and is exactly the kind of setup I had wanted to make best use of for the 'website in a box' idea. However, with several other frameworks now racing to have the coolest instance, it isn't something I'm going to concentrate on right now for Labyrinth. Plus there is the fact that Labyrinth isn't a PSGI framework, which others have eagerly added to their favourite framework. Labyrinth came from a very different mindset than other now more well known frameworks, and tries to solve some slightly different problems. With just me currently working on Labyrinth, as opposed to the teams of developers working on other frameworks, Labyrinth is never going to be the first choice for many reasons. I shall watch with interest the successes (and lessons learned from any hiccups) of the other frameworks as it is something I would like to get working with Labyrinth. If anyone who has the time and knows PGSI/Plack well enough, and would like to add those capabilities to Labyrinth, please get in touch.

The next notable plugins I'll be working on are the Survey, Music and Gallery Plugins. The former has its own post coming shortly. The next notable CPAN Testers site released planned is the Reports site. With it being considerably more involved, it might take a little longer to package and document, but it will likely be the most complex site release for Labyrinth, which will give anyone interested in the framework a good idea of how it can be used to drive several sites all at once.

File Under: labyrinth / opensource / perl / web / website

<< June 2011 (1) April 2011 (3) >>

Some Rights Reserved Unless otherwise expressly stated, all original material of whatever nature created by Barbie and included in the Memories Of A Roadie website and any related pages, including the website's archives, is licensed under a Creative Commons by Attribution Non-Commercial License. If you wish to use material for commercial puposes, please contact me for further assistance regarding commercial licensing.