LINQ fun - part one

Well, the recent musings of Alex James have caught my interest around LINQ (part one and two) - and so I thought I would start having a play with LINQ and using it to query some well known web sites... and decided to go with dragging some info out of youtube...



Basically, I like QI (also known as Quite Interesting - british comedy show) - and there's a lot of episodes on youtube, querying on the tags "QI", "Quite" & "Interesting" returns about 1000 results... almost every result represents a part of an episode... and most people posting are kind enough to include something in the title or description of the video which lets you know which episode and series it belongs to...



However it's going to take some effort to actually start watching at series 1, episode 1 and work your way through the episodes in the right order by endlessly browsing youtube's search results... unless we start "value adding" - creating an "episodic" view over youtube's data.



At this point you could put your magic hat on and wish for features from you're youtube "episodes" site:

  • Perhaps an RSS feed notifying me of new episodes as they're listed on youtube daily.

  • A nice way to see a series, it's episodes, and each part that I need to play.
  • Maybe some aggregation of episode summaries from another web site for each episode (in this case I'm thinking maybe TVRage.Com...)

So you can quickly see that we can start making something virtually out of nothing, like MacGyver... but much lamer and without the hair.



First things first, I started having a look at YouTube's API - you can search by tags and get plenty of info back from their REST web service, but you can only get 20 results per page, 20 results isn't much cop so I built a simple class which gives access to the whole set of search results as an IEnumerableYouTubeResult> - here's the code for that:


public class YouTubeSearcher
{
private const string TagQuery = "http://www.youtube.com/api2_rest?method=youtube.videos.list_by_tag&dev_id={0}&tag={1}&page={2}";
private string _developerId;

public YouTubeSearcher(string developerId)
{
if (string.IsNullOrEmpty(developerId)) throw new ArgumentNullException("developerId");
_developerId = developerId;
}

public IEnumerable QueryByTags(params string[] tags)
{
if ((tags == null) || (tags.Length <= 0))="" throw="" new="" argumentnullexception("tags",="" "tags="" must="" contain="" one="" or="" more="">

for (int page=1; true; page++)
{
List results = QueryByTagAndPage(JoinTags(tags), page);
if (results.Count <= 0)="">
foreach (YouTubeResult result in results) yield return result;
}
}

private string JoinTags(string[] tags)
{
if (tags.Length == 1) return tags[0];

StringBuilder builder = new StringBuilder(tags[0]);

for (int i=1; i

return builder.ToString();
}

private List QueryByTagAndPage(string tag, int page)
{
Console.WriteLine("Querying by tag: {0}, page: {1}", tag, page);

Stopwatch watch = Stopwatch.StartNew();
try
{
List results = new List();

string uri = string.Format(TagQuery, _developerId, tag, page);
XPathDocument xpd = new XPathDocument(uri);

XPathNavigator xpn = xpd.CreateNavigator();

XPathNodeIterator xniError = xpn.Select(@"/ut_response");

xniError.MoveNext();

if (xniError.Current.GetAttribute("status", String.Empty) == "fail")
{
string expression = "/ut_response/error/description";
string errorText = xpn.SelectSingleNode(expression).InnerXml;

throw new YouTubeException("Error occured while querying youtube: {0}", errorText);
}

try
{
XPathNodeIterator xni =
xpn.Select(@"/ut_response/video_list/video");

while (xni.MoveNext())
{
XPathNavigator navigator = xni.Current;

string title = navigator.SelectSingleNode("title").InnerXml;
string url = navigator.SelectSingleNode("url").InnerXml;
string thumbUrl = navigator.SelectSingleNode("thumbnail_url").InnerXml;
string id = navigator.SelectSingleNode("id").InnerXml;
string description = navigator.SelectSingleNode("description").InnerXml;
int lengthInSeconds = int.Parse(navigator.SelectSingleNode("length_seconds").InnerXml);
string author = navigator.SelectSingleNode("author").InnerXml;

results.Add(new YouTubeResult(id, url, title, thumbUrl, lengthInSeconds, description, author));
}
}
catch (XPathException xpe)
{
throw new YouTubeException("Xpath exception occured: {0}", xpe.Message);
}

return results;
}
finally
{
Console.WriteLine("Query complete in {0}ms", watch.ElapsedMilliseconds);
}
}
}



Following on from that we need to parse each search result and attempt to pull out it's episode information:
  • Series Number

  • Episode Number
  • Part Number


At this point we might also make the assumption that parts should be grouped by the user who posted them - in case the same episode has been posted twice by two users (quite likely, people are silly).



Parsing part information could be done using successive LINQ queries, but It's actually not that pleasant considering we're generally interogating only two text fields - the title for the clip, and it's description - horses for courses - so instead I built a quick 'n dirty "EpisodeParser" class... here's the code for that:


public class EpisodeParser
{
private List _contributors = new List();

public EpisodeParser()
: this(
new SeriesContributor(),
new EpisodeContributor(),
new PartContributor(),
new XFormatContributor(),
new PilotContributor(),
new PartOfPartsContributor(),
new WordNumberPartsContributor())
{
}

public EpisodeParser(params AbstractContributor[] contributors)
{
if (contributors != null) _contributors.AddRange(contributors);
}

public EpisodePart Parse(YouTubeResult result)
{
EpisodePart ep = new EpisodePart(result);

foreach (AbstractContributor contributor in _contributors)
{
contributor.Contribute(ep);
}

return ep;

/*
* QI Series 4 EpisodePart 12 (part 3)
* s2e10 part 1/4
* Qi Series 1 Ep 5 Part 1/3
* QI 2x01
* Take Out 1
* QI Pilot EpisodePart part 6
* S2E09
*/
}

private class SeriesContributor : AbstractContributor
{
public override void Contribute(EpisodePart episodePart)
{
int? seriesNumber = ParseNameNumber(episodePart, "series", "s");
AssignSeriesNumber(episodePart, seriesNumber);
}
}

private class EpisodeContributor : AbstractContributor
{
public override void Contribute(EpisodePart episodePart)
{
int? episodeNumber = ParseNameNumber(episodePart, "episode", "ep", "e");
AssignEpisodeNumber(episodePart, episodeNumber);
}
}

private class PartContributor : AbstractContributor
{
public override void Contribute(EpisodePart episodePart)
{
int? partNumber = ParseNameNumber(episodePart, "part", "p");
AssignPartNumber(episodePart, partNumber);
}
}

private class PilotContributor : AbstractContributor
{
public override void Contribute(EpisodePart episodePart)
{
if (episodePart.Result.Title.ToUpper().Contains("PILOT"))
{
AssignSeriesNumber(episodePart, 1);
AssignEpisodeNumber(episodePart, 1);
}
}
}

private class XFormatContributor : AbstractContributor
{
public override void Contribute(EpisodePart episodePart)
{
int seriesNumber = 0;
int episodeNumber = 0;

if (SplitNumber(episodePart, ref seriesNumber, ref episodeNumber, 'X'))
{
AssignSeriesNumber(episodePart, seriesNumber);
AssignEpisodeNumber(episodePart, episodeNumber);
}
}
}

private class PartOfPartsContributor : AbstractContributor
{
public override void Contribute(EpisodePart episodePart)
{
int partOf = 0;
int parts = 0;

if (SplitNumber(episodePart, ref partOf, ref parts, '/'))
{
AssignPartNumber(episodePart, partOf);
}
}
}

private class WordNumberPartsContributor : AbstractContributor
{
private static readonly Dictionary _phrases;

static WordNumberPartsContributor()
{
_phrases = new Dictionary();
_phrases.Add("part one", 1);
_phrases.Add("part two", 1);
_phrases.Add("part three", 1);
_phrases.Add("part four", 1);
_phrases.Add("part five", 1);
_phrases.Add("part six", 1);
_phrases.Add("part seven", 1);
_phrases.Add("part eight", 1);
_phrases.Add("part nine", 1);
_phrases.Add("part ten", 1);
}

public override void Contribute(EpisodePart episodePart)
{
AssignPartNumber(episodePart, FindPhrase(episodePart, _phrases));
}
}
}



At this point we have the building blocks for starting to write some LINQ queries... here's my first test - running a basic "select all"...


YouTubeSearcher searcher = new YouTubeSearcher(DeveloperId);

IEnumerable results = from result
in searcher.QueryByTags("QI", "Quite", "Interesting")
select result;

Console.WriteLine("total results: {0}", results.Count());



Trying a more explicit style of query, and parsing episodes:


EpisodeParser episodeParser = new EpisodeParser();

YouTubeSearcher searcher = new YouTubeSearcher(DeveloperId);

IEnumerable parts = searcher.QueryByTags("QI", "Quite", "Interesting")
.Select(result => episodeParser.Parse(result))
.Where(part => part.SeriesNumber == 1 && part.PartNumber == 1)
.OrderBy(part => part.EpisodeNumber);

foreach (EpisodePart part in parts.Distinct())
{
Console.WriteLine(part);
}



Which will let us know which episodes in series 1 exist...



Bit of a rush, but next time I'll start digging in a little deeper...



At this point though it's worth noting that we have some stuff for free because of IEnumerableT>...
  • Search results are being processed as their yielded, if we're just looking for the first matching item for a query we can stop without having to request additional result pages on a match is made.

  • Same goes for episodes, we only parse them as they are required - no unnecessary overhead.


So far nothing has required LINQ, but I think we'll start to see it being a great time saver come the next couple of parts... compared to writing the code ourselves.



We shall see!
Read More

PHP...

Background



As a bit of background, for the last couple of months I've been doing some work for a personal client aside from the work for
Seismic Technologies which is on the back burner till we pick up some more investment interest (I'm still the lead dev though) - the project is an add-in for an existing product (COM interop) which be must be deeply-integrated, as well as being capable of being used in stand alone mode...



It's a very advantageous project considering the time frame, but that's part of the fun :) Once the clients moved forward on some marketing I'll post a little more about some of the challenges I've faced along the way.



At any rate - the project's stalled briefly while the clients doing a little business analysis to get the underlying methodology sorted - so they've asked me to switch across to building the license generation / customer portal / license purchasing module for their preexisting CMS system (CMS made simple - PHP) ... where are the ruby or Monorail CMS's to wean my clients onto?

PHP... ack

... So I haven't used PHP in anger for years and years, but the one advantage of dynamic languages is you can generally hit the ground running a lot quicker then their statically compiled competitors...  maybe PHP even more so because it's focused on web development.



So far the two things that have bugged/puzzled me are:
  • Classes don't call their base classes default constructor implicity - you have to do that yourself.  This isn't all bad, at least you can control when the default constructor is called.

  • Methods are instance, static and pseudo-instance all in one...


I think the second one bugs me more because you end up with 2+ potential code paths that should be accounted for in testing, if your exposing an "api" for consumption - or more importantly you should throw an exception for the usages you don't wish to allow (I'm probably missing the "quick and dirty" point here of course ;o) - it's hard to fight years of  instance methods != static methods...



Maybe I'm just old fashioned and there's nothing wrong with this, I should have a flick through Programming Language Pragmatics again, there must be some other dynamic languages with similar behavior?



At any rate, the example:


class A
{
function foo()
{
if (isset($this)) {
echo '$this is defined (';
echo get_class($this);
echo ")n";
} else {
echo "$this is not defined.n";
}
}
}

class B
{
function bar()
{
A::foo();
}
}

$a = new A();
$a->foo();
A::foo();
$b = new B();
$b->bar();
B::bar();



And the output of that little example is shown below, notice how A:foo() knew it was being called from class
B
...  I wonder what phalanger is doing under the hood to achieve the same thing in the CLR...


$this is defined (a)
$this is not defined.
$this is defined (b)
$this is not defined.

Read More

The slacker got tagged

Well I've been really slack of late re: the blog and the coding community in general - but at any rate, what brought me back into focus was actually getting email regarding Splicer - yes, there are people using it... who knew?

At any rate, I also noticed that reading back through other peoples blogs - it appears I was actually tagged by Alex James - now I had a personal blog for a while before a technical one, and it's that kind of bollix that drove me away from it as a past time :) but then I'm just a sour sod, soooo.... I've decided to respond, if for no other reason but to confirm that I do in fact read Alex James blog - but I wont bother inflicting the pain on anyone else (plus most of the NZ bloggers have already been swatted a few weeks ago)

Without further delay, 5 things you probably don't know about me:

1) I got engaged last year on the rocks at pink beach, omaha.

2) The first programming language I learned was basic for the vic20, followed shortly by gwbasic.  And then (turbo) C++ when I was 11 or 12 (I forget exactly).

3) I bought my first car at the end of last year, a BMW 318ti.

4) I grew up on a farm and was home schooled for a couple of years, got Dux at the dubious Rodney college in Wellsford, and then did some tertiary studies (ie. made new friends) at Unitec.

5) My cat is named shodan, and my previous cat was run over and consequently body-snatched by gypsies or possibly itallians (according to the neighbours at the time...)

Wasn't that fun, tune in next time when I actually post something of value :)

Read More

Apologies...

Apologies for the extended blog down-time... moved house last weekend (and did a few major changes to my network, including a entirely new domain etc.) and only just got a chance to set up an Ubuntu/apache/mod_proxy box for forwarding requests onto the server where my blog lives.

Drop me a comment if anything seems a little wonky (ie. missing images etc.) as I did the config in a hurry  :)

Read More

Odd designer behaviour...

The evil designer attribute ;o)

I'm working on a project at the moment where the client want's an
"Add-in" for an existing piece of software... it's a COM interop
project, .Net 2.0 (and yes, it uses Castle, IoC sits in
nicely with
the IServiceProvider :) and it
hosts a custom designer, and the add-ins must exist in their own
directory, outside of the host applications base directory.

At any rate, it's been working just fine so far, but today I was
trying to introduce some custom designers... and they weren't
being constructed... which highlighted some odd behaviour in the
design time support - I have 6 or so assemblies, most
are support assemblies, some containing controls and their
associated designers, something like this:

public class MyButtonDesigner :
ControlDesigner { ..}

[Designer(typeof(MyButtonDesigner))]

public class MyCustomButton :
ButtonBase { .. }

And then there is the core assembly which hosts the design
surface etc. and contains a number of  COM-visible classes
- one of these class in the core assembly is
constructed via Com interop by it's Name, and everything starts
it's life from there in their Add-in, much like a Main class in a
WinForms project.

So the core assembly references the support assemblies, and has
no problem creating controls at runtime within these
assemblies...

But then...

However, the design time support fails to construct or make use
of the custom designers... and doesn't blow up, it just
fails silently - personally I hate this kind of behaviour...
I like things to fail fast, in your face :)

It appears the custom designer is never constructed because
the design time infrastructure fails to locate the
assembly it's held within, even though my code has
just created an instance of a control from the same said
assembly not an instant before... arghh!

To my mind this is madness... I'm not passing a string reference
to the type, and the assembly is already loaded, it should be as
simple as using the Type in the attribute to create an instance
of the designer - looking at the internal implementation of
this attribute, I can see where it all goes terribly wrong:

public DesignerAttribute(Type
designerType)

{

      this.designerTypeName =
designerType.AssemblyQualifiedName;

      this.designerBaseTypeName =
typeof(IDesigner).FullName;

}

It's still a puzzle as to why this isn't being resolved... the
assembly is already loaded against the current AppDomain - I
guess I could mess with the fusion logger to figure it out, there
must be some subtle different - but I didn't have the time to
waste, so I just cheated, implementing an
AppDomain.AssemblyResolve event handler to take
care of returning the assembly that's already loaded when
matching against a set of pre-loaded assemblies.

I think the silent failure is what annoys me the most, this is
something which could be easily introduced and might not be
imediately detected when performing a minor update... I'm still
not sure how I could write a test to detect it...

Speaking of custom designers... I'm unimpressed by how most
custom designers are marked internal... especially for classes
that are designed to be subclassed... why the hell is the
ButtonBaseDesigner marked internal?

Read More