LINQ fun - part one

Well, the recent musings of Alex James have caught my interest around LINQ (part one and two) - and so I thought I would start having a play with LINQ and using it to query some well known web sites... and decided to go with dragging some info out of youtube...



Basically, I like QI (also known as Quite Interesting - british comedy show) - and there's a lot of episodes on youtube, querying on the tags "QI", "Quite" & "Interesting" returns about 1000 results... almost every result represents a part of an episode... and most people posting are kind enough to include something in the title or description of the video which lets you know which episode and series it belongs to...



However it's going to take some effort to actually start watching at series 1, episode 1 and work your way through the episodes in the right order by endlessly browsing youtube's search results... unless we start "value adding" - creating an "episodic" view over youtube's data.



At this point you could put your magic hat on and wish for features from you're youtube "episodes" site:

  • Perhaps an RSS feed notifying me of new episodes as they're listed on youtube daily.

  • A nice way to see a series, it's episodes, and each part that I need to play.
  • Maybe some aggregation of episode summaries from another web site for each episode (in this case I'm thinking maybe TVRage.Com...)

So you can quickly see that we can start making something virtually out of nothing, like MacGyver... but much lamer and without the hair.



First things first, I started having a look at YouTube's API - you can search by tags and get plenty of info back from their REST web service, but you can only get 20 results per page, 20 results isn't much cop so I built a simple class which gives access to the whole set of search results as an IEnumerableYouTubeResult> - here's the code for that:


public class YouTubeSearcher
{
private const string TagQuery = "http://www.youtube.com/api2_rest?method=youtube.videos.list_by_tag&dev_id={0}&tag={1}&page={2}";
private string _developerId;

public YouTubeSearcher(string developerId)
{
if (string.IsNullOrEmpty(developerId)) throw new ArgumentNullException("developerId");
_developerId = developerId;
}

public IEnumerable QueryByTags(params string[] tags)
{
if ((tags == null) || (tags.Length <= 0))="" throw="" new="" argumentnullexception("tags",="" "tags="" must="" contain="" one="" or="" more="">

for (int page=1; true; page++)
{
List results = QueryByTagAndPage(JoinTags(tags), page);
if (results.Count <= 0)="">
foreach (YouTubeResult result in results) yield return result;
}
}

private string JoinTags(string[] tags)
{
if (tags.Length == 1) return tags[0];

StringBuilder builder = new StringBuilder(tags[0]);

for (int i=1; i

return builder.ToString();
}

private List QueryByTagAndPage(string tag, int page)
{
Console.WriteLine("Querying by tag: {0}, page: {1}", tag, page);

Stopwatch watch = Stopwatch.StartNew();
try
{
List results = new List();

string uri = string.Format(TagQuery, _developerId, tag, page);
XPathDocument xpd = new XPathDocument(uri);

XPathNavigator xpn = xpd.CreateNavigator();

XPathNodeIterator xniError = xpn.Select(@"/ut_response");

xniError.MoveNext();

if (xniError.Current.GetAttribute("status", String.Empty) == "fail")
{
string expression = "/ut_response/error/description";
string errorText = xpn.SelectSingleNode(expression).InnerXml;

throw new YouTubeException("Error occured while querying youtube: {0}", errorText);
}

try
{
XPathNodeIterator xni =
xpn.Select(@"/ut_response/video_list/video");

while (xni.MoveNext())
{
XPathNavigator navigator = xni.Current;

string title = navigator.SelectSingleNode("title").InnerXml;
string url = navigator.SelectSingleNode("url").InnerXml;
string thumbUrl = navigator.SelectSingleNode("thumbnail_url").InnerXml;
string id = navigator.SelectSingleNode("id").InnerXml;
string description = navigator.SelectSingleNode("description").InnerXml;
int lengthInSeconds = int.Parse(navigator.SelectSingleNode("length_seconds").InnerXml);
string author = navigator.SelectSingleNode("author").InnerXml;

results.Add(new YouTubeResult(id, url, title, thumbUrl, lengthInSeconds, description, author));
}
}
catch (XPathException xpe)
{
throw new YouTubeException("Xpath exception occured: {0}", xpe.Message);
}

return results;
}
finally
{
Console.WriteLine("Query complete in {0}ms", watch.ElapsedMilliseconds);
}
}
}



Following on from that we need to parse each search result and attempt to pull out it's episode information:
  • Series Number

  • Episode Number
  • Part Number


At this point we might also make the assumption that parts should be grouped by the user who posted them - in case the same episode has been posted twice by two users (quite likely, people are silly).



Parsing part information could be done using successive LINQ queries, but It's actually not that pleasant considering we're generally interogating only two text fields - the title for the clip, and it's description - horses for courses - so instead I built a quick 'n dirty "EpisodeParser" class... here's the code for that:


public class EpisodeParser
{
private List _contributors = new List();

public EpisodeParser()
: this(
new SeriesContributor(),
new EpisodeContributor(),
new PartContributor(),
new XFormatContributor(),
new PilotContributor(),
new PartOfPartsContributor(),
new WordNumberPartsContributor())
{
}

public EpisodeParser(params AbstractContributor[] contributors)
{
if (contributors != null) _contributors.AddRange(contributors);
}

public EpisodePart Parse(YouTubeResult result)
{
EpisodePart ep = new EpisodePart(result);

foreach (AbstractContributor contributor in _contributors)
{
contributor.Contribute(ep);
}

return ep;

/*
* QI Series 4 EpisodePart 12 (part 3)
* s2e10 part 1/4
* Qi Series 1 Ep 5 Part 1/3
* QI 2x01
* Take Out 1
* QI Pilot EpisodePart part 6
* S2E09
*/
}

private class SeriesContributor : AbstractContributor
{
public override void Contribute(EpisodePart episodePart)
{
int? seriesNumber = ParseNameNumber(episodePart, "series", "s");
AssignSeriesNumber(episodePart, seriesNumber);
}
}

private class EpisodeContributor : AbstractContributor
{
public override void Contribute(EpisodePart episodePart)
{
int? episodeNumber = ParseNameNumber(episodePart, "episode", "ep", "e");
AssignEpisodeNumber(episodePart, episodeNumber);
}
}

private class PartContributor : AbstractContributor
{
public override void Contribute(EpisodePart episodePart)
{
int? partNumber = ParseNameNumber(episodePart, "part", "p");
AssignPartNumber(episodePart, partNumber);
}
}

private class PilotContributor : AbstractContributor
{
public override void Contribute(EpisodePart episodePart)
{
if (episodePart.Result.Title.ToUpper().Contains("PILOT"))
{
AssignSeriesNumber(episodePart, 1);
AssignEpisodeNumber(episodePart, 1);
}
}
}

private class XFormatContributor : AbstractContributor
{
public override void Contribute(EpisodePart episodePart)
{
int seriesNumber = 0;
int episodeNumber = 0;

if (SplitNumber(episodePart, ref seriesNumber, ref episodeNumber, 'X'))
{
AssignSeriesNumber(episodePart, seriesNumber);
AssignEpisodeNumber(episodePart, episodeNumber);
}
}
}

private class PartOfPartsContributor : AbstractContributor
{
public override void Contribute(EpisodePart episodePart)
{
int partOf = 0;
int parts = 0;

if (SplitNumber(episodePart, ref partOf, ref parts, '/'))
{
AssignPartNumber(episodePart, partOf);
}
}
}

private class WordNumberPartsContributor : AbstractContributor
{
private static readonly Dictionary _phrases;

static WordNumberPartsContributor()
{
_phrases = new Dictionary();
_phrases.Add("part one", 1);
_phrases.Add("part two", 1);
_phrases.Add("part three", 1);
_phrases.Add("part four", 1);
_phrases.Add("part five", 1);
_phrases.Add("part six", 1);
_phrases.Add("part seven", 1);
_phrases.Add("part eight", 1);
_phrases.Add("part nine", 1);
_phrases.Add("part ten", 1);
}

public override void Contribute(EpisodePart episodePart)
{
AssignPartNumber(episodePart, FindPhrase(episodePart, _phrases));
}
}
}



At this point we have the building blocks for starting to write some LINQ queries... here's my first test - running a basic "select all"...


YouTubeSearcher searcher = new YouTubeSearcher(DeveloperId);

IEnumerable results = from result
in searcher.QueryByTags("QI", "Quite", "Interesting")
select result;

Console.WriteLine("total results: {0}", results.Count());



Trying a more explicit style of query, and parsing episodes:


EpisodeParser episodeParser = new EpisodeParser();

YouTubeSearcher searcher = new YouTubeSearcher(DeveloperId);

IEnumerable parts = searcher.QueryByTags("QI", "Quite", "Interesting")
.Select(result => episodeParser.Parse(result))
.Where(part => part.SeriesNumber == 1 && part.PartNumber == 1)
.OrderBy(part => part.EpisodeNumber);

foreach (EpisodePart part in parts.Distinct())
{
Console.WriteLine(part);
}



Which will let us know which episodes in series 1 exist...



Bit of a rush, but next time I'll start digging in a little deeper...



At this point though it's worth noting that we have some stuff for free because of IEnumerableT>...
  • Search results are being processed as their yielded, if we're just looking for the first matching item for a query we can stop without having to request additional result pages on a match is made.

  • Same goes for episodes, we only parse them as they are required - no unnecessary overhead.


So far nothing has required LINQ, but I think we'll start to see it being a great time saver come the next couple of parts... compared to writing the code ourselves.



We shall see!
Written on February 2, 2007