Blog | Photography | Highlights | Contact | About | Atom & RSS Feeds
Tag Clouds Revisited, Refactored and Improved
  • Blog
  • Tag Clouds Revisited, Refactored and Improved

  Tags: Tassography , Tag Clouds
About a year ago, I added a Tag Cloud to allow you to navigate tags associated with the photos on the site, but I’ve never been happy with the algorithm used. I spent some time Friday working out the problems and updating my code. The Problems 1) Tags were distributed across ten buckets, each bucket containing the same number of tags. But even with 1800 tags the frequency isn’t linear. The majority of the tags have a frequency less than average frequency. Which means the first 5 buckets are all effectively equal but would be drawn differently. Thus making it seem that some tags were much more frequent than their frequency peers. Which you can see in this example: image 2) The buckets were calculated relative to all the 1800 tags, not on each individual view of the tags. Therefore certain views, particularly the home page (where only the top 100 tags are shown), there’s no clear difference between tags:  image 3) The number buckets were hard coded. So the method to calculate the buckets for the tags couldn’t be modified without changing method that generated the Css. The Solution I’ve started to add tags to Blogs, so it was time to revisit this algorithm. Not only to solve these problems but also making it generic so that it could be applied to Photo Tags and Blog Tags. After searching the internet I came across this blog which helped me solve these problems. Specifically the solutions were: 1) Directly calculate the size of the font based on the frequency of the tag and not artificially bucket tags of different frequencies together. This way if 90% of the tags are low frequency, there size will all be the same and using the smallest font size. 2) With a couple of changes to my code I was able to use this algorithm to generate a tag cloud for my blog, as shown below. But there’s still a problem.  The frequency of the tags is not linear so in the left image below, the vast majority of tags are small. This can be solved by applying a logarithmic scale to the bucket calculation as discussed in the blog. The resulting image on the right is significantly improved:
(linear algorithm) (logarithmic algorithm)
image image
  2) By making this direct mapping between the font size and the bucket, the calculation is much cleaner. As a result, the mapping can be calculated every time a view is rendered. So now the home page view (were only the top 100 tags are shown), goes from a view where there is no clear bucketing: image To a much improved view that shows the relative weight of these top 100 tags to each other, not the entire list of 1800 tags: image The Code Martin Peck gives me frequent advice on coding, and in one discussion we discussed how less code is often better. The amusing thing about this refactoring is that the lines of code have halved from the the old version, with all it’s flaws!
public static void CalculateCloudBuckets(ITagCloudTag[] tagCloudTags)
{
    //ITagCloudTag has two properties TagFrequency & TagCloudBucket
    //ITagCloudTag.TagFrequency has already been set
    List<ITagCloudTag> tags = tagCloudTags.ToList<ITagCloudTag>();
    // Get Max and Min Occurrence
    int minFrequency = tags.Min(p => p.TagFrequency);
    int maxFrequency = tags.Max(p => p.TagFrequency);
    // Calculate Frequencies
    foreach (ITagCloudTag tag in tags)
    {
        if (tag.TagFrequency > 0)
        {
            double weight = (Math.Log(tag.TagFrequency) - Math.Log(minFrequency))
                / (Math.Log(maxFrequency) - Math.Log(minFrequency));
            tag.TagCloudBucket = Convert.ToInt32(TagCloudGenerator.minFontSize +
                ((TagCloudGenerator.maxFontSize - TagCloudGenerator.minFontSize)
                * weight));
        }
        else
        {
            tag.TagCloudBucket = 0;
        }
    }
}
  If you want to see the linear method for yourself, just replace the weight calculation to:
double weight = (tag.TagFrequency - minFrequency) / (maxFrequency - minFrequency);
  Well that’s it, with an hour of writing code (less time than it took to write this blog), I have less code to maintain, a much improved photo tag cloud view and a new blog tag cloud.

This website, all photography & other content is Copyright © Ben Vincent. Unauthorised use of images is strictly prohibited.
Last Updated: Wed, 14 Dec 2011, 16:30:58    |    Website Version v4.0.4138.41239    |    Content v7.002