A Tag cloud solution in Django (4 years ago)

Sunday, 20 August 2006, 11:20 a.m.

I ll describe here a 'non-perfect but suitable' solution to make a tag cloud, like the one i use for this blog, with django.

Principle

We are going to use a many to many field with link a blog anetry to many tags. It will create three tables blog_entry, blog_entry_tags and blog_tag. The tags need some field to remember the number of blog entries they appears in, as well as there current font size for the tag cloud.

The algorithm to distributes n elements between b buckets, keeping a uniform distribution without having tags with same number of references in different buckets, was the tricky part. You can find discussion about font distribution alogorithm around. one here.

Related items (the things being saved into the many-to-many relation) are not saved as part of a model's save method, as you have discovered. Instead, the Add- and ChangeManipulators save the many-to-many items later. In fact, for adding a new item, this is basically required, because you need to know the new instance's primary key value before you can save a reference to it in the m2m join table -- and that value does not necessarily exist before it is saved to the database.Malcolm Tredinnick link.

An important point: When do we do this processing ? at every blog view ? in a post_save method for our blog Entry model? The problem comes from many to many relation object saving, the post_save method for entry still happen before the related tags in blog_tag and blog_entry_tags are saved, see the quote for more details.

Since i dont see how to overcome the many to many saving process and i dont want to recalculate the cloud tag values at every view, i will use a post_save dispatcher which basicaly acts like a post_save model method.

Here comes the dodgy solution. On the first save of your entry (which you do by clicking save in the edit/create form for entry objects), the dispatcher will connect to your tag_cloud generation method. But, since the tags arent saved yet, you need to save your entry twice. On the second time the tags are there and the tag cloud generation method will update the entry tags total references and font-size field. Hope you got it, my explainations are not famous for their clearness.

In order to update the tag cloud we need to save the entry object twice.

Models

model.py file for my blog application.

class Tag(models.Model):
name = models.CharField(maxlength=200, primary_key='True', core=True)
total_ref = models.IntegerField(blank=True, default=0)
font_size = models.IntegerField(blank=True, default=0)

def __str__(self):
    return self.name 

def get_absolute_url(self):
    return "/blog/tag/%s/" % (self.name)

def __cmp__(self, other):
    return cmp(self.total_ref, other.total_ref)          

class Entry(models.Model):
[...]
tags = models.ManyToManyField(Tag)

Dispatcher

In our blog/view.py we can now connect the dispatcher to the tag cloud generation method. Tweak the final result using *nbr_of_buckets and base_font_size.

[...]
from coulix_org.blog.models import Entry, Tag
from django.db.models import signals
from django.dispatch import dispatcher

def process_cloud_tag(instance):
    ''' distribution algo n tags to b bucket, where b represents
    font size. '''
    entry = instance
    # be sure you save twice the same entry, otherwise it wont update the new tags.
entry_tag_list = entry.tags.all()
    for tag in entry_tag_list:
        tag.total_ref = tag.entry_set.all().count();
        tag.save()

    tag_list = Tag.objects.all()
    nbr_of_buckets = 8
    base_font_size = 11
    tresholds = []
    max_tag = max(tag_list)
    min_tag = min(tag_list)
    delta = (float(max_tag.total_ref) - float(min_tag.total_ref)) / (float(nbr_of_buckets))
    # set a treshold for all buckets
    for i in range(nbr_of_buckets):
        tresh_value =  float(min_tag.total_ref) + (i+1) * delta
        tresholds.append(tresh_value)
    # set font size for tags (per bucket)
    for tag in tag_list:
        font_set_flag = False
        for bucket in range(nbr_of_buckets):
            if font_set_flag == False:
                if (tag.total_ref <= tresholds[bucket]):
                    tag.font_size = base_font_size + bucket * 2
                    tag.save()
                    font_set_flag = True

# connect signal
dispatcher.connect(process_cloud_tag,
    sender = Entry,
    signal = signals.post_save)

Templatetags

We use a template tag to return a list of all tags to our template. Add the code to blog/templatetags/whatever.py file.

from coulix_org.blog.models import Tag
register = template.Library()

# use for tag cloud
def show_tag_list(parser, token):
    """ {% get_tag_list %}"""
    return TagListObject()

class TagListObject(template.Node):
    def render(self, context):  
        context['blog_tags'] = Tag.objects.all()
        return ''

register.tag('get_tag_list', show_tag_list)

Template

Here comes the template which generates the proper html code to display a nice cloud tag.

{% load whatever %}
{% get_tag_list %}
{% for tag in blog_tags %}
    <span style="font-size: {{ tag.font_size }}px;">
        <a class="link-typeA" title="Number of entries: {{ tag.total_ref }}" href="/blog/tags/all/{{ tag.name }}">{{ tag.name }}</a>
    </span>
{% endfor %}

Currently it only uses font-size, i may add some color gradient later. Please comment and propose better solutions than the two step saving dodgy method :).

Comments

  1. Aug 29 2006
    canada
    #1

    Maybe Django signals can be used to preserve the Tag Cloud in the Cache

  2. Aug 29 2006
    australia
    #2

    That's what happening with dispatcher.connect(process_cloud_tag, Isn't it what you mean ?

  3. Oct 3 2006
    serbia and montenegro
    #3

    I added post_delete dispatcher to delete tags not in use:

    tag_list = Tag.objects.all() for tag in tag_list: total = tag.entry_set.all().count() if total == 0: tag.delete() else: tag.total_ref = total tag.save()

    tag_list = Tag.objects.all() nbr_of_buckets = 8 ...

    Excellent work!

  4. Dec 29 2006
    united kingdom
    #4
    class Tag(Model):
        def save(self):
            Model.save(self)
            # I think that you could also shove all that stuff here rather than using signals
            Model.save(self)
    
  5. Jan 13 2007
    india
    #5

    Just Curious - Will signals works if the sender and receiver are in two different machines accessing the same database?

  6. Jan 13 2007
    australia
    #6

    No idea, a good question for #django :)

  7. Feb 11 2007
    germany
    #7

    signals are process-local

    this code will result in invalid data given a fitting race-condition

  8. Mar 21 2007
    japan
    #8

    very nice tutorial. i've been use it for my pages

  9. Jun 16 2007
    germany
    #9

    Sorry, I think another way!

  10. Jun 16 2007
    australia
    #10

    what other way ? bad ? :/

Post yours


Tag Cloud

Archives

Last Comments

Rss Feeds