Solving the build error Python.h: No such file or directory on Ubuntu

I was struggling due to this problem for couple of days and got the solution just now , I  thought to share it
with you all; as I don’t want anyone else to land up in trouble and waste so much time as I had to.

I use Hardy(Ubuntu) and was trying to build subvertpy  from source as the package was not available for same.

Why this error occurs

This problem occurs due to absence of Python development headers as many python modules have dependency on  Python development headers to compile.

Solution

Solutions are usually simple to such problems this one is no exception, just need to install development headers using package manager; open the terminal and issue the following commands

sudo apt-cache search python

and select the package that matches your installation as in my case it is python2.5-dev; issue the following command to install it

sudo apt-get install python2.5-dev

This will install development headers in /usr/include/python2.5 and now you can build your python module successfully.

Why BootStrapToday sent so many notifications yesterday ?

Yesterday almost all BootStrapToday users received, Ticket modification notifications for all tickets assigned to them. (probably lot of emails notifications). First our apologies to our users for annoying them with so many mails. We have learned our lesson and we have taken the corrective steps to ensure that this does not happen again.

Short Version of what happened:

Yesterday we upgraded BootStrapToday to new version.  As part of this upgrade, we added a new field ‘Complexity’ to the tickets. For all existing ‘closed’ tickets , complexity is set to ‘Unkown’, other tickets ‘complexity’ is set to ‘Simple’. Since this is a change in the ticket, our ticket notification system automatically triggered notifications to all these tickets.  Somehow we missed this when we upgraded our QA server for testing. By the time we realized what has happened, the system has sent notifications to almost all users.

Long Version of what happened (to technically interested):

The BootStrapToday uses Django framework. Our upgrade process is automated using ‘Fabric”.  We use celery daemon to run background processing tasks and trigger email notifications.

The moment ticket is saved , a background task is added the celery task. This celery task determines to whom notifications should go.

The upgrade process roughly goes through following steps (these are Fabric steps and hence automated).

  1. stop all services. (including celery)
  2. Update the source.
  3. Execute schema changes (e.g. django syncdb)
  4. Execute scripts to update any existing data (as required).
    For example, like adding a default complexity to Tickets. In our case, we did not do this while adding ‘complexity’ column to tickets table.  We added separate post ‘schema changes’ script.  We queried at all ‘closed’ tickets, set the complexity to ‘Unknown’ and saved the ticket. Each ticket save added a celery task for notifications.
  5. Restart all services (e.g. web server, celery daemon etc).
    At this point, all there were large number of celery tasks for ticket notification and the moment celery daemon started, it started executing the these tasks and started sending emails.

Since upgrade is an automated process, by time we realized what is happening and stopped the celery daemon, hundreds of email notifications were already sent.

What are we doing to ensure that it does not happen again ?

We have added a flag to disable email notifications in our ‘data update’ step. If the emails are to be completely disabled, then we intercept the Django EmailMessage.send() function and stop the email from sent.  There is also an additional check on disabling some notification types instead of all. If we need to notify users of some specific data update, we will put in separate ‘inform user’ step.

Once again our apologies to our users for annoying them with so many mails.  We have learned our lesson and we have taken the corrective steps to ensure that this does not happen again.

PS: Given below is quick simple code on intercepting EmailMessasge.send()

from django.core.mail import EmailMessage

def intercept_email_send(email_send_func):
    def mail_send(*args, **kwargs):
        #check if email is enabled in settings. If not, don't send the mail.    
        if settings.ENABLE_EMAIL:            
            email_send_func(*args, **kwargs)
    return mail_send

EmailMessage.send = intercept_email_send(EmailMessage.send)

Virus that infects Python .pyc files

Some time back I came across an article from Symantec blog ‘This Python Has Venom!‘.  It documents a virus that infects python’s .pyc files.  (python compiled file). Its a proof-of-concept virus and probably not a serious risk.  However, its a novel method and considering that .pyc files are ‘platform neutral’ , it can lead to ‘cross platform viruses’.

Just something interesting.

Optimizing Django database access : Part 2

Few months back we moved to Django 1.2.  Immediately we realized that our method ‘optimizing’ database queries on foreign key fields doesn’t work any more.  The main reason for this ‘break’ was Django 1.2 changes to support multiple databases. Specially the way ‘get’ function is called to access value of a ForeignKey field.

In Django 1.2 ForeignKey fields are accessed using rel_mgr. See the call below.

rel_obj = rel_mgr.using(db).get(**params).

However, manager.using() call returns a queryset. Hence if the manager has implemented a custom ‘get’  function, this ‘custom get’ function is not called. Since our database access optimzation is based on ‘custom get’ function. This change broke our optimization.

Check Django Ticket 16173 for the details.

Also our original code didn’t consider multidb scenarios. Hence query ‘signature’ computation has to consider the db name also.

from django.db import connection
from django.core import signals

def install_cache(*args, **kwargs):
    setattr(connection, 'model_cache', dict())

def clear_cache(*args, **kwargs):
    delattr(connection, 'model_cache')

signals.request_started.connect(install_cache)
signals.request_finished.connect(clear_cache)

class YourModelManager(models.Manager):
    def get(self, *args, **kwargs):
        '''
        Added single row level caching for per connection. Since each request
        opens a new connection, this essentially translates to per request
        caching
        '''
        model_key = (self.db, self.model, repr(kwargs))

        model = connection.model_cache.get(model_key)
        if model is not None:
            return model

        # One of the cache queries missed, so we have to get the object
        # from the database:
        model = super(YourModelManager, self).get(*args, **kwargs)
        if model is not None and model.pk is not None:
            connection.model_cache[model_key]=model
        return model

There are minor changes from the first version. Now this manager class stores the ‘db’ name also in ‘key’.

Along with this change you have to make one change the core Django code. In file django/db/fields/related.py in in __get__ function of ‘ReverseSingleRelatedObjectDescriptor’ class, Replace the line

         rel_obj = rel_mgr.using(db).get(**params)

by line

         rel_obj = rel_mgr.db_manager(db).get(**params)

This will ensure that YourModelManager’s get function is called while querying the foreignkeys.

Do additional features in software reduce its durability?

Software designing has become very competitive because of the emergence of many new languages. Developers spend their major time finding the most suitable compilers which can reduce the complexities involved in developing software. Market is flooded with millions of software companies. These companies target the clients by creating a need for software in almost every routine. From switching the television using remote control to launching a satellite, the major dependency is software. These softwares are the whole and sole responsibility for the success of corresponding mechanism.

The companies around the globe hire the most efficient and highly skilled developers, as the softwares developed by them makes an impression over the customer. Softwares can be developed in low level as well as in high level languages. Some softwares are designed according to the needs of the customers while some are generic. The latter ones usually are designed for mass population. These softwares are for general use. They comprise of many features and functionalities. The aim of the company behind this strategy is cost saving. They feel it better to design single, multipurpose software in spite of designing several small, single purpose softwares. The customers who tend to use these kinds of softwares usually find it much complex and time consuming. There is one other category of people who like a single software which can meet all their expectations.

It becomes very difficult when there is a bug or an issue in the software. When the multipurpose software halts, the overall work of the company stops, but the one using variety of softwares for different activities do not worry on a single software failure. He remains quite relaxed as the other work at his company continues to work smoothly. Even if the idea of having a single software to perform multiple activities sounds good but falls short during implementation. It should be noted that adding features should be in such a way that a majority of the features should be utilized at the same time, else the software starts losing its durability and becomes malicious in a long run.

Thanks,

Bootstraptoday | www.bootstraptoday.com

ken@bootstraptoday.com

Optimizing Django database access : some ideas/experiments

As we added more features to BootStrapToday, we started facing issues of performance. Page access was getting progressively slower. Recently we analyzed page performance using excellent Django Debug Toolbar and discovered that in worst there were more than 500 database calls in a page. Obviously that was making page display slow. After looking at various calls, analyzing the code and making changes, we were able to bring it down to about 80 calls and dramatically improving the performance.  Django has excellent general purpose caching framework. However, it’s better to minimize the calls and then add caching for remaining queries. In this article, I am going to show a simple but effective idea for improving the database access performance.

In Django, a common reason for increased database calls in ForeignKey fields. When you try to access the variable representing foreign key typically it results in a database access. Usually suggested solution to this problem is use of ‘select_related’ function of Django query set API. It can substantially reduce the number of database calls. Sometimes it is not possible to use ‘select_related’ (e.g. you don’t want to change how a third party app works or changing the queries requires significant change in the logic etc).

In our case, we have Foreign Key fields like Priority, Status etc on Ticket models. These fields are there because later we want to add ability to ‘customize’ these fields. However, the values in these tables rarely change. Hence usually this results in multiple queries to get the same data.  Usually these are ‘get’ calls. If we can add a simple caching to ‘get’ queries for status, priority etc, then we can significantly reduce the number of database calls.

In Django, a new connection is opened to handle a new request. Hence if we add model instance cache to ‘connection’ then query results will be cached during handling of one request. New request will make the database query again. With this strategy we don’t need complicated logic to clear ‘stale’ items from the cache.

from django.db import connection
from django.core import signals

def install_cache(*args, **kwargs):
    setattr(connection, 'model_cache', dict())

def clear_cache(*args, **kwargs):
    delattr(connection, 'model_cache')

signals.request_started.connect(install_cache)
signals.request_finished.connect(clear_cache)

class YourModelManager(models.Manager):
    def get(self, *args, **kwargs):
        '''
        Added single row level caching for per connection. Since each request
        opens a new connection, this essentially translates to per request
        caching
        '''
        model_key = (self.model, repr(kwargs))

        model = connection.model_cache.get(model_key)
        if model is not None:
            return model

        # One of the cache queries missed, so we have to get the object from the database:
        model = super(YourModelManager, self).get(*args, **kwargs)
        if model is not None and model.pk is not None:
            connection.model_cache[model_key]=model
        return model

As a side benefit, since the same model instance is returned for same query in subsequent calls, number of duplicate instances is reduced and hence memory foot print is also reduced.

Really simple python message queue

Sometime we were looking for python message queue.  Our requirements were pretty simple.

Problem:
Our problem was to update the recent commit data into a database table through Subversion’s post-commit hook.  Since this is a post-commit hook, if it takes a long time, user may get ‘timeout errors’. That is not acceptable. Since sometimes commits are quite large,  we have to put the the task in queue and process it later. One option is to start a a separate process to update the table. However, we are using django. Hence starting a new process mean ‘restarting django’ which will be a significant overhead.

So we wanted

  1. Something which keeps queue in memory or in database. Persistent queue was not needed.
  2. implemented in python. Since It was very simple requirement. Hence we didn’t want the overhead of setting up another application and maintaining it.
  3. Today there will be a single django process extracting tasks from queue and executing them. However, tomorrow this process may be executed on another server.
  4. simple queue client (preferably without any direct database dependency)

We looked at various options.

  1. Celery/django-celery
  2. ActiveMQ + python client
  3. PyMQ
  4. Few more.

However, all these approach seemed to be a over kill for our requirements. I was thinking about this requirement for 2/3 days. Then suddenly a thought struck me.  Email server maintains a queue of email message and python has a simple SMTP class.  I can use this class to implement a message queue.

  1. Derive a class from smtpd.SMTPServer.
  2. Override ‘process_message’ method.
  3. In ‘process_message’ start a thread.
  4. Inside the thread function, read the message contents and execute the task.
  5. The message contents are simple JSON objects.
  6. Client code is simple. Client just have to send a ‘email’ to this local SMTP server. Send the task parameters encoded in JSON format as content of this email. So the client can be a simple shell script.

UPDATE : It took me 1 hour to code this. Its a single python file of about 40 lines. As the title says its ‘really simple message queue’. There are no configurations to maintain.  Since this particular Python SMTP server is configured to listen to a non-standard port and receive mails only from local machine,  ’injecting’ nasty messages is not serious issue.
In future, when we need some serious queue server, we will use something else.

Follow

Get every new post delivered to your Inbox.

Join 41 other followers