glances – new way to look at contention

glances is like top/htop but little different. It shows you sort based on contention (smartly and automatically) unless you change that and hence if you just want to check what is biggest bottleneck in system, then head over to glances quickly. Here is quick description from dnf info command

 

Name : glances
Version : 2.11.1
Release : 2.fc28
Arch : noarch
Size : 3.2 M
Source : glances-2.11.1-2.fc28.src.rpm
Repo : @System
From repo : fedora
Summary : CLI curses based monitoring tool
URL : https://github.com/nicolargo/glances
License : GPLv3
Description : Glances is a CLI curses based monitoring tool for both GNU/Linux and BSD.
:
: Glances uses the PsUtil library to get information from your system.
:
: It is developed in Python.

Intro to Python Image Processing in Computational Photography

This post was written by Radu Balaban, SQL Developer for Toptal.

 

Computational photography is about enhancing the photographic process with computation. While we normally tend to think that this applies only to post-processing the end result (similar to photo editing), the possibilities are much richer since computation can be enabled at every step of the photographic process—starting with the scene illumination, continuing with the lens, and eventually even at the display of the captured image.

This is important because it allows for doing much more and in different ways than what can be achieved with a normal camera. It is also important because the most prevalent type of camera nowadays—the mobile camera—is not particularly powerful compared to its larger sibling (the DSLR), yet it manages to do a good job by harnessing the computing power it has available on the device.

We’ll take a look at two examples where computation can enhance photography—more precisely, we’ll see how simply taking more shots and using a bit of Python to combine them can create nice results in two situations where mobile camera hardware doesn’t really shine—low light and high dynamic range.

Low-light Photography

Let’s say we want to take a low-light photograph of a scene, but the camera has a small aperture (lens) and limited exposure time. This a typical situation for mobile phone cameras which, given a low light scene, could produce an image like this (taken with an iPhone 6 camera):

Image of a couple toys in a low-light environment

If we try to improve the contrast the result is the following, which is also quite bad:

The same image as above, much brighter but with a distracting visual noise

What happens? Where does all this noise come from?

The answer is that the noise comes from the sensor—the device that tries to determine when the light strikes it and how intense that light is. In low light, however, it has to increase its sensitivity by a great deal to register anything, and that high sensitivity means it also starts detecting false positives—photons that simply aren’t there. (As a side note, this problem does not affect only devices, but also us humans: Next time you’re in a dark room, take a moment to notice the noise present in your visual field.)

Some amount of noise will always be present in an imaging device; however, if the signal (useful information) has high intensity, the noise will be negligible (high signal to noise ratio). When the signal is low—such as in low light—the noise will stand out (low signal to noise).

Still, we can overcome the noise problem, even with all the camera limitations, in order to get better shots than the one above.

To do that, we need take into account what happens over time: The signal will remain the same (same scene and we assume it’s static) while the noise will be completely random. This means that, if we take many shots of the scene, they will have different versions of the noise, but the same useful information.

So, if we average many images taken over time, the noise will cancel out while the signal will be unaffected.

The following illustration shows a simplified example: We have a signal (triangle) affected by noise, and we try to recover the signal by averaging multiple instances of the same signal affected by different noise.

A four-panel demonstration of the triangle, a scattered image representing the triangle with added noise, a sort of jagged triangle representing the average of 50 instances, and the average of 1000 instances, which looks nearly identical to the original triangle.

We see that, although the noise is strong enough to completely distort the signal in any single instance, averaging progressively reduces the noise and we recover the original signal.

Let’s see how this principle applies to images: First, we need to take multiple shots of the subject with the maximum exposure that the camera allows. For best results, use an app that allows manual shooting. It is important that the shots are taken from the same location, so a (improvised) tripod will help.

More shots will generally mean better quality, but the exact number depends on the situation: how much light there is, how sensitive the camera is, etc. A good range could be anywhere between 10 and 100.

Once we have these images (in raw format if possible), we can read and process them in Python.

For those not familiar to image processing in Python, we should mention that an image is represented as a 2D array of byte values (0-255)—that is, for a monochrome or grayscale image. A color image can be thought of as a set of three such images, one for each color channel (R, G, B), or effectively a 3D array indexed by vertical position, horizontal position and color channel (0, 1, 2).

We will make use of two libraries: NumPy (http://www.numpy.org/) and OpenCV (https://opencv.org/). The first allows us to perform computations on arrays very effectively (with surprisingly short code), while OpenCV handles reading/writing of the image files in this case, but is a lot more capable, providing many advanced graphics procedures—some of which we will use later in the article.


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
<span class="hljs-keyword">import</span> os
<span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np
<span class="hljs-keyword">import</span> cv2

folder = <span class="hljs-string">'source_folder'</span>

<span class="hljs-comment"># We get all the image files from the source folder</span>
files = list([os.path.join(folder, f) <span class="hljs-keyword">for</span> f <span class="hljs-keyword">in</span> os.listdir(folder)])

<span class="hljs-comment"># We compute the average by adding up the images</span>
<span class="hljs-comment"># Start from an explicitly set as floating point, in order to force the</span>
<span class="hljs-comment"># conversion of the 8-bit values from the images, which would otherwise overflow</span>
average = cv2.imread(files[<span class="hljs-number">0</span>]).astype(np.float)
<span class="hljs-keyword">for</span> file <span class="hljs-keyword">in</span> files[<span class="hljs-number">1</span>:]:
    image = cv2.imread(file)
    <span class="hljs-comment"># NumPy adds two images element wise, so pixel by pixel / channel by channel</span>
    average += image
 
<span class="hljs-comment"># Divide by count (again each pixel/channel is divided)</span>
average /= len(files)

<span class="hljs-comment"># Normalize the image, to spread the pixel intensities across 0..255</span>
<span class="hljs-comment"># This will brighten the image without losing information</span>
output = cv2.normalize(average, <span class="hljs-keyword">None</span>, <span class="hljs-number">0</span>, <span class="hljs-number">255</span>, cv2.NORM_MINMAX)

<span class="hljs-comment"># Save the output</span>
cv2.imwrite(<span class="hljs-string">'output.png'</span>, output)

The result (with auto-contrast applied) shows that the noise is gone, a very large improvement from the original image.

The original photograph of toys, this time both brighter and much more clear, with very little discernable noise

However, we still notice some strange artifacts, such as the greenish frame and the gridlike pattern. This time, it’s not a random noise, but a fixed pattern noise. What happened?

A close-up of the top left corner of the above image

A close-up of the top left corner, showing the green frame and grid pattern

Again, we can blame it on the sensor. In this case, we see that different parts of the sensor react differently to light, resulting in a visible pattern. Some elements of these patterns are regular and are most probably related to the sensor substrate (metal/silicon) and how it reflects/absorbs incoming photons. Other elements, such as the white pixel, are simply defective sensor pixels, which can be overly sensitive or overly insensitive to light.

Fortunately, there is a way to get rid of this type of noise too. It is called dark frame subtraction.

To do that, we need an image of the pattern noise itself, and this can be obtained if we photograph darkness. Yes, that’s right—just cover the camera hole and take a lot of pictures (say 100) with maximum exposure time and ISO value, and process them as described above.

When averaging over many black frames (which are not in fact black, due to the random noise) we will end up with the fixed pattern noise. We can assume this fixed noise will stay constant, so this step is only needed once: The resulting image can be reused for all future low-light shots.

Here is how the top right part of the pattern noise (contrast adjusted) looks like for an iPhone 6:

The pattern noise for the portion of the frame displayed in the previous image

Again, we notice the grid-like texture, and even what appears to be a stuck white pixel.

Once we have the value of this dark frame noise (in the 

1
average_noise

 variable), we can simply subtract it from our shot so far, before normalizing:


1
2
3
4
average -= average_noise

output = cv2.normalize(average, <span class="hljs-keyword">None</span>, <span class="hljs-number">0</span>, <span class="hljs-number">255</span>, cv2.NORM_MINMAX)
cv2.imwrite(<span class="hljs-string">'output.png'</span>, output)

Here is our final photo:

One more image of the photo, this time with absolutely no evidence of having been taken in low light

High Dynamic Range

Another limitation that a small (mobile) camera has is its small dynamic range, meaning the range of light intensities at which it can capture details is rather small.

In other words, the camera is able to capture only a narrow band of the light intensities from a scene; the intensities below that band appear as pure black, while the intensities above it appear as pure white, and any details are lost from those regions.

However, there is a trick that the camera (or photographer) can use—and that is adjusting the exposure time (the time the sensor is exposed to the light) in order to control the total amount of light that gets to the sensor, effectively shifting the range up or down in order to capture the most appropriate range for a given scene.

But this is a compromise. Many details fail to make it into the final photo. In the two images below, we see the same scene captured with different exposure times: a very short exposure (1/1000 sec), a medium exposure (1/50 sec) and a long exposure (1/4 sec).

Three versions of the same image of flowers, one so dark that most of the photo is black, one normal-looking, albeit with slightly unfortunate lighting, and a third with the light cranked up so high that it's hard to see the flowers in the foreground

As you can see, neither of the three images is able to capture all the available details: The filament of the lamp is visible only in the first shot, and some of the flower details are visible either in the middle or the last shot but not both.

The good news is that there’s is something we can do about it, and again it involves building on multiple shots with a bit of Python code.

The approach we’ll take is based on the work of Paul Debevec et al., who describes the method in his paper here. The method works like this:

First, it requires multiple shots of the same scene (stationary) but with different exposure times. Again, as in the previous case, we need a tripod or support to make sure the camera does not move at all. We also need a manual shooting app (if using a phone) so that we can control the exposure time and prevent camera auto-adjustments. The number of shots required depends on the range of intensities present in the image (from three upwards), and the exposure times should be spaced across that range so that the details we are interested in preserving show up clearly in at least one shot.

Next, an algorithm is used to reconstruct the response curve of the camera based on the color of the same pixels across the different exposure times. This basically lets us establish a map between the real scene brightness of a point, the exposure time, and the value that the corresponding pixel will have in the captured image. We will use the implementation of Debevec’s method from the OpenCV library.


1
2
3
4
5
6
7
8
9
<span class="hljs-comment"># Read all the files with OpenCV</span>
files = [<span class="hljs-string">'1.jpg'</span>, <span class="hljs-string">'2.jpg'</span>, <span class="hljs-string">'3.jpg'</span>, <span class="hljs-string">'4.jpg'</span>, <span class="hljs-string">'5.jpg'</span>]
images = list([cv2.imread(f) <span class="hljs-keyword">for</span> f <span class="hljs-keyword">in</span> files])
<span class="hljs-comment"># Compute the exposure times in seconds</span>
exposures = np.float32([<span class="hljs-number">1.</span> / t <span class="hljs-keyword">for</span> t <span class="hljs-keyword">in</span> [<span class="hljs-number">1000</span>, <span class="hljs-number">500</span>, <span class="hljs-number">100</span>, <span class="hljs-number">50</span>, <span class="hljs-number">10</span>]])

<span class="hljs-comment"># Compute the response curve</span>
calibration = cv2.createCalibrateDebevec()
response = calibration.process(images, exposures)

The response curve looks something like this:

A graph displaying the response curve as pixel exposure (log) over pixel value

On the vertical axis, we have the cumulative effect of the scene brightness of a point and the exposure time, while on the horizontal axis we have the value (0 to 255 per channel) the corresponding pixel will have.

This curve than allows us to perform the reverse operation (which is the next step in the process)—given the pixel value and the exposure time, we can compute the real brightness of each point in the scene. This brightness value is called irradiance, and it measures the amount of light energy that falls on a unit of sensor area. Unlike the image data, it is represented using floating point numbers because it reflects a much wider range of values (hence, high dynamic range). Once we have the irradiance image (the HDR image) we can simply save it:


1
2
3
4
5
6
<span class="hljs-comment"># Compute the HDR image</span>
merge = cv2.createMergeDebevec()
hdr = merge.process(images, exposures, response)

<span class="hljs-comment"># Save it to disk</span>
cv2.imwrite(<span class="hljs-string">'hdr_image.hdr'</span>, hdr)

For those of us lucky enough to have an HDR display (which is getting more and more common), it may be possible to visualize this image directly in all its glory. Unfortunately, the HDR standards are still in their infancy, so the process to do that may be somewhat different for different displays.

For the rest of us, the good news is that we can still take advantage of this data, although a normal display requires the image to have byte value (0-255) channels. While we need to give up some of the richness of the irradiance map, at least we have the control over how to do it.

This process is called tone-mapping and it involves converting the floating point irradiance map (with a high range of values) to a standard byte value image. There are techniques to do that so that many of the extra details are preserved. Just to give you an example of how this can work, imagine that before we squeeze the floating point range into byte values, we enhance (sharpen) the edges that are present in the HDR image. Enhancing these edges will help preserve them (and implicitly the detail they provide) also in the low dynamic range image.

OpenCV provides a set of these tone-mapping operators, such as Drago, Durand, Mantiuk or Reinhardt. Here is an example of how one of these operators (Durand) can be used and of the result it produces.


1
2
3
4
5
6
durand = cv2.createTonemapDurand(gamma=<span class="hljs-number">2.5</span>)
ldr = durand.process(hdr)

<span class="hljs-comment"># Tonemap operators create floating point images with values in the 0..1 range</span>
<span class="hljs-comment"># This is why we multiply the image with 255 before saving</span>
cv2.imwrite(<span class="hljs-string">'durand_image.png'</span>, ldr * <span class="hljs-number">255</span>)

The result of the above computation displayed as an image

Using Python, you can also create your own operators if you need more control over the process. For instance, this is the result obtained with a custom operator that removes intensities that are represented in very few pixels before shrinking the value range to 8 bits (followed by an auto-contrast step):

The image that results from following the above process

And here is the code for the above operator:


1
2
3
4
5
6
7
8
9
10
11
12
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">countTonemap</span><span class="hljs-params">(hdr, min_fraction=<span class="hljs-number">0.0005</span>)</span>:</span>
    counts, ranges = np.histogram(hdr, <span class="hljs-number">256</span>)
    min_count = min_fraction * hdr.size
    delta_range = ranges[<span class="hljs-number">1</span>] - ranges[<span class="hljs-number">0</span>]

    image = hdr.copy()
    <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(len(counts)):
        <span class="hljs-keyword">if</span> counts[i] &lt; min_count:
            image[image &gt;= ranges[i + <span class="hljs-number">1</span>]] -= delta_range
            ranges -= delta_range

    <span class="hljs-keyword">return</span> cv2.normalize(image, <span class="hljs-keyword">None</span>, <span class="hljs-number">0</span>, <span class="hljs-number">1</span>, cv2.NORM_MINMAX)

Conclusion

We’ve seen how with a bit of Python and a couple supporting libraries, we can push the limits of the physical camera in order to improve the end result. Both examples we’ve discussed use multiple low-quality shots to create something better, but there are many other approaches for different problems and limitations.

While many camera phones have store or built-in apps that address these particular examples, it is clearly not difficult at all to program these by hand and to enjoy the higher level of control and understanding that can be gained.

If you’re interested in image computations on a mobile device, check out OpenCV Tutorial: Real-time Object Detection Using MSER in iOS by fellow Toptaler and elite OpenCV developer Altaibayar Tseveenbayar.

Python Class Attributes: An Overly Thorough Guide

This article is originally published at Toptal.

I had a programming interview recently, a phone-screen in which we used a collaborative text editor.

I was asked to implement a certain API, and chose to do so in Python. Abstracting away the problem statement, let’s say I needed a class whose instances stored some 

1
data

 and some 

1
other_data

.

I took a deep breath and started typing. After a few lines, I had something like this:


1
2
3
4
5
6
class Service(object):
    data = []

    def __init__(self, other_data):
        self.other_data = other_data
    ...

My interviewer stopped me:

  • Interviewer: “That line: 
    1
    data = []

    . I don’t think that’s valid Python?”

  • Me: “I’m pretty sure it is. It’s just setting a default value for the instance attribute.”
  • Interviewer: “When does that code get executed?”
  • Me: “I’m not really sure. I’ll just fix it up to avoid confusion.”

For reference, and to give you an idea of what I was going for, here’s how I amended the code:


1
2
3
4
5
6
class Service(object):

    def __init__(self, other_data):
        self.data = []
        self.other_data = other_data
    ...

As it turns out, we were both wrong. The real answer lay in understanding the distinction between Python class attributes and Python instance attributes.

Python class attributes vs. Python instance attributes

Note: if you have an expert handle on class attributes, you can skip ahead to use cases.

Python Class Attributes

My interviewer was wrong in that the above code is syntactically valid.

I too was wrong in that it isn’t setting a “default value” for the instance attribute. Instead, it’s defining 

1
data

 as a class attribute with value 

1
[]

.

In my experience, Python class attributes are a topic that many people know something about, but few understand completely.

Python Class Variable vs. Instance Variable: What’s the Difference?

A Python class attribute is an attribute of the class (circular, I know), rather than an attribute of an instance of a class.

Let’s use a Python class example to illustrate the difference. Here, 

1
class_var

 is a class attribute, and 

1
i_var

 is an instance attribute:


1
2
3
4
5
class MyClass(object):
    class_var = 1

    def __init__(self, i_var):
        self.i_var = i_var

Note that all instances of the class have access to 

1
class_var

, and that it can also be accessed as a property of the class itself:


1
2
3
4
5
6
7
8
9
foo = MyClass(2)
bar = MyClass(3)

foo.class_var, foo.i_var
## 1, 2
bar.class_var, bar.i_var
## 1, 3
MyClass.class_var ## &lt;— This is key
## 1

For Java or C++ programmers, the class attribute is similar—but not identical—to the static member. We’ll see how they differ later.

Class vs. Instance Namespaces

To understand what’s happening here, let’s talk briefly about Python namespaces.

namespace is a mapping from names to objects, with the property that there is zero relation between names in different namespaces. They’re usually implemented as Python dictionaries, although this is abstracted away.

Depending on the context, you may need to access a namespace using dot syntax (e.g., 

1
object.name_from_objects_namespace

) or as a local variable (e.g., 

1
object_from_namespace

). As a concrete example:


1
2
3
4
5
6
7
8
9
10
class MyClass(object):
    ## No need for dot syntax
    class_var = 1

    def __init__(self, i_var):
        self.i_var = i_var

## Need dot syntax as we've left scope of class namespace
MyClass.class_var
## 1

Python classes and instances of classes each have their own distinct namespaces represented by pre-defined attributes 

1
MyClass.__dict__

 and 

1
 

, respectively.

When you try to access an attribute from an instance of a class, it first looks at its instance namespace. If it finds the attribute, it returns the associated value. If not, it then looks in the class namespace and returns the attribute (if it’s present, throwing an error otherwise). For example:


1
2
3
4
5
6
7
8
9
10
foo = MyClass(2)

## Finds i_var in foo's instance namespace
foo.i_var
## 2

## Doesn't find class_var in instance namespace…
## So look's in class namespace (MyClass.__dict__)
foo.class_var
## 1

The instance namespace takes supremacy over the class namespace: if there is an attribute with the same name in both, the instance namespace will be checked first and its value returned. Here’s a simplified version of the code (source) for attribute lookup:


1
2
3
4
5
6
def instlookup(inst, name):
    ## simplified algorithm...
    if inst.__dict__.has_key(name):
        return inst.__dict__[name]
    else:
        return inst.__class__.__dict__[name]

And, in visual form:

attribute lookup in visual form

How Class Attributes Handle Assignment

With this in mind, we can make sense of how Python class attributes handle assignment:

  • If a class attribute is set by accessing the class, it will override the value for all instances. For example:
    
    
    1
    2
    3
    4
    5
    6
    foo = MyClass(2)
    foo.class_var
    ## 1
    MyClass.class_var = 2
    foo.class_var
    ## 2

    At the namespace level… we’re setting 

    1
    MyClass.__dict__['class_var'] = 2

    . (Note: this isn’t the exact code(which would be 

    1
    setattr(MyClass, 'class_var', 2)

    ) as 

    1
    __dict__

     returns a dictproxy, an immutable wrapper that prevents direct assignment, but it helps for demonstration’s sake). Then, when we access 

    1
    foo.class_var

    1
    class_var

     has a new value in the class namespace and thus 2 is returned.

  • If a Paython class variable is set by accessing an instance, it will override the value only for that instance. This essentially overrides the class variable and turns it into an instance variable available, intuitively, only for that instance. For example:
    
    
    1
    2
    3
    4
    5
    6
    7
    8
    foo = MyClass(2)
    foo.class_var
    ## 1
    foo.class_var = 2
    foo.class_var
    ## 2
    MyClass.class_var
    ## 1

    At the namespace level… we’re adding the 

    1
    class_var

     attribute to 

    1
    foo.__dict__

    , so when we lookup 

    1
    foo.class_var

    , we return 2. Meanwhile, other instances of 

    1
    MyClass

     will not have 

    1
    class_var

     in their instance namespaces, so they continue to find 

    1
    class_var

     in 

    1
    MyClass.__dict__

     and thus return 1.

Mutability

Quiz question: What if your class attribute has a mutable type? You can manipulate (mutilate?) the class attribute by accessing it through a particular instance and, in turn, end up manipulating the referenced object that all instances are accessing (as pointed out by Timothy Wiseman).

This is best demonstrated by example. Let’s go back to the 

1
Service

 I defined earlier and see how my use of a class variable could have led to problems down the road.


1
2
3
4
5
6
class Service(object):
    data = []

    def __init__(self, other_data):
        self.other_data = other_data
    ...

My goal was to have the empty list (

1
[]

) as the default value for 

1
data

, and for each instance of 

1
Service

 to have its own data that would be altered over time on an instance-by-instance basis. But in this case, we get the following behavior (recall that 

1
Service

 takes some argument 

1
other_data

, which is arbitrary in this example):


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
s1 = Service(['a', 'b'])
s2 = Service(['c', 'd'])

s1.data.append(1)

s1.data
## [1]
s2.data
## [1]

s2.data.append(2)

s1.data
## [1, 2]
s2.data
## [1, 2]

This is no good—altering the class variable via one instance alters it for all the others!

At the namespace level… all instances of 

1
Service

 are accessing and modifying the same list in 

1
Service.__dict__

 without making their own 

1
data

 attributes in their instance namespaces.

We could get around this using assignment; that is, instead of exploiting the list’s mutability, we could assign our 

1
Service

 objects to have their own lists, as follows:


1
2
3
4
5
6
7
8
9
10
s1 = Service(['a', 'b'])
s2 = Service(['c', 'd'])

s1.data = [1]
s2.data = [2]

s1.data
## [1]
s2.data
## [2]

In this case, we’re adding 

1
s1.__dict__['data'] = [1]

, so the original 

1
Service.__dict__['data']

 remains unchanged.

Unfortunately, this requires that 

1
Service

 users have intimate knowledge of its variables, and is certainly prone to mistakes. In a sense, we’d be addressing the symptoms rather than the cause. We’d prefer something that was correct by construction.

My personal solution: if you’re just using a class variable to assign a default value to a would-be Python instance variable, don’t use mutable values. In this case, every instance of 

1
Service

 was going to override 

1
Service.data

 with its own instance attribute eventually, so using an empty list as the default led to a tiny bug that was easily overlooked. Instead of the above, we could’ve either:

  1. Stuck to instance attributes entirely, as demonstrated in the introduction.
  2. Avoided using the empty list (a mutable value) as our “default”:
    
    
    1
    2
    3
    4
    5
    6
    class Service(object):
        data = None

        def __init__(self, other_data):
            self.other_data = other_data
        ...

    Of course, we’d have to handle the 

    1
    None

     case appropriately, but that’s a small price to pay.

So When Should you Use Python Class Attributes?

Class attributes are tricky, but let’s look at a few cases when they would come in handy:

  1. Storing constants. As class attributes can be accessed as attributes of the class itself, it’s often nice to use them for storing Class-wide, Class-specific constants. For example:
    
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    class Circle(object):
        pi = 3.14159

        def __init__(self, radius):
            self.radius = radius

        def area(self):
            return Circle.pi * self.radius * self.radius

    Circle.pi
    ## 3.14159

    c = Circle(10)
    c.pi
    ## 3.14159
    c.area()
    ## 314.159
  2. Defining default values. As a trivial example, we might create a bounded list (i.e., a list that can only hold a certain number of elements or fewer) and choose to have a default cap of 10 items:
    
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    class MyClass(object):
        limit = 10

        def __init__(self):
            self.data = []

        def item(self, i):
            return self.data[i]

        def add(self, e):
            if len(self.data) &gt;= self.limit:
                raise Exception("Too many elements")
            self.data.append(e)

    MyClass.limit
    ## 10

    We could then create instances with their own specific limits, too, by assigning to the instance’s 

    1
    limit

    attribute.

    
    
    1
    2
    3
    foo = MyClass()
    foo.limit = 50
    ## foo can now hold 50 elements—other instances can hold 10

    This only makes sense if you will want your typical instance of 

    1
    MyClass

     to hold just 10 elements or fewer—if you’re giving all of your instances different limits, then 

    1
    limit

     should be an instance variable. (Remember, though: take care when using mutable values as your defaults.)

  3. Tracking all data across all instances of a given class. This is sort of specific, but I could see a scenario in which you might want to access a piece of data related to every existing instance of a given class.To make the scenario more concrete, let’s say we have a 
    1
    Person

     class, and every person has a 

    1
    name

    . We want to keep track of all the names that have been used. One approach might be to iterate over the garbage collector’s list of objects, but it’s simpler to use class variables.

    Note that, in this case, 

    1
    names

     will only be accessed as a class variable, so the mutable default is acceptable.

    
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    class Person(object):
        all_names = []

        def __init__(self, name):
            self.name = name
            Person.all_names.append(name)

    joe = Person('Joe')
    bob = Person('Bob')
    print Person.all_names
    ## ['Joe', 'Bob']

    We could even use this design pattern to track all existing instances of a given class, rather than just some associated data.

    
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    class Person(object):
        all_people = []

        def __init__(self, name):
            self.name = name
            Person.all_people.append(self)

    joe = Person('Joe')
    bob = Person('Bob')
    print Person.all_people
    ## [&lt;__main__.Person object at 0x10e428c50&gt;, &lt;__main__.Person object at 0x10e428c90&gt;]
  4. Performance (sort of… see below).

Under-the-hood

Note: If you’re worrying about performance at this level, you might not want to be use Python in the first place, as the differences will be on the order of tenths of a millisecond—but it’s still fun to poke around a bit, and helps for illustration’s sake.

Recall that a class’s namespace is created and filled in at the time of the class’s definition. That means that we do just one assignment—ever—for a given class variable, while instance variables must be assigned every time a new instance is created. Let’s take an example.


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
def called_class():
    print "Class assignment"
    return 2

class Bar(object):
    y = called_class()

    def __init__(self, x):
        self.x = x

## "Class assignment"

def called_instance():
    print "Instance assignment"
    return 2

class Foo(object):
    def __init__(self, x):
        self.y = called_instance()
        self.x = x

Bar(1)
Bar(2)
Foo(1)
## "Instance assignment"
Foo(2)
## "Instance assignment"

We assign to 

1
Bar.y

 just once, but 

1
instance_of_Foo.y

 on every call to 

1
__init__

.

As further evidence, let’s use the Python disassembler:


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
import dis

class Bar(object):
    y = 2

    def __init__(self, x):
        self.x = x

class Foo(object):
    def __init__(self, x):
        self.y = 2
        self.x = x

dis.dis(Bar)
##  Disassembly of __init__:
##  7           0 LOAD_FAST                1 (x)
##              3 LOAD_FAST                0 (self)
##              6 STORE_ATTR               0 (x)
##              9 LOAD_CONST               0 (None)
##             12 RETURN_VALUE

dis.dis(Foo)
## Disassembly of __init__:
## 11           0 LOAD_CONST               1 (2)
##              3 LOAD_FAST                0 (self)
##              6 STORE_ATTR               0 (y)

## 12           9 LOAD_FAST                1 (x)
##             12 LOAD_FAST                0 (self)
##             15 STORE_ATTR               1 (x)
##             18 LOAD_CONST               0 (None)
##             21 RETURN_VALUE

When we look at the byte code, it’s again obvious that 

1
Foo.__init__

 has to do two assignments, while 

1
Bar.__init__

 does just one.

In practice, what does this gain really look like? I’ll be the first to admit that timing tests are highly dependent on often uncontrollable factors and the differences between them are often hard to explain accurately.

However, I think these small snippets (run with the Python timeit module) help to illustrate the differences between class and instance variables, so I’ve included them anyway.

Note: I’m on a MacBook Pro with OS X 10.8.5 and Python 2.7.2.

Initialization


1
2
10000000 calls to `Bar(2)`: 4.940s
10000000 calls to `Foo(2)`: 6.043s

The initializations of 

1
Bar

 are faster by over a second, so the difference here does appear to be statistically significant.

So why is this the case? One speculative explanation: we do two assignments in 

1
Foo.__init__

, but just one in 

1
Bar.__init__

.

Assignment


1
2
3
4
10000000 calls to `Bar(2).y = 15`: 6.232s
10000000 calls to `Foo(2).y = 15`: 6.855s
10000000 `Bar` assignments: 6.232s - 4.940s = 1.292s
10000000 `Foo` assignments: 6.855s - 6.043s = 0.812s

Note: There’s no way to re-run your setup code on each trial with timeit, so we have to reinitialize our variable on our trial. The second line of times represents the above times with the previously calculated initialization times deducted.

From the above, it looks like 

1
Foo

 only takes about 60% as long as 

1
Bar

 to handle assignments.

Why is this the case? One speculative explanation: when we assign to 

1
Bar(2).y

, we first look in the instance namespace (

1
Bar(2).__dict__[y]

), fail to find 

1
y

, and then look in the class namespace (

1
Bar.__dict__[y]

), then making the proper assignment. When we assign to 

1
Foo(2).y

, we do half as many lookups, as we immediately assign to the instance namespace (

1
Foo(2).__dict__[y]

).

In summary, though these performance gains won’t matter in reality, these tests are interesting at the conceptual level. If anything, I hope these differences help illustrate the mechanical distinctions between class and instance variables.

In Conclusion

Class attributes seem to be underused in Python; a lot of programmers have different impressions of how they work and why they might be helpful.

My take: Python class variables have their place within the school of good code. When used with care, they can simplify things and improve readability. But when carelessly thrown into a given class, they’re sure to trip you up.

Appendix: Private Instance Variables

One thing I wanted to include but didn’t have a natural entrance point…

Python doesn’t have private variables so-to-speak, but another interesting relationship between class and instance naming comes with name mangling.

In the Python style guide, it’s said that pseudo-private variables should be prefixed with a double underscore: ‘__’. This is not only a sign to others that your variable is meant to be treated privately, but also a way to prevent access to it, of sorts. Here’s what I mean:


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
class Bar(object):
    def __init__(self):
    self.__zap = 1

a = Bar()
a.__zap
## Traceback (most recent call last):
##   File "&lt;stdin&gt;", line 1, in &lt;module&gt;
## AttributeError: 'Bar' object has no attribute '__baz'

## Hmm. So what's in the namespace?
a.__dict__
{'_Bar__zap': 1}
a._Bar__zap
## 1

Look at that: the instance attribute 

1
__zap

 is automatically prefixed with the class name to yield 

1
_Bar__zap

.

While still settable and gettable using 

1
a._Bar__zap

, this name mangling is a means of creating a ‘private’ variable as it prevents you and others from accessing it by accident or through ignorance.

Edit: as Pedro Werneck kindly pointed out, this behavior is largely intended to help out with subclassing. In the PEP 8 style guide, they see it as serving two purposes: (1) preventing subclasses from accessing certain attributes, and (2) preventing namespace clashes in these subclasses. While useful, variable mangling shouldn’t be seen as an invitation to write code with an assumed public-private distinction, such as is present in Java.