Python - Bytecodes and Types

We got started in Python with simple assignment statements and the print statement in the last entry.
In this one, we'll examine the some Python innards and its type system.

Let us examine how Python converts your code into a machine-readable form.

In languages like C or C++, it is the compiler's task is to create efficient machine code for the working architecture (usually accompanied by the generation intermediate stage assembly and object code). These languages interact directly with the underlying architecture which make them hardware-dependent but clearly quite efficient.

Contrast this with Java or Python where the first stage of code generation ends with what is called bytecode. Bytecode is code written for an abstracted virtual machine whose instruction set provides a convenient intermediary between the pretty high-level where these languages operate and the low-lying system hardware. This abstracted VM introduces an extra layer of complexity to the process but we get more portable code.

While it is popular to quote Python as being an interpreted language it isn't a pure interpreted language (like the original BASIC implementation). Python as a language has several popular implementations and while each of them follows a similar code generation process they differ in the tools to do so. The 'mainstream' (if any such word is indeed applicable) implementation called CPython is first compiled to bytecode and then the bytecode is interpreted to native machine code. Other implementations are Jython (code compiled to Java bytecode and run on a Java VM), IronPython or PyPy (compiled and run on a JIT compiler).

You can see this SO thread for more discussion on this topic. Some points made there are especially good.
There are several relevant resources on related topics, just Google something like 'python bytecode' or 'python implementations'

Let's now move on to Python's type system. Two relevant terms in this context as applicable to Python are dynamic type-checking and strongly-typed.

I misquoted Python as being a weakly-typed language in my last post - a mistake stemming from my own ignorance. I'm sorry for it and have rectified the error since.

While the definition of 'weakly-typed' as quoted in the last post still holds Python doesn't exactly fit that criterion. The variable is type-less unless explicitly bound to a value. This is often referred to as 'duck typing' - an object is classified based on its attributes. When we say Python has a strong-type system we mean that Python has a notion of distinctness between separate types. It doesn't allow a type to morph into another type by itself. The programmer needs to specify his/her intentions to change the type by casting objects to the unrelated type.
(Thanks to this SO thread.)

Dynamic typing means the language itself does not force the programmer to define the types of variables. The language assigns the variables types based on their values at runtime. Thus, value assignments decide how Python deduces the variable type.
Note how this markedly differs from statically-typed languages like C where variables are accorded primary status in deciding the type.

Remark (a): This does not mean that any seemingly type-less language is dynamically typed - Haskell, for example. (Also see 'type-inference in statically-typed languages')

Remark (b): Dynamic-checking also introduces a huge possible source of type-related bugs in the program. As demonstrated in the previous post it is ridiculously easy to mess up variables with impossible values. Lesson: Use with caution.

The syntax to check the type of any variable is very simple:
>>> a = 'Hello'  # 'a' is a string
>>> type(a)      # determine type
[COLOR="Blue"]<type 'str'>[/COLOR]
>>> b = a * 4   
>>> print b      # weakly-typed
>>> print a*4.0
[COLOR="Red"]TypeError: can't multiply sequence by non-int 
of type 'float'[/COLOR]
See how the shell throws a TypeError when we pass a floating-value to manipulate a string? This serves as an example to show that while inter-type manipulations are theoretically allowed not all types can be achieved with equal ease. We'll see more of this when we see Python sequences.

Fundamentally, Python's data types can be broken down into int's, floats, strings (the 'str' type), bool and the 'NoneType' besides the advanced data-types like lists, tuples, sets and dictionaries.
Most of these data-types should be familiar to everyone. The 'NoneType' is a placeholder type usually used to denote an empty value - much like the 'Null' in C.

PS: I originally intended to cover types and move on to writing functional scripts in this entry with a little aside on Python's interpreter but apparently I'm not good with brevity. In the interest of keeping this post short I'll extend that discussion to the next one.

As usual please feel free to suggest any modifications or point out any errors whatsoever.

Thanks for reading!

Blog entry information

Last update

More entries in General

More entries from ActivePower

Share this entry