Python for data analysis

Why Python?

Python is a powerful programming language when considering portability, flexibility, syntax, style, and extendability. The language was written by Guido van Rossum with clean syntax built in. To define a function or initiate a loop, indentation is used instead of brackets. The result is profound: a Python programmer can look at any given uncommented Python code and quickly understand its inner workings and purpose. From the business perspective, the research tools built on Python are cheaper, faster (in development time) and easier to maintain.

Compiled languages like Fortran and C are natively much faster than Python, but not necessarily so when Python is bound to them. Using packages like Cython enables Python to interface with C code and pass information from the C program to Python and vice versa through memory. This allows Python to be on par with the faster languages when necessary and to use legacy code (e.g., FFTW). The combination of Python with fast computation has attracted scientists and others in large numbers. Two packages in particular are the powerhouses of scientific Python: NumPy and SciPy. Additionally, these two packages makes integrating legacy code easy.

NumPy and SciPy

The basic operations used in scientific programming include arrays, matrices, integration, differential equation solvers, statistics, and much more. Python, by default, does not have any of these functionalities built in, except for some basic mathematical operations that can only deal with a variable and not an array or matrix. NumPy and SciPy are two powerful Python packages, however, that enable the language to be used efficiently for scientific purposes.

NumPy specializes in numerical processing through multi-dimensional ndarrays , where the arrays allow element-by-element operations, a.k.a. broadcasting. If needed, linear algebra formalism can be used without modifying the NumPy arrays beforehand. Moreover, the arrays can be modified in size dynamically. This takes out the worries that usually mire quick programming in other languages. Rather than creating a new array when you want to get rid of certain elements, you can apply a mask to it.

SciPy is built on the NumPy array framework and takes scientific programming to a whole new level by supplying advanced mathematical functions like integration, ordinary differential equation solvers, special functions, optimizations, and more. To list all the functions by name in SciPy would take several pages at minimum.

Together both packages form a basis for all tools we will be using for our analytical purposes.


BISFERA is a research and development project of Semko Software Development aimed at creating solutions for systematic understanding & profit from Big Text Data.

These solutions are intended for smart decisions in business & politics.

Our aim is most of all to extract & quantify subtle changes of relations, sentiments, trends and intents in (huge) groups of people.