Skip to content

Data Structures In Python – Arrays | Create `em From Scratch

ArraysTN

Lets begin with our first post on data structures and algorithms and here we’ll mainly talk everything about arrays. We shall also create this data structure from scratch without using any built in libraries.

Data structures and algorithms form a major part of interviews in any top companies. Knowing them not only help you crack one but also understand how the basics of computer program work.

What to expect from this blog ?

  • Arrays in detail.
  • Different types of arrays.
  • How python manages memory while array operations.
  • Create our own array from scratch 🙂


1. Arrays In Detail

Just before we begin with arrays in detail , It is always awesome to know how our computer stores the information in memory.
Remember :

  • The smallest measure of information size is bits.
  • In computers , we normally measure smallest size of data in bytes. One byte is equal to 8 bits.
  • For every information stored in memory as bytes , they have a unique ID for themselves. i.e Consider word ‘A B’. A shall have a ID XXXX followed by B YYYY.
  • All the information storage and retrieval happens through this unique ID ( index ) . Our computer hardwares are designed such a way to support these operations. When we write a program , say python or Scala , these programs keep a track on the storages and indexes.
  • So finally , all these info are written to or read from RANDOM ACCESS MEMORY or RAM.

So , lets finally begin with arrays in python 😛


1.1 Does python have arrays ?

If we have some experience in python , we probably never hear name of “arrays” when we code. We instead hear names like lists , tuples. These are nothing but arrays in python. So henceforth when we refer arrays , in python it directly references lists or tuples.


1.2 Low level arrays Vs Referential arrays

Python usually has 2 types of arrays. Low level arrays usually means data will be stored as bytes in regular memory blocks whereas referential has concept of pointers. We shall discuss this in detail below to make concepts clear.

Let’s begin with low level arrays.
Here, python internally represents each character as 2 bytes i.e 16 bits. Consider word “Sample“.

sample
sample

Observe word “S” , it has 2 indexes 2146 & 2147. This means one word is represented in 2 bytes. Total of 6 characters are stored as 12 bytes.

Now imagine we need to store 1000 employees with their employee IDs. Though we can use the above method but it’s very inefficient.
Here , referential arrays come into picture. In referential arrays we store data as shown below,

ref_arrays
ref_arrays

We can notice every index will be pointed to a specific object. This is how referential arrays work.
Python is intelligent enough to point duplicates to same index which helps to save memory. Checkout below.

ref_arrays
ref_arrays_zeros

When your array is of size 8 and all the values are zeros , Python will reference it as above to save huge memory.
So , now what if one portion of array is updated ? Python manages it this way !

ref_arrays
ref_arrays_updates

A new referential object is created for an updated value only.

Isn’t it awesome ? Now lets consider one last scenario of lists getting extended. The memory blocks will look as below.

ref_arrays
ref_arrays_extended

Python assigns a new index list when a list is extended. i.e it would have a primary link to new index slab and secondary link to existing index slab as shown above. This helps immensely when performing update or alter operations to happen efficiently. This process continues as and when an array is extended.

If you couldn’t catch this , hang on ! The next section may help you get an idea.


1.3 What are dynamic arrays ? A game changer for python perf.

So we have gone though multiple ways an array is stored in memory. Now , in real practical world , we often use arrays to store huge amount of data and python has to consistently extend its memory allocation.

Though it all happens automatically , lets get an idea on how does python do it all by itself leaving less load on manual garbage collection.

DynamicArrays
DynamicArrays

Well , lets see a scenario of inserting elements 1-10 in an empty array. The dynamic nature of python arrays work following way.

  • Insert 1 and the default memory assigned to the list would be 2 byte ( 1 block ). This results in overflow.
  • Now python interpreter creates a FRESH array with doubles the memory to 4 bytes ( 2 blocks ). The content of 1 are copied to this fresh array. The old array undergo garbage collection. But yet , this array overflows.
  • Python interpreter creates another FRESH array of double the size with 4 blocks and copies the older content to new fresh array. Older array is discarded.
  • The process continues till the execution ends.

Now we know , how does a real memory management happens in python. The concept is pretty simple.

Once a list or tuple memory overflows , the interpreter creates a new list or tuple of double size and copies everything to the newly created fresh array. The older array is discarded.


2. Creating Arrays From Scratch

Here we try to create our own data structure ( similar to list ) using python.

import ctypes

class OurOwnList(object):
    
    def __init__(self):
        # Count actual elements (Default is 0)
        self.n = 0

        # Default Capacity
        self.capacity = 1 
        self.A = self.make_array(self.capacity)
        
    def __len__(self):
        """
        Return number of elements sorted in array
        """
        return self.n
    
    def __getitem__(self,k):
        """
        Return element at index k
        """
        if not 0 <= k <self.n:
            # Check it k index is in bounds of array
            return IndexError('K is out of bounds!') 

        #Retrieve from array at index k
        return self.A[k] 
        
    def append(self, ele):
        """
        Add element to end of the array
        """

        if self.n == self.capacity:
            self._resize(2*self.capacity) #Double capacity once overflow
        
        self.A[self.n] = ele #Set self.n index to element
        self.n += 1
        
    def _resize(self,new_cap):
        """
        Resize internal array to capacity new_cap
        """
        
        B = self.make_array(new_cap) # New bigger array
        
        for k in range(self.n): # Reference all existing values
            B[k] = self.A[k]
            
        self.A = B # Call A the fresh array
        self.capacity = new_cap # Reset the capacity
        
    def make_array(self,new_cap):
        """
        Returns a new array with new_cap capacity
        """
        return (new_cap * ctypes.py_object)()

I’ve included comments on functionality of every functions above. It’s okay if you don’t catch everything but this was an attempt to create our own data structures from scratch.

Lets verify the output of our new data structure OurOwnList 🙂

# Instantiate the class
arr = OurOwnList()
# Append new element
arr.append(10)
# Append another element
arr.append(12)
# Check array length
len(arr)
#Output : 2
# accessing the elements via index
arr[1]
#Output : 12

Phew ! it worked.

3. Accessing Elements In An Array – Python Specific

We have to remember , all the python arrays start from index ZERO ( 0 ).
If we see any array sequence with square brackets ” [ ]” , they are lists and one with “( )” are tuples.

Ex : a = [1,2,3]. is a list. whereas (1,2,3) is a tuple. ( Notice the brackets )

To access any elements , we can simply use a( 0 ) for first element a( 1 ) for second and so on !
The last element can be accessed using a[-1].

You may read more about array slicing here.

You may read continuation of data structure series here!
Trust me , you may perhaps not find more simpler explanations anywhere 🙂


4. Wrapping Off

That was an exciting journey of how the underlying logic of python arrays work. We also covered how the python interpreter manages memory and discards items that isn’t needed anymore.

Though most of the garbage collections are automatically taken care by python , it’s always great to know how are data structures of python language developed.

Thanks for reading !

You may not want to miss these exciting posts :