Data Structures & Algorithms

Werkkk

LinkedLists

Definition of a Node

class Node:
    def __init__(self, val=0, next=None, previous=None):
        self.val = val
        self.next = next
        self.previous = previous # if Doubly Linked

- Node holds two (or three) things:
    - the data
    - pointer to the next node
    - pointer to previous node (if doubly linked list)

Definition of Singly LinkedList

class LinkedList:
    def __init__(self, items=None):
        self.head = None
        self.tail = None

        if items:
            for item in items:
                self.append(item)

    def prepend(self, val):
        node = Node(val)
        if not self.head: # empty LL -> node set to head & tail 
            self.tail = node
        else: 
            node.next = self.head
        self.head = node 

    def append(self, val):
        node = Node(val)
        if not self.tail: # empty LL -> node set to head & tail 
            self.head = node
        else:
            self.tail.next = node 
        self.tail = node
    # ...

- LinkedList init with head & tail node, if exists
- If init with array, create the list with append()

The most common variants of linked lists are:

	`next`	`previous`	if Circular:
Singly Linked	☑️		`tail.next` -> `head`
Doubly Linked	☑️	☑️	`tail.next` -> `head` `head.previous` -> `tail`

Terminology


node	position in LinkedList containing the `value` of whatever is stored at the position and at least one reference to another node (`next`)
head	node at `beginning` of list
tail	node at `end` of the list
sentinel	a `dummy` node, typically placed at the head or end of the list to help make operations simpler (e.g., delete) or to indicate the termination of list

Time & Space Complexity

	Best	Worst
Occurs when `node` at	`head`	`last`
Accessing / Search	O(1)	O(N)
Inserting at `head`	O(1)	O(N)
Deleting at `head`	O(1)	O(N)

Strategy: Dummy Head

Delete

    def delete(self, val):
        dummy = Node("sentinel")
        dummy.next = self
        placeholder = dummy
        current = self
        while current:
            if current.val == val:
                placeholder.next = current.next
                return dummy.next
            placeholder = current
            current = current.next
        return dummy.next

1. Create `dummy` node
    - `dummy.next` points to `head`
2. Set `placeholder` to `dummy` and `current` to `head`
3. Iterate through `current` by:
    - Set `placeholder` to `current` node
    - Set `current` to `current.next`
5. If we find `val`:
    - set `placeholder.next` to node at `current.next`
6. Return `dummy.next`

Typically saves you creating special edge condition logic in order to operate on the head of a linked list with some algorithms.

Strategy: Multi Pass

Get Length of List

    def __len__(self):
        current = self
        length = 0
        while current:
            length += 1
            current = current.next
        return length

1. Iterate through list
2. Increment `length`
3. Set `current` to `current.next`

Most computations on a list will require O(N) time complexity, so a simple but very useful technique is to pass through the list a constant number of times to calculate some summary of the list that will simplify your algorithm.

The looping is implicitly done for you by the execution of your program, namely every successive call to your recursive function places some data on your call stack and then goes to the next function.

Note on Recursion vs. Iteration:

Recursive Length of List

    def get_length_recursive(self):
        linklist = self
        if not linklist: 
            return 0 # A None path has 0 length
        return get_length_recursive(linklist.next) + 1 
        # The length at this node is 1 + length of rest of list

1. Base Case: return `0` if `None`
2. Recurse through call stack, 
adding `+1` to previous val of `length` on call stack

Strategy: Two Pointer

Two-Pointer Solution for Detecting Cycle

def has_cycle(ll):
    if not ll or not ll.next:
        return False

    slow = ll
    fast = ll.next

    while fast and fast.next:

        if fast == slow or fast.next == slow:
            return True
        fast = fast.next.next # Because it's fast
        slow = slow.next
    return False

Similar to a race track,   
the faster pointer will eventually cross/lap the slower pointer,   
whereas if no cycle, they will never cross paths

Very useful technique for dealing with linked lists involves iterating through the list with 2 or more pointers.

The differences between how the pointers iterate can be used to make calculations on the list more efficient.

	Time	Space
Using Hashmap	O(n)	O(n)
Two Pointer	O(n)	O(1)

Explanation:

The first thing that most people think of is to use another data structure to store nodes that we have already seen as we traverse the list. Then as we move through the nodes of the list we check to see if we have already stored the current node in our auxiliary data structure and if we have then we have found a cycle. The typical data structure to choose here is a hash map because it offers constant time insertion and lookup.

We can get rid of the extra auxiliary data structure by utilizing only one additional pointer. We can then use the two pointers to iterate through the list at two different speeds. The motivation being that if there is a cycle, then the list can be thought of as a circle (at least the part of the list past the self-intersection). Similar to a race track, the faster pointer must eventually cross paths with the slower pointer, whereas if there is not a cycle they will never cross paths.

Heaps

Min Heap

           1 <- smallest
         /   \
        3      2
       / \    /
      4   6  5

import heapq

class MinHeap:
    def __init__(self, minheap): # minheap is the list that we can to convert to a heap
        heapq.heapify(minheap) # Use the heapify function to convert list to a heap
        self.minheap = minheap

    def insert(self, key):
        heapq.heappush(self.minheap, key) # Insert key into the heap (heapq automatically maintains the heap property)

    def getMin(self):
        return self.minheap[0] # Returns the smallest element of the heap in O(1) time

    def removeMin(self):
        heapq.heappop(self.minheap) # The heappop function removes the smallest element in the heap

    def printHeap(self):
        print(self.minheap) # Prints the heap

Heaps are special tree based data structures that satisfy two properties:

Ordered in a specific way:

heaps root children

Min smallest element parent <= sum of children

Max largest element parent >= sum of children
Are complete binary trees

Binary At most, two children: left and right

Complete Fills each level entirely, except the last level, which must be left

heaps	`root`	children
Min	smallest element	`parent <=` sum of `children`
Max	largest element	`parent >=` sum of `children`


Binary	At most, two children: `left` and `right`
Complete	Fills each level entirely, except the last level, which must be `left`

Heaps are useful when for getting the largest or smallest elements, and in situations where you don’t care about fast lookup, delete, or search.

Terminology

Max Heap

           6 <- largest
         /   \
        5     3
       / \    /
      4   2  1

class MaxHeap:
    def __init__(self):
        # Initialize a heap using list
        self.heap = []

    def getParentPosition(self, i):
        # The parent is located at floor((i-1)/2)
        return int((i-1)/2)

    def getLeftChildPosition(self, i):
        # The left child is located at 2 * i + 1
        return 2*i+1

    def getRightChildPosition(self, i):
        # The right child is located at 2 * i + 2
        return 2*i+2

    def hasParent(self, i):
        # This function checks if the given node has a parent or not
        return self.getParentPosition(i) < len(self.heap)

    def hasLeftChild(self, i):
        # This function checks if the given node has a left child or not
        return self.getLeftChildPosition(i) < len(self.heap)

    def hasRightChild(self, i):
        # This function checks if the given node has a right child or not
        return self.getRightChildPosition(i) < len(self.heap)

    def insert(self, key):
        self.heap.append(key) # Adds the key to the end of the list
        self.heapify(len(self.heap) - 1) # Re-arranges the heap to maintain the heap property

    def getMax(self):
        return self.heap[0] # Returns the largest value in the heap in O(1) time.

    def heapify(self, i):
        while(self.hasParent(i) and self.heap[i] > self.heap[self.getParentPosition(i)]): # Loops until it reaches a leaf node
            self.heap[i], self.heap[self.getParentPosition(i)] = self.heap[self.getParentPosition(i)], self.heap[i] # Swap the values
            i = self.getParentPosition(i) # Resets the new position

    def printHeap(self):
        print(self.heap) # Prints the heap


Bubble down	moving an element down by swapping with one of its children in proper position
Bubble up	moving an element down by swapping with one of its children in proper position
Priority Queue	queue data structure with associated priority value, higher priority is dequeue before lower priority, if same, then dequeued based on location in array

Heap Operations

Building

This complex data structure can be represented using an array. Often implemented as arrays because they are super efficient ways of representing priority queues.

Binary heaps are super efficient for implementing priority queues because it’s very easy to know and retrieve/remove the element with the highest priority: it will always be the **root* node*!

Building a heap only takes O(n) time, so you can potentially optimize a solution by building a heap from a list instead of running insertion n times to create the heap.

By property of it being a complete binary tree, we can see how the parent-child relationships are maintained in the array using these formulas:


Parent	`(n - 1) // 2`	`n ==` index of current node
`Left`	`2i + 1`	`i ==` index of parent node
`Right`	`2i + 2`

Insertion

When growing a heap, we can only ever add a node to the left-most available node, at lowest possible level.

If necessary to follow both rules of shape and order, we swap the two nodes that are out of order

Removal

When deleting or removing an element, most heaps are usually concerned with removing the root node.

In order to maintain rules of shape and order:

For shape, remove the right-most node at the lowest level, make it root
Then, for order, compare with child nodes and swap
Continue bubbling down (step 2) until no longer violating heap order property

Time Complexity

In the worst case scenario, the swapping procedure for insertions and deletions will move the element through the height of the heap.

Because heaps are binary trees that are guaranteed to be as complete as possible, the number of levels in the heap will be log n.

	Best	Worst
Reading root node	O(1)	O(1)
Insertion	O(log n)	O(log n)
Deletion	O(log n)	O(log n)
Build heap	O(n) (from list)	O(n log n) (inserting into empty heap)

Strategy: Max Heap with `heapq`

def topKFrequent(words: List[str], k: int) -> List[str]:
    """
    Time: O(n + k log n)
    Space: O(n)
    """
    histo = {}

    # O(n)
    for word in words:
        histo[word] = histo.get(word, 0) + 1

    # O(n) - use negative value because min heap
    topK = [(-v, k) for k,v in histo.items()]
    heapify(topK) # O(n) since heapify entire list instead of pushing one by one

    #O(k log n) - use min heap & nsmallest because we need to return alphabetical order in event of same count/priority
    return [item[1] for item in nsmallest(k, topK)]

Use negative values so max value is at top of the 'min' heap

References

More References

CodePath CS guides: https://guides.codepath.org/compsci
InterviewCake: https://www.interviewcake.com/data-structures-reference
TSiege Tech Interview Cheat Sheet: https://github.com/TSiege/Tech-Interview-Cheat-Sheet

Data Structures & Algorithms

LinkedLists

Terminology

Time & Space Complexity

Strategy: Dummy Head

Strategy: Multi Pass

Note on Recursion vs. Iteration:

Strategy: Two Pointer

Explanation:

Heaps

Terminology

Heap Operations

Building

Insertion

Removal

Time Complexity

Strategy: Max Heap with heapq

References

More References

Strategy: Max Heap with `heapq`