Data Structures & Algorithms
Werkkk
LinkedLists
Definition of a Node
class Node:
def __init__(self, val=0, next=None, previous=None):
self.val = val
self.next = next
self.previous = previous # if Doubly Linked
- Node holds two (or three) things:
- the data
- pointer to the next node
- pointer to previous node (if doubly linked list)
Definition of Singly LinkedList
class LinkedList:
def __init__(self, items=None):
self.head = None
self.tail = None
if items:
for item in items:
self.append(item)
def prepend(self, val):
node = Node(val)
if not self.head: # empty LL -> node set to head & tail
self.tail = node
else:
node.next = self.head
self.head = node
def append(self, val):
node = Node(val)
if not self.tail: # empty LL -> node set to head & tail
self.head = node
else:
self.tail.next = node
self.tail = node
# ...
- LinkedList init with head & tail node, if exists
- If init with array, create the list with append()
The most common variants of linked lists are:
next |
previous |
if Circular: | |
---|---|---|---|
Singly Linked | ☑️ | tail.next -> head |
|
Doubly Linked | ☑️ | ☑️ | tail.next -> head head.previous -> tail |
Terminology
node | position in LinkedList containing the value of whatever is stored at the position and at least one reference to another node (next ) |
head | node at beginning of list |
tail | node at end of the list |
sentinel | a dummy node, typically placed at the head or end of the list to help make operations simpler (e.g., delete) or to indicate the termination of list |
Time & Space Complexity
Best | Worst | |
---|---|---|
Occurs when node at |
head |
last |
Accessing / Search | O(1) | O(N) |
Inserting at head |
O(1) | O(N) |
Deleting at head |
O(1) | O(N) |
Strategy: Dummy Head
Delete
def delete(self, val):
dummy = Node("sentinel")
dummy.next = self
placeholder = dummy
current = self
while current:
if current.val == val:
placeholder.next = current.next
return dummy.next
placeholder = current
current = current.next
return dummy.next
1. Create `dummy` node
- `dummy.next` points to `head`
2. Set `placeholder` to `dummy` and `current` to `head`
3. Iterate through `current` by:
- Set `placeholder` to `current` node
- Set `current` to `current.next`
5. If we find `val`:
- set `placeholder.next` to node at `current.next`
6. Return `dummy.next`
Typically saves you creating special edge condition logic in order to operate on the head of a linked list with some algorithms.
Strategy: Multi Pass
Get Length of List
def __len__(self):
current = self
length = 0
while current:
length += 1
current = current.next
return length
1. Iterate through list
2. Increment `length`
3. Set `current` to `current.next`
Most computations on a list will require O(N) time complexity, so a simple but very useful technique is to pass through the list a constant number of times to calculate some summary of the list that will simplify your algorithm.
The looping is implicitly done for you by the execution of your program, namely every successive call to your recursive function places some data on your call stack
and then goes to the next function.
Note on Recursion vs. Iteration:
Recursive Length of List
def get_length_recursive(self):
linklist = self
if not linklist:
return 0 # A None path has 0 length
return get_length_recursive(linklist.next) + 1
# The length at this node is 1 + length of rest of list
1. Base Case: return `0` if `None`
2. Recurse through call stack,
adding `+1` to previous val of `length` on call stack
Strategy: Two Pointer
Two-Pointer Solution for Detecting Cycle
def has_cycle(ll):
if not ll or not ll.next:
return False
slow = ll
fast = ll.next
while fast and fast.next:
if fast == slow or fast.next == slow:
return True
fast = fast.next.next # Because it's fast
slow = slow.next
return False
Similar to a race track,
the faster pointer will eventually cross/lap the slower pointer,
whereas if no cycle, they will never cross paths
Very useful technique for dealing with linked lists involves iterating through the list with 2 or more pointers.
The differences between how the pointers iterate can be used to make calculations on the list more efficient.
Time | Space | |
---|---|---|
Using Hashmap | O(n) | O(n) |
Two Pointer | O(n) | O(1) |
Explanation:
The first thing that most people think of is to use another data structure to store nodes that we have already seen as we traverse the list. Then as we move through the nodes of the list we check to see if we have already stored the current node in our auxiliary data structure and if we have then we have found a cycle. The typical data structure to choose here is a hash map because it offers constant time insertion and lookup.
We can get rid of the extra auxiliary data structure by utilizing only one additional pointer. We can then use the two pointers to iterate through the list at two different speeds. The motivation being that if there is a cycle, then the list can be thought of as a circle (at least the part of the list past the self-intersection). Similar to a race track, the faster pointer must eventually cross paths with the slower pointer, whereas if there is not a cycle they will never cross paths.
Heaps
Min Heap
1 <- smallest
/ \
3 2
/ \ /
4 6 5
import heapq
class MinHeap:
def __init__(self, minheap): # minheap is the list that we can to convert to a heap
heapq.heapify(minheap) # Use the heapify function to convert list to a heap
self.minheap = minheap
def insert(self, key):
heapq.heappush(self.minheap, key) # Insert key into the heap (heapq automatically maintains the heap property)
def getMin(self):
return self.minheap[0] # Returns the smallest element of the heap in O(1) time
def removeMin(self):
heapq.heappop(self.minheap) # The heappop function removes the smallest element in the heap
def printHeap(self):
print(self.minheap) # Prints the heap
Heaps are special tree based data structures that satisfy two properties:
Ordered in a specific way:
heaps root
children Min smallest element parent <=
sum ofchildren
Max largest element parent >=
sum ofchildren
Are complete binary trees
Binary At most, two children: left
andright
Complete Fills each level entirely, except the last level, which must be left
Heaps are useful when for getting the largest or smallest elements, and in situations where you don’t care about fast lookup, delete, or search.
Terminology
Max Heap
6 <- largest
/ \
5 3
/ \ /
4 2 1
class MaxHeap:
def __init__(self):
# Initialize a heap using list
self.heap = []
def getParentPosition(self, i):
# The parent is located at floor((i-1)/2)
return int((i-1)/2)
def getLeftChildPosition(self, i):
# The left child is located at 2 * i + 1
return 2*i+1
def getRightChildPosition(self, i):
# The right child is located at 2 * i + 2
return 2*i+2
def hasParent(self, i):
# This function checks if the given node has a parent or not
return self.getParentPosition(i) < len(self.heap)
def hasLeftChild(self, i):
# This function checks if the given node has a left child or not
return self.getLeftChildPosition(i) < len(self.heap)
def hasRightChild(self, i):
# This function checks if the given node has a right child or not
return self.getRightChildPosition(i) < len(self.heap)
def insert(self, key):
self.heap.append(key) # Adds the key to the end of the list
self.heapify(len(self.heap) - 1) # Re-arranges the heap to maintain the heap property
def getMax(self):
return self.heap[0] # Returns the largest value in the heap in O(1) time.
def heapify(self, i):
while(self.hasParent(i) and self.heap[i] > self.heap[self.getParentPosition(i)]): # Loops until it reaches a leaf node
self.heap[i], self.heap[self.getParentPosition(i)] = self.heap[self.getParentPosition(i)], self.heap[i] # Swap the values
i = self.getParentPosition(i) # Resets the new position
def printHeap(self):
print(self.heap) # Prints the heap
Bubble down | moving an element down by swapping with one of its children in proper position |
Bubble up | moving an element down by swapping with one of its children in proper position |
Priority Queue | queue data structure with associated priority value, higher priority is dequeue before lower priority, if same, then dequeued based on location in array |
Heap Operations
Building
This complex data structure can be represented using an array. Often implemented as arrays because they are super efficient ways of representing priority queues.
Binary heaps are super efficient for implementing priority queues because it’s very easy to know and retrieve/remove the element with the highest priority: it will always be the **root* node*!
Building a heap only takes O(n)
time, so you can potentially optimize a solution by building a heap from a list instead of running insertion n
times to create the heap.
By property of it being a complete binary tree, we can see how the parent-child relationships are maintained in the array using these formulas:
Parent | (n - 1) // 2 |
n == index of current node |
Left |
2i + 1 |
i == index of parent node |
Right |
2i + 2 |
Insertion
When growing a heap, we can only ever add a node to the left-most
available node, at lowest possible level
.
If necessary to follow both rules of shape and order, we swap
the two nodes that are out of order
Removal
When deleting or removing an element, most heaps are usually concerned with removing the root
node.
In order to maintain rules of shape and order:
- For shape, remove the
right-most
node at thelowest level
, make itroot
- Then, for order, compare with child nodes and
swap
- Continue bubbling down (step 2) until no longer violating
heap order property
Time Complexity
In the worst case scenario, the swapping procedure for insertions and deletions will move the element through the height of the heap.
Because heaps are binary trees that are guaranteed to be as complete as possible, the number of levels in the heap will be log n
.
Best | Worst | |
---|---|---|
Reading root node | O(1) | O(1) |
Insertion | O(log n) | O(log n) |
Deletion | O(log n) | O(log n) |
Build heap | O(n) (from list) | O(n log n) (inserting into empty heap) |
Strategy: Max Heap with heapq
def topKFrequent(words: List[str], k: int) -> List[str]:
"""
Time: O(n + k log n)
Space: O(n)
"""
histo = {}
# O(n)
for word in words:
histo[word] = histo.get(word, 0) + 1
# O(n) - use negative value because min heap
topK = [(-v, k) for k,v in histo.items()]
heapify(topK) # O(n) since heapify entire list instead of pushing one by one
#O(k log n) - use min heap & nsmallest because we need to return alphabetical order in event of same count/priority
return [item[1] for item in nsmallest(k, topK)]
- Use negative values so max value is at top of the 'min' heap
References
- https://medium.com/basecs/learning-to-love-heaps-cef2b273a238
- https://www.geeksforgeeks.org/binary-heap/
- https://www.section.io/engineering-education/heap-data-structure-python/
More References
- CodePath CS guides: https://guides.codepath.org/compsci
- InterviewCake: https://www.interviewcake.com/data-structures-reference
- TSiege Tech Interview Cheat Sheet: https://github.com/TSiege/Tech-Interview-Cheat-Sheet