brokensandals.net -> Technical -> Programming challenges -> Two HackerRank hashtable problems

Posted on 2019-06-03.

The Sock Merchant and Ransom Note HackerRank problems primarily test whether you have basic knowledge of how to use hashtables. Below, I show how to approach them from a couple different programming paradigms, and some of the performance and memory usage tradeoffs you can choose from.

If you just want hints, without the full solution, see: June KC Code Challenge Hints.

My solutions are written in Kotlin. I only show the bodies of the sockMerchant and checkMagazine functions; for the rest of the program, I’m using the template that HackerRank provides.

Table of Contents:

Imperative vs Functional hashtable-based solutions
- For Sock Merchant
- For Ransom Note
Faster storage for Sock Merchant
Trading time efficiency for space efficiency

Imperative vs Functional hashtable-based solutions

For Sock Merchant

I think the most straightforward solution to Sock Merchant is:

Create an empty hashtable. The keys will represent colors, and the values will be the number of socks that have that color.
In a loop, look at each element in the input array (containing the color of each sock). Look up the entry for that color in the hasthable and increment its value. At the end of the loop, the hashtable will have a correct record of the total number of socks for each color.
In a loop, look at each entry in the hashtable. Divide each value (representing the number of socks of that color) by 2 and discard the remainder (if the remainder is 1, it means there’s an unpaired sock). Add that quotient to a variable that keeps track of the total number of pairs.

Here’s the code:

fun sockMerchant(n: Int, ar: Array<Int>): Int {
    val socksByColor = mutableMapOf<Int, Int>()

    for (sockColor in ar) {
        socksByColor[sockColor] = (socksByColor[sockColor] ?: 0) + 1
    }

    var pairs = 0
    for ((color, socks) in socksByColor) {
        pairs += socks / 2
    }

    return pairs
}

Both the time complexity and the space complexity of this solution are O(N).

The time complexity is O(N) because we have a loop that looks at each element of the input array (that is, N elements).¹ Inside the loop, we’re reading a value from a hashtable and writing a value to a hashtable, both of which are considered O(1) operations.²

The space complexity is O(N) because we create a hashtable that could potentially hold up to N elements. The amount of memory used by a hashtable should be roughly proportional to the number of elements.

The above solution uses an imperative programming style. One meaning of the word “imperative” is “command”. The code reads like a list of commands: do this, do that, store this value in this spot, etc.

An alternative is a functional programming style. Here’s code written in that style:

fun sockMerchant(n: Int, ar: Array<Int>): Int =
    ar.groupingBy { it }.eachCount().values.map { it / 2 }.sum()

Like the imperative solution, this should be O(N) in both time and space complexity. It’s fundamentally doing basically the same thing:

ar.groupingBy { it }.eachCount() looks at each element in the array and constructs a hashtable.
values.map { it / 2 }.sum() looks at each value in the hashtable, divides it by 2, and adds up all the quotients.³

In this case, the functional solution is more concise. It also avoids mutation - that is, it never changes a variable or modifies a data structure after that variable or data structure has been created. Minimizing mutation is a key characteristic of functional programming, and it can make code much easier to reason about, especially as programs get larger.

On the other hand, the functional solution requires you to understand a handful of special-purpose methods (groupingBy, eachCount, values, map, sum) which may make it more intimidating to someone less familiar with this style. These are common trade-offs with functional programming.

For Ransom Note

A straightforward approach to this problem is similar to the one I described for Sock Merchant.

Create an empty hashtable. The keys will be words from the magazine, and the values will be the number of times that word appears in the magazine.
In a loop, look at each word in the magazine. Look up the entry for that word in the hashtable and increment its value. At the end of the loop, the hashtable will have a correct record of the total number of occurrences of each word.
In a loop, look at each word in the note. Find that word in the hashtable. If it’s missing or the value is 0, there aren’t enough occurrences in the magazine to let us make the note, so we can print No and exit. Otherwise, decrement the value in the hashtable, to indicate that we’ve used up one occurrence of the word from the magazine.

Here’s the code:

fun checkMagazine(magazine: Array<String>, note: Array<String>): Unit {
    val availableWords = mutableMapOf<String, Int>()
    
    for (word in magazine) {
        availableWords[word] = (availableWords[word] ?: 0) + 1
    }
    
    for (word in note) {
        val remaining = availableWords.getOrDefault(word, 0)
        
        if (remaining == 0) {
            println("No")
            return
        }
        
        availableWords[word] = remaining - 1
    }
    
    println("Yes")
}

We’ll call the time complexity of this solution O(M + N), where M is the length in words of the magazine and N is the length in words of the note. Really, it’s more complicated than that, because doing comparisons on strings and calculating the hash codes of strings is not an O(1) operation, but rather depends on the length of the strings. Just for simplicity, I’m going to ignore that throughout this blog post and treat those as O(1) operations.

So, the function above is O(M + N) time complexity because it looks at each of the M elements of the magazine in a loop, and then looks at each of the N elements of the note in a loop. All the operations in both loops are considered O(1), given my simplifying assumption about string operations, and bearing in mind the usual caveats about the time complexity of hashtable access².

If we ignore the input arrays, the space complexity is O(M), since the only structure we create is a hashtable with M elements. If we take the input arrays into account, space complexity is O(M + N), since there’s also an N-length array in memory representing the note.

Just as we did with Sock Merchant, we can rewrite this solution in a functional style:

fun checkMagazine(magazine: Array<String>, note: Array<String>): Unit {
    val magazineCounts = magazine.groupingBy { it }.eachCount()
    val noteCounts = note.groupingBy { it }.eachCount()

    val enough = noteCounts.all { (word, count) -> count <= (magazineCounts[word] ?: 0) }
    println(if (enough) "Yes" else "No")
}

This deviates a bit more from the imperative solution, because it creates two hashtables instead of one. The first records the number of times each unique word in the magazine appears in the magazine, and the second records the number of times each unique word in the note appears in the note.

Then, it checks whether all values in the second hashtable are less than or equal to the corresponding values in the first hashtable. That tells us whether all the words in the note occur at least as many times in the magazine.

The time and space complexity of this solution are both O(M + N).

Faster storage for Sock Merchant

Hashtables are fast, but the techniques they must use to deal with collisions of hash codes prevent them from being as fast as directly accessing an element of an array by index. Since the keys of our hashtable - the colors - are integers, we could easily use an array instead of a hashtable. The downside of this is the memory usage: there must be a slot in the array for every possible color value, regardless of whether or not the color ever actually appears in our list of socks. But if we know that the range of colors won’t be excessively large, that may not be a big deal. In this case, the problem statement says that all color values fall between 1 and 100, so we’d only need a 100-element array, which is negligible.

So, we could do something like this:

fun sockMerchant(n: Int, ar: Array<Int>): Int {
    val socksByColor = IntArray(101)

    for (sockColor in ar) {
        socksByColor[sockColor] = socksByColor[sockColor] + 1
    }

    var pairs = 0
    for (socks in socksByColor) {
        pairs += socks / 2
    }

    return pairs
}

Notice that the second loop now always runs 101 iterations, rather than the actual number of colors that there are. We can avoid that by calculating the total number of pairs as we go along:

fun sockMerchant(n: Int, ar: Array<Int>): Int {
    val socksByColor = IntArray(101)
    var pairs = 0

    for (sockColor in ar) {
        socksByColor[sockColor] = socksByColor[sockColor] + 1
        if (socksByColor[sockColor] % 2 == 0) {
            pairs++
        }
    }

    return pairs
}

Now notice that each iteration of that loop doesn’t actually need to know the total number of socks for a given color. It just needs to make sure that it only increments pairs every other time (i.e., every second, fourth, sixth, … time) it encounters a given color. So rather than storing the number of socks for a color, we could just store whether or not we need to increment pairs next time we encounter that color. That means we only need to store 1 bit per color, rather than storing a full 32-bit integer.

Kotlin (like many languages) offers a structure for efficiently storing an array of bits:

fun sockMerchant(n: Int, ar: Array<Int>): Int {
    val unmatched = BitSet()
    var pairs = 0

    for (color in ar) {
        if (unmatched[color]) {
            pairs++
        }
        unmatched.flip(color)
    }

    return pairs
}

This should use less space than the other array-based solutions, while still having O(N) time complexity and presumably being faster than the hashtable-based solutions.

Trading time efficiency for space efficiency

If we’re willing to sacrifice performance, we can write solutions to both problems that have O(1) space complexity (excluding the space used by the input arrays).

Hashtables are often used to help us deal with data that’s coming in an unpredictable order. For example, in Sock Merchant, we might get a few socks of one color, then one of another, then another of the first color, and so on. We use the hashtable to enable the program to switch back and forth between dealing with different colors, and remember how many socks it had found of each.

If the data gave us all the socks for a color at once, we wouldn’t need that context-switching ability. As suggested on the HackerRank forums for this problem, we can achieve that by sorting the array at the start of our method, then simply looking for all the elements where the immediately following element is the same color, like this:

fun sockMerchant(n: Int, ar: Array<Int>): Int {
    ar.sort()
    var pairs = 0

    var i = 0
    while (i < n - 1) {
        if (ar[i] == ar[i + 1]) {
            pairs++
            i += 2
        } else {
            i += 1
        }
    }

    return pairs
}

There are sorting algorithms such as heap sort with O(N * log N) time complexity and O(1) space complexity. (That may or may not be the complexity of the default sorting algorithm invoked by ar.sort(), but for simplicity we’ll assume it is; we could always explicitly invoke or implement a different sorting algorithm instead if we wanted.)

The loop iterates less than N times and only performs O(1) operations, and doesn’t allocate anything in memory. So if the sort algorithm is O(N * log N) time and O(1) space, so is the function as a whole. O(N * log N) is worse than the O(N) of solutions described above, but not dramatically.

For the Ransom Note problem, we can sort both input arrays. Then, we only need to walk each array once, simultaneously, to figure out whether all the words in the note array occur enough times in the magazine array. The algorithm is:

Use a variable to hold the current index into the magazine array, starting at 0.
Loop over every word in the note array; for each one:
1. Loop over words in the magazine array until we find one that is lexicographically greater than or equal to the note word.
  - If the magazine word is greater, then all subsequent words in the magazine array will also be greater, so we know this word isn’t in there. We can print No and exit.
  - If the magazine word is equal, we can move on to the next word in both the note and magazine arrays.
If we make it to the end of the note array, then we’ve succeeded and can print Yes.

Sorting the note array takes O(N * log N) time, and sorting the magazine array takes O(M * log M) time, for a total of O(N * log N + M * log M) time complexity. The loops do at most O(N + M) operations (since they only look at each element of each sorted array once), so the full time complexity of the function is dominated by that of the sorting.

Note, however that there’s also an O(N * M) algorithm with O(1) space complexity. That’s this simple brute force approach:

For each word in the note array:
1. For each word in the magazine array:
  - If the note word and magazine word are equal, remove the word from the magazine array and move on to the next word in the note array.
2. If no match was found in the magazine array, print No and exit.
If we make it to the end of the note array, print Yes.

When the magazine array is drastically longer than the note array, N * M can be smaller than N * log N + M * log M, meaning this brute force approach would be faster.

We can write a solution that chooses between these two different O(1) space complexity approaches, to use whichever should be more efficient based on the sizes of the input arrays:

fun checkMagazine(magazine: Array<String>, note: Array<String>): Unit {
    val m = magazine.size.toDouble()
    val n = note.size.toDouble()

    if (n > m) {
        println("No")
        return
    }

    val logM = Math.log(m.toDouble())
    val logN = Math.log(n.toDouble())

    if (m * n < m * logM + n * logN) {
        // Use the O(N*M) approach
        for (word in note) {
            val index = magazine.indexOf(word)
            if (index == -1) {
                println("No")
                return
            }

            magazine[index] = ""
        }
    } else {
        // Use the O(N*log(N)+M*log(M)) approach
        magazine.sort()
        note.sort()

        var magazineIndex = 0
        for (noteIndex in 0 until note.size) {
            val noteWord = note[noteIndex]

            var cmp: Int
            do {
                if (note.size - noteIndex > magazine.size - magazineIndex) {
                    println("No")
                    return
                }

                cmp = magazine[magazineIndex].compareTo(noteWord)
                magazineIndex++
            } while (cmp < 0)

            if (cmp > 0) {
                println("No")
                return
            }
        }
    }

    println("Yes")
}

There’s also a loop that looks at each element of the hashtable. But the hashtable can’t possibly have more than N elements (and it will only have that many if every sock is a different color), so that loop is also O(N). So this function has two O(N) loops, making the function O(2N). Constant factors like the 2 are usually dropped from this notation - we’re generally concerned with the differences between complexities like O(N), O(N^2), O(2^N), etc., and for large values of N those differences will render the differences caused by constant factors insignificant by comparison.
Some writes to the hashtable may take longer than others because the hashtable implementation may do things like resize or rebalance buckets when a certain size threshold is crossed. But the costs of that, and the frequency with which it’s triggered, are such that the total time complexity of N write operations should still be O(N), meaning that each write is O(1) on average.

Reads and writes for some keys may also take longer than for other keys due to hash collisions. For large N, though, this shouldn't add up to be very significant, unless your input data is very skewed.
One difference with the imperative approach is that the .map call may create a temporary list containing the results of dividing each value of the hashtable by 2. (I’m not familiar enough with Kotlin’s implementation to know if that’s the case or not.) If so, that would mean this has higher memory usage than the imperative solution. But since the size of that array will be less than or equal to N, the space complexity of the algorithm is still O(N).