4 minute read

❄️ Back to Advent of Code 2023 solutions homepage ❄️

Today is the first day of the Advent of Code. I hope to find some time each day to make an attempt for the daily puzzle, and may return later to those puzzles that I find interesting, if I feel that there are further interesting things to explore. I have no particular goal in mind, other than to flex my problem solving muscles, and also get better at writing up short blog posts for this site.

Part 1

In the first part of this puzzle, we are given a collection of words containing lower case letters of the alphabet and digits. Each word is guaranteed to have at least one digit. For each word, we must find the first and last digits (which may be the same character), and form a two digit number from them; the final answer is the sum of these numbers.

Method 1 - use strip()

We can use the strip method of the native string class in Python to remove leading and trailing characters belonging to a certain set. Using the set containing all the lower case letters, this allows us to read the first and last characters of the remaining string. Note that the resulting string may only have one character, in which case the resulting number repeats the same digit:

sum_1 = 0
for word in input_1:
    stripped_word = word.strip('abcdefghijklmnopqrstuvwxyz')
    word_int = "".join([stripped_word[0], stripped_word[-1]])
    sum_1 += int(word_int)
sum_1
54561

Method 2 - loop over the word

One problem with the previous method is that it creates a copy of the original word once the leading and trailing letters have been stripped off - this isn’t ideal if the word is very long. Instead, we can create some helper functions which allow us to loop over the input string directly, either from the start or the end of the word:

def first_digit_in_word(word):
    for c in word:
        if c.isdigit():
            return c
        
def last_digit_in_word(word):
    for i in range(len(word)):
        c = word[-i-1]
        if c.isdigit():
            return c

This allows us to find the relevant digits, while referencing the original string and without creating any copies:

sum([
    int("".join([first_digit_in_word(word), 
                 last_digit_in_word(word)])) 
    for word in input_1
])
54561

And so the answer to Part 1 is: 54561.

Part 2

In the second part of this puzzle, we need to not only look for digits, but also for spelled out numbers (e.g. ‘one’, ‘two’, ‘three’ etc.) at the start and end of the words. To help with this, let’s define a dict mapping each number string to the corresponding integer:

NUMBERS = {'one': 1,
           'two': 2,
           'three': 3,
           'four': 4,
           'five': 5,
           'six': 6,
           'seven': 7,
           'eight': 8,
           'nine': 9}

The strategy here is to work our way in from each end of the word, checking at each step:

  • If the current character is a digit - in which case we are done
  • If the substring leading up to this point ends in one of the spelled out numbers

Modifying the previous functions is pretty straightforward:

def first_number_in_word(word):
    for i in range(len(word)):
        # If next character is a digit, return it
        if word[i].isdigit():
            return int(word[i])
        else:
            # Check for number words ending here
            # All numbers have 3, 4, or 5 characters
            for j in [5, 4, 3]:
                # Find subword ending with this character
                start = max(0, i-j+1)
                subword = word[start:i+1]
                
                if subword in NUMBERS:
                    return NUMBERS[subword]
                
    raise(ValueError('No number found in word!'))
def last_number_in_word(word):
    for i in range(len(word)):
        pos = -i-1
        # If next character is a digit, return it
        if word[pos].isdigit():
            return int(word[pos])
        else:
            # Check for number words starting here
            # All numbers have 3, 4, or 5 characters
            for j in [5, 4, 3]:
                # Find subword starting with this character
                end = pos+j if pos+j < 0 else None
                subword = word[pos:end]
                
                if subword in NUMBERS:
                    return NUMBERS[subword]
                
    raise(ValueError('No number found in word!'))

Testing these functions on the first few words from the provided dataset, we can see that they give the desired results:

for word in input_1[:10]:
    print(f'{first_number_in_word(word)}, {last_number_in_word(word)} --- {word}')
6, 7 --- sixsrvldfour4seven
5, 8 --- 53hvhgchljnlxqjsgrhxgf1zfoureightmlhvvv
5, 2 --- fives2dznl
2, 3 --- twocrqvjsix5threethree
2, 9 --- gtjtwonefourone6fouronefccmnpbpeightnine
7, 1 --- seventdtrcseveneightsevencgjgjxfpmfsix8twones
4, 3 --- fourthreeseven1grvhrjxklh3ninetwothree
4, 8 --- fourninethrnnth8
2, 5 --- two2hnxcfivejrdjxtb
8, 5 --- bssbrgcx86vsmqstrxsjbjeightqzhbzxqg5

All that remains to do is to form the two-digit numbers for each of the input words and sum:

sum([
    10*first_number_in_word(word) + last_number_in_word(word) 
    for word in input_1
])
54076

And so the answer to Part 2 is: 54076.

There is a further optimization that could be done on these functions - in particular, at each step, we can keep track of a set of candidate numbers which are currently still possible, and ignore the others. For example, if the word begins in the letters xxxxx, we know that they can’t be part of any number string; therefore, at each of these steps, we won’t need to look back at all. Furthermore, if the next character is a t, we know that these can only be part of the numbers two or three, and so there is no need to check membership of the full number set.

However, as the total time taken to do all 1000 words is just over 15ms, it seems a bit pointless at this stage. If it becomes clear that these functions will be needed later, and become the bottleneck for a time-consuming process, I can look at implementing these improvements.

%time sum([10*first_number_in_word(word) + last_number_in_word(word) for word in input_1])
CPU times: total: 0 ns
Wall time: 15.5 ms





54076

Day 1 complete - a gentle introduction, with the difficulty set to increase from here!

Unit tests

test_words = [
    'two1nine',
    'eightwothree',
    'abcone2threexyz',
    'xtwone3four',
    '4nineeightseven2',
    'zoneight234',
    '7pqrstsixteen',
]

test_answers = [29, 83, 13, 24, 42, 14, 76]
for word, answer in zip(test_words, test_answers):
    assert 10*first_number_in_word(word) + last_number_in_word(word) == answer, word

Tags:

Updated:

Comments