Category: Programming

  • Migrating from Octopress to Jekyll

    Back in 2014 I abandoned WordPress for Octopress. Itā€™s been especially amazing for page load speeds, and I also enjoyed the fact that GitHub Pages are completely free - and I only need to pay for a domain name. Hosting a website can get expensive.

    Octopress was a shortlived framework built on top of Jekyll, focused on blogging and designed to run on top of GitHub Pages. Unfortunately the development stopped in 2015, and now, 10 years later, I couldnā€™t set it up on a new machine due to most dependencies getting dangerously out of date.

    I chose to migrate to vanilla Jekyll, since itā€™s a static site generator which is built on top of simple markdown and HTML files. Jekyllā€™s been around for some time, and Iā€™m hoping Microsoft wonā€™t be shutting down GitHub pages any time soon.

    Saying goodbye to Octopress (granted, it looks almost the same).

    The whole process only took a couple of hours, and Iā€™d like to document some highlights and lowlights. You might find it useful if youā€™re setting up a new Jekyll blog, or, like me, still have an Octopress blog that needs migrating.

    Fresh setup

    I went with a fresh Jekyll setup, by installing Jekyll and running jekyll new blog. I successfully copied over old _posts and images, and ported the relevant parts of _config.yml from Octopress to vanilla Jekyll.

    Octopress uses liquid {% img %} tags, which arenā€™t natively supported in Jekyll. I took the opportunity to convert those to markdown style syntax. I only have a few hundred posts, and I used a Vim macro to convert all {% img /foo/bar.png baz %} to ![baz](/foo/bar.png).

    By default Jekyll comes installed with the minima theme, which I found to be mostly sufficient for my needs. I was able to override specific theme files by copying them from gem installation location to my blog directory and modifying them. Turned out to be straightforward and customizable. For example, I transferred the way Octopress pagination looks by modifying _layouts/home.html.

    For backward compatbility, I also had to move RSS feed to /atom.xml by modifying _config.yml:

    feed:
      path: /atom.xml
    

    I could immediately run the site locally with bundle exec jekyll serve --baseurl="".

    Missing functionality

    Two major things were missing straight out of the box: archive and category pages.

    I grew attached to my archive page, and recreating it only took a couple of minutes. All I had to do is add an archive.markdown page to the siteā€™s root directory:

    ---
    layout: page
    title: Archive
    navbar: Archive
    permalink: /blog/archive/
    ---
      
    {%- assign date_format = site.minima.date_format | default: "%b %-d, %Y" -%}
    
    <div>
      <ul>
        {% for post in site.posts %}
          {% capture this_year %}{{ post.date | date: "%Y" }}{% endcapture %}
          {% unless year == this_year %}
            {% assign year = this_year %}
            <h2 style="margin-top: 1em;">{{ year }}</h2>
          {% endunless %}
          <li>
            <a href="{{ root_url }}{{ post.url }}" itemprop="url">{{ post.title }}</a>
            <span class="text-muted">| šŸ“… {{ post.date | date: date_format }}</span>
          </li>
        {% endfor %}
      </ul>
    </div>
    

    Building category support turned out to be messier and more complicated. I didnā€™t want to write up a custom solution, and ended up with some technical debt Iā€™ll probably have to address in the future (wink-wink, this will never happen).

    I used jekyll-category-pages gem, which worked okay-ish. The instructions on field-theory/jekyll-category-pages are extensive and arenā€™t too difficult to follow - I appreciated not having to write my own category pages, but I had to:

    1. Stop category pages from being automatically added to the navigation bar.
    2. Disable pagination on category pages, because for some reason it really didnā€™t work with jekyll-category-pages.

    I also added my own basic category index pages by creating categories.markdown:

    ---
    layout: page
    title: Categories
    navbar: Categories
    permalink: /blog/categories/
    ---
    
    {% assign category_names = "" | split: "" %}
    {% for category in site.categories %}
      {% assign category_names = category_names | push: category[0] %}
    {% endfor %}
    {% assign category_names = category_names | sort %}
      
    <div>
      <ul>
        {% for category in category_names %}
          <li>
            <a href="{{ root_url }}/{{ site.category_path }}/{{ category | slugify }}">{{ category }}</a>
          </li>
        {% endfor %}
      </ul>
    </div>
    

    GitHub Pages

    While GitHub Pages documentation is extensive, getting Jekyll to work with GitHub Pages took longer than Iā€™d like to admit. Specifically, Gemfile generated by running jekyll new blog misleadingly tells you to comment away the latest version of the jekyll gem and instead use the github-pages gem:

    # Happy Jekylling!
    gem "jekyll", "~> 4.4.1"
    # If you want to use GitHub Pages, remove the "gem "jekyll"" above and
    # uncomment the line below. To upgrade, run `bundle update github-pages`.
    # gem "github-pages", group: :jekyll_plugins
    

    You donā€™t want to do that, oh no. Because the default GitHub Pages gem is stuck in the past on the 3rd version of Jekyll (and at the time of writing weā€™re on version 4), which caused all kind of hidden problems - including the fact that my URL slugs silently werenā€™t getting generated right. I switched back on the jekyll gem and set up a custom GitHub action to deploy the site:

    name: Deploy Jekyll site to Pages
    
    on:
      push:
        branches: ["master"]
    
      # Allows you to run this workflow manually from the Actions tab
      workflow_dispatch:
    
    permissions:
      contents: read
      pages: write
      id-token: write
    
    concurrency:
      group: "pages"
      cancel-in-progress: false
    
    jobs:
      # Build job
      build:
        runs-on: ubuntu-latest
        steps:
          - name: Checkout
            uses: actions/checkout@v4
          - name: Setup Ruby
            # https://github.com/ruby/setup-ruby/releases/tag/v1.207.0
            uses: ruby/setup-ruby@4a9ddd6f338a97768b8006bf671dfbad383215f4
            with:
              ruby-version: '3.1' # Not needed with a .ruby-version file
              bundler-cache: true # runs 'bundle install' and caches installed gems automatically
              cache-version: 0 # Increment this number if you need to re-download cached gems
          - name: Setup Pages
            id: pages
            uses: actions/configure-pages@v5
          - name: Build with Jekyll
            run: bundle exec jekyll build --baseurl "$"
            env:
              JEKYLL_ENV: production
          - name: Upload artifact
            uses: actions/upload-pages-artifact@v3
    
      # Deployment job
      deploy:
        environment:
          name: github-pages
          url: $
        runs-on: ubuntu-latest
        needs: build
        steps:
          - name: Deploy to GitHub Pages
            id: deployment
            uses: actions/deploy-pages@v4
    

    Donā€™t forget to set ā€œBuild and deployment sourceā€ to ā€œGitHub pagesā€ in the repository settings to actually use the action.

    My Octopress blog was set up in a source Git branch, and content was generated into the master branch. I wanted to change that to have the source in the master branch (the action above wonā€™t work without that), and I was able to replace my master with source with the following set of commands:

        git checkout master
        git pull
        git checkout source
        git merge -s ours master --allow-unrelated-histories
        git checkout master
        git merge source
    

    We merge the master branch into source using ours merge strategy (effectively ignoring the master branch history), and then merge that back into master.

    Positive experience

    All in all migrating to Jekyll has been a great experience, which is a testament to Jekyll communityā€™s dedication to thorough documentation. Knowing that Jekyll is a mature, maintained, and documented project, and that GitHub Pages infrastructure is reliable and supported, provides a sense of stability. I hope this results in Jekyll and GitHub Pages becoming a (reasonably) future-proof platform for my blog. But letā€™s check back in in 10 years - see you in 2035?

  • Static websites rule!

    I hope youā€™ve noticed that navigating to this page was quick (letā€™s hope that the Internet Gods are kind to me, and nothing stood in the way of you accessing this page). In fact, most pages on my blog - hopefully including this one - should render in under a second. I didnā€™t put any work into optimizing this site, and itā€™s not a boast, nor is it a technological marvel - this is just a good old fashioned static website.

    If this is new to you - static website is just what it sounds like - static HTML and CSS files, sometimes with some light JavaScript sprinkled throughout. Thereā€™s no server side processing ā€“ the only bottlenecks are the host server speed, recipientā€™s connection speed, and the browser rendering speed. Page is stored as is, and is sent over as soon as itā€™s requested. This is how the Internet used to be in late 90s and early 2000s (with eclectic web design to boot, of course).

    I think static websites are cool and arenā€™t used nearly enough, especially for websites that are, well, static. Think to the last website youā€™ve visited to read something - maybe a news site, or maybe a blog. Now did it take at least a couple of seconds for them to load? Likely. Did their server have to waste unnecessary cycles putting together a page for you? Most definitely. Now, contrast this with your experience with a static website like this one. Hereā€™s the result from pagespeed.web.dev for this page:

    A screenshot with page speed test for www.rosipov.com (this website). It displays: first contentful paint: 0.8 s; largest contentful paint: 0.9 s; total blocking time: 0 ms; cumulative layout shift: 0.007; speed index: 0.7 s.

    Every render complete in under a second, and I didnā€™t have to put in any work into optimizing my website.

    This site is built on a (now unsupported) Octopress, which is itself built on top of Jekyll. You write pages in Markdown, generate web pages using a pre-made template, and deploy the resulting pages to a hosting provider. In fact, GitHub Pages allow you to host your static website for free, and you can have a third party platform like Disqus provide comment support.

    Static websites work great for portfolios, blogs, and websites that donā€™t rely on extensive common manipulation. Theyā€™re more secure (no backend to hack), simple to build and maintain, very fast even without optimization, and are natively SEO friendly (search engines are great at understanding static pages). Static websites are cheap to run - I only pay for a domain name for this site (under $20 a year).

    If you have a blog or a portfolio and youā€™re using an overly complicated content management system to write - consider slimming down. Jekyll (or many of its alternatives) offers a number of pre-made off-ramps for major CMS users, is easy to set up, and is straightforward to work with. Canā€™t recommend enough - static websites rule!

  • Making a packing spreadsheet

    Being the unique person I am, I love traveling. Oftentimes I end up getting deep into trying to optimize my packing methods. There are dozens of tools online designed to help with this exact thing (services like Lighterpack or GearGrams). But, being handy with code, I decided to dabble in the subject on my own.

    One of the most important things in packing is the overall weight of the pack, and I always want to know what type of things are the heaviest. I also want to be able to run random queries on my data, whatever it is that Iā€™m trying to learn. I want to have an inventory of items (backpacks, clothes, storage solutions) which I can plug in and out of a spreadsheet to see how the resulting picture changes on the go. Working with software as my day job, Iā€™d also like for the solution to be automated whenever possible.

    Google Spreadsheets turned out to be the perfect solution, providing the ability to quickly sum up the weight of my things and output insights about the data.

    Final Result

    Hereā€™s a link to the spreadsheet, I encourage the reader to copy and play around with in anyway you see fit.

    Hereā€™s the final result for a multi-day trip I will have for this year. As you can see, my pack weighs around 3 kilograms or a bit over 6 freedom units (not including water). My recently purchased Nintendo Switch is the heaviest item (and itā€™s worth every gram as it makes flying tolerable), but clothes take up most of the weight:

    A screenshot of a packing spreadsheet.

    I use indentation levels to show that some items are contained within other items. This also lets me calculate the absolute and relative weights of a whole container with everything inside of it (see fields labeled ā€œContainerā€ and ā€œPercentageā€).

    The ā€œWeightā€ and the ā€œBreakdownā€ fields indicate absolute and relative item weight respectively, which accounts for quantity of the item (quantity defaults to 1 if not explicitly set). Weight-related fields are color coded from lightest items in green to heaviest items in red.

    Categories are used to group items and execute queries on the data and learn insights like these:

    A chart representing the weight of each category of items relative to a weight of the whole pack.

    Thereā€™s a separate sheet where I enter item names, categories, and weights, which I then use to automatically populate values above and autofill item names in the primary sheet:

    A screenshot of an inventory tab of the packing spreadsheet.

    The Technical Details

    I havenā€™t worked with Google Spreadsheets (or Excel for that matter) a lot in the past, but with an access to clear documentation (and a hundred of web searches later) it was straightforward to get the hang of the it.

    I started off by manually populating the final result sheet, manually adjusting formulas for Container/Percentage cells, as I had no idea how I would detect the indentation yet. I like when things look pretty, so I applied conditional formatting rules right away, and the looks of the sheet really havenā€™t changed throughout the process.

    Next, I created an inventory sheet, which I wanted to use as a source of data in the resulting sheet. A few Google searches and some trial & error resulted in a lookup formula:

    =ArrayFormula(
     IF(
       NOT(ISBLANK($B2)),
       INDEX(InventoryCategories, MATCH($B2, InventoryItems, 0)),
       IF(
         NOT(ISBLANK($C2)),
         INDEX(InventoryCategories, MATCH($C2, InventoryItems, 0)),
         INDEX(InventoryCategories, MATCH($D2, InventoryItems, 0))
       )
      )
    )
    

    ArrayFormula is necessary in order to create arrays on the fly without printing intermediate results in the spreadsheet. InventoryItems and InventoryCategories are named ranges referring to item names and categories in the inventory sheet. MATCH finds an index of a first occurrence of the item name in the sheet, and retrieves corresponding category name. An item weight is retrieved by the exact same principle.

    Trying to find the container weight took a lot more time, and resulted in a lot more headache. Solution for handling indentation in Google Spreadsheets wasnā€™t as elegant as I would have hoped for, but it got the job done:

    =ArrayFormula(
         SUM(
           $I2:INDIRECT(
             CONCATENATE(
               "$I",
               ROW() + IF(
                 NOT(ISBLANK($B2)),
                 MATCH(FALSE, ISBLANK($B3:$B), 0),
                 MATCH(FALSE, ISBLANK($C3:$C), 0)
               ) - 1
             )
           )
         )
    

    The formula above finds the first non-empty cell in a column. It starts searching from the next row (for example, for an item on a second row, we look at the third row and below). After it knows which row first non-empty cell is in, the formula turns it into a string (column $I contains item weights) and use it as an upper bound of a sum of the weights. Finished formula is a bit longer (it adds sugar to only display the number when needed), and if youā€™re interested - you can look it up in the spreadsheet.

    A screenshot used for explaining indentation logic.

    For example, in the screenshot above, the formula will start looking at cells in a column right after the ā€œPacking cubeā€. As soon as it finds a non-empty cell (ā€œNintendo Switch caseā€), the formula will determine this row to be an upper boundary. The formula then will sum weights starting with a ā€œPacking cubeā€ row, and up to but not including ā€œNintendo Switch caseā€ row.

    The rest involved many small tweaks, adding pretty colors, hiding N/A errors and zeroes, and trying to find the perfect shade for column borders.

    And, since you made it this far, hereā€™s how the numbers above look in the real world:

    Side by side pictures of a packed backpack: open and closed.

    Hopefully you found this useful, or at least somewhat entertaining. Thereā€™s a lot of room for improvement, and I aimed to provide a framework and a few basic ideas for building a spreadsheet to accommodate oneā€™s twisted ultralight needs.

  • Automating Octorpress publishing

    I really like Octopress. It builds on top of Jekyll to address certain rough edges, and provides ready to go lighting fast blogging platform. Itā€™s easily extendible if you know how to code (thanks to a rather clean and well organized code base), and posts are just plain Markdown files.

    One issue though - I need to be near a computer to publish and preview the article. This becomes difficult if Iā€™m traveling with, say, a tablet.

    I have a low end AWS Lightsail instance I use for writing and publishing, but itā€™s not always fun to write when SSHing into a server, and I often write offline - making it even more difficult to move files between where I write and where I publish. And managing images is a nightmare. To resolve this, I set up a few directories I use in Dropbox, and wrote a few scripts to move things around when needed.

    Hereā€™s a directory structure in Dropbox:

    - blog/
      - posts/
        - 2017-11-20-automatic-octopress-publishing.markdown
      - img/
        - input/
          - a-picture-of-a-flower.jpg
        - output/
    

    I put Markdown files in Dropbox/blog/posts/ (and numerous offline editors sync with Dropbox - Iā€™m writing this with StackEdit, and I use iA Writer on my tablet). I add my images to Dropbox/img/input/. I tend to strip metadata from my images and resize them to fit the maximum page width (I donā€™t really care for high resolution pictures, speed is preferred over ability to zoom into a picture). For this, two tools are needed:

    sudo apt-get install imagemagick exiftool
    

    When Iā€™m done writing or want to preview an article, I SSH into my AWS Lightsail instance and run sync.sh, a small script which moves posts to a proper directory, processes images and places them in the desired location, as well as starts Octopress instance (this way I can preview my blog on the AWS Lightsail instance). Contents of sync.sh (donā€™t forget to chmod +x):

    #!/bin/bash
    cd $HOME/Dropbox/blog/img/input
    mogrify -resize 832x620 -quality 100 -path $HOME/Dropbox/blog/img/output *.jpg
    exiftool -all= $HOME/Dropbox/blog/img/output/*.jpg
    cp $HOME/Dropbox/blog/posts/*.markdown $HOME/blog/source/_posts
    cp $HOME/Dropbox/blog/img/output/*.jpg $HOME/blog/source/images/posts
    cd $HOME/blog
    rake generate
    rake preview
    

    I run the above script every time I want to preview the site. Iā€™m sure itā€™s easy to set up a daemon to watch for changes in the Dropbox folders, but I donā€™t see the need for that yet. Also, I just statically resize images to a particular resolution (832x620) since all of the pictures I upload have the same aspect ratio, Iā€™m sure thereā€™s a way to calculate that number on the fly for this script to work with different aspect ratios.

    Lastly, when I deployed and committed my changes (still commit and deploy manually out of a habit), I run archive.sh:

    #!/bin/bash
    mv $HOME/Dropbox/blog/posts/*.markdown $HOME/Dropbox/blog/published
    rm $HOME/Dropbox/blog/img/input/*
    rm $HOME/Dropbox/blog/img/output/*
    

    Itā€™s not much, but enough to save some manual labor involved in publishing to Octopress.

  • Mob level distribution

    Distributing mobs in a dungeon based on playerā€™s level (or some dungeon level difficulty factor) was somewhat straightforward, but I would still like to document the progress. I needed to place a mob thatā€™s somewhat within the difficulty level I want, plus minus a few difficulty levels to spice it up.

    Random mob distribution in roguelike dungeon.

    Above you can see three rats, three cats, a dog (r, c, d, all level 1), a farmer (f, level 2), and a lonely bandit (b, level 3) in a level 1 dungeon.

    Without going straight into measure theory, I generated intervals for each mob based on the diff of desired level and their level, and then randomly selected a point within the boundaries. Hereā€™s the abstract code:

    import bisect
    import random
    
    
    def get_random_element(data, target, chance):
        """Get random element from data set skewing towards target.
    
        Args:
            data   -- A dictionary with keys as elements and values as weights.
                      Duplicates are allowed.
            target -- Target weight, results will be skewed towards target
                      weight.
            chance -- A float 0..1, a factor by which chance of picking adjacent
                      elements decreases (i.e, with chance 0 we will always
                      select target element, with chance 0.5 probability of
                      selecting elements adjacent to target are halved with each
                      step).
    
        Returns:
            A random key from data with distribution respective of the target
            weight.
        """
        intervals = []  # We insert in order, no overlaps.
        next_i = 0
        for element, v in data.iteritems():
            d = max(target, v) - min(target, v)
            size = 100
            while d > 0:  # Decrease chunk size for each step of `d`.
                size *= chance
                d -= 1
            if size == 0:
                continue
            size = int(size)
            intervals.append((next_i, next_i + size, element))
            next_i += size + 1
        fst, _, _ = zip(*intervals)
        rnd = random.randint(0, next_i - 1)
        idx = bisect.bisect(fst, rnd)  # This part is O(log n).
        return intervals[idx - 1][2]
    

    Now, if I test the above for, say, a 1000000 iterations, with a chance of 0.5 (halving probability of selecting adjacent elements with each step), and 2 as a target, hereā€™s the distribution I end up with:

    target, chance, iterations = 2, 0.5, 1000000
    
    data = collections.OrderedDict([  # Ordered to make histogram prettier.
        ('A', 0), ('B-0', 1), ('B-1', 1), ('C', 2), ('D', 3), ('E', 4),
        ('F', 5), ('G', 6), ('H', 7), ('I', 8), ('J', 9),
    ])
    
    res = collections.OrderedDict([(k, 0) for k, _ in data.iteritems()])
    
    # This is just a test, so there's no need to optimize this for now.
    for _ in xrange(iterations):
        res[get_random_element(data, target, chance)] += 1
    
    pyplot.bar(
        range(len(res)), res.values(), width=0.7, align='center', color='green')
    pyplot.xticks(range(len(res)), res.keys())
    pyplot.title(
        'iterations={}, target={}, chance={}'.format(
            iterations, target, chance))
    pyplot.show()
    

    Distribution histogram: 1000000 iterations, 0.5 chance, and 2 as a target.

    You can see elements B-0 and B-1 having roughly the same distribution, since they have the same weight.

    Now, if I decrease the chance, likelihood of target being selected increases, while likelihood of surrounding elements being selected decreases:

    Distribution histogram: 1000000 iterations, 0.33 chance, and 2 as a target.

    I works the opposite way as well, increasing the chance decreases likelihood of the target being selected and increases the probability for surrounding elements.

    Distribution histogram: 1000000 iterations, 0.9 chance, and 2 as a target.

    For the sake of completeness, it works with 0 chance of surrounding elements being picked:

    Distribution histogram: 1000000 iterations, 0 chance, and 2 as a target.

    And an equal chance of picking surrounding elements:

    Distribution histogram: 1000000 iterations, 1 chance, and 2 as a target.

    After playing around with the configuration in Jupyter Notebook, I cleaned up the algorithm above and included it into mob placement routine.

  • Spawning evenly distributed mobs

    After generating a few good looking random dungeons, I was puzzled with randomly placing mobs on resulting maps. To make it fair (and fun) for the player mobs should be evenly distributed.

    Dungeon with randomly placed mobs.

    The obvious idea was to pick coordinates randomly within the rectangular map boundaries, and then place mobs if they have floor underneath them. But this way I lose control of the number of mobs and risk having a chance of not placing any mobs at all. Plus, dungeons with bigger surface area would get more mobs - which sounds somewhat realistic, but not entirely what I was aiming for.

    I could improve on the above by rerunning enemy placement multiple times and select the most favorable outcome - but the solution would be rather naive.

    To have control over the number of mobs I decided to place them as I generate the rooms of the dungeon. Thereā€™s a trick one can use to get a random element with equal probability distribution from a sequence of an unknown size:

    import random
    
    
    def get_random_element(sequence):
        """Select a random element from a sequence of an unknown size."""
        selected = None
        for k, element in enumerate(sequence):
            if random.randint(0, k) == 0:
                selected = element
        return selected
    

    With each iteration the chance of the current element to become a selected item is 1 divided by number of elements seen so far. Indeed, a probability of an element being selected out of a 4-item sequence:

    1 * (1 - 1/2) * (1 - 1/3) * (1 - 1/4) = 1/2 * 2/3 * 3/4 = 6/30 = 1/4 
    

    Now all I had to do is to modify this to account for multiple mob placement. Hereā€™s a generalized function above which accounts for selecting n elements from the sequence with even distribution.

    import random
    
    
    def get_random_element(sequence, n):
        """Select n random elements from a sequence of an unknown size."""
        selected = [None for _ in range(n)]
        for k, element in enumerate(sequence):
            for i in range(n):
                if random.randint(0, k) == 0:
                    selected[i] = element
        return selected
    

    I incorporated logic above into the room generation code, accounted for duplicates, and ended up with decent distribution results.

  • Randomly generated dungeons

    After playing with random dungeon generation for a bit, I seem to be satisfied with the results:

    A randomly generated ASCII-dungeon.

    The above is displayed using ADOM notation - # represents a wall, . is a floor tile, and + is meant to be a closed door. After fiddling about with a few random dungeon generation ideas, I settled with the following.

    The algorithm

    1. Start with a random point on a canvas.
    2. Create a room with random width and height. Donā€™t worry about walls yet.
    3. Select a few points on the sides of the room, put those in a stack. Save the direction in which the potential doors would point.
    4. Go through the stack, and try to add a room or a corridor (corridor is just a room with a width or a height of 1). Higher chance of corridors seems to look better and results in those wiggly passageways.
      1. Try to add a room a few dozen times with different random configurations. If no luck - give up and grab a new item from the stack. If couldnā€™t generate a continuation to the corridor - mark it as a loose end, weā€™ll clean those up later.
      2. If a room was added successfully - add a door where it connects to the previous room/corridor. Add a floor section if itā€™s a corridor connecting to another corridor.
      3. At this point one ends up with quite a few interconnected corridors, merged rooms, and all kinds of fun surroundings (my desired goal).
    5. Do the above until the stack is empty or a desired number of rooms has been generated.
    6. Clean up the loose ends from step 4.1. Remove loose corridor segments one by one until intersection with another room/corridor is detected.
    7. Add walls around the rooms and corridors, while also cleaning up any extra doors we may have left behind when creating merged corridors or rooms.
      1. I simply used slightly modified depth first search starting inside any room and drawing walls wherever I could find floor/door not connected to anything.
      2. When encountering a door - check if itā€™s surrounded by walls or doors from the opposite sides. If not - remove the door and replace it with a floor tile. If any doors were adjucent to the removed door - requeue the door check on them.
    8. Perform steps 1-7 a few hundred times, saving the resulting dungeons each time. Pick the best candidate with the most desired features - like a number of rooms, breadth, square footage, longest corridors, etc.

    A more detailed explanation of the steps is below. For now, here are a few more dungeons generated using this method:

    A randomly generated ASCII-dungeon.

    A randomly generated ASCII-dungeon.

    A randomly generated ASCII-dungeon.

    I think dungeon generation is far more art than science, and I had a lot of fun tweaking all the different input parameters:

    • Room size boundaries.
    • Corridor lengths.
    • Frequency of corridor occurrences.
    • Number of exits from the room.
    • Number of room generation attempts.
    • Number of dungeon generation attempts.
    • Final dungeon picking heuristics.

    Last item on the list is the most interesting one - with few hundred dungeons as an input, picking the right one is rather important. I ended up settling on using max priority queue with a rough surface area of the dungeon as a key (itā€™s more of a breadth, really - how wide and tall it is). Then Iā€™d sift through some top results and pick the one with the most rooms available. This results in the most fun-looking map which feels up most of the screen, while still usually not being too cluttered.

    Hereā€™s a breakdown of a simple scenario:

    Steps 1 and 2

    Pick a random spot on a canvas and generate a room of random size (4 x 3):

    ....
    ....
    ....
    

    Step 3

    Select potential spots for doors, letā€™s label them 1, 2, 3.

    ....2
    ....
    ....1
      3
    

    I went for a uniform distribution by unfolding the rectangle and folding it back in to get a proper coordinate on the perimeter. Now, stack contains coordinates of [1, 2, 3] (along with the directions in which they are pointing).

    Steps 4 and 5

    Add a room or a corridor to a connector number 3. Weā€™ll be adding the room to the right of number 3. Letā€™s assume random sends a corridor of length 5 our way. Weā€™re happy with the corridor pointing either up, down, or right - so we let the random decide again: up.

         4
         .
         .
    ....2.
    .... .
    ....3.
      1
    

    We add the end of the corridor to the stack as number 4 (now [1, 2, 4]). We also mark 4 as a loose end, in case we end up not adding a room to it. Dangling corridors are never pretty.

    Now, to replace number 3 with a door:

         4
         .
         .
    ....2.
    .... .
    ....+.
      1
    

    Adding another random corridor of length 2 to the point 4, pointing right. Replace number 4 with a floor segment, since point 4 was the end of another corridor. Remove point 4 from loose ends, add point 5.

         ...5
         .
         .
    ....2.
    .... .
    ....+.
      1
    

    Take point 5, generate a room of size 3 x 6. 5 becomes a door. Loose ends list is empty.

         ...+...
         .   ...
         .   ...
    ....2.   ...
    .... .   ...
    ....+.   ...
      1
    

    For simplicityā€™s sake, letā€™s assume we donā€™t want any more exits from this room. Back to the stack of [1, 2]. Point 2 seem to not have much room for growth. After a few unsuccessful attempts to place a room or a corridor there, we give up:

         ...+...
         .   ...
         .   ...
    .... .   ...
    .... .   ...
    ....+.   ...
      1
    

    Now for point 1: we get another corridor of length 3. Point 6 is now added to the loose ends list.

         ...+...
         .   ...
         .   ...
    .... .   ...
    .... .   ...
    ....+.   ...
      +
      .
      .
      .
      6
    

    Letā€™s assume we run out of space and canā€™t add anything to the end of 6. Weā€™re done generating the dungeon. Our stack is empty, and our loose ends contains coordinates of 6.

    Step 6

    Start with the loose end, and remove items one by one until a tile with multiple neighbors is encountered:

         ...+...
         .   ...
         .   ...
    .... .   ...
    .... .   ...
    ....+.   ...
      X
      X
      X
      X
      X
    

    Viola:

         ...+...
         .   ...
         .   ...
    .... .   ...
    .... .   ...
    ....+.   ...
    

    Steps 7 and 8

    There are no rogue doors in this scenario, so all we need to do is add the walls:

         #########
         #...+...#
         #.###...#
    ######.# #...#
    #....#.# #...#
    #....#.# #...#
    #....+.# #...#
    ######## #####
    

    All of the steps above should be repeated a few hundred times with different dungeons, and then the best dungeon should be picked as a final one.

    Did I miss anything? Was cleaning up ā€œloose endsā€ too much of a hack? What should have I done differently?

  • Contributing to an existing Octopress blog

    I had to download my Octopress blog to a new machine today and the process of setting up didnā€™t go as smoothly as I expected. At the end of the day the setup turned out to be simple, and here are the steps:

    git clone -b source https://github.com/username/username.github.io myblog
    cd myblog
    git clone https://github.com/username/username.github.io _deploy
    bundle install
    

    In a nutshell, you have to manually add _deploy folder set to track master branch of your Octopress repository. Otherwise rake deploy command fails.

    Happy writing!

  • One more argument for 80 character limit

    Limiting code to 80 (or 100 or 120) characters per line. Itā€™s a well-known topic, and thereā€™s an overall consensus on the subject, not counting occasional questions by newbies and odd cases. Most established tech companies have their own line length guidelines. These are often dependent on a language, such as in the case of Google with their 80 character Python and 100 character Java limits.

    In this article, I wonā€™t be advocating all the usual arguments, such as easier diff with version control software, or the ability to stack windows side by side on wide screens. No, I believe that battle to be won quite a long time ago, the topic is now closed. But something I didnā€™t find mentioned in any of the discussions is an interesting point from the world of writers and designers.

    Ever since I started being interested in improving my writing skills, I found article after article mention the importance of line length in reading. Interestingly enough, this issue was raised in a world of literature. It had been resolved long before programmers were fascinated with the desire to use up rapidly expanding screen real estate.

    I am talking about something known as ā€œmeasureā€ in typography. It seems to be the reason newspapers use narrow columns, or books leave such vast margins around the text on a page. Hereā€™s an exempt from the Wikipedia article:

    Measure (or sometimes ā€œThe Measureā€) in typography is the length of a line of text. For a single-column design measure should ideally lie between 40 & 80 characters. Many typographers consider the perfect measure to be 65 characters. If the lines are too short then the text becomes disjointed, if they are too long the content loses rhythm as the reader searches for the start of each line. Punctuation should preferably hang outside the measure.

    Most programming languages use special operators and keywords, which can be considered ā€œpunctuationā€. Some languages are more verbose (use more punctuation), and some arenā€™t. If you remove punctuation, the 80/100/120 character limit perfectly fits the standard time-honed ā€œmeasureā€.

    Maybe I can use this as an additional argument the next time I have to explain to a junior new hire why the character limits are so strictly enforced.

  • Export WordPress posts to plain text

    I prefer to create and edit my posts in plain text. Itā€™s a timeless format, and I find it nice to have an archive of posts lying around in plain text.

    I wrote a little Python script which I used to extract an archive of my posts and pages into a bunch of files on my computer. In order to use it, you need to use ā€œWordPress Importerā€ plugin. Export your posts and pages into a WXR format, and feed the file to the script.

    Source code of the script is below (link for downloading the script is at the bottom of this post):

    #!/usr/bin/env python
    
    """This script converts WXR file to a number of plain text files.
    
    WXR stands for "WordPress eXtended RSS", which basically is just a
    regular XML file. This script extracts entries from the WXR file into
    plain text files. Output format: article name prefixed by date for
    posts, article name for pages.
    
    Usage: wxr2txt.py filename [-o output_dir]
    """
    
    import os
    import re
    import sys
    from xml.etree import ElementTree
    
    NAMESPACES = {
            'content': 'http://purl.org/rss/1.0/modules/content/',
            'wp': 'http://wordpress.org/export/1.2/',
    }
    USAGE_STRING = "Usage: wxr2txt.py filename [-o output_dir]"
    
    def main(argv):
        filename, output_dir = _parse_and_validate_output(argv)
        try:
            data = ElementTree.parse(filename).getroot()
        except ElementTree.ParseError:
            _error("Invalid input file format. Can not parse the input.")
        page_counter, post_counter = 0, 0
        for post in data.find('channel').findall('item'):
            post_type = post.find('wp:post_type', namespaces=NAMESPACES).text
            if post_type not in ('post', 'page'):
                continue
            content = post.find('content:encoded', namespaces=NAMESPACES).text
            date = post.find('wp:post_date', namespaces=NAMESPACES).text
            title = post.find('title').text
            date = date.split(' ')[0].replace('-', '')
            title = re.sub(r'[_]+', '_', re.sub(r'[^a-z0-9+]', '_', title.lower()))
            if post_type == 'post':
                post_filename = date + '_' + title + '.txt'
                post_counter += 1
            else:
                post_filename = title + '.txt'
                page_counter += 1
            with open(os.path.join(output_dir, post_filename), 'w') as post_file:
                post_file.write(content.encode('utf8'))
            post_counter += 1
        print "Saved {} posts and {} pages in directory '{}'.".format(
                post_counter, page_counter, output_dir)
    
    def _parse_and_validate_output(argv):
        if len(argv) not in (2, 4):
            _error("Wrong number of arguments.")
        filename = argv[1]
        if not os.path.isfile(filename):
            _error("Input file does not exist (or not enough permissions).")
        output_dir = argv[3] if len(argv) == 4 and argv[2] == '-o' else os.getcwd()
        if not os.path.isdir(output_dir):
            _error("Output directory does not exist (or not enough permissions).")
        return filename, output_dir
    
    def _error(text):
        print text
        print USAGE_STRING
        sys.exit(1)
    
    if __name__ == "__main__":
        main(sys.argv)
    

    You can download the script from here: wxr2txt.py.

  • Python: "ignored" context manager

    There was a recent fantastic addition to Python 3.4 by Raymond Hettinger: contextlib.ignored. Itā€™s a context manager which lets you shorten the following often-occurring pattern:

    try:
        os.remove('i_probably_do_not_exist.txt')
    except OSError:
        pass
    

    And turn it into this:

    with ignored(OSError):
        os.remove('i_probably_do_not_exist')
    

    Much shorted and prettier. But, as probably most of engineers, you have to use older version of python in production. Thatā€™s where this little chunk of code comes in. Create a little compat (as in ā€œcompatibilityā€) library and add following function:

    import contextlib
    
    @contextlib.contextmanager
    def ignored(*exceptions):
        try:
            yield
        except exceptions:
            pass
    

    Beautiful!

    UPDATE: As Andy pointed out in comments, this context manager has been renamed to contextlib.suppress (https://bugs.python.org/issue19266).

  • Making Django and Lettuce play nice together

    Lettuce is a great BDD tool which allows you to parse expressions written via Gherkin syntax in python. However the documentation is not very comprehensive, and at the moment current version (0.2.19) has some issues integrating with the latest Django (1.6.1 at the moment of writing). Without further ado, Iā€™ll get to a comprehensive tutorial.

    Letā€™s assume you are using pip and virtualenv for the dependency control, and you already have a working project configured. Your project is called ā€œmyprojectā€, and the only app you have within your project is called ā€œpollsā€.

    Setup

    First, you have to install lettuce library. At the moment of writing, current released version (0.2.19) has an error integrating with Django, so weā€™ll install current development version. Releases 0.2.20 and up should include the fix, so pip install lettuce would be better if the version is out.

    pip install -e \
        git://github.com/gabrielfalcao/lettuce@cccc397#egg=lettuce-master
    pip install django-nose splinter
    pip freeze > requirements.txt
    

    First line downloads lettuce package from the github repository and installs missing dependencies. You can replace cccc397 with the current commit. Technically commit can be omitted, but we donā€™t want to have an unstable ever-changing branch in our requirements.txt. I also added django-nose since nose assertions come in handy while writing Lettuce steps, as well as splinter, which is a great tool for testing web application.

    Add Lettuce to the INSTALLED_APPS in your myproject/settings.py:

    INSTALLED_APPS = (
        'django.contrib.auth',
        'django.contrib.contenttypes',
        'django.contrib.sessions',
        'django.contrib.sites',
        'django.contrib.messages',
        'django.contrib.staticfiles',
        'django.contrib.admin',
        'django.contrib.admindocs',
        # ... third party apps ...
        'lettuce.django',
        'myproject',
        'polls',
    )
    

    You also have to explicitly specify the apps you want to use with lettuce:

    LETTUCE_APPS = (
        'polls',
    )
    

    By default, lettuce will run itsā€™ tests against your default database. But we want to use test database for that, so we have to add few more settings:

    LETTUCE_TEST_SERVER = 'lettuce.django.server.DjangoServer'
    LETTUCE_SERVER_PORT = 9000
    

    Where LETTUCE_TEST_SERVER is a subclass of Djangoā€™s LiveTestServerCase - a class which runs a test server for you and LETTUCE_SERVER_PORT is different from port 8000 so you wonā€™t have issues running the development server via python manage.py runserver at the same time as running Lettuce tests.

    You also have to create a features directories inside the apps you want to test with Lettuce:

    /myproject
        /myproject
            __init__.py
            settings.py
            urls.py
            wsgi.py
        /polls
            /features
                /steps
                    __init__.py
                    polls_list.py
                polls_list.feature
            __init__.py
            models.py
            tests.py
            views.py
        manage.py
        requirements.txt
        terrain.py
    

    Lettuce has itsā€™ own settings file called terrain.py. It has to be in the same directory as a manage.py:

    from django.core.management import call_command
    from django.conf import settings
    from lettuce import before, after, world
    from splinter.browser import Browser
    
    @before.each_scenario
    def flush_database(scenario):
        call_command('flush', interactive=False, verbosity=0)
    
    @before.each_scenario
    def prepare_browser(scenario):
        world.browser = Browser()
    
    @after.each_scenario
    def destroy_browser(scenario):
        world.browser.quit()
    

    This code flushes the database before each scenario, as well as prepares and destroys the splinter browser.

    Writing the features

    Feature files support standard Gherkin syntax, letā€™s write one right now in polls/features/polls_list.feature:

    Feature: Polls list
    
        Scenario: Polls list without any polls
            When I access the "polls list" url
            Then I see a text "We didn't find any polls!"
    
        Scenario: Polls list with one poll
            Given a poll with the title "Hello world"
            When I access the "polls list" url
            Then I see a text "Hello world"
            And I do not see a text "We didn't find any polls!"
    

    Now describe the steps in polls/features/steps/polls_list.py:

    from django.core.urlresolvers import reverse
    from lettuce import step, world
    from lettuce.django import django_url
    from nose.tools import assert_in, assert_not_in
    from polls.models import Poll
    
    PAGES = {
        "polls list": "polls:list"
    }
    
    @step(r'access the "(.*)" url')
    def access_the_url(step, name):
        world.browser.visit(django_url(reverse(PAGES[name])))
    
    @step(r'see a text "(.*)"')
    def see_a_text(step, text):
        assert_in(text, world.browser.html)
    
    @step(r'a poll with the title "(.*)"')
    def create_a_poll_with_the_title(step, title):
        poll = Poll.objects.create(title=title)
        polls.save()
    
    @step(r'do not see a text "(.*)"')
    def do_not_see_a_text(step, text):
        assert_not_in(text, world.browser.html)
    

    Running the tests

    Now, you can run python manage.py harvest --test-server to run the tests you just wrote:

    $ python manage.py harvest --test-server
    Creating test database for alias 'default'...
    Django's builtin server is running at 0.0.0.0:9000
    
    Feature: Polls list
    
      Scenario: Polls list without any polls
        When I access the "polls list" url
        Then I see a text "We didn't find any polls!"
    
      Scenario: Polls list with one poll
        Given a poll with the title "Hello world"
        When I access the "polls list" url
        Then I see a text "Hello world"
        And I do not see a text "We didn't find any polls!"
    
    1 feature (1 passed)
    2 scenarios (2 passed)
    6 steps (6 passed)
    Destroying test database for alias 'default'...
    

    Donā€™t forget the --test-server switch - otherwise Lettuce will run tests against your default database.

    Sources

    You can find some more details on Lettuce and Django integration here: Web development fun with Lettuce and Django.

    Update

    Rather than using --test-server switch, itā€™s easier and safer to set a flag in your settings.py (suggested by Michel Sabchuk):

    LETTUCE_USE_TEST_DATABASE = True
    

    This way you wonā€™t end up accidentally erasing your production database after forgetting to add --test-server flag.

  • Python doctests and decorators bug

    Now, this as well might be a feature, but doctest strings are not being executed for decorated functions (at least in version 2.7). However, there is a workaround.

    You need to decorate your functions with functools.wraps within a decorator to import docstrings into a decorator scope.

    #!/usr/bin/env python
    
    from functools import wraps
    
    def decorator(func):
        @wraps(func)
        def wrapper():
            return func()
        return wrapper
    
    @decorator
    def foo():
        """
        >>> foo()
        False
        """
        return True
    
    import doctest
    doctest.testmod()
    

    Now you can see this test failing, where otherwise it would have been ignored:

    **********************************************************************
    File "decorator.py", line 12, in __main__.foo
    Failed example:
        foo()
    Expected:
        False
    Got:
        True
    **********************************************************************
    1 items had failures:
       1 of   1 in __main__.foo
    ***Test Failed*** 1 failures.
    
  • Python tests with doctest and unittest

    When it comes to tests, doctest is a great simple module to write tests for your application. However it is pretty basic and does not have any extended features like, for example, centralized unit tests. If you have multiple modules with doctests (and you probably do) you most likely want to be able to run all doctests recursively from one place. Thatā€™s where unittest comes in.

    Letā€™s assume we store modules in the lib/ directory:

    $ ls lib/
    __init__.py bar.py foo.py
    

    Here are the contents of foo.py and bar.py respectfully:

    def foo():
        """
        >>> foo()
        False
        """
        return False
    
    def bar():
        """
        >>> bar()
        True
        """
        return True
    
    def baz():
        """
        >>> baz()
        False
        """
        return False
    

    Now, to run all tests we need a wrapper script. Letā€™s call it: runtests.py:

    #!/usr/bin/env python
    
    import unittest
    import doctest
    import os
    
    files = []
    root_dir = 'lib/'
    
    for root, _, filenames in os.walk(root_dir):
        for filename in filenames:
            if filename == '__init__.py' or filename[-3:] != '.py':
                continue
            f = os.path.join(root, filename)
            f = f.replace('/', '.')
            f = f[:-3]
            files.append(f)
    
    suite = unittest.TestSuite()
    for module in files:
        suite.addTest(doctest.DocTestSuite(module))
    unittest.TextTestRunner(verbosity=1).run(suite)
    

    This approach invokes the doctest.DocTestSuite method, which converts doctests strings into unittest suites. Time to run our tests:

    $ chmod +x runtests.py
    $ ./runtests.py
    ...
    ----------------------------------------------------------------------
    Ran 3 tests in 0.008s
    
    OK
    

    And just to be sure that approach actually works, letā€™s make one of the tests fail:

    $ ./runtests.py
    .F.
    ======================================================================
    FAIL: baz (lib.bar)
    Doctest: lib.bar.baz
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/usr/lib/python2.7/doctest.py", line 2201, in runTest
        raise self.failureException(self.format_failure(new.getvalue()))
    AssertionError: Failed doctest test for lib.bar.baz
      File "/home/rosipov/unitdoc/lib/bar.py", line 8, in baz
    
    ----------------------------------------------------------------------
    File "/home/rosipov/unitdoc/lib/bar.py", line 10, in lib.bar.baz
    Failed example:
        baz()
    Expected:
        True
    Got:
        False
    
    ----------------------------------------------------------------------
    Ran 3 tests in 0.009s
    
    FAILED (failures=1)
    
  • pygame.font not found

    I had an issue with pygame not being able to find a dependency for the font module. After quite a time-consuming search the missing package name was libsdl-ttf2.0-dev.

    Hope this helps someone.

  • Rails and MongoDB with Cygwin

    Setting up Ruby on Rails with MongoDB on a Windows machine.

    You need to have cygwin installed with ruby and git packages (obviously you may want to have more).

    The following commands are executed in the cygwin prompt:

    git clone git://github.com/rubygems/rubygems.git
    cd rubygems/
    ruby setup.rb
    gem install rails
    

    Go to the MongoDB website and download Windows binaries: http://www.mongodb.org/downloads. Extract the content of the bin/ directory to C:\cygwin\usr\local\bin.

    Create a directory for the db files (the default MongoDB db files directory is C:\datadb):

    cd /cygdrive/c
    mkdir data
    mkdir data/db
    

    Done! Both mongo and rails are in your cygwinā€™s path now, feel free to tweak it as you see fit.

  • C strtok usage example

    I had some issues with the strtok function. Hereā€™s a detailed explanation and usage for it.

    The strtok function is used to tokenize a string and thus separates it into multiple strings divided by a delimiter.

    #include <stdio.h>
    #include <string.h>
    
    int main() {
        int j, i = 0;
        char delim[4];
        char user_input[81], *token[80];
        user_input[0] = 0;  // set first byte to 0 to detect empty string
    
        // validate the string length
        while (strlen(user_input) <= 1 || strlen(user_input) > 82) {
            printf("Feed me a string to tokenize: ");
            fgets(user_input, sizeof(user_input), stdin);
        }
    
        printf("And a delimiter (up to 4 chars): ");
        fgets(delim, sizeof(delim), stdin);
    
        token[0] = strtok(user_input, delim);  // first call returns pointer
                                               // to first part of user_input
                                               // separated by delim
        while (token[i] != NULL) {
            i++;
            token[i] = strtok(NULL, delim);  // every call with NULL uses
                                             // saved user_input value and
                                             // returns next substring
        }
    
        for (j=0; j<=i-1; j++) {
            printf("%sn", token[j]);
        }
    
        return 0;
    }
    

    Letā€™s compile and execute it:

    Feed me a string to tokenize: foo/bar/baz
    And a delimiter: /
    foo
    bar
    baz
    

    The first call to strtok returns the pointer to the first substring. All the next calls with the first argument being NULL use the string passed at the first call and return the next substring. The function returns NULL if no more substrings are available.

  • Create gitolite repository

    A reminder on how to initialize a fresh gitolite repository, assuming that gitolite has already been set up.

    All actions are performed on a local machine. In this case: ~/gitolite-admin is admin repository, ~/foo is desired repository, rosipov.com is gitolite hostname. Command vi stands for the text editor, but you may use whichever editor you prefer.

    cd ~/gitolite-admin
    vi conf/gitolite.conf
    

    Add lines (obviously you may want to use individual users instead of @all):

    repo foo
        RW+ = @all
    

    Save it. Next:

    git add conf/gitolite.conf
    git commit -m "Add foo repo for @all"
    git pull --rebase &amp;&amp; git push
    mkdir ~/foo
    cd ~/foo
    git init
    git remote add origin git@rosipov.com:foo.git
    

    Add some files at this point. In this example, only .gitkeep is added.

    git add .gitkeep
    git commit -m "Initialize repo"
    git push origin master
    

    The new repository is all set up now.

  • C - Reallocating memory for the array

    I justĀ startedĀ delving into C and I had some issues with increasing the memory allocation size for my array. The best way to understand something - write a detailed explanation of the process. So here we go:

    #include <stdio.h>
    #include <stdlib.h>
    
    // notice that array is a pointer to a pointer
    int append_array_element(int **array, int *array_length, int element) {
        *array_length += 1;
        *array = realloc(*array, *array_length * sizeof(int));
        (*array)[*array_length - 1] = element; // [] has higher priority
                                               // then * in C's order of
        return 0;                              // operations
    }
    

    We have a pointer to the pointer to the array, pointer to the arrayā€™s length, and a value to append to the array. Array is passed as a pointer to pointer so we will be able to ā€˜repointā€™ the original pointer to a new memory segment.

    Letā€™s call the function a few times and see what it gives us.

    int main() {
        int *array = NULL;
        int array_length = 0;
    
        append_array_element(&array, &array_length, 142);
        printf ("Our array with %d elementsn", array_length);
        printf("%dn", array[0]);
    
        append_array_element(&array, &array_length, 19);
        printf ("Our array with %d elementsn", array_length);
        printf("%dn", array[0]);
        printf("%dn", array[1]);
    
        return 0;
    }
    

    And the output is:

    Our array with 1 elements
    142
    Our array with 2 elements
    142
    19
    

    You can never be sure when you work with memory, so letā€™s add some error handling in case the operation fails.

    int append_array_element(int **array, int *array_length, int element) {
        *array_length += 1;
        // realloc() returns pointer or False if it fails
        void *_tmp_array = realloc(*array, *array_length * sizeof(int));
        if (!_tmp_array) {
            printf("Error while trying to realloc memory!n");
            return -1;
        }
        *array = _tmp_array;
        (*array)[*array_length - 1] = element;
    
        return 0;
    }
    

    Now we will be able to handle the crash on our own. But reallocation is not a very fast thing, so letā€™s double the amount of reallocated memory every time we run out of allocated memory. Weā€™ll need one more variable to track the current allocated memory size.

    int append_array_element(int **array, int *array_length,
                             int *array_alloc_length, int element) {
        *array_length += 1;
        if (*array_length > *array_alloc_length) {
            if (*array_alloc_length == 0) {
                *array_alloc_length = 1;
            } else {
                *array_alloc_length *= 2;
            }
    
            void *_tmp_array = realloc(*array, *array_alloc_length * sizeof(int));
            if (!_tmp_array) {
                printf("Error while trying to reallocate memory for an array!n");
                return -1;
            }
            *array = _tmp_array;
        }
        (*array)[*array_length - 1] = element;
    
        return 0;
    }
    

    Letā€™s call it few times:

    int main() {
        int *array = NULL;
        int array_length = 0;
        int array_alloc_length = 0;
    
        int i;
        for (i = 0; i < 10; i++) {
            append_array_element(&array, &array_length, &array_alloc_length, i);
        }
    
        printf("Our array with %d elementsn", array_length);
        for (i = 0; i < 10; i++) {
            printf("%dn", array[i]);
        }
        printf("Array length: %d, allocated elements: %dn",
               array_length, array_alloc_length);
    
        return 0;
    }
    
    Our array with 10 elements
    0
    1
    2
    3
    4
    5
    6
    7
    8
    9
    Array length: 10, allocated elements: 16
    

    And for the fun of it, letā€™s do the same with 100 elements:

    Our array with 100 elements
    0
    1
    2
    ...
    97
    98
    99
    Array length: 100, allocated elements: 128
    

    Letā€™s improve the function a little bit more - and we are good to go. Letā€™s say we have a dynamic array with 129 elements - but we will have allocated memory for 256 elementsā€¦ It is 127 more then we need! We donā€™t want to be greedy, so letā€™s find a way to free up the memory.

    #include <stdio.h>
    #include <stdlib.h>
    #include <stdbool.h> // I included this just to use boolean values with
                         // neat true/false aliases
    
    int append_array_element(int **array, int *array_length,
                             int *array_alloc_length, int element,
                             bool is_last_element) {
        *array_length += 1;
        if (*array_length > *array_alloc_length || is_last_element) {
            if (is_last_element) {
                *array_alloc_length = *array_length;
            }
            else if (*array_alloc_length == 0) {
                *array_alloc_length = 1;
            } else {
                *array_alloc_length *= 2;
            }
    
            void *_tmp_array = realloc(*array, *array_alloc_length * sizeof(int));
            if (!_tmp_array) {
                printf("Error while trying to reallocate memory for an array!n");
                return -1;
            }
            *array = _tmp_array;
        }
        (*array)[*array_length - 1] = element;
    
        return 0;
    }
    

    That should be pretty straight-forward. And letā€™s call it two more times with 10 and 100 elements:

    int main() {
        int *array = NULL;
        int array_length = 0;
        int array_alloc_length = 0;
        bool is_last_element;
    
        int i;
        int i_max = 10;
        for (i = 0; i < i_max; i++) {
            if (i == i_max - 1)
            {
                is_last_element = true;
            } else {
                is_last_element = false;
            }
            append_array_element(&array, &array_length, &array_alloc_length,
                                 i, is_last_element);
        }
    
        printf("Our array with %d elementsn", array_length);
        for (i = 0; i < i_max; i++) {
            printf("%dn", array[i]);
        }
        printf("Array length: %d, allocated elements: %dn",
               array_length, array_alloc_length);
    
        return 0;
    }
    

    And the results are:

    Our array with 10 elements
    0
    1
    2
    3
    4
    5
    6
    7
    8
    9
    Array length: 10, allocated elements: 10
    
    Our array with 100 elements
    0
    1
    2
    ...
    97
    98
    99
    Array length: 100, allocated elements: 100
    

    Pretty neat, huh? Thank you for reading!