The road to improving Mipui performance

Alon Mishne, June 2018

On December 2017 I created a poll to get a feel on what was important for the community. I had free-text fields and a recurring complaint there was performance on large maps, so I decided to tackle that. I knew it would be tricky, but I didn't know how much. This post gives a rough overview of the problem, the solution, and why it has difficult.

Background

Mipui is a web app I'm developing, serving as a map maker for top-down grid-based maps, the type used in role-playing games such as D&D. It also has other capabilities - saves to cloud, live updates shared across users, and more - but those are out of scope here.

From an early point in the development, the size of maps has been flexible. It defaults to 20x20, but you can change it to any size using the buttons on the side of the map. Later on I added a dedicated map resize button, which makes it even easier to quickly resize to any size.

On my machine, the performance of 20x20 maps was smooth. By performance I mean the frames per second (FPS) when panning, zooming, hovering and drawing on the map. However things started to slow down as the maps got larger, however. The slowdown was starting to get noticeable on 30x30 maps, and on 50x50 it was already getting into frustrating territory. So, this is the problem I set out to tackle.

I didn't have a clear target when I started, but I was hoping I could get 100x100 maps running smoothly - I would consider that a success.

The Reason it was Slow

Initial profiling indicated what I suspected - the slowdown wasn't really caused by anything in my code, but was due to the time it took the browser to render the dom tree.

Mipui's grid is actually a collection of absolutely-positioned elements (all divs). Because Mipui actually places a wall grid cell in-between every two non-wall cells, the actual number of cells for a 20x20 map is 41x41. Worse, it's not just a single div per cell; every "layer" is assigned a div per cell, where layers are floor layer, wall layer, text layer etc. Empty layer values don't get a div so it's not really that bad, but the minimum number of divs per cell is 2: the floor layer and a "grid" layer, which is an invisible layer which is the one actually interacting with the mouse.

If I estimate an average of 1 non-empty layer per cell, which feels about right to me - so a total of 3 with the wall and grid layers - then the total number of divs for a 20x20 map is 41x41x3 = 5043. For a 50x50 map, that grows to 101x101x3 = 30,603. My aspirational 100x100 map is 121,203 divs, yet another order of magnitude over the already slow 50x50 map.

For reference, at the time of writing the number of elements on my Facebook homepage is 1866 and in my Gmail inbox-with-preview it's 4332, and they certainly don't all respond to mouse events and move around. So no wonder my high div count made things so slow - looks like typical websites don't have so many dynamic and interactive divs on-screen at once.

Side Note: Why Not Canvas?

At the early stages of the project I considered using canvas instead of divs, but ended up ruling that out for various reasons which are outside the scope here. By the time I started the performance work mentioned in this post, changing it from divs to canvas would have been a considerable effort, and I wasn't entirely sure that it would solve the problem, and I was guessing it would also introduce new issues of its own.

Initial Attempts

Even before my December 2017 survey I tried to improve on performance, and started out with some small things. I prevented the map-resize button from moving every time you zoom or pan the map, instead only disappearing and reappearing a second after the zoom/pan was complete (big performance boost); I moved some interaction logic to only occur on each frame with requestAnimationFrame (small boost) and I cached some sizing values that I needed (barely noticeable boost). None of these was enough to really tackle the problem.

The will-change Property

The next improvement attempt was trying to use the will-change property. According to the documentation it's supposed to give the browser a "heads-up" that something is about to change, but through experimentation I found out that in Chrome, what "will-change: contents" really does is cache the element to an image. That's actually the opposite of what I was expecting (if anything, it's anything that isn't about to change that should be cached), but it still seemed to have a lot of potential to be used for improving performance.

I tried applying will-change at several resolutions, and finally found out that applying it to each layer elements resulted in a significant performance boost - around x2 FPS - particularly for zooming operations. Of course there were a few problems I had to solve, for example zooming in would actually lead to a blurry image, because of the image caching. So I had to forcefully remove and re-apply the will-change property after the zoom is complete to force the browser to re-cache the image. Another problem was that the map now had occasionally artifacts, which was very annoying. Still, x2 was too good to skip so I used it.

Seeing the browser cache to image gave me an idea: what if I would tile the map - meaning, I'd divide it into small submaps, sized 5x5 or 10x10 or something like that - and I'd apply will-change on that tile level (or per-layer in that tile), instead of on entire map layers? That way I'll be able to edit small chunks at a time, while everything else is cached to image and has little performance impact.

I did some non-trivial amount of work to implement the tiling approach, but at the end I tested it and it had no performance benefit. Still, that tiling idea felt powerful, so I left the code around.

Ultimate Solution: Caching Tiles to Images

So, along comes December 2017 and the performance complaints, and I'm thinking about the tiling idea again, this time more seriously. I was already using a library that knows how to transform arbitrary dom to image, called dom-to-image, for exporting the map to PNG - could I utilize that with my tiling code to properly cache tiles to images? In theory the performance improvement could be dramatic. With 10x10 tiles, a 100x100 map will be around 100 image elements instead of 120k+ divs! Sure, a bunch of the tiles will be "active" and thus made out of many divs instead of a single image, but I already know that with just a few thousands elements thing run smoothly. Plus, caching myself instead of relying on the flaky "will-change" property means less artifacts and more cross-browser compatibility. So, I decided to go for it.

I did some prototyping with tiling to test the water. I crudely tiled the map and replaced tiles with arbitrary images, just to see how fast the page responds. I discovered that if I just made the underlying tile invisible or hidden then nothing really changes, but if I actually detach the tile node from the DOM tree then performance does indeed get a huge boost! Detaching and reattaching is easy, so this definitely seemed to be the right way to approach this.

Turns out there are two major challenges involved with this approach. The first is getting the tile boundary to be smooth in the presence of content that can cross tile boundaries - either explicitly with multi-cell elements (such as multi-cell text and tokens), or implicitly with the shadow dropped by wall elements, which is a crucial part of the design. The second challenge is performance: the dom-to-image library I was using is way too slow to be used in this manner, taking up to a second for even small tiles, causing noticeable jank.

Challenge 1: Seamless Tile Boundaries

I initially toyed with just letting multi-cell elements expand out of their containing tile. But this approach doesn't allow such a token to be inserted in between layers in the neighboring tile - it has to either be at the top or the bottom... so that was a no-go from the start. This means tiles would have to be hermetic: They must know how to draw their entire content and not let it overflow (unless it's guaranteed to be the top-most content in all neighboring tiles).

Instead, I use what I nicknamed "replicas". When a cell is drawn to a tile, it also calculates which other tiles contain parts of it, and creates a replica of itself in the other tiles, with the same absolute location on the map. This results in two elements in different tiles but that share the same exact absolute location, leading to a smooth transition while still being hermetic per tile.

A similar approach is used for walls and other elements that cast shadows. If an element won't actually extend to a neighbor tile but its shadow will, it's also replicated in that tile. This was actually broken in the Chrome version I was using - shadows of clipped elements were missing - so I opened a Chrome bug on it which was fixed later on.

The last problem was that the browser would often display a single-pixel "seam" in-between neighboring tiles. To counteract that, I made tiles actually expand a single pixels to all sides, and they overlap each other so that the seam is now always over such an overflow, and such invisible.

Challenge 2: Faster DOM to Image

I knew that the dom-to-image library I was using is excruciatingly slow when exporting the entire map to an image, but I was hoping that using tiles that are significantly smaller than the full map - for example, 5x5 tiles - would be fast enough to not be noticeable for the user. Turns out I was wrong - even small tiles could sometimes reach up to a second to convert, and occasionally more, completely unreasonable for constantly caching tiles in the background.

Well, my first idea was that I don't actually need tile caching to be fast, I just need it to be responsive. I looked inside the dom-to-image library and saw it was dominated by a loop iterating over all nodes in the target dom. I added a potential interrupt point in-between loop iterations, and triggered that interrupt on some user interaction actions, for example hovering over a new tile. It seemed to be working initially, but I still got some jank, and it turned out there's an atomic phase in the flow that I could not interrupt - serializing the dom to XML, which still sometimes took over a second. The reason is that the XML is huge, mostly because that library copies all styles (getComputedStyle) for all nodes! It had no other choice, really, since the XML needs all styling information inside it.

However, what if I didn't need to copy all styling information? It would make the XML far smaller and I would have no need to perform a deep-clone of the target node (another time-consuming phase that dom-to-image library had to perform)! So I set about to write my own library, which I called GridImager. I worked around the styling limitation by maintaining a string with the current active CSS rules in a GridImager instance, then prepending that to the generated XML.

The results were dramatic! I could finally get caching small tiles to take under 50ms in most cases, which is low enough to not be too disruptive. However there was nearly an endless parade of issues I had to tackle before I could get it to support everything I needed, particularly around scaling of external images and inline SVGs. I wrote a test suite and slowly progressed through the issues until everything was ironed out, and I ended up with a lightning-fast alternative to dom-to-image, which I then fully replaced.

Conclusion

The final result is still imperfect; in particular, operations that change a large area (such as paint bucket or theme change) now "activate" a lot of tiles and are slow until the number of activated tiles goes back down. I added a UI element which displays "please wait" in these cases, to minimize frustration. And there are sometimes small hickups still. But in general, this solution works well - small-scale work on a 200x200 map is almost as breezy as on a 20x20 map, with pan and zoom also being very fast. By my metrics, that's a success. Woo!