August 3, 2015

Pimp My Code, Book 2: Replacing Loops in Swift

Loops: What do we know about them? Do we have a loop problem? Let's find out.

In 26 years of programming ObjectiveC I haven't re-thought the loop much, except when Apple added "for x in y" style loops to Objective-C (which I will refer to as "for/each" from now on) and the fast enumeration protocol, which was plenty cool. But now that I'm switching to Swift, I'm taking the time to reconsider everything about how I code.

[Since you're such an intelligent and discriminating person I won't spend any time trying to sell you on Swift — you're surely already using it, you handsome land mermaid and/or merman. I will mention my next product is 100% Swift (except third-party libraries, of course), and Swift is fast, concise, and works wonderfully for me except for the stupid CGFloat nonsense.]

What first struck me when learning Swift was how much play the "map()" and "filter()" functions got on the web [also "flatmap()" but I'm not as excited about that one yet because I don't love LISP and it's not 1970]. (With the August 6 release of Swift map() is augmented with forEach() if it's the end of your chain, so I have updated this blog entry to use the latter when appropriate.) I admit I haven't exactly kept current with all the newfangled trends in programming languages; I program for Cocoa only, and I didn't see any point in learning some fancy new language that didn't interface with Cocoa as well as Objective-C and probably had bugs and gotchas that I didn't want to spend time on. I'm a veteran of the first Java wars. (Shudder.) Never forget.

It'd be easy to write off "map" (or "forEach") and "filter" at first glance since they seem likely inefficient compared to traditional looping approaches. But while I was porting over my old (started two years ago) Objective-C version of Delicious Monster's new product I tried to embrace Swift's metaphors.

There's an important idea I hadn't realized until the language boffins made it clear — in a traditional "for each"-style loop, there are usually two different operations at play: filtering a list of objects and performing some operation(s) on the filtered subset. But "for each" leaves you to implement them yourself:

Objective-C: pseudo-code for traditional loop
NSMutableArray <Thing *>* const affectedThings = [NSMutableArray new]; for (Thing *thing in allThings) { if (![thing passessSomeTest]) { continue; } [thing doSomething]; [affectedThings addObject: thing]; }

There's nothing super-wrong with the above code per se (which is Latin for "yes there is but I'm being polite"), but Swift gently encourages you to separate the filtering step from the processing step, making the code more readable and maintainable:

Swift: pseudo-code for new-style loop replacement
let affectedThings = allThings.filter { $0.passessSomeTest() } affectedThings.forEach { $0.doSomething() }

If we didn't want the list of 'affectedObjects' after this snippet we could make both examples smaller, of course, but this collection is also a common pattern — in order to redraw the affected objects, or add an undo action to them, or mark them as needing to be saved.

The first thing to notice is that in my written-just-to-illustrate-my-point example, I turned 8 lines of code into 2. And, as we all know, Shipley's first law is: "Less code is better code, no code is best code." (Shipley's n+1th law is that given the the nth law, the n+1th law is the same but said more menacingly.)

"Wow," you're thinking, "Wil, you are a genius! Your contrived example perfectly makes your point!" Oh shucks, you! I can't disagree, but in this one case the credit actually goes to all the language wonk folks who've been coming up with these ideas for the last thirty years while I dicked around making consumer software that you might consider buying because seriously I don't want to have to get a real job. Please, I'm too pretty to survive out there.

But there's something compelling about the Swift example besides its brevity: it more clearly expresses the two separate operations we're doing. In the Swift code, line 1 is filtering, line 2 is actioning. (Ok, that wasn't great English but you get the idea.)

In traditional "for each"-like loops (whether in Objective-C or Swift), we might have "if" & "continue" statements stuck at various points throughout the loop, so it's not clear at a single glance if the objects are being filtered, and, if they are, exactly how or how many times. But if you first build a list of objects and then actionize them (note to future self, you need to come up with better wordage here when you edit this, genius), then it's very clear what filter operations are being done, and thus it's much easier to maintain the code.

To belabor the point (and what's a blog but a belabor of love?), if I wanted to perform another action on all the filtered objects, in the first example I might accidentally stick my new action above the filtering lines "if / continue,” and it'll affect all the objects, and my code won't work right. (Ok, just squint pretend the loop is much bigger than this, obviously in a three-line example we aren't going to make those kinds of errors as much.) But in the Swift example you don't have as much opportunity to make this mistake. First, you'd have to not notice the word "filter" starting the block, and second, if you don't have the word "return" in your "filter {}" block (as I don't in this method), and you stick another line into it, the code won't even compile.



Contrived examples are all fun and stuff (I'm looking at you, Plato), but let's look at some of my actual real-life code. The two examples I could isolate aren't as clear as the made-up one above one, but it's a poor scientist who throws away data just because it doesn't match their every expectation (I'm looking at you, Andrew Wakefield).

So, first off, I had a straightforward method that finds the closest 2D "Intersection" object to the mouse that's within 'maximumDistance':

Old method to find nearest 2D intersection to mouse
public func intersectionNearPoint(point: CGPoint, within maximumDistance: CGFloat) -> Intersection? { var closestDistance = maximumDistance var closestIntersection: Intersection? for intersection in intersections { let thisDistance = (intersection.position - point).magnitude if thisDistance < closestDistance { closestDistance = thisDistance closestIntersection = intersection } } return closestIntersection }

Again, I don't think this code is horrible. But we strive for more than "not horrible." (Except if you're on a fixed-bid contract. JOKE! Sort of.) So let's bacon up this Swift:

More Swift-y method to find nearest 2D intersection to mouse
public func intersectionNearPoint(point: CGPoint, within maximumDistance: CGFloat) -> Intersection? { func distanceSquaredFromPoint(intersection: Intersection) -> CGFloat { return (intersection.origin - point).magnitudeSquared } return intersections.filter { return distanceSquaredFromPoint($0) <= (maximumDistance * maximumDistance) }.minElement { return distanceSquaredFromPoint($0) < distanceSquaredFromPoint($1) } }

We went from 10 lines of content to what WAS 3 (before I reformatted it for the tiny width of my blog, so now it's 8, dang it), so that's good! Less code, better code, yada yada. Anyhow, it's clearly less wordy, and to my eye it's easier to guess what it does at a glance — the 'filter' and 'minElement' functions give us great clues as to where we're going with the code. But, if you're like me (I know I am!), you're bugged that I'm calling distanceSquaredFromPoint() twice for the Intersections that are close enough to pass the first test. It seems inefficient, dang it!

As it happens, I know that it's very uncommon for Intersections to be clustered together in this application and the maximumDistance I'm looking is about 10 points, so the maximum number of Intersections I'd reasonably expect to pass the filter() is like 4, so we're maybe doing 4 extra hypot() calls behind the scenes. I'd wild-ass guess that even a slow old iPhone could do about forty million hypot() calls a second (iPhone 5 was several hundred MFLOPS, from my quick research). So, meh!

If you're an old fart, you're probably also bugged that you don't really know what's going on inside filter() and minElement(). Are they written efficiently? Does filter() allocate a bunch of memory? Because the first example sure didn't. Memory allocation is slow! What the heck kind of Mickey Mouse outfit do we have here?

Well, you need to calm down, Grandma/Grandpa. Did you see me run Instruments? I don't remember running Instruments. So we're not optimizing this code yet, because we haven't run Instruments on it, and it's not 1980 any more (seriously, check your calendar — I'll wait).

Remember that computers get faster and faster and faster and faster, and there aren't enough good programmers in the world, nor good code. So the most important thing is writing clear code that works well and will be easy to maintain and extend.

Follow this handy mnemonic, Johnny and/or Janey:
  1. Is the code as clear as possible? Yes? Good. Leave it alone.
  2. Did the code even show up in Instruments? No? LEAVE IT THE F ALONE.


Ok, let's do a final example — this one is to find a particular 3D "Piece" that's under the mouse, where each Piece has an associated SCNNode "node." Here's the naïve first version:

Original method to find 3D piece under mouse
public func pieceUnderLocationInView(locationInView: CGPoint) -> [(Piece, SCNHitTestResult)] { var piecesAndHitTestResults = [(Piece, SCNHitTestResult)]() for hit in sceneKitView!.hitTest(locationInView, options: nil) { for piece in pieces { if piece.model.node == hit.node { piecesAndHitTestResults.append((piece, hit)) break } } } return piecesAndHitTestResults }

Ok, that's a fine first pass, but what jumps out is how ugly it is to loop through all our pieces for every hit we get, to see if the hit node matches any piece's node. Let's try again, but this time caching the piece nodes for easier lookup, using some Swift-y goodness:

Slightly better method to find 3D piece under mouse
public func pieceUnderLocationInView(locationInView: CGPoint) -> [(Piece, SCNHitTestResult)] { let pieceNodes = pieces.map { $0.model.node! } var piecesAndHitTestResults = [(piece, SCNHitTestResult)]() for hit in sceneKitView!.hitTest(locationInView, options: nil) { if let pieceIndex = pieceNodes.indexOf(hit.node) { piecesAndHitTestResults.append((pieces[pieceIndex], hit)) } } return piecesAndHitTestResults }

Cool, we got to use the "map" function and we've eliminated a couple lines of code, that's a good day! Plus, IFF this function ever shows up in Instruments because we have, like, 20,000 "pieces", we can just convert 'pieceNodes' to a Set, and it'll essentially be O(#hits) again.

But can we go…deeper? Can we eliminate the loop, and eliminate the mutable array we're collecting values in? Chains of map(), filter(), and forEach() are cool because it makes it very clear what kinds of operation we're doing, in a way looping does not. A loop is a general construct, so if you see one it could be, say to sum 50 numbers, OR it could be to transform 50 numbers into 50 other objects — you don't know unless you look inside the loop. But when you see forEach(), map() or filter(), you know you're doing some kind of action, transform, or narrowing of a list, respectively. They make code more "glanceable," since apparently in this blog entry I'm just making up words.

So I tried to write a method that was even more Swift-y with the map()s and the filter()s, but at first I made kind of a mess:

Worse method to find 3D piece under mouse
public func pieceUnderLocationInView(locationInView: CGPoint) -> [(Piece, SCNHitTestResult)] { let pieceNodes = pieces.map { $0.model.node! } return sceneKitView!.hitTest(locationInView, options: nil).map { (hit) -> (Piece!, SCNHitTestResult) in if let pieceIndex = pieceNodes.indexOf(hit.node) { return (self.pieces[pieceIndex], hit) } else { return (nil, hit) } }.filter { $0 != nil }.map { ($0!, $1) } } }

Yipes. This try isn't a win for Swift or good sense—it's sure "Swift-y" but it's a lot more code and isn't readable. Where did I go wrong? There were two problems that conspired against me to make this code so bad: the first was Swift doesn't have a "filterAndMap()" function, where you can both eliminate some of the items in your list and transform them to something else. The second was that I was trying REALLY hard to not do the lookup of the hit node twice. But, wait, is the lookup really really a problem? I mean, did I run Instruments? Did you see me run Instruments? No. I was just being stupid.

Let's write this beautifully, people:

Best method to find 3D piece under mouse
public func pieceUnderLocationInView(locationInView: CGPoint) -> [(Piece, SCNHitTestResult)] { let pieceNodes = pieces.map { $0.model.node! } return sceneKitView!.hitTest(locationInView, options: nil) .filter { pieceNodes.contains($0.node) } .map { (pieces[pieceNodes.indexOf($0.node)!], $0) } }

Oh man, now we are talking. This is short and it's super-clear. Good jorb, the Swift team.



To conclude: map(), filter(), and forEach() make code both shorter and more readable-at-a-glance, which is good. This pattern can be done in Objective-C, but Swift makes it easy and beautiful.

As a sidebar about optimizing: don't optimize code until you've measured what's slow — just write good code as quickly as you can, so you have lots of time at the end to make the actual critical areas fast. But when first writing code we can't help but think about how slowly it'll run, so try to think of ⑴ how big the potential data set is normally (and in the most insane cases), ⑵ how often the particular code is run, and ⑶ how fast machines are. Then, if you think there will be a problem, feed your algorithm a bunch of data in a real-world situation and get an Instruments log.

For instance, if you are responding to a mouse click, consider how often a human being will click a mouse. Maybe once a second, with the maximum being 100 times a second for a super-mutant. These numbers aren't going to change much. Now, about any processor we use can do at least a billion instructions a second, so we have about 10,000,000 instructions we can use per mouse click. (Of course, if the user is waiting for feedback you won't have the full second — not acceptable to delay the animation of the button being pressed, for example.) Consider this if you're saying to yourself, "I have to optimize this, it's called once every mouseDown()!" Certainly you may have to optimize it. But not before testing.

To contrast, what if you're writing some code that's inside a mouse-tracking event loop? Well, mouse movement events could be in the hundreds per second, and every year we might expect them to be more precise and frequent. So estimate you've lost several orders of magnitude of instructions you can do without delaying the user: you should always Instrument your mouse dragging code.

Now, despite all my pretend egotistical bluster, I'll be the first to admit I'm new at Swift. So maybe you can make my sample code better? Please do! Since I don't love festering cesspools of disgust, I don't accept comments directly on this blog, but you can e-mail me at my initials at delicious-monster. (My middle initial is J. Please don't steal my identity.)

Labels: ,