If you need to symbolicate crash reports for your macOS or iOS app yourself, you’d probably use atos command line utility to get the job done. Apple has in-depth documentation on crash reports symbolication, and they recommend atos in the section called Symbolicate the Crash Report with the Command Line.

Basically what atos does is:

  • take the dSYM file, the architecture name and the load address of the app’s binary image,
  • take the list of memory addresses to symbolicate,
  • return the list of symbols identifying the memory addresses in question.

If you ever wanted to automate symbolication (not relying on a 3rd party service) you might have noticed that atos is only available on macOS. This limits your options for performing symbolication server-side, requiring you to provision a Mac server – and these machines tend to be expensive and/or cumbersome to maintain.

However, it appears that you still can symbolicate an Apple-platform crash report on Linux. It only takes a bit more Command Line Fu.

Sample app

In this article we’ll work with a sample application that crashes. It contains just a single window and a button. Tapping the button would attempt to update a non-existent text field:

class ViewController: NSViewController {
    @inline(never)
    @IBAction func buttonAction(_ sender: Any?) {
        updateLabel()
    }

    @inline(never)
    private func updateLabel() {
        label.stringValue = "Boom"
    }

    private weak var label: NSTextField!
}

Note that @inline(never) attribute is there to instruct the compiler to not try and inline the functions (collapsing the stack trace), which would make this example harder to follow. It should not be used here in a real-life scenario.

A sample crash stack trace (edited for brevity) looks like this:

Thread 0 Crashed::  Dispatch queue: com.apple.main-thread
0   SampleCrashingApp        0x1004aeb3c 0x1004ac000 + 11068
1   SampleCrashingApp        0x1004aeaf4 0x1004ac000 + 10996
2   SampleCrashingApp        0x1004aeab8 0x1004ac000 + 10936
3   AppKit                   0x185c9bbc4 -[NSApplication(NSResponder) sendAction:to:from:] + 460
    [... more AppKit frames]
16  AppKit                   0x185ec0848 -[NSApplication _handleEvent:] + 76
17  AppKit                   0x185a88618 -[NSApplication run] + 636
18  AppKit                   0x185a59d08 NSApplicationMain + 1132
19  SampleCrashingApp        0x1004aecf4 0x1004ac000 + 11508
20  dyld                     0x1006c1088 start + 516

Environment setup

In the absence of atos on Linux, you’ll need the following tools to get the job done:

  • llvm-nm: to find the base image address as stored in the dSYM file,
  • llvm-addr2line: to map symbol addresses to specific lines of code,
  • swift-demangle: to convert mangled Swift symbol names to a human-readable form.

The first two are included in the llvm package available in main package tree on most Linux distributions, including Debian-based (e.g. Ubuntu) and RHEL-based (e.g. Amazon Linux). The third one requires Swift to be installed in the system. Swift, in turn, is likely unavailable from your preferred Linux distro’s package manager, so you’d have to install it manually or use a Docker image (as listed in the Downloads page at swift.org).

Apart from the tools, you’ll need your crash report and your app’s debug symbols (as a dSYM bundle).

Symbolication

Note: always specify architecture

Please note that you must observe the architecture specified in the crash report (x86_64 vs arm64) and pass it over to symbolication-related commands. This is because app (and dSYMs) are fat binaries containing sets of binary code supporting specific architectures. The debug symbol addresses will differ between architectures.

Get the binary image load address

Unsymbolicated stack frames are described using binary image load address and offset, for instance:

1   SampleCrashingApp        0x1004aeaf4 0x1004ac000 + 10996

Let’s have a look at the numbers on the right hand side:

  • 0x1004aeaf4 is a symbol address,
  • 0x1004ac000 is a binary image load address,
  • 10996 is symbol’s offset within a binary image.

The symbol address is a sum of the binary image load address and the symbol offset (hence the plus sign). Note that the left operand is a hexadecimal number while the right operand is decimal… 🤯 The binary image load address varies between application sessions, and symbol offsets are constant per binary image (i.e. an offset for a given symbol will be the same across application sessions).

When using atos you need to pass the load address and the symbol address as parameters. With llvm-addr2line it’s a bit different:

  1. You only pass the symbol address, without the load address.
  2. The symbol address is different than the one specified in the crash report 💥. To be more specific, it’s the load address part that differs. Instead of an arbitrary, session-dependent value, it seems to be fixed to 0x100000000.

Arguably it makes more sense this way – after all the tool operates on the dSYM file so it looks up the same addresses for the same symbols all the time. Likely it’s just a convenience of atos that it works with addresses present in the crash report, and under the hood it just subtracts load address from the symbol address and works on the resulting offset.

Anyways, the 0x100000000 magic number is actually present in the dSYM (and the app binary) symbol table and can be extracted with the following command:

$ llvm-nm -P --arch=arm64 \
    SampleCrashingApp.app.dSYM/Contents/Resources/DWARF/SampleCrashingApp \
    | grep __mh_execute_header

The above would output:

__mh_execute_header T 100000000 0

Resolve addresses to symbols

Knowing the normalized load address, we can proceed to computing symbol addresses and resolving them. A little calculation is required here. Let’s go back to our stack trace:

0   SampleCrashingApp        0x1004aeb3c 0x1004ac000 + 11068
1   SampleCrashingApp        0x1004aeaf4 0x1004ac000 + 10996
2   SampleCrashingApp        0x1004aeab8 0x1004ac000 + 10936

We don’t care about addresses anymore, and are only interested in symbol offsets, i.e. 11068, 10996 and 10936 (decimals!). Those need to be added to 0x100000000, making:

  • 0x100000000 + 11068 = 0x100002b3c
  • 0x100000000 + 10996 = 0x100002af4
  • 0x100000000 + 10936 = 0x100002ab8

These addresses can now be passed to llvm-addr2line:

$ llvm-addr2line -f \
    --default-arch=arm64 \
    --obj SampleCrashingApp.app.dSYM/Contents/Resources/DWARF/SampleCrashingApp \
    0x100002af4

This tool would output the source code location (file and line number) of the function represented by the stack frame. Adding -f will make it also print the function name. Here is a sample output (always 2 lines):

$s17SampleCrashingApp14ViewControllerC11updateLabel33_CEBA0EBCB7099BA3FFA1062F19F801EELLyyF
/Users/user/code/SampleCrashingApp/SampleCrashingApp/ViewController.swift:18

The top line is a mangled Swift symbol, so it needs to be demangled in a separate command:

$ swift-demangle --simplified --compact \
    '$s17SampleCrashingApp14ViewControllerC11updateLabel33_CEBA0EBCB7099BA3FFA1062F19F801EELLyyF'

rendering the actual function name:

ViewController.updateLabel()

Output

After symbolicating all stack frames and gluing the data together, the stack trace symbolicated using LLVM on Linux looks like this:

Thread 0 Crashed::  Dispatch queue: com.apple.main-thread
0   SampleCrashingApp        Swift runtime failure: Unexpectedly found nil while implicitly unwrapping an Optional value (ViewController.swift:0)
1   SampleCrashingApp        ViewController.updateLabel() (ViewController.swift:18)
2   SampleCrashingApp        @objc ViewController.buttonAction(_:) (<compiler-generated>:0)

And here is the same stack trace symbolicated on macOS using atos:

Thread 0 Crashed::  Dispatch queue: com.apple.main-thread
0   SampleCrashingApp        ViewController.updateLabel() (in SampleCrashingApp) (ViewController.swift:18)
1   SampleCrashingApp        ViewController.updateLabel() (in SampleCrashingApp) (ViewController.swift:18)
2   SampleCrashingApp        @objc ViewController.buttonAction(_:) (in SampleCrashingApp) (<compiler-generated>:0)

That’s not bad already!

Additional remarks

llvm-symbolizer

This tool is shipped with LLVM alongside llvm-addr2line and it does roughly the same – takes similar arguments and produces similar output. I’ve only found it recently and still need to learn the difference between these. By quick checking it looks like llvm-symbolizer defaults to displaying inlined symbols, and it also provides the source code column, in addition to the row. And historically, llvm-addr2line was designed as a drop-in replacement to GNU addr2line. Perhaps it’s just a matter of preference to pick either of the two.

Batch operations

Both llvm-addr2line and llvm-symbolizer can take multiple symbols for processing.

Sample Dockerfile

This Dockerfile contains all the software required to symbolicate Apple crash reports on Linux:

FROM swiftlang/swift:nightly-5.6-amazonlinux2
RUN yum -y update && yum -y install llvm

It uses one of the official Swift Docker images.