All Your Go Binaries are Belong to Us
The skillset of performing binary analysis may to some appear to be limited to a few undeadly souls. While it may look like a form of dark arts when someone can read data structures in a raw hex dump, it shouldn’t even qualify as a party trick. To quote @BizTheDeveloper’s mother, “… reading a hex dump is not that hard…”
Now, the goal of this post is not to turn the reader into a hex dump magician. Instead, I want to show that binary analysis is all about data parsing. If you are a Go developer and are not interested in the analysis of Go binaries, this post will still have something for you. Did you know that “Go binary analysts” know if you organize your source code neatly or just dump everything into one file?
In this post, we will see how we can extract some of the available hidden metadata in the binaries produced by the Go compiler. With the extracted data, we will see a few use cases including how it can be used to determine if the application uses a vulnerable dependency. The goal is to be able to perform this on “production builds” so we are targeting support for stripped binaries. This means we can’t depend on debug information or symbols that normally are included in binaries produced by the compiler. This may look like something we would need but in fact, we actually don’t. Finally, let’s limit us to only using the standard library and optionally a “golang x" package.
The Debug Package
A package that everyone who wants to do some analysis of Go binaries should know about is the debug package in the standard library. The package provides sub-packages to parse ELF (Linux and Unix), PE (Windows), and Mach-O (macOS) files. In addition to this, it also has useful functions for parsing some of the data structures we need to process. One good thing with these functions is that they are not dependent on debug information or symbols, which means we can use them for stripped “production builds.” Finding the structures is a bit harder but unfortunately not something we can work around.
The metadata table we are going to focus on in this section is the PCLNTAB. The PCLNTAB was added in Go 1.2 and holds data needed for Go’s panic messages. The table is used to map between the program counter (location of an assembly instruction) and the source code file and line number, allowing a more developer-friendly panic message as it includes where in the source code the panic occurred. What is also part of this table is a list of all the function information, including functions that have been eliminated by the compiler as part of the dead code elimination step. The goal of parsing this table is to recover all the packages in the Go binary.
Before we can process the table, we need to first find it. This is very easy
for ELF and Mach-O files because the table is located in its own section called
.gopclntab. When it comes to PE files, the process is a bit less
straightforward. The table is usually located in the .rdata or .text
section of the PE file. The table starts with a magic value that can be used to
locate the start of the table. For Go binaries compiled with 1.2
up to
excluding 1.16
of the compiler, the magic value is 0xfffffffb
. For files
compiled with 1.16
and later the magic value is 0xfffffffa
. To ensure the
match is correct we can use the same
checks
that the parser function uses to check the table.
To parse the table, we will use the debug/gosym package. First, we need to create a LineTable with the function NewLineTable. Using the LineTable we can create a symbol table with the function NewTable. The NewTable function accepts a byte slice of the symbol table. This argument can be nil which is good since the symbol table is not available in stripped binaries.
lt := gosym.NewLineTable(lntabBuf, textStart)
tab, err := gosym.NewTable(nil, lt)
The definition of the structure returned is shown in the code snippet below. The structure has really one field which is exported, Funcs which is a slice of Func.
type Table struct {
Syms []Sym // nil for Go 1.3 and later binaries
Funcs []Func
Files map[string]*Obj // for Go 1.2 and later all files map to one Obj
Objs []Obj // for Go 1.2 and later only one Obj in slice
// contains filtered or unexported fields
}
The Func type holds information about a single function which includes the
entry point and where it ends. These addresses are the memory locations when
the file is loaded into memory for execution and not the offset from the
beginning of the file. If the function has been eliminated by the compiler, the
entry has a value of uint64(0)
.
The Func structure also includes a pointer to the underlying symbol. Via the Sym, we can access the name of the function via the method BaseName and the package it belongs to via the method PackageName. If the Func is a method attached to a type, the method ReceiverName returns the string name of the receiver. Otherwise, it returns an empty string.
Using this information, we can iterate through all functions to discover which packages are used and which functions are reachable (according to the Go compiler’s dead code elimination logic). Additional logic based on heuristics to determine if the package is part of the standard library, a dependency, or the main module can be used to sort the packages.
Build Information
With the introduction of the module system, there is another way of enumerating
the packages used when compiling a Go program. While this information is only
available in binaries compiled with “go mod” enabled, the data is richer. The
build info structure is available as a separate section in ELF and Mach-O
files. The section name is .go.buildinfo. For PE files this data is stored
inside one of the data sections. It’s a small data structure of 32-bytes. An
example is shown in the code snippet below. The first 16-bytes is the structure
header. It starts with a 14-byte magic value of 0xff Go buildinf:
. The next
byte is either 0x4
or 0x8
and gives the pointer size in bytes. The last
byte in the header indicates the bit-endianness, a 0x0
means
little-endian while a 0x1
means big-endian.
- offset - 0 1 2 3 4 5 6 7 8 9 A B C D E F 0123456789ABCDEF
0x0086e000 ff20 476f 2062 7569 6c64 696e 663a 0800 . Go buildinf:..
0x0086e010 30a3 8800 0000 0000 70a3 8800 0000 0000 0.......p.......
Following the header are two pointers to two Go strings. Under the hood, a Go string is a structure with a pointer to the start of the data as the first field and a length of the data as the second field. The first string is the version of the compiler that was used to compile the binary. The second string is the build information. This string is only available if the project was compiled with go modules enabled.
The runtime package has the logic to
parse
the build info string into the BuildInfo
structure shown in the snippet
below. The structure essentially holds the information from the go.mod
file
plus the checksums from go.sum
.
// BuildInfo represents the build information read from
// the running binary.
type BuildInfo struct {
Path string // The main package path
Main Module // The module containing the main package
Deps []*Module // Module dependencies
}
// Module represents a module.
type Module struct {
Path string // module path
Version string // module version
Sum string // checksum
Replace *Module // replaced by this module
}
The code in the
runtime for
parsing out this data structure has seen some changes in the last couple of
months. The logic has also been added as part of a sub-package in the
debug
package that can be assumed to be part of release Go 1.18
. One thing that
has
changed
with the structure is the addition of the Settings
field to hold more
information about the build environment.
// BuildInfo represents the build information read from a Go binary.
type BuildInfo struct {
GoVersion string // Version of Go that produced this binary.
Path string // The main package path
Main Module // The module containing the main package
Deps []*Module // Module dependencies
Settings []BuildSetting // Other information about the build.
}
// Module represents a module.
type Module struct {
Path string // module path
Version string // module version
Sum string // checksum
Replace *Module // replaced by this module
}
// BuildSetting describes a setting that may be used to understand how the
// binary was built. For example, VCS commit and dirty status is stored here.
type BuildSetting struct {
// Key and Value describe the build setting. They must not contain tabs
// or newlines.
Key, Value string
}
Until Go 1.18
is released, we can just copy the code from the runtime and use
it to parse the string into a useful data structure. With this information, we
get the versions and can easily detect which packages are part of a dependency
module.
Vulnerability Scanner
In this section, we are going to see how the extracted information can be used
to design a vulnerability scanner. There is currently a project of developing
a vulnerability database and code for checking
against it. The goal with this scanner is instead of working with the source
code, it is to work with the compiled artifacts allowing users of a Go
application to check it for vulnerability. For our
example,
we will use a vulnerability reported in the gopkg.in/yaml.v2
module. The code
snippet below shows the data available in the vulnerability database.
module: gopkg.in/yaml.v2
additional_packages:
# all of the incompatible versions of github.com/go-yaml/yaml
# are affected
- module: github.com/go-yaml/yaml
symbols:
- decoder.unmarshal
versions:
- fixed: v2.2.3
description: |
Due to unbounded alias chasing, a maliciously crafted YAML file
can cause the system to consume significant system resources. If
parsing user input, this may be used as a denial of service vector.
published: 2021-04-14T12:00:00Z
credit: "@simonferquel"
symbols:
- decoder.unmarshal
links:
pr: https://github.com/go-yaml/yaml/pull/375
commit: https://github.com/go-yaml/yaml/commit/bb4e33bf68bf89cad44d386192cbed201f35b241
In addition to the module name gopkg.in/yaml.v2
we also have the fixed
version v2.2.3
and the affected symbol decoder.unmarshal
. What we can
decipher from this is that the unmarshal
method on the decoder
type prior
to v2.2.3
is vulnerable. With this information we can construct the following
logic to check for this vulnerability:
- Extract the build information to see if the binary uses a version earlier
than
v2.2.3
. If not, report as not vulnerable. - Search for a function with the package name of
gopkg.in/yaml.v2
, the receiver of decoder, and the name ofunmarshal
. - Check if the found function has a non-nil
Entry
field. If it does, report as vulnerable. Otherwise, report as not vulnerable.
Now, this approach is not perfect. We are standing on the shoulders of the Go compiler’s dead-code elimination logic to remove code that’s not reachable. It is possible that the code is never executed in the binary but the compiler failed to eliminate it, which would result in a false positive. It is possible to reduce this false positive further but it would need to create call graphs to see if the code is reachable. This is out of scope for this post.
Source Code Map
With the data extracted using the debug package, there are some other interesting things we can do. Remember that the PCLN table is used to map a process counter to a specific line in a source code file. This means that the binary has information about the structure of the source code. What we just need to do is extract it and present it in a friendlier way. As I have described the process in a previous blog post, this will be a summarized version.
In an earlier section, the Func
type was introduced that holds symbol
information about a function. Two of the fields are pointers to where the code
of the function starts and ends. This means we know where the first instruction
starts and the last instruction ends for the method. The Table
structure has
a method for resolving a process counter to both a line number and a source
code file name. This gives us of way to determine the first and last line
number of a function in the source code. One may naively assume that we can
just put the entry and the end field values into the function and it will
return what we want. Unfortunately, this is not the case. The entry works fine
but issues sometimes occur with the last instruction. The compiler adds code to
the end of each function (this code requests more stack space) which can throw
off the information. So the best we can do is to estimate where the end is by
checking all instructions in the function and assuming that the largest line
number in the same file as the starting line is the end of the function. This
method isn’t perfect as inline functions can break this assertion.
The next step is to get the location of each instruction in the function. For
x86 this isn’t straightforward because they can be one to 15 bytes long. This
means that the only way of getting the location of each instruction is to
disassemble it. The Go team maintains a package named
arch that can disassemble x86, ARM,
and PowerPC. The package is used by the Go tool objdump
. Another hack
that can be used is to just assume that each instruction is say for example
four bytes long and resolve the line for every four-bytes. The process counter
to line mapping function does not care if the passed in process counter is
right in the middle of an instruction…
With the file names and line numbers extracted we can just organize the data
and present it. Here is for example the extracted data for a gofmt binary. It
only shows the data for the main module. The first line gives the name of the
package and the path to the folder when it was compiled. Next, each file is
listed. Under each file, the function is listed in the sorted order of the
starting line number. The function line also includes an estimated ending line
of the function and estimated lines of code; the first line is the function
definition. In the code snippet below we can see that one file is named
<autogenerated>
. This is for code generated by the compiler. Another thing we
can see in the output is some functions with dwrap
in the name, for example,
processFile·dwrap·1
. These are functions generated by the compiler for defer
calls.
Package main: /usr/lib/go/src/cmd/gofmt
File: <autogenerated>
(*simplifier)Visit Lines: 1 to 1 (0)
File: gofmt.go
init Lines: 29 to 53 (24)
usage Lines: 64 to 76 (12)
isGoFile Lines: 76 to 83 (7)
processFile Lines: 83 to 163 (80)
processFile·dwrap·1 Lines: 90 to 166 (76)
visitFile Lines: 166 to 176 (10)
main Lines: 176 to 184 (8)
gofmtMain Lines: 184 to 232 (48)
gofmtMain·dwrap·2 Lines: 195 to 234 (39)
diffWithReplaceTempFile Lines: 234 to 251 (17)
replaceTempFilename Lines: 251 to 276 (25)
backupFile Lines: 276 to 298 (22)
File: internal.go
parse Lines: 23 to 94 (71)
parsefunc1 Lines: 45 to 69 (24)
parsefunc2 Lines: 69 to 80 (11)
format Lines: 94 to 175 (81)
File: rewrite.go
initRewrite Lines: 19 to 32 (13)
initRewritefunc1 Lines: 31 to 38 (7)
parseExpr Lines: 38 to 57 (19)
rewriteFile Lines: 57 to 81 (24)
rewriteFilefunc1 Lines: 64 to 85 (21)
set Lines: 85 to 117 (32)
setfunc1 Lines: 90 to 99 (9)
apply Lines: 117 to 152 (35)
isWildcard Lines: 152 to 160 (8)
match Lines: 160 to 248 (88)
subst Lines: 248 to 308 (60)
File: simplify.go
simplifierVisit Lines: 15 to 130 (115)
simplifiersimplifyLiteral Lines: 102 to 133 (31)
simplify Lines: 133 to 141 (8)
removeEmptyDeclGroups Lines: 141 to 152 (11)
isEmpty Lines: 152 to 164 (12)
One may wonder what this information can be used for. One thing it can be used for is to detect changes in Go applications. Another place where it is useful is for the analysis of suspicious Go binaries by for example identifying if the application is an open-source program or malware. The snippet below is from a ransomware called Snatch that has been around for a few years. From the function names, we can get an idea of what the binary might be doing. We see both function names that suggest scanning folders for files and encrypting. One thing that this ransomware does is to install itself as a Windows service that is started in Safe boot mode. After it has installed itself as a service, it reboots the machine into safe mode. We can see function names in the output that suggest this behavior.
Package main: /home/go/src/locker
File: config.go
init Lines: 39 to 246 (207)
File: dirs.go
scanDir Lines: 10 to 85 (75)
scanDirfunc1 Lines: 11 to 13 (2)
File: files.go
encryptFile Lines: 13 to 120 (107)
encryptFilefunc1 Lines: 14 to 16 (2)
File: main.go
main Lines: 24 to 91 (67)
runInstance Lines: 91 to 107 (16)
File: misc.go
decodeString Lines: 15 to 37 (22)
makeFile Lines: 37 to 60 (23)
makeFilefunc1 Lines: 43 to 68 (25)
makeBatFile Lines: 60 to 67 (7)
runBatFile Lines: 67 to 92 (25)
runBatFilefunc1 Lines: 68 to 70 (2)
isSafeBoot Lines: 92 to 115 (23)
deleteShadowCopy Lines: 115 to 135 (20)
selfRemove Lines: 135 to 158 (23)
randomBatFileName Lines: 158 to 171 (13)
Copy Lines: 171 to 188 (17)
File: queue.go
(*Queue)Push Lines: 20 to 33 (13)
(*Queue)Pop Lines: 33 to 45 (12)
runWorkers Lines: 45 to 64 (19)
File: services.go
(*myService)Execute Lines: 13 to 56 (43)
getServicesNamesList Lines: 56 to 88 (32)
stopServices Lines: 88 to 113 (25)
setupServiceSafeBoot Lines: 113 to 138 (25)
safeModeEnabled Lines: 138 to 161 (23)
installService Lines: 161 to 188 (27)
reboot Lines: 188 to 204 (16)
Wrap-up
There is much more metadata, for example,
types and the build-id
, that
can be extracted from Go binaries. If we were to cover all of it, this post
would be way too long. Hopefully, this has shown that Go binaries are rich with
metadata and that binary analysis isn’t that bad. For readers that would like
to do some of their own binary analysis but do not want to write all the code
to extract the data, there are libraries available. The Go Reverse Engineering
Tool Kit can extract all the data covered in
this post, allowing just your imagination with what to do with the data.