Facebook Linkedin Twitter
Posted Tue Dec 21, 2021 •  Reading time: 27 minutes

Analysing Big Code Bases

In the old days we had only monoliths that usually contained a lot of spaghetti code that converts any project slowly but surly into a big ball of mud. Now we have micro services everywhere and they tend to lead to spaghetti architecture. It seems that we just have to live with some sort of spaghetti and can just choose it’s form.

Micro services are completely independent of each other and usually just held together by some kubernetes configuration files. These are very hard to analyze and refactor since no shiny IDE with proper tooling exists for this.

Within a service it is much easier to move functionality around and analyze the structure since Go is a proper programming language with great support for tooling. So for the rest of this article we will focus on analysing a single project.

The Directory Tree

We usually start to approach a new project by looking at its Go doc (e.g. for the standard library). Unfortunately most companies don’t run a go doc server. Most dev’s rather look at the comments in the code itself anyway. Only the list page is needed to get an overview.

Fortunately we can create exactly that with a little open source command line tool: spaghetti-analyzer It can be simply called with

spaghetti-analyzer -t

from the root of the project. This creates a simple text file: dirtree.txt):

The file contains no links or formatting but can be included in a README.md file easily as a code block:

service
├── app -	Package app contains turns the business functionality into an application.
│   ├── services
│   │   ├── metrics -	Package metrics collects and publishes metrics.
│   │   │   ├── collector -	Package collector is a simple collector for metrics.
│   │   │   └── publisher -	Package publisher manages the publishing of metrics.
│   │   │       ├── datadog -	Package datadog provides support for publishing metrics to DD.
│   │   │       └── expvar -	Package expvar manages the publishing of metrics to stdout.
│   │   └── sales-api -	Package sales-api serves the business functionality via a HTTP API.
│   │       ├── handlers -	Package handlers manages the different versions of the API.
│   │       │   ├── debug -	
│   │       │   │   └── checkgrp -	Package checkgrp maintains the group of handlers for health checking.
│   │       │   └── v1 -	Package v1 contains the full set of handler functions and routes supported by the v1 web api.
│   │       │       ├── productgrp -	Package productgrp maintains the group of handlers for product access.
│   │       │       └── usergrp -	Package usergrp maintains the group of handlers for user access.
│   │       └── tests -	
│   │           └── v1 -	Package v1 ...
│   └── tooling
│       ├── logfmt		-	This program takes the structured log output and makes it readable.
│       └── sales-admin -	This program performs administrative tasks for the garage sale service.
│           └── commands -	Package commands contains the functionality for the set of commands currently supported by the CLI tooling.
.
.
.

The result can be changed directly but you would have to maintain those changes long term. It’s usually much better to edit the package level Go documentation if Go files exist or just create a doc.go file in the directory for this.

This gives us a nice overview and it helps a lot to find the right package for some functionality. But it doesn’t tell us how packages relate to each other.

Package Statistics

For getting a first impression of the relationships between packages we can create a package_statistics.md file with the same tool:

spaghetti-analyzer -s

This file is markdown that contains information about how all the packages relate to each other including links to all the related packages. It starts with a table of all the packages that use any other package with links to detailed information. All information is geared towards finding and building packages that encapsulate or hide packages from the rest of the project. This way simple and lean modularization is supported.

You can find the generated file here and it looks like this (the file is shortened and headings are changed so it fits all into this section):


Package Statistics

Legend
  • package - name of the internal package without the part common to all packages.
  • type - type of the package:
  • direct deps - number of internal packages directly imported by this one.
  • all deps - number of transitive internal packages imported by this package.
  • users - number of internal packages that import this one.
  • max score - sum of the numbers of packages hidden from user packages.
  • min score - number of packages hidden from all user packages combined.
Package app/services/metrics
Direct Dependencies (Imports) Of Package app/services/metrics

app/services/metrics/collector, app/services/metrics/publisher, app/services/metrics/publisher/expvar, foundation/logger

All (Including Transitive) Dependencies (Imports) Of Package app/services/metrics

app/services/metrics/collector, app/services/metrics/publisher, app/services/metrics/publisher/expvar, foundation/logger

Package app/services/sales-api
Direct Dependencies (Imports) Of Package app/services/sales-api

app/services/sales-api/handlers, business/sys/auth, business/sys/database, foundation/keystore, foundation/logger

All (Including Transitive) Dependencies (Imports) Of Package app/services/sales-api

app/services/sales-api/handlers, app/services/sales-api/handlers/debug/checkgrp, app/services/sales-api/handlers/v1, app/services/sales-api/handlers/v1/productgrp, app/services/sales-api/handlers/v1/usergrp, business/core/product, business/core/product/db, business/core/user, business/core/user/db, business/sys/auth, business/sys/database, business/sys/metrics, business/sys/validate, business/web/v1, business/web/v1/mid, foundation/keystore, foundation/logger, foundation/web

Package app/services/sales-api/handlers
Direct Dependencies (Imports) Of Package app/services/sales-api/handlers

app/services/sales-api/handlers/debug/checkgrp, app/services/sales-api/handlers/v1, business/sys/auth, business/web/v1/mid, foundation/web

All (Including Transitive) Dependencies (Imports) Of Package app/services/sales-api/handlers

app/services/sales-api/handlers/debug/checkgrp, app/services/sales-api/handlers/v1, app/services/sales-api/handlers/v1/productgrp, app/services/sales-api/handlers/v1/usergrp, business/core/product, business/core/product/db, business/core/user, business/core/user/db, business/sys/auth, business/sys/database, business/sys/metrics, business/sys/validate, business/web/v1, business/web/v1/mid, foundation/web

Packages Using (Importing) Package app/services/sales-api/handlers

app/services/sales-api

Packages Shielded From Users Of Package app/services/sales-api/handlers
Packages Shielded From All Users Of Package app/services/sales-api/handlers

app/services/sales-api/handlers/debug/checkgrp, app/services/sales-api/handlers/v1, app/services/sales-api/handlers/v1/productgrp, app/services/sales-api/handlers/v1/usergrp, business/core/product, business/core/product/db, business/core/user, business/core/user/db, business/sys/metrics, business/sys/validate, business/web/v1, business/web/v1/mid

. . .

At least in the full file we see that this file contains many links and we can click around it to understand connections and build a mental model. This helps a lot in analyzing a big project. And to keep it maintainable.

Dependency Tables

Once you have discovered and established a good project structure it is great to document it in a simpler way especially for new team members.

In general there are two well understood ways to document dependencies:

  • Blobs and arrows are very intuitive but don’t scale well over 10 nodes. It just gets too hard to follow the arrows.
  • Dependency tables (aka design structure matrix) are a bit less intuitive but still easy to understand. And they scale much better.

So we want to generate a dependency table and thankfully they are really easy to create in markdown, too. The spaghetti-analyzer` tool helps us again:

spaghetti-analyzer -d "app/services/sales-api"

The parameter is the main Go package that you want to generate documentation for.

You can find the generated file here and it looks like this (the file is shortened and headings are changed so it fits all into this section):


Dependency Table For: github.com/ardanlabs/service/app/services/sales-api

a p p / s e r v i c e s / s a l e s - a p i / h a n d l e r s - S a p p / s e r v i c e s / s a l e s - a p i / h a n d l e r s / d e b u g / c h e c k g r p - S a p p / s e r v i c e s / s a l e s - a p i / h a n d l e r s / v 1 - S a p p / s e r v i c e s / s a l e s - a p i / h a n d l e r s / v 1 / p r o d u c t g r p - S a p p / s e r v i c e s / s a l e s - a p i / h a n d l e r s / v 1 / u s e r g r p - S b u s i n e s s / c o r e / p r o d u c t - S b u s i n e s s / c o r e / p r o d u c t / d b - S b u s i n e s s / c o r e / u s e r - S b u s i n e s s / c o r e / u s e r / d b - S b u s i n e s s / s y s / a u t h - S b u s i n e s s / s y s / d a t a b a s e - S b u s i n e s s / s y s / m e t r i c s - S b u s i n e s s / s y s / v a l i d a t e - S b u s i n e s s / w e b / v 1 - S b u s i n e s s / w e b / v 1 / m i d - S f o u n d a t i o n / k e y s t o r e - S f o u n d a t i o n / l o g g e r - S f o u n d a t i o n / w e b - S
app/services/sales-api S S S S S
app/services/sales-api/handlers S S S S S
app/services/sales-api/handlers/debug/checkgrp S
app/services/sales-api/handlers/v1 S S S S S S S
app/services/sales-api/handlers/v1/productgrp S S S S
app/services/sales-api/handlers/v1/usergrp S S S S
business/core/product S S S
business/core/product/db S
business/core/user S S S S
business/core/user/db S
business/sys/database S
business/web/v1/mid S S S S S

Legend

  • Rows - Importing packages
  • Columns - Imported packages

So the rows contain the packages that import other packages like the main package app/services/sales-api that isn’t imported by any other package. The columns contain the packages being imported like the tool package business/sys/auth that doesn’t import any other package of the project. Many packages are a row entry and a column entry like app/services/sales-api/handlers. We don’t make use of the package types since we didn’t provide any configuration. This would allow us to document our findings a bit better and is explained in the documentation of the companion spaghetti-cutter project.

The table makes it quite easy to understand for example that the packages

  • foundation/web,
  • foundation/logger,
  • foundation/keystore,
  • business/web/v1,
  • business/sys/auth,
  • business/sys/metrics and
  • business/sys/validate are tool packages that don’t import any other package.

On the other hand the packages

  1. app/services/sales-api,
  2. app/services/sales-api/handlers and
  3. app/services/sales-api/handlers/v1 are at the top of the stack orchestrating other packages as they import a lot of other packages and are only imported by the package before. So app/services/sales-api/handlers/v1 is only imported by app/services/sales-api/handlers. And app/services/sales-api/handlers itself is only imported by app/services/sales-api. And that is the main package that isn’t imported at all.

Analyzing A Big Code Base

Now lets see how well this scales to a bigger code base. Unfortunately there isn’t any bigger business software open source Go code base. Things like the Go compiler or Kubernetes are highly technical and very different from the things most reader and I are working on every day. The best candidate that I could find is the Prometheus monitoring system. I found multiple projects that seem to be Prometheus. I chose the one that seems to contain the biggest complexity.

So the command

spaghetti-analyzer -t

is giving us this (full output here):

prometheus -	Package prometheus ...
├── cmd -	
│   ├── prometheus -	The main package for the Prometheus server executable.
│   └── promtool -	Package promtool ...
├── config -	Package config ...
├── console_libraries -	
├── consoles -	
├── discovery -	Package discovery ...
│   ├── aws -	Package aws ...
│   ├── azure -	Package azure ...
│   ├── consul -	Package consul ...
│   ├── digitalocean -	Package digitalocean ...
│   ├── dns -	Package dns ...
│   ├── eureka -	Package eureka ...
│   ├── file -	Package file ...
│   │   └── fixtures -	
│   ├── gce -	Package gce ...
│   ├── hetzner -	Package hetzner ...
│   ├── http -	Package http ...
│   │   └── fixtures -	
│   ├── install -	Package install has the side-effect of registering all builtin service discovery config types.
│   ├── kubernetes -	Package kubernetes ...
│   ├── legacymanager -	Package legacymanager ...
│   ├── linode -	Package linode ...
│   ├── marathon -	Package marathon ...
│   ├── moby -	Package moby ...
│   ├── openstack -	Package openstack ...
│   ├── puppetdb -	Package puppetdb ...
│   │   └── fixtures -	
│   ├── refresh -	Package refresh ...
│   ├── scaleway -	Package scaleway ...
│   ├── targetgroup -	Package targetgroup ...
│   ├── triton -	Package triton ...
│   ├── uyuni -	Package uyuni ...
│   ├── xds -	Package xds ...
│   └── zookeeper -	Package zookeeper ...
.
.
.

So we immediately see … that the project could use more package level documentation. Just like most commercial business software. But that is easy and quick enough to change.

More interesting right now are the other two views. The command

spaghetti-analyzer -s

results in the dependency statistics. Those look quite similar to the ones for the smaller project. So I don’t copy it here.

The dependency table is easy to create, too:

spaghetti-analyzer --doc='cmd/prometheus'

You can find the result here looking like this:


Dependency Table For: github.com/prometheus/prometheus/cmd/prometheus

c o n f i g - S d i s c o v e r y - S d i s c o v e r y / a w s - S d i s c o v e r y / a z u r e - S d i s c o v e r y / c o n s u l - S d i s c o v e r y / d i g i t a l o c e a n - S d i s c o v e r y / d n s - S d i s c o v e r y / e u r e k a - S d i s c o v e r y / f i l e - S d i s c o v e r y / g c e - S d i s c o v e r y / h e t z n e r - S d i s c o v e r y / h t t p - S d i s c o v e r y / i n s t a l l - S d i s c o v e r y / k u b e r n e t e s - S d i s c o v e r y / l e g a c y m a n a g e r - S d i s c o v e r y / l i n o d e - S d i s c o v e r y / m a r a t h o n - S d i s c o v e r y / m o b y - S d i s c o v e r y / o p e n s t a c k - S d i s c o v e r y / p u p p e t d b - S d i s c o v e r y / r e f r e s h - S d i s c o v e r y / s c a l e w a y - S d i s c o v e r y / t a r g e t g r o u p - S d i s c o v e r y / t r i t o n - S d i s c o v e r y / u y u n i - S d i s c o v e r y / x d s - S d i s c o v e r y / z o o k e e p e r - S n o t i f i e r - S p k g / e x e m p l a r - S p k g / g a t e - S p k g / l a b e l s - S p k g / l o g g i n g - S p k g / p o o l - S p k g / r e l a b e l - S p k g / r u l e f m t - S p k g / r u n t i m e - S p k g / t e x t p a r s e - S p k g / t i m e s t a m p - S p k g / v a l u e - S p r o m p b - S p r o m q l - S p r o m q l / p a r s e r - S r u l e s - S s c r a p e - S s t o r a g e - S s t o r a g e / r e m o t e - S t e m p l a t e - S t s d b - S t s d b / a g e n t - S t s d b / c h u n k e n c - S t s d b / c h u n k s - S t s d b / e n c o d i n g - S t s d b / e r r o r s - S t s d b / f i l e u t i l - S t s d b / g o v e r s i o n - S t s d b / i n d e x - S t s d b / r e c o r d - S t s d b / t o m b s t o n e s - S t s d b / t s d b u t i l - S t s d b / w a l - S u t i l / h t t p u t i l - S u t i l / o s u t i l - S u t i l / s t a t s - S u t i l / s t r u t i l - S u t i l / t e s t s t o r a g e - S u t i l / t e s t u t i l - S u t i l / t r e e c a c h e - S w e b - S w e b / a p i / v 1 - S w e b / u i - S
cmd/prometheus S S S S S S S S S S S S S S S S S S S S
config S S S
discovery S
discovery/aws S S S S
discovery/azure S S S S
discovery/consul S S S
discovery/digitalocean S S S
discovery/dns S S S
discovery/eureka S S S S
discovery/file S S
discovery/gce S S S S
discovery/hetzner S S S S
discovery/http S S S
discovery/install S S S S S S S S S S S S S S S S S S S S S
discovery/kubernetes S S S
discovery/legacymanager S S
discovery/linode S S S
discovery/marathon S S S S
discovery/moby S S S S
discovery/openstack S S S S
discovery/puppetdb S S S S
discovery/refresh S
discovery/scaleway S S S
discovery/triton S S S
discovery/uyuni S S S
discovery/xds S S S S
discovery/zookeeper S S S S
notifier S S S S
pkg/exemplar S
pkg/relabel S
pkg/rulefmt S S S
pkg/textparse S S S
promql S S S S S S S S S S S
promql/parser S S S S S
rules S S S S S S S S S
scrape S S S S S S S S S S S
storage S S S S S S
storage/remote S S S S S S S S S S S S S S
template S S
tsdb S S S S S S S S S S S S S S S S
tsdb/agent S S S S S S S S S
tsdb/chunks S S S
tsdb/index S S S S S S
tsdb/record S S S S S
tsdb/tombstones S S S S
tsdb/tsdbutil S S
tsdb/wal S S S S S S
util/httputil S
util/teststorage S S S S S
web S S S S S S S S S S S S S S S S
web/api/v1 S S S S S S S S S S S S S S S

Legend

  • Rows - Importing packages
  • columns - Imported packages

Unfortunately this is just too big for GitHub (and even more so for this blog article). It would still be useful if we could use the full width of the monitor for the table or at least scale to smaller text. Both isn’t really an option in practice and scrolling all the time is at least as tedious as clicking around the statistics page. What we really want is to hide less interesting stuff. And we can do exactly that by creating sub-tables:

spaghetti-analyzer --doc='cmd/prometheus,discovery/**,tsdb/**'

This command creates sub-tables for the discovery package and everything under it and similarly for the tsdb package and everything under that. With the discovery and time series data base out of the way we get the following package dependencies:


Dependency Table For: github.com/prometheus/prometheus/cmd/prometheus

c o n f i g - S d i s c o v e r y - S d i s c o v e r y / i n s t a l l - S d i s c o v e r y / l e g a c y m a n a g e r - S d i s c o v e r y / t a r g e t g r o u p - S n o t i f i e r - S p k g / e x e m p l a r - S p k g / g a t e - S p k g / l a b e l s - S p k g / l o g g i n g - S p k g / p o o l - S p k g / r e l a b e l - S p k g / r u l e f m t - S p k g / r u n t i m e - S p k g / t e x t p a r s e - S p k g / t i m e s t a m p - S p k g / v a l u e - S p r o m p b - S p r o m q l - S p r o m q l / p a r s e r - S r u l e s - S s c r a p e - S s t o r a g e - S s t o r a g e / r e m o t e - S t e m p l a t e - S t s d b - S t s d b / a g e n t - S t s d b / c h u n k e n c - S t s d b / c h u n k s - S t s d b / e r r o r s - S t s d b / i n d e x - S t s d b / r e c o r d - S t s d b / t s d b u t i l - S t s d b / w a l - S u t i l / h t t p u t i l - S u t i l / o s u t i l - S u t i l / s t a t s - S u t i l / s t r u t i l - S u t i l / t e s t s t o r a g e - S u t i l / t e s t u t i l - S w e b - S w e b / a p i / v 1 - S w e b / u i - S
cmd/prometheus S S S S S S S S S S S S S S S S S S S S
config S S S
notifier S S S S
pkg/exemplar S
pkg/relabel S
pkg/rulefmt S S S
pkg/textparse S S S
promql S S S S S S S S S S S
promql/parser S S S S S
rules S S S S S S S S S
scrape S S S S S S S S S S S
storage S S S S S S
storage/remote S S S S S S S S S S S S S S
template S S
util/httputil S
util/teststorage S S S S S
web S S S S S S S S S S S S S S S S
web/api/v1 S S S S S S S S S S S S S S S

Legend

  • Rows - Importing packages
  • Columns - Imported packages

All the individual types of discovery are hidden this way. So the sub-table for discovery does an excellent job. The sub-table for the time series data base doesn’t work so well. The storage and storage/remote packages don’t manage to hide the time series data base from the rest of the project. This is something I would invest some work into (and I think the Prometheus project has done so already in a different repository). In summary this still doesn’t look perfect but is much less overwhelming than the big single dependency table.

The sub-table for the discovery is here:


Dependency Table For: github.com/prometheus/prometheus/discovery/**

d i s c o v e r y - S d i s c o v e r y / a w s - S d i s c o v e r y / a z u r e - S d i s c o v e r y / c o n s u l - S d i s c o v e r y / d i g i t a l o c e a n - S d i s c o v e r y / d n s - S d i s c o v e r y / e u r e k a - S d i s c o v e r y / f i l e - S d i s c o v e r y / g c e - S d i s c o v e r y / h e t z n e r - S d i s c o v e r y / h t t p - S d i s c o v e r y / k u b e r n e t e s - S d i s c o v e r y / l i n o d e - S d i s c o v e r y / m a r a t h o n - S d i s c o v e r y / m o b y - S d i s c o v e r y / o p e n s t a c k - S d i s c o v e r y / p u p p e t d b - S d i s c o v e r y / r e f r e s h - S d i s c o v e r y / s c a l e w a y - S d i s c o v e r y / t a r g e t g r o u p - S d i s c o v e r y / t r i t o n - S d i s c o v e r y / u y u n i - S d i s c o v e r y / x d s - S d i s c o v e r y / z o o k e e p e r - S u t i l / o s u t i l - S u t i l / s t r u t i l - S u t i l / t r e e c a c h e - S
discovery S
discovery/aws S S S S
discovery/azure S S S S
discovery/consul S S S
discovery/digitalocean S S S
discovery/dns S S S
discovery/eureka S S S S
discovery/file S S
discovery/gce S S S S
discovery/hetzner S S S S
discovery/http S S S
discovery/install S S S S S S S S S S S S S S S S S S S S S
discovery/kubernetes S S S
discovery/legacymanager S S
discovery/linode S S S
discovery/marathon S S S S
discovery/moby S S S S
discovery/openstack S S S S
discovery/puppetdb S S S S
discovery/refresh S
discovery/scaleway S S S
discovery/triton S S S
discovery/uyuni S S S
discovery/xds S S S S
discovery/zookeeper S S S S

Legend

  • Rows - Importing packages
  • Columns - Imported packages

We can immediately see that the packages discovery, discovery/refresh and discovery/targetgroup are used by (almost) all of the sub-packages except discovery/install. The first probably contains central types and the other two some basic functionality. The discovery/install package on the other side is using all the other sub-packages.

Finally the sub-table for the time series database is here:


Dependency Table For: github.com/prometheus/prometheus/tsdb/**

c o n f i g - S d i s c o v e r y - S d i s c o v e r y / t a r g e t g r o u p - S p k g / e x e m p l a r - S p k g / g a t e - S p k g / l a b e l s - S p k g / l o g g i n g - S p k g / p o o l - S p k g / r e l a b e l - S p k g / t e x t p a r s e - S p k g / t i m e s t a m p - S p k g / v a l u e - S p r o m p b - S s c r a p e - S s t o r a g e - S s t o r a g e / r e m o t e - S t s d b - S t s d b / c h u n k e n c - S t s d b / c h u n k s - S t s d b / e n c o d i n g - S t s d b / e r r o r s - S t s d b / f i l e u t i l - S t s d b / g o v e r s i o n - S t s d b / i n d e x - S t s d b / r e c o r d - S t s d b / t o m b s t o n e s - S t s d b / t s d b u t i l - S t s d b / w a l - S u t i l / o s u t i l - S
config S S S
pkg/exemplar S
pkg/relabel S
pkg/textparse S S S
scrape S S S S S S S S S S S
storage S S S S S S
storage/remote S S S S S S S S S S S S S S
tsdb S S S S S S S S S S S S S S S S
tsdb/agent S S S S S S S S S
tsdb/chunks S S S
tsdb/index S S S S S S
tsdb/record S S S S S
tsdb/tombstones S S S S
tsdb/tsdbutil S S
tsdb/wal S S S S S S

Legend

  • Rows - Importing packages
  • Columns - Imported packages

We can see that the time series data base isn’t as well separated from the rest of the service as the discovery. Especially it isn’t clear how concerns are separated between the time series data base and the storagepackage(s). This is an area I would spend some work on and I think that this even is an old version of prometheus and the times series database has been extracted into its own repository in a newer version.

So the better modularized the project is the more you can rely on dependency tables with sub-tables. Otherwise you have to work more with the package statistics (or hack on CSS).

Conclusion

All 3 views help to establish and communicate a good code structure.

  • The directory view helps to find where is what.
  • The statistics help to find and establish relationships between the packages.
  • The dependency table documents the relationships in a way that is easy to understand.

So we don’t have to fear big code bases just because of their size. The complexity has to live somewhere. Instead we can approach big code bases wisely with tools that help not to get lost in an ocean of code. Last but not least clean modularization is important no matter the form or shape used (services or packages).