Structuring Go Projects: Overcoming Scaling Challenges

Introduction
#

When I started learning Go, early in 2019, I remember not being very pleased with it. Among my many quarrels with Go ideas, I remember struggling with the question “How the hell should I structure my project?”. I remember coming across the infamous golang-standards repo. But relax, this is not that kind of story.

This story is about how scaling up can often expose the shortcomings of early, well-intentioned, and fair decisions. It highlights the importance of structuring your project with your specific context and scalability in mind.

This is based on my experience and I will assume some familiarity with Go as we will get a bit technical towards the end. I also took some liberties in simplifying and renaming some internal concepts to, hopefully, make things a bit more digestible.

The Stage
#

Not too long ago, in the whimsical realm of Serverless, hundreds of lambdas thrived happily in a monorepo kingdom. These lambdas were continuously built into individual binaries, and while deployments weren’t quite continuous, they happened fairly regularly. Events were triggered, state machines executed, and records were written, all in a beautiful choreography that, most days, went off without a hitch.

But all was not well in this kingdom.

While everyone was going about their business, it seemed like a highly contagious and silent disease was spreading right under everyone’s noses. All lambdas appeared to be manifesting changes at every build and, little by little, slowing down deployments until timeouts started to happen and minutes started to run out. And if that wasn’t enough all lambdas seemed to be getting ever bigger, all growing at the same slow yet steady rate.

Why was this happening?
#

We had to figure out why this was happening and luckily we didn’t have to dig very deep to find the issues:

With the help of goweight, we learned that some of our cloud provider (AWS) libraries were unexpectedly large and they always showed in the top 10 largest dependencies of any sampled lambdas.
Perhaps most critically, with godepgraph, we found that all sampled lambdas had more or less the same, very large, dependency set, regardless of what they did.

Naturally, if every lambda depends on everything then any change, no matter how remote or unrelated, will cause a new binary to be built (different hash) causing our deployment tools to deploy every single lambda every single time. Also, every component we added or library we included in go.mod was included in every lambda binary which explained the constant growth.

How did we get there?
#

The answer lay in some decisions made a long time ago. In particular, 2 critical decisions that favoured writer convenience over clarity and scalability:

GraphQL type generation
Lambda dependencies initialisation

1. The type generation issue
#

We had this convention where every model type should be formally declared in GraphQL.

We then had gqlgen (Go library/tool for building GraphQL) generate our Go types from the GraphQL definitions and dump them all into a single Go package inevitably named: types. Perfect!

Well, it probably didn’t feel like a big deal back when there were less than 100 types in types, but now that package alone was +8MB (at this point or average lambda binary was ~50MB, so about 16%). And remember, if a single bit in types changes (which it did often) then anything importing types (which everything did) had to be redeployed.

Sure it was convenient early on to just generate everything into a single package and, maybe one could even argue it also seemed to be somehow aligned with the Go standard library with regards to preferring flat rather than deep file structures (look at net or os for example). But the lack of focus and sheer breadth of this package was turning it into a huge problem.

2. The dependencies initialisation issue
#

Initialising dependencies was conveniently facilitated by this Services type which knew how to initialise everything that a lambda, any lambda, might ever need. Even if the lambda didn’t need it.

This type was everywhere and I must say that it was really convenient for whomever was writing. It allowed for shorter function signatures and the writer didn’t really have to think too much about how to get some component, it was just there.

What this implies though, is that this type imported pretty much everything which was evidently becoming a big problem for us. Not to mention that it made it harder for the reader to understand what was actually being used when a Services type was required which sometimes led to misconfiguration issues.

The Solution
#

To contain the impact of changes we decided to introduce domains. No, this is not a post about Domain Driven Design! Sure there might be some overlap but, in our case, a domain was just a way to compartmentalise types and functionality as well as keep related components (like lambdas, http clients, db repo) closely together to make it easier to understand a system.

The domain packages (the top-level Go package of a domain) had to be as dependency-free as possible so that when depending on it we could be confident we were only importing relevant types to that domain.

This crucially meant these packages could only contain DTO-like structs and interfaces, which implied that implementations (of potential repos, clients, services, etc…) had to live elsewhere. We decided to create nested packages for each implementation. This approach allowed us to better isolate dependencies even though it doesn’t align with Go’s preference for flat over nested packages.

This proved to be a positive outcome because it invited business logic to depend on interfaces and types on the domain package rather than the generically named implementation packages.

Here’s how a foo domain would have looked like

go-module/
├── foo/
│   ├── apis.go         # interfaces for components in the domain, for example a Repo interface
│   ├── generate.go     # just the file with a go:generate comment to generate the GraphQL types
│   ├── gqlgen.yml      # type generation config
│   ├── internal/       # shared logic here, accessible only to things in the same domain
│   ├── lambdas/
│   │   ├── createfoo/  # lambda handler here
│   │   └── listfoo/    # lambda handler here
│   ├── repo/           # Repo interface implementation here
│   └── types_gen.go    # generated types

Just by generating types in separate domains and restructuring things a bit we began to observe progress. As expected, brand new lambdas, in brand new domains, with their types in the same domain were not triggering the deployment of other lambdas when changed. Fantastic!

Of course, these lambdas were themselves still being deployed when any other type changed because of the types package, but at least we could see the light at the end of the tunnel and, little by little, the types package was going to be broken down.

Another benefit of this structure was that reading code, you know, the thing we do all day, also became a little nicer. Before where we had types.FooRecord we now had foo.Record.

While this was great, the most interesting part to me, I have to say, was how we solved the lambda (or perhaps dependencies) initialisation issue.

We wanted to keep it relatively simple for someone (perhaps a new joiner) that was working on some lambda to use some existing component. This required shifting complexity elsewhere and this is when we introduced Wire.

Wire is a compile-time dependency injection library. In summary, we declare what components we need and Wire takes care of all the initialisation for us. With a single command, we were able to generate all the boilerplate code to initialise all dependencies for any lambda.

We also wanted to keep our function signatures short and predictable. This was where Go implicit interfaces and composition really showed their power.

Perhaps it’s better to start by looking at what a typical lambda handler would look like:

package mylambda

import (...)

type Dependencies interface {
    foo.RepoGetter
    bar.ClientGetter
    ...
}

//go:generate genlambda MyLambda

func MyLambda(ctx context.Context, d Dependencies, input baz.SomeInputType) (baz.SomeOutputType, error) {
    ...
    d.FooRepo().FindFoo(...)
    ...
}

We thought this was pretty neat! The reader knows exactly what components a lambda is using just by looking at the Dependencies interface. In case you’re wondering about those ???Getter types, they are just interfaces with a getter method. For example:

package foo

type RepoGetter interface {
    FooRepo() Repo
}

This means that the Dependencies interface is equivalent to:

type Dependencies interface {
    FooRepo() foo.Repo
    BarClient() bar.Client
    ...
}

The ???Getter interfaces are there just for convenience. I know, here it is again. More on this later on.

Lambdas depend on ???Getter interfaces and we implement those interfaces with a ???Wrapper struct like:

package foo

type RepoWrapper struct {
    x Repo
}

func (w *RepoWrapper) FooRepo() Repo {
    return w.x
}

Then, we compose a type with these ???Wrapper structs in order to satisfied the Dependencies interface.

type wrappers struct {
    *foo.RepoWrapper
    *bar.ClientWrapper
    ...
}

And finally, Wire takes care of generating the wrappers struct initialisation logic for us, making use of the implementation packages from before.

In our approach Wire is the only thing that knows about those implementation packages from before, which means it is the only thing that has to deal with the consequences of poorly named packages.

You may have noticed the //go:generate genlambda MyLambda comment. The setup process was a little bit tedious and, luckily, very predictable. So, we created this little genlambda tool to help us create & update a type that satisfies the Dependencies interface. It actually ended up doing a little bit more than just that, but that was its main purpose.

And voilà!

We had controlled the disease and achieved all our goals:

Domains helped contain the impact of changes. 🎉
We were no longer including unused libraries in each binary. Most lambda binaries shrank by nearly 40% while the rest ranged down to a modest size reduction of about 17%. 🎉
It was still easy enough to just use an existing component. If you need the baz.Y component just add baz.YGetter to the Dependencies interface and run go generate. If you forgot to run go generate you’d get a build time error due to the wrappers struct not satisfying the Dependencies interface which was also neat. Then you’d be able to get that component like d.BazY(). 🎉
This approach also meant that we could still pass Dependencies around to other functions which kept their signatures small and readable. If a function needed foo.Repo then it would expect something that implemented a foo.RepoGetter. 🎉
Bonus: because everything depended on interfaces testing and mocking became easier. 🎉
Bonus: streamlined lambda setup with genlambda tool. 🎉
Bonus: the size reduction benefits were compounded by the domains effect on limiting the propagation of changes. Not only fewer things were being deployed each time but uploading/downloading each binary was now quicker. 🎉

Learnings
#

This solution certainly wasn’t perfect (just while writing this article I thought about 1 or 2 things that I would like to try differently) but we managed to solve both issues and we were generally happy with the outcome.

We still created some tech debt for convenience sake… Truth is, making things convenient is fine, even desirable, but it certainly should not be done at all costs nor without careful consideration about future implications. I believe the concessions this time around were more manageable:

Little bits of duplication here and there - mostly gqlgen config and the wrappers.
The ???Getter interfaces were defined next to the implementation - mostly to avoid having to learn/remember exactly what the getter method looked like and it made life simpler for the genlambda tool.
Maintaining this increasingly bigger file with an exhaustive list of all constructors for Wire was becoming a burden - we could probably generate this too but as we generally only appended to it that was low on the priority list.
We had to become vigilant of someone inadvertently importing something that caused a dependency explosion in any top-level domain package - we could probably have some automated way to check for this later on.

I may have gone a bit nuts on code generation. This is yet another thing we now have to maintain and I’m not entirely sure of how this will pan out. My experience tells me it’s probably going to be fine as long as these tools don’t grow too much.

A new problem that we found ourselves often going back and forward was around defining domain boundaries. It sure ain’t an easy task and we had to keep reshuffling things around until it felt right.

Go unpinionated stance on how to structure your projects along with the ease of just importing any package when working in a monorepo context can lead us down some nasty paths. Perhaps, more critically than in other contexts, we should to give more consideration early on to how we are going to structure our Go projects to avoid major setbacks in the future. That said, Go package model showed its flexibility by allowing us to refactor and get the outcome we needed to solve our problem.

Thoughts
#

While discussing these problems with the wider team someone eventually did ask: Why doesn’t Go have something like TypeScript tree shaking? I’m not sure but I believe I have read somewhere that Go does perform some level of dead code elimination but due to the possibility of someone using reflection to call methods that looked like were not being used, the compiler has to include everything that’s in any imported package in the final binary.
In order to get these changes implemented I invested a lot of time in tooling to help not only with the migration but also with maintaining the new lambda style and project structure. I still believe that if you want to make something happen you need to get as many obstacles as possible out of the way to get others to follow but I have some reservations that maintaining these tools might become too much of a burden in the long run.
Go allows you to structure your project however you need. This could be both a blessing and a curse. Some online resources will recommend good starting points but you shouldn’t take them as gospel. I believe most projects will likely grow into their own thing as they scale.

Acknowledgments
#

This article is my first attempt at writing for the internet, and I’m excited about the journey ahead. I just want to extend my heartfelt thanks to my partner and friends for all their support and encouragement.

Introduction#

The Stage#

Why was this happening?#

How did we get there?#

1. The type generation issue#

2. The dependencies initialisation issue#

The Solution#

Learnings#

Thoughts#

Acknowledgments#