Compliant Infrastructure: Organization Setup

August 19, 2023

Compliant Infrastructure

Robert Konarskis, Co-founder and CTO, Savages Corp.

Series Introduction

Regulatory compliance is generally treated as an additional necessary evil that doesn’t bring the business forward but costs time and resources. However, in the end, it’s a set of best practices to protect both the business, the customers, and create a safer environment for everyone.

It takes 20 years to build a reputation and five minutes to ruin it. If you think about that, you'll do things differently. Warren Buffett.

Designing and operating software systems in regulated environments comes with an additional level of responsibility, complexity, and risk. Our aim is to provide you with the fundamental concepts and equip you with a set of practical blueprints and best practices to make fewer mistakes, protect your user data and your business from intentional or unintentional harm. The overall approach tries to strike a balance between complexity and usability without compromising security, and aims at building an expandable setup that can grow together with the organization without needing a replacement.

The examples used in the series are largely going to focus on cloud-based (AWS) setups, but the concepts remain true for any other cloud provider, on-premise or hybrid solutions. Please note that having on-premise infrastructure with user data in regulated environments opens up a whole can of worms related to physical data security, and is generally not recommended for new businesses.

My name is Robert Konarskis, I am one of the co-founders and CTO of Savages Corp, and I spent years building and operating B2B and B2C products in industries with strict compliance requirements, both for our own, and together with our customers. I was directly responsible for system design and infrastructure configuration, documentation and audit preparation, as well as passing the audit.

Besides the compliant software development itself, there are generally six areas to consider before launching a product in a secure environment:

  1. Organization setup
  2. Network configuration
  3. Perimeter security
  4. Data protection
  5. Logging and monitoring
  6. Source code security
  7. Change management and auditing

Organization Setup

The first and the most fundamental step is correctly setting up the (cloud) organization. An organization is an umbrella under which all of the company’s IT infrastructure resources are grouped together. Here are the key requirements that must be covered to ensure security and simplify the compliance process as much as possible, while retaining sufficient flexibility to minimize future rework:

  1. It must support full resource isolation between production and non-production environments (those that do and do not contain real user data).This greatly simplifies access control as all production resources can be strictly locked up and non-production resources can be made slightly more open to make the life of the engineering teams easier. In addition, it must be possible to grant certain people access to a portion of production resources such as technical metrics and logs without necessarily granting them access to user data.
  2. It must come with Single Sign-On (SSO) or an option to easily enable it to access all resources. Centrally managed access to all IT resources not only makes life easier for day-to-day operations, it also makes it harder to make mistakes when provisioning access, and greatly reduces compliance work when it comes to evidence collection.
  3. It must be possible to isolate production resources from organization access control management and organization security automations. One reason for this is the ability to reuse security automations throughout the whole organization, the other is for added security from internal threats as the organization grows.
  4. The organization must have a centrally aggregated audit log repository with tight access controls, where all audit logs are aggregated from the entire organization. Additionally, it must not be possible for anyone in the organization to disable audit logging. This greatly reduces the risk of internal threats and acts as a starting point for any security-related investigations.

Due to the requirements listed above, it is generally recommended to stick with one of the major cloud providers. They are built with compliance in mind and have greater configurability compared to smaller service providers which may not have all the necessary tools in place.

Management Account

Before diving into the topics, make sure your foundation is future-proof. Management account creation is the first step in configuring an organization. When creating one, if it does not exist yet, create a corporate email address or alias such as Make sure to not use a personal email alias as these tend to be hard to change in the future when the organization grows. When creating an account with the chosen provider, it is recommended to call it ‘management’, and set up billing, then upload root user credentials to a password manager or another secure location, plus enable multi-factor authentication (MFA) where possible. These are the keys to the entire organization and they must be kept safe.

Full Resource Isolation

At its core, full resource isolation means that every environment stage (development, staging, production, etc.) is a fully complete and isolated system on its own without any shared dependencies. Compute resources from one environment can’t access data resources from another environment. In most cloud providers this is achieved by creating separate accounts, such as in the example illustrated below.

This means that everything from applications to network configurations to deployment pipelines must be configured separately. Since our goal is to be able to change all of the environments from a single place, this pretty much mandates all infrastructure resources to be defined as-code with tools like CDK, Terraform, etc., which is often a best practice anyway and becomes a significant time-saver once the tools are familiar.

When it comes to compliance benefits, if the infrastructure is defined as-code, you can reuse the same change management policies for application code changes and infrastructure changes. With manually provisioned infrastructure, it is nearly impossible to review the changes proactively, hence a need for reactive checking of audit logs after every change arises.

Separating production and non-production also simplifies the audit process as it reduces the number of questions that need to be answered about data segregation. It’s sufficient to state that all real-user data is isolated in its own environment with limited access vs having to prove that it’s not mixed with any non-production data to which the engineering team may have unrestricted access.

All of our clients benefit from scoping most of their compliance efforts to the production account and not having to provide evidence for other accounts that do to store or process user data.

Single Sign-On (SSO)

SSO generally solves 3 problems very well. First of all, being able to access all of the company’s IT systems from a single place with a single set of credentials is a convenience for the team. Secondly, being able to create one user, set appropriate permissions once, and remove permissions from a single place, is a convenience for the system administrator. Third and most importantly, it allows to configure organization-wide security requirements such as multi-factor authentication (MFA) from a single place, and centralize all access (incl. access logs).

From the compliance perspective, this means that most access-related evidence collection is scoped to the SSO system that is being used, and not the individual services. SaaS providers such as Okta, Github and others know how valuable this is, and price their ‘Enterprise’ offerings with SSO features accordingly.

This does not mean that you must set up an SSO provider and spend hours hooking it up with your cloud service provider on day one. As long as they are compatible, this change can be made at any point in the future. When the pain of managing users in multiple places outgrows the pain of implementing and paying for SSO, you should be able to make the investment.

Security Centralization

Within an IT organization, there are multiple roles that need to be covered, whether by one individual or different people or teams. The most common tasks revolve around:

  1. Operating the application. If all the resources the engineering and/or operations teams need to do this successfully are located in the ‘Workloads’ accounts, their access can be scoped to those accounts only.
  2. Billing or access management. Typically, very few members of an organization can configure user access and set up billing. Only those people need access to the ‘Management’ account of the organization where these tasks can be performed.
  3. Changing security configurations and performing investigations. The security team is usually in charge of ensuring appropriate security incident response that may involve checking audit logs or configuring security automations. Their access can be limited to the ‘Security’ accounts.

The diagram above illustrates an organization structure for a multi-account AWS organization setup according to the industry best practices. The engineering teams can worry about the ‘Workloads’ accounts, the security team operates in the realm of the ‘Security’ accounts, and the management can control the organization from the ‘Management’ account.

Access and responsibility separation described in this section reduces the risk of internal threats as most individuals have their access scoped to specific parts of the organization, which makes it more difficult for any one individual to commit internal fraud. This is the kind of forward thinking that many auditors are looking for when performing compliance assessments, hence AWS made this a best practice with their Control Tower service relatively recently.

Centrally Aggregated Audit Logs

Every major cloud provider or SaaS system supports audit logs for the operations performed by human users, and, in some cases, by the system itself. When it comes to the IT organization, it’s mandatory to have all audit logs aggregated in a single place with strict access controls. In addition, it’s important that nobody in the organization can disable audit logging from any of the accounts (this comes by default when using AWS Control Tower for setting up the organization in AWS).

Please note here that not only the audit logs from your cloud provider are relevant, but also logs from other IT systems in use such as git repositories, external SSO providers, even customer support systems. Whenever possible, it’s recommended to set up integrations between all relevant systems ensuring that audit logs from all of them are aggregated in a central place.

In case of a security-related incident, or a suspicion of one, the centrally aggregated log repository is the first place to analyze and reconstruct the events. According to most compliance frameworks, these logs must be retained for long periods of time, the exact requirements being slightly different from framework to framework and range from a few months to 6+ years.

In Practice

We have a story from a client with a particularly interesting employee that had to be fired on an immediate basis with all of their access removed as soon as possible. When this happened, the benefits of following industry’s best practices became apparent.

First of all, we were immediately able to remove access to all IT systems used by the organization, meaning our Single Sign-On (SSO) efforts paid off at that moment. Soon after, we were able to trace that person’s actions precisely across all of the organization’s AWS accounts, with a special focus on the production workloads account, which was possible due to complete resource isolation, and audit log aggregation in a single place. It was a necessary measure to ensure that no backdoor was left open that could potentially grant them access to the system in the future. We reviewed all of the logs and concluded that the system is safe.

In the next article, we will look at the network configurations and offer guidance for building secure integrations both within and outside of the IT system.

Table of Contents