Solutions Architect Professional: Why?

Amazon Web Services is the juggernaut of cloud computing. When I work with large tech-first companies, the question isn’t whether they use AWS; it’s whether they use anything else in addition to AWS. AWS is what Microsoft was in the 90’s.

That means when I’m planning out security solutions with customers, they’ll use AWS terminology, concepts and practices. AWS is the lingua franca. Even on Azure, GCP, Kubernetes, or on-prem, the building blocks are going to be their own twists on AWS services.

You could learn AWS by studying on your own, but the certificate gives you a concrete goal to study for and a little proof of your expertise at the end. So why not do it?

Out of all the AWS certificates, I chose the Solutions Architect Professional certificate because I like a challenge. I’m hoping to get equivalent certs on the other two major clouds soon.

The Challenge

This certificate is extremely challenging. AWS recommends two years of experience as a lead cloud architect before even starting to study for it. It covers all areas of AWS, including networking, security, storage, compute, serverless, and the added-value services that AWS has built on top.

I did have a background in AWS before I started. But no more than the average software engineer. I had never been responsible for running a large AWS deployment or using the vast majority of AWS functionality.

I studied throughout summer 2024, and took the test in September.

The Process

I used three primary resources to study for the exam:

  1. AWS Documentation. AWS spends an enormous amount of time crafting high-quality documentation, so it makes sense ot use it. It is well-written and even calls out potential “gotcha” areas. But it is a moutain of documentation, so you have to be selective.
  2. ACloudGuru’s courses. I watched all the videos for the associate-level and professional-level exams. They are roughly 30 hours of content, combined, and include practice exams and hands-on labs.
  3. Adrian Cantrill’s course. This one is about 70 hours. It also includes a few practice exams.
Both resources were key. ACloudGuru has many hands-on labs, practice quizzes, and full-length practice tests, but the lectures are not very detailed and I feel you’d be better off just reading the documentation. Adrian Cantrill’s course, on the other hand, has fantastic video lectures that are better than nearly anything else you could find; however his practice quizzes and labs are not as good. I’d recommend doing what I did and purchasing both.

Watch at 2x speed

I always find it extremely hard to pay full attention to pre-recorded online lectures because of the inconsistent pacing. I recommend going at 2x speed, and then slowing down when you don’t understand something. This is better than trying to pay attention to things that you already know, and getting bored.

Flash cards

Spaced-repetition flash cards are an effective way to remember details. I made my on flash card app and used it a bit, although it wasn’t my main means of studying. I’ll link to it in the future.

What to study

History and context

It helps to know a bit of history. S3, EC2, and SQS were the original three AWS services, and EBS, IAM and VPCs were bolted on later. Then other services came after that. This helps you explain why S3 has its own permissions mechanism separate from IAM, or why instance stores exist outside EBS.

VPCs were added in 2009, but only made mandatory in 2021! Before VPCs, instances were connected directly to the Internet. EC2 without a VPC is called “EC2 Classic.” It’s likely there are still a few non-VPC instances that are still running today, and various bits of AWS do not actually depend on you having a VPC.

There have been many high-profile breaches caused by insufficient S3 access permissions. Hence the multiple attempts at adding better security to S3.

AWS initially used Xen virtualization under the hood, but switched completely to their own hypervisor called Nitro in around 2016. There are still old instance types and AMIs around that only support Xen.
AWS used to let instances access their own metadata through a very simple service called IMDS. After a high profile breach, they updated to IMDSv2, which is more secure - but IMDS is still available by default.

For most of AWS history, you had to SSH into an instance directly. A whole universe of products called “privileged access management” cropped up to restrict and monitor users SSHing into instances. AWS didn’t develop just one alternative – they now have two, Instance Connect and Systems Manager Session Manager (say that one ten times fast).
Also, for most of AWS history, users had fixed static Access Keys and Secret Access Keys. AWS is really trying to get away from that, with multiple different ways to get temporary Access Keys and Secret Access Keys.

Compute

  1. There is less on the exam about core compute than I originally expected.
  2. Para vs. HVM
  3. AMIs
  4. Placement groups
  5. Behavior around rebooting, stopping, and editing instance details
  6. Launch templates
  7. IMDSv2
  8. Reserved instances (convertible and not), and selling reserved instances
  9. Spot instances
  10. Dedicated instances
  11. Fabric
  12. Amazon Linux 2
  13. Graviton
  14. GPU instances
  15. Memory encryption (SEV)

Queues

  1. SQS
  2. SNS (easy to mix up with SQS!)
  3. Amazon MQ
  4. Kinesis
  5. Firehose. Firehose was originally called Kinesis Firehose, which was very confusing since it had little to do with Kinesis.
  6. Managed Kafka
  7. IoT MQTT

Networking

  1. AWS system structure:
    1. Three major partitions (Public, U.S. Federal, and China)
    2. Regions
    3. Availability zones
    4. Local zones
    5. Wavelength (super-local zones)
  2. VPCs
    1. CIDR blocks
    2. Internet gateways (regional)
    3. NAT gateways (zonal) (expensive!)
    4. Gateway endpoints (used for private S3 and DynamoDB access)
    5. VPC Peering
    6. IPv6 addressing and egress
    7. Client VPN endpoint; link aggregation for VPNs; authentication for VPNs
    8. IPAM
    9. Routing tables
    10. Subnet allocation
    11. Splitting subnets
    12. What to do when peering two VPCs with overlapping subnets
    13. Transit through a VPC is (mostly) not allowed
    14. Recommended architecture: VPC per project with a central Shared Services VPC
    15. Overall, need to have a strong understanding of which VPC resources are zonal and which are regional
    16. Flow logs
    17. Mirroring
    18. NO multicast support!
  3. DNS
    1. Customizing VPC DNS through DHCP Option Sets
    2. Split DNS
    3. Configuring DNS with instance names
    4. Route 53 registration and zone hosting
    5. DNS based load balancing based on latency and geography
    6. DNS health checks
  4. Client VPN
  5. Site-to-site VPN
  6. Transit gateway
  7. Global accelerator (GAX)
  8. Direct Connect (DX)
    1. DX is hard/impossible to experiment with on your own!
    2. MACsec
    3. Link aggregation with 2 DX, or
    4. Differences between dedicated DX and managed service provider DX
  9. ENIs
    1. Security groups
      1. Using one security group as an ID in a different security group
      2. Difference between Security Group and NACL
    2. Performance basics: accelerated network interfaces, fabric
    3. Attaching one EC2 instance to multiple VPCs
  10. Shield / Shield Advanced
  11. Auto Scaling Groups
    1. Especially behavior in balancing across zones
  12. Load balancers
    1. NLBs
    2. ALBs
    3. TLS termination
    4. Interaction with Auto Scaling Groups
    5. In practice, you want most things in AWS to be behind some kind of load balancer, as you want fine-grained control
  13. API Gateway
    1. REST mode understands requests and provides more detailed functionality (doesn’t actually HAVE to be REST)
    2. Non-REST mode just uses HTTP
  14. VPC Lattice
    1. This is not actually on the test yet, but it seems to be AWS’ big new networking thing, so it probably will be soon.
  15. CloudFront

Storage

  1. FSx – understand all modes
  2. Storage gateway – again, understand all modes
  3. Instance store
  4. EBS
    1. Basic performance and reliability characteristics
    2. Expanding a volume
    3. Snapshots
    4. RAID arrays of EBS volumes and when to use
  5. S3
    1. S3 is the core storage service for AWS.
    2. S3 gateways in the VPC
    3. Bucket policies & IAM
    4. Origin access control
    5. Replication between regions
    6. Tiering/service levels
    7. Object lifecycle management
    8. Mounting with Amazon’s s3mount utility, or synching with the aws command line
    9. Pre-signed requests for upload and download
    10. Transfer
      1. Basically a serverless FTP/SFTP endpoint that can talk to your buckets.
    11. Storage Transfer Service
      1. Sophisticated tool for moving data between S3 and other services on a schedule or driven by events.
    12. Lambda objects
    13. Triggering events from S3 actions (both within S3, and also CloudWatch Events)
      1. S3 Express One Zone
        1. This is a very different service that happens to be under the S3 brand name. It has totally different semantics from S3.
  6. EFS
    1. Use on Linux and Windows
    2. Transit encryption – this actually just uses stunnel at the application layer!
    3. Pricing (this one’s expensive)
  7. Snowball Edge
    1. This started as a storage device, but now has a lot of compute capabilities too.

Databases

  1. RDS
    1. Data migration
    2. Schema migration
    3. Supported databases
    4. MySQL
    5. Postgres
    6. SQL Server
    7. Oracle (it works a bit differently)
    8. DB2 (also a bit odd)
    9. Gaps in functionality between these, e.g. a good deal of stuff is not available on DB2
    10. BabelFish
  2. Replication and promotion
  3. Maintenance windows
  4. Backup RTO and RPO
  5. Aurora
    1. Despite the marketing, Aurora is basically RDS with an improved storage layer.
    2. Global databases
  6. Authentication for these
  7. DynamoDB
    1. Performance quotas
  8. Read the Dynamo paper!
  9. Partition and sort keys
  10. Quotas and performance management options
  11. DAX
    1. Always needs 3 nodes!
  12. DynamoDB gateways in the VPC (again)
  13. Managed ElasticSearch
  14. Athena
  15. Redshift
  16. Redshift Spectrum
  17. Caches:
    1. Managed Memcached
    2. Managed Redis/Valkey
    3. You really always want to be using Redis/Valkey. Memcached is usually a red herring.

Security

IAM is a huge topic on its own. You could write a long book just about IAM. This is the most significant single study area. Truly mastering IAM would probably take years of study. Be forewarned!

  1. Accounts, users, and organizations
  2. ARN format
  3. Cognito
    1. Identity federation for applications
    2. Identity federation into AWS users
  4. IAM
    1. Need to be able to write and read IAM policies fluently.
    2. Understand resource and user policies
    3. Understand roles
    4. Understand using roles across accounts
    5. IAM Federation using SAML and OIDC
    6. Tricky stuff: can never block all services in the us-east-1 region, even if you want to limit users to only one other region!
    7. Auditing and checking policies work as expected
    8. Permissions boundaries
    9. Service control policies
    10. 2fa setup
    11. AWS-managed policies
    12. Permission boundaries
    13. How to share resources between accounts
    14. PrivateLink endpoint interface permissions
  5. STS
  6. Simple Directory Service (basically Samba)
  7. Managed Active Directory (basically Windows Active Directory)
  8. Federating with your own AD server across DX or VPN
  9. RAM
    1. Share a VPC between accounts
    2. Certain resources can be shared with RAM
    3. Kind of a kludge outside using proper IAM policies to accomplish the same thing
  10. Control Tower: know basics
  11. Landing Zone: know basics (but really you should just use this)
  12. EC2 Instance Login
    1. Login to instances with Instance Connect
    2. Login to instances with Systems Manager Session Manager
    3. Behavior of SSH keys in the EC2 console
  13. GuardDuty
    1. General threat detection service
  14. Macie
    1. Special tool for S3 security only (strange name)
  15. Firewall Manager
  16. Web Application Firewall
  17. Config
    1. Tracks versions for every AWS object in your account
    2. Remediates configuration drift
    3. Expensive
  18. CloudTrail
    1. Logs every action that affects AWS objects
    2. Understand differences between CloudTrail, CloudWatch, and Config
  19. Security Hub
  20. Inspector
  21. Limiting what users can deploy using Service Catalog or CloudFormation Stacks
  22. Label policies
  23. PCA
    1. Run a CA for your own stuff
  24. KMS
    1. Heart of all encryption functionality in the AWS universe
  25. HSM
    1. Really for compliance requirements; very expensive
  26. Roles Anywhere
    1. Access APIs with certificates instead of STS tokens
  27. IRSA
    1. Access roles from Kubernetes
  28. Shared responsibility model

Billing

  1. AWS Budgets
  2. AWS Cost Control
  3. Having a single budget account within an org
  4. Marketplace
  5. Billing for APIs that you create using API Gateway

Serverless

  1. ACR
  2. ECS
  3. Fargate
    1. Can run in ECS or lambda modes
    2. Can even run locally now
  4. Lambda
    1. Versioning
    2. 15-minute limit
    3. Attaching to VPC
    4. Attaching to EFS
    5. Roles
    6. Base images
    7. Building new base images
  5. Step Functions
    1. Mini programming language for building state machines out of multiple lambda functions
    2. Similar to state machines
    3. Can build an entire system using lambda and step functions
  6. Batch
  7. Using CloudWatch Events to auto-trigger things

IoT

  1. Surprisingly, there were several questions on this, even though it’s a very niche area
  2. Provisioning with certificates
  3. Queueing messages with MQTT
  4. Device updates

CloudFormation

  1. You do not have to know how to write a template
  2. But you do have to know how they deploy, how to use variables, how to nest templates

Migration

  1. Know the services AWS provides for migration
  2. Application discovery on VMWare
  3. Migration Hub
  4. Mainframe migration tools
  5. 3 R’s, building a migration strategy

Observability

  1. CloudWatch
  2. X-Ray (basics only)
  3. Inspector (basics only)

Other services

  1. Workspaces
  2. EMR basics
  3. Transcode & other video services
  4. Polly
  5. Glue
  6. Marketplace basics
  7. VMWare on AWS - why use it?
  8. EKS - extreme basics only, this is not a Kubernetes test
  9. Elastic BeanStalk - getting old but still relevant
  10. Support tiers
  11. Outposts
  12. Look at all the stuff AWS offers for running locally on your own machine
  13. License Manager

Don’t bother learning

  1. CloudFormation details
  2. Terraform, Ansible, Chef, Puppet (although having a basic knowledge of all would be useful)
  3. Specific command lines
  4. More than basic knowledge of using the console
  5. AWS APIs
  6. Pricing details
  7. Names or characteristics of specific instance types
  8. Very advanced topics in networking and performance
  9. HPC

Vestigial bones

Just like humans have a few vestigial bones from back when we had tails and ate raw plants, AWS has some vestigial bones from its early architecture. It’s evolved over time, but you can see traces of the path it took to get there.

Pricing

The test does not cover pricing. This makes some sense, since most companies have negotiated agreements that come with substantial discounts, and AWS changes pricing all the time.

But you always need to remember that AWS wants you to run up as large a bill as possible. Often, the “recommended” approaches will incur hundreds or thousands of dollars a month in unnecessary expenses, when a simpler and cheaper solution is available.

AWS is a bit like Amazon.com: They already have your credit card number, and make it very easy to spend a lot of money that you don’t really need to. And they don’t necessarily want to train their certified experts in how to reduce costs.

For real-world use, remember:

  1. AWS systematically under-prices compute and storage, and then makes up the difference on network bandwidth pricing. I suppose they figure that if your company is just starting out, it doesn’t really need much bandwidth yet; then when it’s successful you’ll have money to spend on bandwidth (and are already locked in). If you have bandwidth-heavy stuff like video though, use something else!
  2. The networking portions cover PrivateLink (Gateway and Interface Endpoints) and Transit Gateways. These both have a significant added cost though, even though they feel like they are just convenience features that shouldn’t cost extra.
  3. Lambda is unbelievably expensive compared to just hosting the same things yourself in EC2. It’s convenient, and useful for lots of smaller projects, but anything that’s going to blow up probably needs to be moved back to traditional hosting. Fargate is a nice middle ground because you can start with Lambda as the runtime, then switch it to run on ECS which is much cheaper once you need that volume.
  4. RDS and Aurora get really expensive, primarily because they need a ton of memory, which is often the most expensive variable in AWS instances. If at all possible, avoid them or minimize their use.
  5. Advanced services around security are all incredibly expensive. For example, I know companies that have accidentally spent tens of thousands of dollars in a month just by turning on Config. Presumably they have customers who really want those services, and are willing to pay big money.

The cloud can be cheaper than running things in a data center, but it’s not automatically cheaper. Many AWS featuers are designed to lock you in to AWS, so your bill can only grow. Any real architect needs to understand the pricing structure and steer away from the money pits.

Most medium-sized AWS customers work with a managed service provider (MSP), who bundle up many AWS customers and negotiate shared discounts with AWS. They can also share expensive resources like DX connections (which start at $10,000 per month), and provide first-line support. Unfortunately, it does usually mean giving the MSP a great deal of control over your account, which may be unacceptable. Larger customers can negotiate discounts independently.

The most important thing

The test is extremely time limited. The questions and answer choices are long, complicated, and often confusing. You only have about two minutes per question, so you’ll have to read and understand very very quickly. Many people who know all this material won’t be quick enough to answer the questions in the time allotted.

It’s vital to take practice tests in order to get into the rhythm of answering these long, winding, multi-part questions. I found ACloudGuru’s practice tests to be very good, and did all of them (some multiple times).

If you’re not a fluent English speaker, and you’re taking the test in English, it is going to be an extreme challenge.

What can we surmise about AWS internals?

Naturally, AWS doesn’t tell us very much about how their system works under the hood. However, it’s fun to think about how AWS must work internally, in order to have the observed behavior. I’ve made a few educated guesses that help me reason about how AWS must work.

Is this a useful certificate?

Yes! Understanding the array of options for building things on AWS is extremely helpful. You could easily save a company millions of dollars by going down the right path instead of the wrong one.

It is also a challenging enough certificate that not too many people have it.

However, just taking a test is very different from hands-on experience. You don’t have to find a problem with a failing network configuration, or know how to actually write a query for any of the many databases. You could come out of this test knowing how to whiteboard a complex system perfectly, but without the experience to actually do it.

I wish the material covered modern immutable infrastructure and declarative configuration in more detail. Unless you’re migrating old stuff, there’s absolutely no reason to have traditional always-on services configured through a console in a cloud environment.

Fun things I learned along the way

  • Adrian Cantrill is an incredible teacher. I do my own technical trainings sometimes, and I hope I can make mine half as good!
  • NAT gateways are pretty expensive for personal use (at least $30/month). Can I avoid them by using IPv6 (so no NAT is required)? YES! Although a few key sites like GitHub still don’t work on IPv6.
  • You can create multiple accounts on AWS using username+account1@gmail.com, username.account2@gmail.com, etc. And each gets $300 in AWS starting credits.
  • ChatGPT is a great study buddy. It has very good knowledge of AWS, perhaps because so many people write about it.
  • I posted about passing the test on Twitter, and got what really appears to be a personal congratulations from AWS. I didn’t tag anyone there. Very kind of them!
  • Apparently if you pass all the AWS tests, they send you an exclusive gold jacket. This would be a fun challenge, especially if you can get someone else to pay for all the courses and exam fees!

Cost of getting the certification

I was paying for all of this out-of-pocket, so cost was a concern. In total, I spent around $550:

  1. Around $80 for Adrian Cantrill’s course (one time fee)
  2. $35/month for ACloudGuru access, adding up to about $150
  3. $300 exam fee
  4. About $50 of AWS fees on top of the $300 in free starting credits they give you (I accidentally left something running for a month)

I think it’s worth it, but I understand not everyone has that much cash available up-front. If you can, see if your employer will pay for these expenses.

If you pass one exam from AWS, they give you a 50% off discount on your next exam. So by taking an easier (and cheaper) exam first, you may be able to save $150.

Conclusion

This was a fantastic journey, and I’m glad I did it. I went from having detailed knowledge about a few specific AWS areas, to deep knowledge about almost every user-facing service they provide. It took me around 200 hours of study over a few months to get to that level. I would compare it to roughly the same effort as an advanced-level college course.

I’m not sure if I’ll ever be responsible for AWS in an operational capacity in the future, but I’m confident I can at least communicate with the people who are, and speak the same language.

Plus it was a fun challenge!

What’s next?

I’m hoping to get high-level certifications on all 3 clouds, and do the CKA, and do a few security certs! So it’s a long road ahead for me.