Wilson Mar bio photo

Wilson Mar

Hello!

Calendar YouTube Github

LinkedIn

Dynamically audit AWS resource configrations using using OPA Rego, Lambda alerts using SNS through EventBridge, then automatically remediate (via SSM)

US (English)   Norsk (Norwegian)   Español (Spanish)   Français (French)   Deutsch (German)   Italiano   Português   Estonian   اَلْعَرَبِيَّةُ (Egypt Arabic)   Napali   中文 (简体) Chinese (Simplified)   日本語 Japanese   한국어 Korean

Overview

Security threats are getting more and more clever and incidious. So a quicker and more thorough response to remediation is necessary in today’s world.

NOTE: Content here are my personal opinions, and not intended to represent any employer (past or present). “PROTIP:” here highlight information I haven’t seen elsewhere on the internet because it is hard-won, little-know but significant facts based on my personal research and experience.

There are mechanisms:

  1. Scan IaC (Infrastructure as Code) (Terraform and CloudFormation) using TFSec and similar IaC scan tools, before they are used to create resources in AWS. This is a “shift left” approach while on laptops. This is needed because by the time that vulnerabilities are spotted in AWS Config and AWS Security Hub, the vulnerability already exists. This evaluation can be repeated as part of every CI/CD infra deploy pipeline run.

  2. Scan for vulnerable conditions within AWS resources. This is done by AWS Config, which logs every change to AWS configurations in the GUI, CLI, or API.

  3. When a vulnerability is identified, trigger an alert for manual follow-up. This is what AWS Security Hub does.

  4. When a vulnerability is identified, trigger automated remediations (where appropriate) for quick responsse rather than waiting days or weeks for manual work. This also reduces toil in manual effort. PROTIP: Start with automating just SGs, so that you’ll be prepared to do other automated remediations for responding quickly to vulnerabilities.

We will do it all:

  1. Define the logic to identify concerns in infrastructure evaluate policies, defined as code (PaC). This is done by Terraform Enterprise Cloud.

  2. One-click cross-account remediation. When vulnerabilities are identified in one account, immediately analyze other accounts for similar concerns. Easily deploy the solution across primary and member accounts.


aws-auto-remediation-sns

From https://github.com/cloudconformity/auto-remediate

  1. A user makes an S3 bucket publicly readable via S3 Access Control Lists (ACLs)
  2. Cloud Conformity identifies the risk in real-time
  3. Cloud Conformity publishes a message to the specified SNS Topic
  4. SNS topic triggers the Orchestrator lambda function which in turns calls S3 bucket auto-remediate function
  5. S3 BucketPublicReadAccess Auto Remediate Function (AutoRemediateS3-001) updates the S3 bucket ACL and closes the security gap

a

IaC Scan Tools

Several mechanisms automatically assess, audit, evaluate, and even remediate the configuration of AWS resource configurations (manually in GUI or by Terraform/CloudFormation Infrastructure as Code):

Can be performed on a Mac/Windows laptop:

As a service:

Policies

Within AWS:

  • IAM Policies
  • Cloud Formation Stack Policies
  • Application Autoscaling Policies
  • Backup Vault Access Policies
  • CloudFront Cache Policies
  • CloudFront Origin Request Policies
  • CloudWatch Resource Policies
  • CloudWatch Retention Policies
  • ECR Lifecycle Policies, Registry Policies, Repository Policies
  • EMR Managed Scaling Policies
  • KMS Key Policies


Automatic Detection by TFSec, etc.

TFSec for AWS

At time of writing, there were 70 rules at https://tfsec.dev/docs/aws/home. TODO: Compare against AWS rules.

  • A KMS key is not configured to auto-rotate. - aws-lambda-enable-tracing
  • AWS ES Domain should have logging enabled - aws-elastic-search-encrypt-replication-group
  • aws_instance should activate session tokens for Instance Metadata Service. - aws-ec2-no-secrets-in-user-data

  • aws-api-gateway-enable-access-logging - API Gateway stages for V1 and V2 should have access logging enabled
  • aws-api-gateway-enable-cache-encryption - API Gateway must have cache enabled
  • aws-api-gateway-enable-tracing - API Gateway must have X-Ray tracing enabled
  • aws-api-gateway-no-public-access - No public access to API Gateway methods
  • aws-api-gateway-use-secure-tls-policy - API Gateway domain name uses outdated SSL/TLS protocols.

  • aws-athena-enable-at-rest-encryption - Athena databases and workgroup configurations are created unencrypted at rest by default, they should be encrypted
  • aws-athena-no-encryption-override - Athena workgroups should enforce configuration to prevent client disabling encryption

  • aws-autoscaling-enable-at-rest-encryption - Launch configuration with unencrypted block device.
  • aws-autoscaling-no-public-ip - A resource has a public IP address.

  • aws-cloudfront-enable-logging - Cloudfront distribution should have Access Logging configured
  • aws-cloudfront-enable-waf - CloudFront distribution does not have a WAF in front.
  • aws-cloudfront-enforce-https - CloudFront distribution allows unencrypted (HTTP) communications.
  • aws-cloudfront-use-secure-tls-policy - CloudFront distribution uses outdated SSL/TLS protocols.

  • aws-cloudtrail-enable-all-regions - Cloudtrail should be enabled in all regions regardless of where your AWS resources are generally homed
  • aws-cloudtrail-enable-at-rest-encryption - Cloudtrail should be encrypted at rest to secure access to sensitive trail data
  • aws-cloudtrail-enable-log-validation - Cloudtrail log validation should be enabled to prevent tampering of log data

  • aws-cloudwatch-log-group-customer-key - CloudWatch log groups should be encrypted using CMK
  • CloudWatch log groups should be encrypted using CMK - aws-codebuild-enable-encryption

  • aws-ecs-enable-container-insight - ECS clusters should have container insights enabled
  • aws-ecs-enable-in-transit-encryption - ECS Task Definitions with EFS volumes should use in-transit encryption
  • aws-ecs-no-plaintext-secrets - Task definition defines sensitive environment variable(s).

  • aws-launch-no-sensitive-info - Ensure all data stored in the Launch configuration EBS is securely encrypted
  • CodeBuild Project artifacts encryption should not be disabled - aws-config-aggregate-all-regions
  • Config configuration aggregator should be using all regions for source - aws-documentdb-enable-log-export
  • DAX Cluster should always encrypt data at rest - aws-dynamodb-enable-recovery

  • DocumentDB encryption should use Customer Managed Keys - aws-dynamodb-enable-at-rest-encryption
  • DocumentDB logs export should be enabled - aws-documentdb-enable-storage-encryption
  • DocumentDB storage must be encrypted - aws-documentdb-encryption-customer-key

  • Domain logging should be enabled for Elastic Search domains - aws-elastic-search-enable-in-transit-encryption
  • DynamoDB tables should use at rest encryption with a Customer Managed Key - aws-ebs-enable-volume-encryption

  • EBS volume encryption should use Customer Managed Keys - aws-ec2-enforce-http-token-imds
  • EBS volumes must be encrypted - aws-ebs-encryption-customer-key

  • ECR images tags shouldn’t be mutable. - aws-ecr-no-public-access
  • ECR repository has image scans disabled. - aws-ecr-enforce-immutable-repository
  • ECR repository policy must block public access - aws-ecr-repository-customer-key
  • ECR Repository should use customer managed keys to allow more control - aws-ecs-enable-container-insight

  • EFS Encryption has not been enabled - aws-eks-enable-control-plane-logging

  • EKS cluster should not have open CIDR range for public access - aws-elastic-search-enable-domain-logging
  • EKS Clusters should have cluster control plane logging turned on - aws-eks-encrypt-secrets
  • EKS Clusters should have the public access disabled - aws-eks-no-public-cluster-access-to-cidr
  • EKS should have the encryption of secrets enabled - aws-eks-no-public-cluster-access

  • Elasticache Replication Group uses unencrypted traffic. - aws-elb-drop-invalid-headers

  • Elasticsearch doesn’t enforce HTTPS traffic. - aws-elastic-search-use-secure-tls-policy
  • Elasticsearch domain endpoint is using outdated TLS policy. - aws-elastic-service-enable-domain-encryption
  • Elasticsearch domain isn’t encrypted at rest. - aws-elasticache-add-description-for-security-group
  • Elasticsearch domain uses plaintext traffic for node to node communication. - aws-elastic-search-enable-logging

  • Ensure that lambda function permission has a source arn specified - aws-launch-no-sensitive-info

  • IAM customer managed policies should not allow decryption actions on all KMS keys - aws-iam-no-password-reuse
  • IAM Password policy should have expiry less than or equal to 90 days. - aws-iam-set-minimum-password-length
  • IAM Password policy should have minimum password length of 14 or more characters. - aws-kinesis-enable-in-transit-encryption
  • IAM Password policy should have requirement for at least one lowercase character. - aws-iam-require-numbers-in-passwords
  • IAM Password policy should have requirement for at least one number in the password. - aws-iam-require-symbols-in-passwords
  • IAM Password policy should have requirement for at least one symbol in the password. - aws-iam-require-uppercase-in-passwords
  • IAM Password policy should have requirement for at least one uppercase character. - aws-iam-set-max-password-age
  • IAM Password policy should prevent password reuse. - aws-iam-no-policy-wildcards
  • IAM policy should avoid use of wildcards and instead apply the principle of least privilege - aws-iam-require-lowercase-in-passwords

  • Kinesis stream is unencrypted. - aws-kms-auto-rotate-keys
  • Lambda functions should have X-Ray tracing enabled - aws-lambda-restrict-source-arn
  • Load balancer is exposed to the internet. - aws-elbv2-http-not-used
  • Load balancers should drop invalid headers - aws-elbv2-alb-not-public
  • Missing description for security group/security group rule. - aws-elasticache-enable-backup-retention
  • Point in time recovery should be enabled to protect DynamoDB table - aws-dynamodb-table-customer-key
  • Redis cluster should have backup retention turned on - aws-elasticache-enable-in-transit-encryption
  • Task definition defines sensitive environment variable(s). - aws-efs-enable-at-rest-encryption
  • Unencrypted Elasticache Replication Group. - aws-elastic-search-enforce-https
  • Use of plain HTTP - aws-iam-block-kms-policy-wildcard
  • User data for EC2 instances must not contain sensitive AWS keys - aws-ecr-enable-image-scans

Automate Remediations

Where appropriate, automated remediation of policy non-conformance reduces toil in manual effort. One example is removing AWS Security Groups that AWS automatically creates with EC2 and RDS, but still persists after those resources are deleted and no longer associated with any resource.


Delete SG using Terraform

This approach involves running Terraform after it provisions AWS.

This blog makes use of this main.tf Terraform file.

  1. Resource “aws_config_config_rule” checks if all security groups are attached.

  2. Resource “aws_iam_role” is used by the remediation action. To remediate the non-compliant security groups, the role needs to execute an SSM Automation document, and it needs to be able to describe and delete a security group. Here the least privilege principle is used.

  3. Resource “aws_iam_policy” defines Allow permissions for ssm (Service Manager) roles.

  4. Resource “aws_iam_policy_attachment” assumes the role.

  5. Resource “aws_config_remediation_configuration” defines the remediation action, which triggers the AWS-managed SSM automation document to delete the unused security group.

CAUTION: The above must be run manually in the console until Hashicorp implements this issue.

https://stackoverflow.com/questions/66868470/lambda-security-group-deletion-hanging-and-cant-be-deleted-in-aws-console

AWS System Manager via AWS Security Hub

https://aws.amazon.com/solutions/implementations/aws-security-hub-automated-response-and-remediation/ and https://docs.aws.amazon.com/solutions/latest/aws-security-hub-automated-response-and-remediation/welcome.html proposes this using https://github.com/aws-solutions/aws-security-hub-automated-response-and-remediation with https://docs.aws.amazon.com/solutions/latest/aws-security-hub-automated-response-and-remediation/aws-security-hub-automated-response-and-remediation.pdf

aws-security-hub-automated-response-architecture 11b409c38904e473e603f41e828405eafb30e68d

  1. Detect: This command identifies

  2. Ingest.

  3. Remediate.

    Removal of resources can be done by >AWS System Manager (aka SSM), the same technology also used to apply patches in AWS resources).

    AWS Security Hub Automated Response and Remediation:

    So, instead use CloudFormation

Call Lambda directly

Alternately, custom AWS Lambda functions can be used:

aws-config-auto-remediate-1776x1036

Call Lambda via EventBridge

A more in-depth explanation is this diagram from Andrew Cantrill’s excellent video certification prep. courses:

aws-config-cantrill-1449x754

The above diagram illustrates how Config rules can call on the AWS EventBridge SaaS streaming serverless event bus.


Setup OPA

BLOG: AWS Config custom rules are written in the Rego language processed by the general-purpose Open Policy Agent (OPA).

https://github.com/open-policy-agent/opa

Lambda can use an OPA (Open Policy Agent) Engine to evaluate rules stored in S3.

OPA is purpose-built for reasoning about information represented in structured documents.

OPA is about a 20MB binary that can be run at close to the software needing policy decisions as possible, often as a sidecar.

Open Policy Agent provides a unified policy language that can be enforced across the cloud-native stack (Kubernetes, Kafka, Spinnaker CI/CD, Terraform, Kong, etc.)

Rego is also used within the Kubernetes Admission Controller to validate logic in deployment descriptors before applying them to a cluster. One example of this is creating a policy that only allows deployments that reference containers from trusted repositories.

Conftest is the name of the subproject that runs OPA policies against files on a file system.

Gatekeeper is the subproject that integrates OPA with Kubernetes admission control.

OPA has 50+ built-in functions (strings, numbers, regexps, network CIDRs, JWTs, arrays, objects, sets, etc.).

https://blog.styra.com/blog/origin-of-open-policy-agent-rego

https://github.com/aws-samples/aws-management-and-governance-samples.git (from AWS) is a collection of code samples for the Management and Governance services which includes: CloudWatch, CloudFormation, Cloudtrail, Config, Systems Manager, and more.

contains cfn_templates (CloudFormation templates) to deploy Lambda function and AWS Config rules; lambda_sources source file for the Lambda function and the OPA binary that is a deployed as a layer for the Lambda function. Packaged sources are under the packaged_lambda_assets directory. opa_policies contains Rego policies that correspond to rules deployed by CloudFormation templates.

Rego language

Rego is described the native query language Rego for OPA processing nested documents.

Rego is declarative queries are assertions on data stored in OPA. So policy authors can focus on what queries should return rather than how queries should be executed. These queries are simpler and more concise than the equivalent in an imperative language.

The queries define policies that enumerate instances of data that violate the expected state of the system.

If all the statements in the body hold to be true, the return value is “ground” (i.e a constant).

STAR: Take the OPA Policy Authoring course by Tim Hinrichs (@tlhinrichs), CTO of Straya, OPA’s inventor.

  • https://medium.com/@mathurvarun98/how-to-write-great-rego-policies-dc6117679c9f
  • https://www.fugue.co/blog/5-tips-for-using-the-rego-language-for-open-policy-agent-opa

Resources

Security Group

More on Amazon

This is one of a series on Amazon: