AWS Automatic remediations (of Vulnerabilities in AWS)

Dynamically audit AWS resource configrations using using OPA Rego, Lambda alerts using SNS through EventBridge, then automatically remediate (via SSM)

Overview

IaC Scan Tools
Policies
Automatic Detection by TFSec, etc.
TFSec for AWS
Automate Remediations
Delete SG using Terraform
AWS System Manager via AWS Security Hub
Call Lambda directly
Call Lambda via EventBridge
Setup OPA
Rego language
Resources
Security Group
More on Amazon

Security threats are getting more and more clever and incidious. So a quicker and more thorough response to remediation is necessary in today’s world.

NOTE: Content here are my personal opinions, and not intended to represent any employer (past or present). “PROTIP:” here highlight information I haven’t seen elsewhere on the internet because it is hard-won, little-know but significant facts based on my personal research and experience.

There are mechanisms:

Scan IaC (Infrastructure as Code) (Terraform and CloudFormation) using TFSec and similar IaC scan tools, before they are used to create resources in AWS. This is a “shift left” approach while on laptops. This is needed because by the time that vulnerabilities are spotted in AWS Config and AWS Security Hub, the vulnerability already exists. This evaluation can be repeated as part of every CI/CD infra deploy pipeline run.
Scan for vulnerable conditions within AWS resources. This is done by AWS Config, which logs every change to AWS configurations in the GUI, CLI, or API.
When a vulnerability is identified, trigger an alert for manual follow-up. This is what AWS Security Hub does.
When a vulnerability is identified, trigger automated remediations (where appropriate) for quick responsse rather than waiting days or weeks for manual work. This also reduces toil in manual effort. PROTIP: Start with automating just SGs, so that you’ll be prepared to do other automated remediations for responding quickly to vulnerabilities.

We will do it all:

Define the logic to identify concerns in infrastructure evaluate policies, defined as code (PaC). This is done by Terraform Enterprise Cloud.
One-click cross-account remediation. When vulnerabilities are identified in one account, immediately analyze other accounts for similar concerns. Easily deploy the solution across primary and member accounts.

aws-auto-remediation-sns

From https://github.com/cloudconformity/auto-remediate

A user makes an S3 bucket publicly readable via S3 Access Control Lists (ACLs)
Cloud Conformity identifies the risk in real-time
Cloud Conformity publishes a message to the specified SNS Topic
SNS topic triggers the Orchestrator lambda function which in turns calls S3 bucket auto-remediate function
S3 BucketPublicReadAccess Auto Remediate Function (AutoRemediateS3-001) updates the S3 bucket ACL and closes the security gap

IaC Scan Tools

Several mechanisms automatically assess, audit, evaluate, and even remediate the configuration of AWS resource configurations (manually in GUI or by Terraform/CloudFormation Infrastructure as Code):

Can be performed on a Mac/Windows laptop:

TFSec.dev (by Aqua Security) for shifting security left onto developer laptops.
Terraform open source (with Atlantis Policy-as-Code) on laptop/server

As a service:

Terraform Enterprise Cloud SaaS with Sentinel Policy-as-code (from Hashicorp)
AWS Security Hub SaaS uses only pre-defined AWS-managed static rules
AWS Config SaaS using Lambda to invoke an OPA engine processing custom static rules written in Rego

Policies

Within AWS:

IAM Policies
Cloud Formation Stack Policies
Application Autoscaling Policies
Backup Vault Access Policies
CloudFront Cache Policies
CloudFront Origin Request Policies
CloudWatch Resource Policies
CloudWatch Retention Policies
ECR Lifecycle Policies, Registry Policies, Repository Policies
EMR Managed Scaling Policies
KMS Key Policies

Automatic Detection by TFSec, etc.

TFSec for AWS

At time of writing, there were 70 rules at https://tfsec.dev/docs/aws/home. TODO: Compare against AWS rules.

A KMS key is not configured to auto-rotate. - aws-lambda-enable-tracing
AWS ES Domain should have logging enabled - aws-elastic-search-encrypt-replication-group
aws_instance should activate session tokens for Instance Metadata Service. - aws-ec2-no-secrets-in-user-data
aws-api-gateway-enable-access-logging - API Gateway stages for V1 and V2 should have access logging enabled
aws-api-gateway-enable-cache-encryption - API Gateway must have cache enabled
aws-api-gateway-enable-tracing - API Gateway must have X-Ray tracing enabled
aws-api-gateway-no-public-access - No public access to API Gateway methods
aws-api-gateway-use-secure-tls-policy - API Gateway domain name uses outdated SSL/TLS protocols.
aws-athena-enable-at-rest-encryption - Athena databases and workgroup configurations are created unencrypted at rest by default, they should be encrypted
aws-athena-no-encryption-override - Athena workgroups should enforce configuration to prevent client disabling encryption
aws-autoscaling-enable-at-rest-encryption - Launch configuration with unencrypted block device.
aws-autoscaling-no-public-ip - A resource has a public IP address.
aws-cloudfront-enable-logging - Cloudfront distribution should have Access Logging configured
aws-cloudfront-enable-waf - CloudFront distribution does not have a WAF in front.
aws-cloudfront-enforce-https - CloudFront distribution allows unencrypted (HTTP) communications.
aws-cloudfront-use-secure-tls-policy - CloudFront distribution uses outdated SSL/TLS protocols.
aws-cloudtrail-enable-all-regions - Cloudtrail should be enabled in all regions regardless of where your AWS resources are generally homed
aws-cloudtrail-enable-at-rest-encryption - Cloudtrail should be encrypted at rest to secure access to sensitive trail data
aws-cloudtrail-enable-log-validation - Cloudtrail log validation should be enabled to prevent tampering of log data
aws-cloudwatch-log-group-customer-key - CloudWatch log groups should be encrypted using CMK
CloudWatch log groups should be encrypted using CMK - aws-codebuild-enable-encryption
aws-ecs-enable-container-insight - ECS clusters should have container insights enabled
aws-ecs-enable-in-transit-encryption - ECS Task Definitions with EFS volumes should use in-transit encryption
aws-ecs-no-plaintext-secrets - Task definition defines sensitive environment variable(s).
aws-launch-no-sensitive-info - Ensure all data stored in the Launch configuration EBS is securely encrypted
CodeBuild Project artifacts encryption should not be disabled - aws-config-aggregate-all-regions
Config configuration aggregator should be using all regions for source - aws-documentdb-enable-log-export
DAX Cluster should always encrypt data at rest - aws-dynamodb-enable-recovery
DocumentDB encryption should use Customer Managed Keys - aws-dynamodb-enable-at-rest-encryption
DocumentDB logs export should be enabled - aws-documentdb-enable-storage-encryption
DocumentDB storage must be encrypted - aws-documentdb-encryption-customer-key
Domain logging should be enabled for Elastic Search domains - aws-elastic-search-enable-in-transit-encryption
DynamoDB tables should use at rest encryption with a Customer Managed Key - aws-ebs-enable-volume-encryption
EBS volume encryption should use Customer Managed Keys - aws-ec2-enforce-http-token-imds
EBS volumes must be encrypted - aws-ebs-encryption-customer-key
ECR images tags shouldn’t be mutable. - aws-ecr-no-public-access
ECR repository has image scans disabled. - aws-ecr-enforce-immutable-repository
ECR repository policy must block public access - aws-ecr-repository-customer-key
ECR Repository should use customer managed keys to allow more control - aws-ecs-enable-container-insight
EFS Encryption has not been enabled - aws-eks-enable-control-plane-logging
EKS cluster should not have open CIDR range for public access - aws-elastic-search-enable-domain-logging
EKS Clusters should have cluster control plane logging turned on - aws-eks-encrypt-secrets
EKS Clusters should have the public access disabled - aws-eks-no-public-cluster-access-to-cidr
EKS should have the encryption of secrets enabled - aws-eks-no-public-cluster-access
Elasticache Replication Group uses unencrypted traffic. - aws-elb-drop-invalid-headers
Elasticsearch doesn’t enforce HTTPS traffic. - aws-elastic-search-use-secure-tls-policy
Elasticsearch domain endpoint is using outdated TLS policy. - aws-elastic-service-enable-domain-encryption
Elasticsearch domain isn’t encrypted at rest. - aws-elasticache-add-description-for-security-group
Elasticsearch domain uses plaintext traffic for node to node communication. - aws-elastic-search-enable-logging
Ensure that lambda function permission has a source arn specified - aws-launch-no-sensitive-info
IAM customer managed policies should not allow decryption actions on all KMS keys - aws-iam-no-password-reuse
IAM Password policy should have expiry less than or equal to 90 days. - aws-iam-set-minimum-password-length
IAM Password policy should have minimum password length of 14 or more characters. - aws-kinesis-enable-in-transit-encryption
IAM Password policy should have requirement for at least one lowercase character. - aws-iam-require-numbers-in-passwords
IAM Password policy should have requirement for at least one number in the password. - aws-iam-require-symbols-in-passwords
IAM Password policy should have requirement for at least one symbol in the password. - aws-iam-require-uppercase-in-passwords
IAM Password policy should have requirement for at least one uppercase character. - aws-iam-set-max-password-age
IAM Password policy should prevent password reuse. - aws-iam-no-policy-wildcards
IAM policy should avoid use of wildcards and instead apply the principle of least privilege - aws-iam-require-lowercase-in-passwords
Kinesis stream is unencrypted. - aws-kms-auto-rotate-keys
Lambda functions should have X-Ray tracing enabled - aws-lambda-restrict-source-arn
Load balancer is exposed to the internet. - aws-elbv2-http-not-used
Load balancers should drop invalid headers - aws-elbv2-alb-not-public
Missing description for security group/security group rule. - aws-elasticache-enable-backup-retention
Point in time recovery should be enabled to protect DynamoDB table - aws-dynamodb-table-customer-key
Redis cluster should have backup retention turned on - aws-elasticache-enable-in-transit-encryption
Task definition defines sensitive environment variable(s). - aws-efs-enable-at-rest-encryption
Unencrypted Elasticache Replication Group. - aws-elastic-search-enforce-https
Use of plain HTTP - aws-iam-block-kms-policy-wildcard
User data for EC2 instances must not contain sensitive AWS keys - aws-ecr-enable-image-scans

Automate Remediations

BLOG: https://awesomeopensource.com/project/servian/aws-auto-remediate

Where appropriate, automated remediation of policy non-conformance reduces toil in manual effort. One example is removing AWS Security Groups that AWS automatically creates with EC2 and RDS, but still persists after those resources are deleted and no longer associated with any resource.

Delete SG using Terraform
AWS System Manager via AWS Security Hub
Call Lambda directly
Call Lambda via EventBridge

Delete SG using Terraform

This approach involves running Terraform after it provisions AWS.

This blog makes use of this main.tf Terraform file.

Resource “aws_config_config_rule” checks if all security groups are attached.
Resource “aws_iam_role” is used by the remediation action. To remediate the non-compliant security groups, the role needs to execute an SSM Automation document, and it needs to be able to describe and delete a security group. Here the least privilege principle is used.
Resource “aws_iam_policy” defines Allow permissions for ssm (Service Manager) roles.
Resource “aws_iam_policy_attachment” assumes the role.
Resource “aws_config_remediation_configuration” defines the remediation action, which triggers the AWS-managed SSM automation document to delete the unused security group.

CAUTION: The above must be run manually in the console until Hashicorp implements this issue.

https://stackoverflow.com/questions/66868470/lambda-security-group-deletion-hanging-and-cant-be-deleted-in-aws-console

AWS System Manager via AWS Security Hub

https://aws.amazon.com/solutions/implementations/aws-security-hub-automated-response-and-remediation/ and https://docs.aws.amazon.com/solutions/latest/aws-security-hub-automated-response-and-remediation/welcome.html proposes this using https://github.com/aws-solutions/aws-security-hub-automated-response-and-remediation with https://docs.aws.amazon.com/solutions/latest/aws-security-hub-automated-response-and-remediation/aws-security-hub-automated-response-and-remediation.pdf

Detect: This command identifies
Ingest.
Remediate.

Removal of resources can be done by >AWS System Manager (aka SSM), the same technology also used to apply patches in AWS resources).

AWS Security Hub Automated Response and Remediation:

So, instead use CloudFormation

Call Lambda directly

Alternately, custom AWS Lambda functions can be used:

Call Lambda via EventBridge

A more in-depth explanation is this diagram from Andrew Cantrill’s excellent video certification prep. courses:

The above diagram illustrates how Config rules can call on the AWS EventBridge SaaS streaming serverless event bus.

Setup OPA

BLOG: AWS Config custom rules are written in the Rego language processed by the general-purpose Open Policy Agent (OPA).

https://github.com/open-policy-agent/opa

Lambda can use an OPA (Open Policy Agent) Engine to evaluate rules stored in S3.

OPA is purpose-built for reasoning about information represented in structured documents.

OPA is about a 20MB binary that can be run at close to the software needing policy decisions as possible, often as a sidecar.

Open Policy Agent provides a unified policy language that can be enforced across the cloud-native stack (Kubernetes, Kafka, Spinnaker CI/CD, Terraform, Kong, etc.)

Rego is also used within the Kubernetes Admission Controller to validate logic in deployment descriptors before applying them to a cluster. One example of this is creating a policy that only allows deployments that reference containers from trusted repositories.

Conftest is the name of the subproject that runs OPA policies against files on a file system.

Gatekeeper is the subproject that integrates OPA with Kubernetes admission control.

OPA has 50+ built-in functions (strings, numbers, regexps, network CIDRs, JWTs, arrays, objects, sets, etc.).

https://blog.styra.com/blog/origin-of-open-policy-agent-rego

https://github.com/aws-samples/aws-management-and-governance-samples.git (from AWS) is a collection of code samples for the Management and Governance services which includes: CloudWatch, CloudFormation, Cloudtrail, Config, Systems Manager, and more.

contains cfn_templates (CloudFormation templates) to deploy Lambda function and AWS Config rules; lambda_sources source file for the Lambda function and the OPA binary that is a deployed as a layer for the Lambda function. Packaged sources are under the packaged_lambda_assets directory. opa_policies contains Rego policies that correspond to rules deployed by CloudFormation templates.

Rego language

Rego is described the native query language Rego for OPA processing nested documents.

Rego is declarative queries are assertions on data stored in OPA. So policy authors can focus on what queries should return rather than how queries should be executed. These queries are simpler and more concise than the equivalent in an imperative language.

The queries define policies that enumerate instances of data that violate the expected state of the system.

If all the statements in the body hold to be true, the return value is “ground” (i.e a constant).

STAR: Take the OPA Policy Authoring course by Tim Hinrichs (@tlhinrichs), CTO of Straya, OPA’s inventor.

https://medium.com/@mathurvarun98/how-to-write-great-rego-policies-dc6117679c9f
https://www.fugue.co/blog/5-tips-for-using-the-rego-language-for-open-policy-agent-opa

Resources

Security Group

More on Amazon

This is one of a series on Amazon:

Wilson Mar