How our Security Operations Center (SOC) detects and responds to security threats using software for SIEM (Security Information and Event Management) and SOAR (Security Orchestration, Automation, and Response).
Overview
This article is phrased as descriptions for presentation to auditors..
NOTE: Content here are my personal opinions, and not intended to represent any employer (past or present). “PROTIP:” here highlight information I haven’t seen elsewhere on the internet because it is hard-won, little-know but significant facts based on my personal research and experience.
Our Cyber Security Operations Center (CSOC) provides a central (enterprise-wide) capability to assess events from various activities impacting the security posture of our organization.
“Security is always excessive until it’s not enough.” –-Robbie Sinclair
The Chief Information Security Officer (CISO) or Chief Security Officer (CSO) define Risk (Acceptance) Management strategy/policy in relation to a budget for the SOC department. SOC teams typically have a corporate-wide 24/7/365 scope of responsibility.
This diagram (available) summarizes the text to follow:
A “Hunt Team” performs “Threat Intelligence” by hunting for possible threats. These professionally paranoid pessimists perform Threat Modeling and monitor Threat Feeds from the cybersecurity industry about known and emerging threats such as CVEs (Common Vulnerability Enumerations), malware, phishing attacks, and common coding errors called CWEs (Common Weakness Enumerations). The team (and its AI programs) identify Indicators of Attack (IOA) resulting in Indicators of Compromise (IoC). Specialists such as in OT (Operational Technologies) within MSPPs such as Dragos enable organizations to reach High Maturity quickly with less mistakes.
Their work includes configuration of Intrusion Prevention Systems (IPS) (such as VIDEO: SecurityOnion) and Intrusion Detection Systems (IDS) that provide real-time analysis of traffic generated by applications and network systems, looking for patterns and specific keywords within content.
The team (and its AI programs) then defines logic rules used to automatically detect threats within the organization’s systems. Such processing requires a collective memory about incidents that have occurred before – in a “Security Data Warehouse” so that investigative queries are based on a complete set of security-related data about the activities of users, cloud operations, network devices (firewalls), servers, domain controllers, IoT devices, etc. Collected from each component are logs of events, metrics about CPU, memory, network capacity usage, responsiveness, etc.). Sometimes traces of internal processing are stored for root-cause troubleshooting.
The “Risk Management” team estimates the consequences of technical risks along with other risks to the organization, such as financial, legal, and reputation. Thus, they also incorporate in their analysis potential events in society at large based on sentiment analysis of social media and profiles of terrorists and other malicious actors. Their total view is what should be the basis for prioritizing the SOC team’s work.
The common objective: recognize potential security threats and vulnerabilities before they have a chance to disrupt business operations.
The system to document and automate response to incidents is called SOAR (Security Orchestration, Automation, and Response). Remediations include immediately “locking down” portions of the system to prevent further damage by changing permissions to limit access by specific users and system components. SOAR tools can automate incident response actions using automation such as Bash Shell scripts, ChatGPT prompts, Python, SQL, and other programming code.
The system that collects and analyzes security data is called a Security Information and Event Management (SIEM) system. SIEM is an industry-wide acronyms. According to Gartner:
Security information and event management (SIEM) technology supports threat detection, compliance, and security incident management through the collection and analysis (both near real-time and historical) of security events, as well as a wide variety of other event and contextual data sources. The core capabilities are a broad scope of log event collection and management, the ability to analyze log events and other data across disparate sources, and operational capabilities (such as incident management, dashboards, and reporting).
SIEM and other security systems are set up and operated by our “platform team” of DevSecOps technicians.
The team installs on each endpoint (server, desktop, laptop, mobile device) EDR (Endpoint Detection Response) agents that collect and analyze data about the endpoint’s activities before sending alerts to the SIEM system. The platform team also installs on each endpoint EPP (Endpoint Protection Platform) agents that prevent malware from being installed on the endpoint.
The system maintains a “Correlation Database” used by a Real-Time Correlation Engine which contains Alert Rules used to trigger alerts.
The whole SOC department works proactively with technical teams to ensure that all automated CI/CD workflows scan each type of coding for known vulnerabilities.
The SOC departments also work with other departments to adjust their security controls and policies (especially incident management and recovery) based on lessons learned from actual incidents and “Game Days” that simulate incidents instigated by a “Red Team” (internal but production penetration testers) or “Purple Team” who operate on pre-planned targets across environments.
“Level 1” Security Analysts prioritize work based on the severity (scope of the potential “blast radius”) of the threat and the value of the asset at risk. Increasingly, such work is handled by AI.
- “User Entity Behavior Analytics (UEBA)” (using Machine Learning) highlights significant changes in behavior to better understand event context and recognize intent within specific scenarios. It models the behavior of both humans and also the machines within network, offering advanced threat detection.
“Level 2” Security Analysts investigate and, if appropriate, define incidents to alert others about further analysis or remediation.
Senior SOC professionals (“incident managers”) lead the lifecycle of incident management and recovery efforts within other departments. They escalate reports to management about incidents internally. They maintain the system that escalates alerts when response is not timely.
The GRC (Governance, Risk, and Compliance) team works with the SOC team, management (including Public Relations and Legal), auditors, and external law enforcement to ensure that the organization’s security policies and procedures are up to date and followed, per the organization’s policies to meet data compliance standards including SOC2, ISO 27001, PCI DSS, GDPR, HIPAA, HITRUST, SOX, etc.
NOTE: In smaller companies, some roles may be performed by the same person.
Metrics of effectiveness
Our approach to collecting and displaying metrics on the speed and effectiveness of our SOC is based on the SANS Institute (SysAdmin, Audit, Network, Security) “Defining Metrics for the Security Operations Center” by Christopher Crowley and Mark Orlando.
PROTIP: There is a cost to collect each metric. So the number of metrics should be limited to those that are most important to the organization and SMART (Specific, Measurable, Attainable, Relevant, and Timely).
VIDEO “For every one person working on metrics, they will be able to get 3-4 metrics out.”
EXAMPLES: The SOC generates online displays and reports to report on the volume and responsiveness of work, based on these categories:
-
Collect data from various sources (network devices, servers, domain controllers, etc.)
-
Normalize and aggregate collected data so it can be analyzed
- Volume of logs, metrics, and traces collected for each part of the organization, per day/week/month/quarter/year over time (to project future storage capacity and costs)
- Volume of logs, metrics, and traces collected for each part of the organization, per day/week/month/quarter/year over time (to project future storage capacity and costs)
-
Identify and categorize incidents (as sessions) to detect threats
SIEM sorts this data into categories, for example: malware activity and failed and successful logins. When SIEM identifies a threat through network security monitoring, it generates an alert and defines a threat level based on predetermined rules. For example, someone trying to log into an account 10 times in 10 minutes is ok, while 100 times in 10 minutes might be flagged as an attempted attack.
The challenge level of anomaly detection goes from trivial hash values to known bad IP addresses to Domain names to Network/Host artifacts to Tools to analyzing TTPs .
Relevance to 18 CIS Controls:
- Inventory and Control of Hardware Assets
- Inventory and Control of Software Assets
- Data Protection
- Secure Configuration of Enterprise Assets and Software
- Account Management
- Access Control Management
- Continuous Vulnerability Management
- Audit Log Management
- Email and Web Browser Protections
- Malware Defenses
- Data Recovery
- Network Infrastructure Management
- Network Monitoring and Defense
- Security Awareness and Skills Training
- Service Provider Management
- Application Software Security
- Incident Response Management
- Penetration Tests (and Red Team Exercises)
-
Pinpoint security breaches and enable organizations to investigate generated alerts (forensics analysis)
SIEM software matches events against rules in analytics engines which indexes logs to enable searches and event correlation.
- Accuracy of blacklists blocking DNS, IP address, ports, strings, etc.
- The percentage of false positives (versus true positives) is a challenge to efficiency. The objective is not 100% accuracy, but to reduce the number of false positives to a manageable level.
- Accuracy of PAP (Permissible Actions Protocol) classification to avoid being detected by attackers
- Accuracy of TLP (Traffic Light Protocol) classification about limits to sharing of information about incidents
-
Respond to incidents automatically and manually (using SOAR software to apply automation and structure manual procedures defined in “playbooks”).
- Mean Time To Detect (MTTD) IoA occuring and IoC
-
Mean Time To Repair (MTTR)
-
Threat count and percentage by stage in Mitre’s ATT&CK framework of TTPs (Tactics + Techniques + Procedures of attack) in the lifecycle of attacks to provide context to SOC analysts:
- Mean Time To Internal Report (MTTIR) – time from incident occurrence to report to internal management, as defined by corporate policies and procedures.
-
Mean Time To External Report (MTTER) – time from incident occurrence to report to law enforcement (CISA) within one hour, as required by state and federal laws. Incidents shared with CISA (US Cybersecurity Infrastructure Security Agency) per NIST 800-61 Rev 2 that presents VIDEO: CISA Incident Response Playbooks. CISA works with the US Cyber Coordination Group within the US Department of Homeland Security (DHS) coordinates with the FBI, NSA, and other agencies/departments.
- Mean Time to Fix (MTTF) - Time from request to update in production, for all levels of the tech stack, because that’s is needed to stay ahead of attackers. Types of fix or update include changing the version of TLS (Transport Layer Security) and other encryption algorithms, changing the version of software libraries, and changing the version of operating systems and other software components. VIDEO.
The above addresses Challenges to OT Vulnerability Management.
Dashboards
The SOC team provides dashboards to management and other departments to provide visibility into the security posture of the organization.
A sample dashboard (based on the NIST Cybersecurity Framework) [4]:
PROTIP: It’s not enough to show counts. Show trends over time, compared against activity levels (such as the number of programs, etc.) for extrapolating future levels.
Threat Intelligence Feeds
https://www.youtube.com/watch?v=Lu-5E-AhDwU Elite SOC Teams Rely on These 4 Steps for Defensive Success | Pt.1 SOC Success SANS Institute
References
[1] https://www.gartner.com/en/information-technology/glossary/security-information-and-event-management-siem
[2] https://www.ibm.com/topics/siem
[3] https://www.wikiwand.com/en/Security_information_and_event_management
[4] https://www.youtube.com/watch?v=6PRmCvRCKTQ - Fundamentals: 11 Strategies of a World-Class SOC | SANS Blueprint Podcast Season 4 Intro (in the SANS Cyber Defense channel) |
[5] VIDEO: PDF: EPRI: “Creating Security Metrics for the Electric Sector” by Jason Christopher</a>
https://github.com/praetorian-inc/purple-team-attack-automation using Metasploit to attack based on Mitre ATT&CK TTPs.