Wilson Mar bio photo

Wilson Mar


Calendar YouTube Github


Indexing and visualization of logs, metrics, and other data - with AI

US (English)   Norsk (Norwegian)   Español (Spanish)   Français (French)   Deutsch (German)   Italiano   Português   Estonian   اَلْعَرَبِيَّةُ (Egypt Arabic)   Napali   中文 (简体) Chinese (Simplified)   日本語 Japanese   한국어 Korean


Cisco Acquisition

On Sep 21, 2023, Cisco announced it will acquire Splunk for $28 billion in cash and stock. The deal is expected to close in the second half of 2024.

Splunk’s host names

  • https://splunk.com is Splunk’s marketing page

  • https://splunk.com/blogs

  • https://splk.it/ is Splunk’s URL shortener
    • https://splk.it/SplunkCloudServDesc
  • https://docs.splunk.com/Documentation

  • https://bots.splunk.com for hands-on experiences using Splunk security products.

  • https://splunkbase.splunk.com for downloading files

  • https://github.com/splunk contains open source repos

  • https://github.com/StudyClubForSplunk/ is referenced in
  • https://splunk.studyclub.community/ by volunteer helpers. Its leaders wear fez hats at Splunk conferences

  • Cloud Monitoring Console (CMC) is used by administrators to view Splunk system usage and health.

https://testsafebrowsing.appspot.com/ provides simulated attacks on browsers.


NOTE: Content here are my personal opinions, and not intended to represent any employer (past or present). “PROTIP:” here highlight information I haven’t seen elsewhere on the internet because it is hard-won, little-know but significant facts based on my personal research and experience.

The company Splunk

Splunk was founded in 2003 by Michael Baum, Rob Das, and Erik Swan.

The company mascot is called “Buttercup”.

People who work in the company Splunk are called “Splunkers”. https://www.wikiwand.com/en/Splunk notes that according to Glassdoor, it was the fourth highest-paying company for employees in the United States in April 2017.

Splunk, Inc. is headquartered at 270 Brannan St, San Francisco, California 94107. +1 415.848-8400.

It IPO’d in 2012 as ticker SPLK.

In 2020, Splunk was named to the Fortune 1000 list.

As of September 2020, Splunk’s client list includes 92 companies on the Fortune 100 list.[34]

Splunk’s Value-Added


PROTIP: Although each cloud vendor has services that also do what Splunk does, many choose Splunk to avoid cloud vendor lock-in while going multi-cloud.

Splunk is like “Google” for machine-generated data, especially logs from servers, applications, and networks.

Splunk is now firmly entrenched in many data centers because Splunk works on almost all technologies to handle high volume, high variety data generated at high velocity.

Splunk is a software utility for machine log data collection, indexing, and visualization for “operational intelligence”. Splunk can ingest almost all technologies (on-prem, clouds, databases, etc.) for use by SOC (Security Operations Centers) who correlate what’s going on across the vast landscape of technologies.

  • Collect and Index Log Data: Index streaming log data from all your distributed systems regardless of format or location.

  • Visualize Trends

  • Zoom in and out on timelines to automatically reveal trends, spikes and patterns and click to drill down into search results.

  • Issue alerts (based on AIOps)

On June 11, 2018, Splunk announced its acquisition of VictorOps, a DevOps incident management startup, for US$120 million. The VictorOps product is renamed to “Splunk Online”.

In October 2019, Splunk announced the integration of its security tools - including security information and event management (SIEM), user behavior analytics (UBA), and security orchestration, automation, and response (Splunk Phantom) — into the new Splunk Mission Control.

In 2019, Splunk introduced an application performance monitoring (APM) platform, SignalFx Microservices APM, that pairs “no-sample” monitoring and analysis features with Omnition’s full-fidelity tracing capabilities. Splunk also announced that a capability called Kubernetes Navigator would be available through their product, SignalFx Infrastructure Monitoring.

Also in 2019, Splunk announced new Data Fabric Search and Data Stream Processor. Data Fabric Search that combines into a single view datasets across different data stores, including those that are not Splunk-based. The required data structure is only created when a query is run. The real-time Data Stream Processor collects data from various sources and then distributes results to Splunk or other destinations. It allows role-based access to create alerts and reports based on data that is relevant for each individual.[61] In 2020, it was updated to allow it to access, process, and route real-time data from multiple cloud services.[62]

Also in 2019, Splunk rolled out Splunk Connected Experiences, which extends its data processing and analytics capabilities to augmented reality (AR), mobile devices, and mobile applications.

In 2020, Splunk announced Splunk Enterprise 8.1 and the Splunk Cloud edition. They include stream processing, machine learning, and multi-cloud capabilities.

https://www.glassdoor.com/Reviews/Splunk-Reviews-E117313.htm at time of writing, 77% of employee responders would recommend to a friend, which is “Good” (based on 1,000+ reviews).

Product Names

“Splunk can be used for ALL your data needs.” It is a data platform. It is a data lake. It is a data warehouse. It is a data lakehouse. ;)

Splunk offers proprietary products:

  1. Splunk Application Performance Monitoring (APM)
  2. Splunk Cloud Platform
  3. Splunk Connected Experiences (Mobile, AR, VR, TV)
  4. Splunk Data Stream Processor
  5. Splunk Edge Hub
  6. Splunk Enterprise
  7. Splunk Enterprise Security
  8. Splunk Incident Intelligence
  9. Splunk Infrastructure Monitoring
  10. Splunk IT Service Intelligence
  11. Splunk Machine Learning Toolkit
  12. Splunk Mission Control
  13. Splunk Observability Cloud
  14. Splunk On-Call (formerly VictorOps)
  15. Splunk Real User Monitoring (RUM)
  16. Splunk Security, Orchestration, Automation and Response (SOAR)
  17. Splunk Synthetic Monitoring
  18. Splunk Threat Intelligence Management
  19. Splunk User Behavior Analytics
  20. Splunk Web Optimization
  21. OpenTelemetry

Use cases

On the marketing website:

The full list also has:

  • SLI/SLO Monitoring - centralize performance tracking visualizations and smart alerting to better manage cloud KPIs across environments.

Splunk SOAR

https://splunk.com/soar for free trial (100 trans/day)

Splunk “Mission Control” presents Analytics and Case Management to unify SIEM and SOAR.

Verizon DBIR 2022 found # of incidents going down but breaches going up.

d3fend.mitre.org (MITRE D3FEND) is the response to MITRE ATT&CK. detonate files, quarantine hosts, disable users, revoke tokens. – Peter Kaloroumakis, D3FEND Lead. Techniques to Playbooks (step-by-step) Splunk Enrichment Response Packs of an Analytics Story for Response.


https://research.splunk.com/playbook explorer

Transparent Huge Pages (THP)


TAs (Technical Apps) extend the functionality of Splunk.

C# and Python make calls to Splunk’s APIs.

Splunk’s UCC (Universal Configuration Console)

pip install splunk-packaging-toolkit ucc-gen –help

SOAR (Security Orchestration and Automation)

Splunk SOAR (Security Orchestration and Automation) is a cloud-based platform that automates security tasks, orchestrates workflows, and reduces incident response time.

It reduces “alert fatigue” by automating the triage and remediation of security alerts, and automating the response to security incidents.

The hype:

  • Investigate and respond to threats faster
  • Increase SOC efficiency and productivity
  • Eliminate analyst grunt work so you can stop working hard and start working smarter
  • Go from overwhelmed to in-control of your security operations

Splunk SOAR’s Main Dashboard provides a “single pane of glass”, with an overview of all analyst data and activity, notable events, playbooks, connections with other security tools, workloads, ROI, and more.

Splunk offers a free Community Edition of SOAR (to automate tasks, orchestrate workflows, and reduce incident response time). Fill out a form for their approval before allowing you to download:


    NOTE: "Phantom" was the previous name for the SOAR (cloud) product.

NOTE: Splunk SOAR (Cloud) does not allow access from the Splunk Connected Experiences mobile apps.

Splunk SOAR (Cloud) supports SAML2 authentication.

SOAR marketing page references:

  • https://www.splunk.com/en_us/products/splunk-security-orchestration-and-automation.html
  • https://www.splunk.com/en_us/software/splunk-security-orchestration-and-automation.html
  • https://www.splunk.com/en_us/data-insider/what-is-soar.html
  • https://www.splunk.com/en_us/blog/security/soaring-to-the-clouds-with-splunk-soar.html
  • https://docs.splunk.com/Documentation/SOAR/current/ServiceDescription/SplunkSOARService

SOAR Playbooks

  • VIDEO: https://www.splunk.com/en_us/software/splunk-security-orchestration-and-automation/features.html

By defining and automating the sequence of actions, SOAR playbooks enable swifter responses to triggers, thus reducing incident response time.

It pulls in data from SIEM, EDR, firewall, and threat intelligence feeds.

Post-incident, SOAR playbooks automate remediation of the attack and case management.

SOAR automates tasks, orchestrate workflows:

  • Splunk SOAR comes with 100+ pre-made playbooks out of the box.

  • Users can build and edit playbooks in the original horizontal visual playbook editor or the vertical visual playbook editor introduced August 2021.

  • Splunk SOAR (Cloud) is provisioned with 600GB of disk space and 600GB of PostgreSQL database storage.

SOAR Enrichments

SOAR uses Splunk Intelligence Management (formerly TruSTAR) normalized indicator enrichment, captured within the notes of a container. It enables an analyst to view details and specify subsequent actions directly within a single Splunk SOAR prompt for rapid manual response.

The “Suspicious Email Domain Enrichment” playbook uses Cisco Umbrella Investigate to add to the security event in Spunk SOAR a risk score, risk status, and domain category. This enables faster recognition of the purpose of the email, and the domain enrichment will also provide a connection point to take further action on the output.

SOC Implementation Phases

with CMM (Capability Maturity Model)

  1. Define scope at CCM Level 1 (Ad Hoc State) - Processes are unpredictable and inconsistent.
  2. Implement Technologies
  3. Hire and Build Team
  4. Develop Policies, Processes, Procedures to reach CCM Level 2 (Repeatable State)
  5. Reach CCM Level 3 (Initial - Managed - Defined State)
  6. Develop KPI (Quantitative) and KRI (Qualitative) Metrics
  7. Automate to reach CCM Level 4 (Optimizing State)

Different Proprietary Editions

Not available in free versions:

  • Authentication
  • Distributed search. Scheduled searches and alerting
  • Forwarding in TCP/HTTP (to non-Splunk)
  • Deployment management

PROTIP: A free Splunk Enterprise license allows indexing of up to 500MB per day for 60 days. After that, convert to a perpetual Free license or purchase an Enterprise license.

The master pool quota aggregates a license pools for each index/source type, with its own sub-quota.

./splunk is “short-hand” for the splunk executable in $SPLUNK_HOME/bin/splunk

  • On Unix, this is by default /opt/splunk/bin/splunk
  • On Windows it is c:\program files\splunk\bin\splunk
  • On MacOS it is /Applications/splunk/bin/splunk

Use Splunk SaaS cloud using just a browser

There is a different set of installation and set-up instructions depending on the edition.

  • 14-day Splunk Cloud Platform Trial:


  • 60-day @ 500MB/day Splunk Enterprise Trial on a server:


    Support for MacOS was deprecated since Slunk Enterprise versions 10.14 or 10.15.

  • 6 months @ 50 GB/day Dev/Test for customers with an enterprise support license.

Install Splunk locally

  1. Go to the Download page:


  2. Fill out the form.
  3. Open “Confirm your email address” email and click the link.
  4. At https://www.splunk.com/en_us/download/splunk-cloud/cloud-trial.html
  5. Wait for email “Welcome to Splunk Cloud Platform!” providing
    • Splunk Cloud Platform URL: https://prd-c-xxxx.splunkcloud.com
    • User Name: sc_admin
    • Temporary Password: xxxxxxxx
  6. Login to Splunk Cloud Platform at https://prd-c-xxxx.splunkcloud.com
  7. Copy and paste the temporary password into the password field.
  8. Create a new password and save it in a secure location.
  9. Accept terms.
  10. Read what’s new
  11. Apps menu:

  12. In the menu, notice the “prd-…” in the URL to the Search Manual, Pivot Manual, Dashboard & Visualizations Manual
  13. “Take the free Splunk Fundamentals course” at https://www.splunk.com/en_us/training/free-courses.html A. What is Splunk (45 min) B. Intro to Splunk C. Using Fields

Terraform to install Splunk server

To manage your Splunk infrastructure as code using Terraform:

Created Septermber 2020, https://registry.terraform.io/providers/splunk/splunk/latest at https://github.com/splunk/terraform-provider-splunk



  1. INPUT

Fundamentals Curriculum

Splunk has 160 commands. eval are the most important.

  • Search & Reporting
  • Dashboards
  • Alerts
  • Apps
  • Settings
  • Help


  1. Use the Safari browser to download
    • tutorialdata.zip
    • Prices.csv.zip Do not unzip them because Splunk unzips automatically when you install the app.

PDF: Dashboards and Visualizations Manual

Two different visualization frameworks:

  • The Classic Splunk dashboards and visualizations framework uses Simple XML as the source code and has a limited user interface.

  • The Splunk Dashboard Studio framework uses JSON-formatted stanzas as the source code for the objects in a dashboard, and for the entire dashboard. Add visualizations directly to a dashboard and wire them to searches (aka data sources), without entering the source editor or using Search & Reporting. No Trellis & 3rd party visualizations. Adds Choropleth SVG images.

Splunk Fundamentals Course

Module 1 – Introduction

Module 2 – What is Splunk?

  • Splunk components
  • Installing Splunk
  • Getting data into Splunk

Module 3 – Introduction to Splunk’s User Interface

  • Understand the uses of Splunk
  • Define Splunk Apps
  • Customizing your user settings
  • Learn basic navigation in Splunk

Module 4 – Basic Searching

  • Run basic searches
  • Use autocomplete to help build a search
  • Set the time range of a search
  • Identify the contents of search results
  • Refine searches
  • Use the timeline
  • Work with events
  • Control a search job
  • Save search results

Module 5 – Using Fields in Searches

  • Understand fields
  • Use fields in searches
  • Use the fields sidebar

Module 6 – Search Language Fundamentals

  • Review basic search commands and general search practices
  • Examine the search pipeline
  • Specify indexes in searches
  • Use autocomplete and syntax highlighting
  • Use the following commands to perform searches: o tables o rename o fields o dedup o sort

Module 7 – Using Basic Transforming Commands

  • The top command
  • The rare command
  • The stats command

Module 8 – Creating Reports and Dashboards

  • Save a search as a report
  • Edit reports
  • Create reports that include visualizations such as charts and tables
  • Create a dashboard
  • Add a report to a dashboard
  • Edit a dashboard

Module 9 – Datasets and the Common Information Model

  • Naming conventions
  • What are datasets?
  • What is the Common Information Model (CMI)?

Module 10 – Creating and Using Lookups

  • Describe lookups
  • Create a lookup file and create a lookup definition
  • Configure an automatic lookup

Module 11 – Creating Scheduled Reports and Alerts

  • Describe scheduled reports
  • Configure scheduled reports
  • Describe alerts
  • Create alerts
  • View fired alerts

Module 12 - Using Pivot tables and charts with SPL

  • Describe Pivot
  • Understand the relationship between data models and pivot
  • Select a data model object
  • Create a pivot report
  • Create an instant pivot from a search
  • Add a pivot report to a dashboard

The Pivot tool is a drag-and-drop UI that lets you report on Datasets without the Splunk Search Processing Language (SPL™).

Each dataset exists within a data model, which defines a subset of the dataset represented by the data model as a whole. Each data model consists of one or more data model datasets.

Data model datasets have a hierarchical relationship with each other (have parent-child relationships). Data models can contain multiple dataset hierarchies.

Child datasets have inheritance. Data model datasets are defined by characteristics that mostly break down into constraints and fields. Child datasets inherit constraints and fields from their parent datasets and have additional constraints and fields of their own.

The types of dataset hierarchies: event, search, transaction, child:

  • Event datasets represent a set of events. Root event datasets are defined by constraints (see below). *Transaction datasets represent transactions–groups of events that are related in some way, such as events related to a firewall intrusion incident, or the online reservation of a hotel room by a single customer.
  • Search datasets represent the results of an arbitrary search. Search datasets are typically defined by searches that use transforming or streaming commands to return results in table format, and they contain the results of those searches.
  • Child datasets can be added to any dataset. They represent a subset of the dataset encompassed by their parent dataset. You may want to base a pivot on a child dataset because it represents a specific chunk of data–exactly the chunk you need to work with for a particular report.
    See https://docs.splunk.com/Documentation/Splunk/9.0.4/Knowledge/Aboutdatamodels

BOTS database

logs and use cases


NOTE: Infographics published by FinancesOnline (https://financesonline.com) indicated that humans created, captured, copied, and consumed about 74 zettabytes of data in 2021. That number is estimated to grow to 149 zettabytes in 2024.

SPL (Search Processing Language)

SPL2 also handles ANSI SQL.

Search interface, Anatomy of a Search Logical expressions Using Pipe Using fields

Functions (Critical commands)

  • stats functions
  • eval
  • eventstats and streamstats
  • timechart



  • Create and manage reports, dashboards
  • Schedule a report
  • Schedule a dashboard for PDF delivery

Technology reports:

  • Malware Summary: No. of Infections, Hosts infected, Users, Malware Type/Name, Action by AV, Files
  • Firewall Summary: Inbound/Outbound, Source/Destination, Protocol, Action, Bytes, Packets
  • Account Management Summary: Account Creation, Account Modification, Account Deletion, Lockouts, Password Resets
  • Authentication Summary: Successful/Failed logins, logouts, Account Logons/Lockouts
  • Proxy Summary: Top 10: users, URLs, domains, IP addresses, Malware, Malicious URLs, Malicious domains, Malicious IP addresses, Malicious/Normal downloads and Action
  • Email Summary: Top 10: senders, Recipients, Sender domains, IP addresses, Mail blocking reasons, Malicious/Normal downloads and Action
  • Threat Intelligence Summary: Inbound/Outbound, Source/Destination, Protocol, Action, Bytes, Packets

SIEM Performance reports:

  • SIEM Performance Summary
  • SIEM Performance Summary by Source
  • SIEM Performance Summary by Destination
  • SIEM Performance Summary by Source and Destination

SOC (Security Operations Center)

Security Models: In-house, MSSP (Managed Security Service Provider)/MSP (Managed Service Provider): Dedicated or Shared.

A SOC team correlates and analyzes security events from multiple sources, including network traffic.

From Infosec Institute What does a SOC analyst do?

    Security operations center (SOC) analysts are responsible for analyzing and monitoring network traffic, threats and vulnerabilities within an organization’s IT infrastructure. This includes monitoring, investigating and reporting security events and incidents from security information and event management (SIEM) systems. SOC analysts also monitor firewall, email, web and DNS logs to identify and mitigate intrusion attempts.
  • Threat Intelligence, Threat Hunter, Forensic Investigator,
  • Incident Handler
  • Incident Response Automation Engineer
  • Red Team Specialist, Lead
  • SOC Engineer, Manager

Incident Response Process

NIST SP 800-61 & SANS (SysAdmin, Audit, Network, Security) Institute’s Incident Response Process:

  1. Preparation
  2. Identification
  3. Containment
  4. Eradication
  5. Recovery
  6. Lessons Learned

Metrics of activity:

  • No. of Log Sources: 2,800 - 3,000
  • No. of Log Events/day: 100,000 - 1,000,000
  • No. of Alerts/day: 100 - 200
  • No. of incidents/day: 2 - 5


SLA: time to identify and report suspicious activity:

  • P1: up to 30 minutes
  • P2: 1 hour
  • P3: 2 hours
  • P4: 4 hours

SOC Analyst Interview Q&A, from https://www.socexperts.com

https://www.youtube.com/watch?v=AtRTliJ4Fe0 The Roles and Responsibilities of a Security Operations Center (SOC) by Mike Worth

https://www.youtube.com/watch?v=YVQriOVHl18 A TYPICAL Day in the LIFE of a SOC Analyst


Ticketing tools: Service Now (SNOW), Jira, BMC Remedy, RSA Archer

  1. Reported By
  2. Incident ID assigned by ticketing tool
  3. Detected Time
  4. Incident Description/Details
  5. Assigned To
  6. Occurred Time
  7. Incident Name (Summary)
  8. Priority
  9. Severity
  10. What’s Affected: systems, hosts, IP addresses, User, Business unit, etc.
  11. Evidence
  12. Analysis
  13. Status
  14. Resolution Date

Analysis Tools

  • VirusTotal.com
  • Wireshark
  • MXToolBox
  • CVE Details
  • IBM X-Force/Threat Crowd

Process Explorer


OReilly trainings

https://learning.oreilly.com/live-events/beginning-splunk/0636920372424/ Video course: Beginning Splunk


Fundamentals courses are FREE at https://education.splunk.com/catalog?category=splunk-fundamentals-part-1

https://www.javatpoint.com/splunk text tutorial

VIDEO: Explaining Splunk Architecture Basics

Other downloads:

NOTE: Index 500 MB/Day.

  1. Previously: to download:

    wget -O splunk-7.1.0-2e75b3406c5b-darwin-64.tgz 'https://www.splunk.com/bin/splunk/DownloadActivityServlet?architecture=x86&platform=macos&version=7.1.0&product=splunk&filename=splunk-7.1.0-2e75b3406c5b-darwin-64.tgz&wget=true'

    The MD5 is at, for the version at time of writing: https://download.splunk.com/products/splunk/releases/7.1.0/osx/splunk-7.1.0-2e75b3406c5b-darwin-64.tgz.md5

  2. If you don’t have an account, register.

  3. You may have to copy and paste the URL from above to get back to the page.

    • https://www.splunk.com/en_us/training/videos/all-videos.html

    • http://docs.splunk.com/Documentation/Splunk/latest/Installation

    • https://www.splunk.com/pdfs/solution-guides/splunk-quick-reference-guide.pdf

For release notes, refer to the Known issues in the Release Notes manual:

  • http://docs.splunk.com/Documentation/Splunk/latest/ReleaseNotes/Knownissues
  • http://docs.splunk.com/Documentation/SplunkCloud/6.6.0/SearchReference/Commandsbycategory


Splunk offers ingestion in streaming mode (not batch).

Splunk stores data in indexes organized in directories and files.

Splunk compresses data in flat files using their own proprietary format which are read by SPL (Splunk Processing Language).

Splunk apps have a preconfigured visual app UI.
Splunk add-ons do not have a preconfigured visual UI app (headless).


Splunk default configurations are stored at $splunkhome/etc/system/default

  1. To disable Splunk Launch Messages in splunk_launch.conf


Components, Licenses, Default Ports

Splunk licenses charge by how much data can be indexed per calendar day (midnight to midnight).

Deployment Server manages Splunk components in a distributed environment.

Each Cluster Member for index replication is licensed.

Each forwarder forwards logs to the Splunk Indexer:

  • Universal Forwarder (UF)
  • Heavyweight Forwarder (HWF) parses data (so not recommended for production systems)

  • 8000 - Splunk web port - the Search Head providing GUI for distributed searching

    • Search head cluster is more reliable and efficient than (older) search head pooling (to be deprecated).
    • Search head cluster is managed by a captain, which controls its slaves (legacy terminology)

  • 8080 - Splunk Index Replication port by the Indexer
  • 8089 - Splunk Management GUI port

  • 8191 - Splunk KV Store

  • 9997 - Splunk Indexing port
  • 514 - Splunk Network port


https://splunkbase.splunk.com/app/3138/ 3D Scatterplot - Custom Visualization is built with plotly.js, which combines WebGL and d3.js. So you can zoom, rotate, and orbit around the points, change aspect ratios, colors, sizes, opacity, labels, etc.

Currently, this visualization supports 50,000 points and does not limit your categorical values. Download the app to see some examples.

  1. Disalbe Start the daemon:

    $SPLUNK_HOME/bin/splunk disable boot-start
  2. boot Start the daemon:

    $SPLUNK_HOME/bin/splunk enable boot-start
  3. Start

    splunk start splunkweb

    Start daemon:

    splunk start splunkd
  4. Verify process started (by name):

    ps aux | grep splunk

To reset admin password (v7.1+):

  1. stop Splunk process.

  2. Find the passwd file and rename it “passwd.bk”.

  3. In directory: $SPLUNK_HOME/etc/system/local
  4. Create file user-seed.conf containing:

  5. Create file ui-prefs.conf containing this to have all search app users see : today

    dispatch_earliest_time = @d
    dispatch_latest_time = now
  6. Start the server.


  1. system local directory have highest priority

  2. App local directories
  3. App default directories
  4. System default directory have lowest priority

Sample data (up to 500 MB) can be obtained free from Kaggle, such as Titanic passengers. See https://www.javatpoint.com/splunk-data-ingestion

Ingestion (Fishbucket) to avoid duplicate indexing

To prevent splunk from re-ingesting files it’s already processed (such as a directory full of logs).

Access them through the GUI by searching for:


Splunk UFs & HFs track what files it’s ingested - via monitor, batch, or oneshot - through an internal index called the fishbucket in default folder:

    /opt/splunk/var/lib/splunk That folder contains seek pointers and CRCs for files being indexed, for splunkd to tell whether each has been read.

The fishbucket index contains pairs of file paths & checksums of ingested files, as well as some metadata.

It also prevents re-ingestion if a first ingestion has somehow gone wrong (wrong index, wrong parsing, etc.).

  1. To bypass this limitation, it is possible to delete the entire fishbucket off the filesystem. But this is very much less than ideal - it may cause other files to be re-ingested. Instead, there is a ‘clean’ command that can excise the record of a particular file. Run this while logged in as the splunk user:

    splunk cmd btprobe -d $SPLUNK_HOME/var/lib/splunk/fishbucket/splunk_private_db --file /path/to/file.log --reset
  2. After removal, re-ingest the file:

    splunk add oneshot -source /path/to/file.log -sourcetype ST -index IDX -host HOSTNAME

The most common use case for this is testing index-time props/transforms - date/time extraction, line-breaking, etc. Using a sample log, ingest the file, check the parsing logic via search, then either fix the props and clean & reingest as necessary, or continue onboarding normally.

https://docs.splunk.com/Documentation/AddOns/released/Linux/Configure collectd_html format



Splunk places indexed data in buckets (physical directories) each containing events of a specific period.

Over time, each bucket changes stages as it ages:

  • One or more buckets are hot when newly indexed and open for writing.
  • Warm buckets contain data rolled out of hot buckets
  • Cold buckets contain data rolled out of warm buckets
  • Frozen buckets contain data from cold buckets. They are not searchable, and deleted (or archived) by the indexer

To troublesheet Splunk performance issues

  • Watch Splunk metrics log in real time:

    index="_internal" source="metrics.log" group="per_sourcetype_thruput"
    eval MB=kb/1024 | chart sum(MB)

    Alternately, watch everthing, split by source type:

    index="_internal" source="metrics.log" group="per_sourcetype_thruput" | eval MB=kb/1024 | chart sum(MB) avg(eps) over series
  • Check splunkd.log for errors.

  • Check for server performance metrics (CPU, memory usage, disk I/O, etc.)

  • The SOS (Splunk on Splunk) app is installed to check for warnings and errors (on the dashboard)

  • Too many saved searches can consume excessive system resources.

  • Install and enable Firebug browser extension to reveal what happens when logging into Splunk. Then enable and switch to the “Net” panel to view time spent in HTTP requests and responses

  • use btool to troubleshoot configuration files.

Each search is recorded as a .csv of search results and a search.log in a folder within: $SPLUNK_HOME/var/run/splunk/dispatch

The waiting period before each dispatch directory is deleted is controlled by limits.conf.

If user requests saving, they are deleted after 7 days.

To add folder access logs:

  1. Enable Object Access Audit through group policy on the Windows machine on which the folder is located.
  2. Enable auditing on the specific folder for which logs are monitored.
  3. Install Splunk Universal Forwarder on the Windows machine.
  4. Configure Universal Forwarder to send security logs to Splunk Indexer.

Search terms

Ideally, stats commands are used when unique IDs are available for use because they have higher performance.

But sometimes the unique ID (from one or more fields) alone is not sufficient to discriminate among transactions, such as when web sessions are identified by a cookie/client IP. In that case, transaction commands reference raw text serving as message identifier may be used to begin and end transactions.

stats commands generate summary statistics of all existing fields in search results as values in new fields.

EventStats commands aggregates to original raw data when event stats

Regular Expressions

There are two ways to do Regular Expressions, such as extracting an IP address:

  • rex field = raw “(&LT;ip_address>\d+.\d+.\d+.\d+)”

  • rex field = raw “(&LT;ip_address>([0-9][{1,3}[.]){3}{0-9}{0,3})”

  1. To disable search history, delete file:


Map Reduce Alogorithm

The mechanism that enable Splunk’s fast data searching is that Splunk adapted the map() and reduce() alogorithms that functional programming uses for large-scale batch parallelization processing to streaming.


Splunk Operational Intelligence Cookbook By Josh Diakun, Paul R Johnson, Derek Mock

Installs and Configures Splunk forwarders and servers



https://github.com/cerner/cerner_splunk https://github.com/search?utf8=%E2%9C%93&q=splunk&type=


Blazemeter has additional software that only works on their cloud platform.

Splunk certifications

Splunk Enterprise Security Certified Admin is among the most expensive of all certifications: $4,625.

Splunk Cloud Fundamentals 1

Splunk offers a free class called Fundamentals 1 to get people to get started in Splunk and get certified in Core User

Get FREE Access for one month at https://education.splunk.com/course/splunk-infrastructure-overview

https://docs.splunk.com/Documentation/Splunk/latest/Installation/Systemrequirements Splunk Web front-end runs on 8000 for modern browsers Splunk listens on port 8065 bound to loopback interface KV store uses port 8191 https://docs.splunk.com/Documentation/Splunk/latest/installation/RunSplunkasadifferentornon-rootuser In any directory (/opt/)

Splunk server components: all running splunkd written in C/C++ on SSL port 8089 for mgmt

  • Forwarders on servers where data originates forwards data to Splunk indexers
  • Indexers receive metrics to store in a Splunk index containing directories organized by age
  • Search heads handles search request language on indexers, then consolidate results in reports and dashboards of visualizations Knowledge Objects extracts additional fields and transform data.

Each instance indexes less than 20GB per day for under 20 users using a small number of forwarders.

  • Indexers: 2 64-bit CPU with 6x2GHz cores, 12GB RAM, 1GbE NIC, 800 IOPS
  • Search heads: 4 64-bit CPU with 4x2GHz cores, 12GB RAM, 1GbE NIC, 2 x 10K RPM 300GB SAS drives - RAID-1
  • Forwarders: 1 64-bit CPU with 2x1.5GHz cores, 1GB RAM

Supporting search head with 3 indexers up to 100GB per day up to 100 users using several hundred forwarders. Add 3 search heads under a search cluster to distribute requests.

An Index cluster replicates index data promotes availability and prevents data loss. https://docs.splunk.com/Documentation/Splunk/latest/installation/ChoosetheuserSplunkshouldrunas


	./splunk help
	./splunk enable boot-start -user ubuntu
	./splunk start --accept-license
	./splunk stop
	./splunk restart

User: admin/changeme

Turn off transparent huge pages ?

Other components:

  • License master
  • Deployment server
  • Cluster manager

Splunk scales instances: Input, Parsing, Indexing, Searching

Forwarder Management

  1. Introduction
  2. What is Splunk?
  3. Introduction to Splunk’s interface
  4. Basic searching
  5. Using fields in searches
  6. Search fundamentals
  7. Transforming commands
  8. Creating reports and dashboards
  9. Datasets
  10. The Common Information Model (CIM)
  11. Creating and using lookups
  12. Scheduled Reports
  13. Alerts
  14. Using Pivot


Datadog SaaS.

Sumo Logic provides a paid alternative only as a public cloud-based service.

The ELK stack (ElasticSearch, LogStash for log gathering, and Kibana for visualization) provides both free on-premises and paid cloud offerings.

Loggly and LogLogic also.

For visualization there is also Graphite, Librato, and DataDog.

QRadar is a SIEM product from IBM. VIDEO

Video tutorials


Installing and Configuring Splunk

Pluralsight video course: Optimizing Fields, Tags, and Event Types in Splunk [1h 36m] 28 Feb 2019 by Joe Abraham (@jobabrh, jobabrahamtech.com) is based on Splunk version 7.2.1

Performing Basic Splunk Searches

Analyzing Machine Data with Splunk


By Intellipaat:

By Kinney Group:

Gerald Auger, PhD - Simply Cyber YouTube channel references home lab build by Eric Capuano of Recon Infosec YouTube channel. Uses Lima Charlie for threat hunting.



https://twitter.com/splunk #TurnDataIntoDoing

Splunk online documentation: http://docs.splunk.com/Documentation/Splunk

Splunkbase community: https://community.splunk.com

Splunk Community Slack (splk.it/slack)

Splunk User Groups (usergroups.splunk.com)

http://conf.splunk.com/ July 17-20, 2023 | Las Vegas https://twitter.com/hashtag/splunkconf23 $1,695



SplunkTrust community

Hot buckets

Splunk puts data into buckets. Each bucket has a time span.

If there is an existing bucket defined for a time span that include the time of the data,
Splunk puts data into that bucket.


How many buckets?

maxHotBuckets = 3 is the default, for 2 normal buckets are open at a time, plus a final bucket slot reserved for quarantined data.

QUESTION: Quarantined data?

Larger number of buckets for a high volume index or if there are regular delays in event delivery. The more hot buckets Splunk has open, the greater the time range to cover for events.

Splunk creates a new hot bucket if there is a slot available for a new hot bucket.

PROTIP: Bringing in old data (back in time) may give your buckets indigestion.

Buckets contain raw and tsidx data.

maxHotSpanSecs = 777600 default for 90 days. 3 maxHotBuckets would yield 240 days. Change if there are regularly larger than expected timespans. This can be overruled by Splunk.

minHotIdleSecsBeforeForceRoll = auto for Splunk to autotune (starting at 600 secs).

maxDatasize =

Normally, Splunk places the event in the bucket with the closest timestamp.

But if an event does not fit into an existing hot bucket, if there is at least one bucket idle long enough to be allowed to roll, Splunk closes the hot bucket with the longest idle time before creating a new hot bucket.

Boss of the SOC

https://github.com/splunk/botsv3/ contains a sample pre-index security dataset used in CTF (Capture The Flag) competitions. Its data SourceTypes include aws:cloudtrail, and many others. Required software include AWS Guard Duty, CiscoNVM, Code42 App for Splunk, and a Splunk Add-in for each cloud.


https://www.splunk.com/en_us/pdfs/resources/whitepaper/detecting-supply-chain-attacks.pdf Detecting Supply Chain Attacks Using Splunk and JA3/s hashes to detect malicious activity on critical servers

https://learning.oreilly.com/library/view/sams-teach-yourself/9780135182925/ SQL in 10 Minutes, 5th Edition, 2019, by Ben Forta:

https://learning.oreilly.com/library/view/unix-and-linux/9780134278308/ Unix and Linux System Administration Handbook, 5th Edition, by Evi Nemeth et al

More on Security

This is one of a series on Security in DevSecOps:

  1. Security actions for teamwork and SLSA
  2. DevSecOps

  3. Code Signing on macOS
  4. Transport Layer Security

  5. Git Signing
  6. GitHub Data Security
  7. Encrypt all the things

  8. Azure Security-focus Cloud Onramp
  9. Azure Networking

  10. AWS Onboarding
  11. AWS Security (certification exam)
  12. AWS IAM (Identity and Access Management)
  13. AWS Networking

  14. SIEM (Security Information and Event Management)
  15. Intrusion Detection Systems (Goolge/Palo Alto)
  16. Chaos Engineering

  17. SOC2
  18. FedRAMP
  19. CAIQ (Consensus Assessment Initiative Questionnaire) by cloud vendors

  20. AKeyless cloud vault
  21. Hashicorp Vault
  22. Hashicorp Terraform
  23. OPA (Open Policy Agent)

  24. SonarQube
  25. WebGoat known insecure PHP app and vulnerability scanners
  26. Test for OWASP using ZAP on the Broken Web App

  27. Security certifications
  28. Details about Cyber Security

  29. Quantum Supremecy can break encryption in minutes
  30. Pen Testing
  31. Kali Linux

  32. Threat Modeling
  33. WebGoat (deliberately insecure Java app)