Wilson Mar bio photo

Wilson Mar

Hello!

Email me Calendar Skype call

LinkedIn Twitter Gitter Instagram Youtube

Github Stackoverflow Pinterest

Indexing and visualization of logs, metrics, and other data - with AI

US (English)   Español (Spanish)   Français (French)   Deutsch (German)   Italiano   Português   Estonian   اَلْعَرَبِيَّةُ (Egypt Arabic)   中文 (简体) Chinese (Simplified)   日本語 Japanese   한국어 Korean

Overview

The company Splunk

Splunk has been around since 2003 and is firmly entrenched in many datacenters. As of September 2020, Splunk’s client list includes 92 companies on the Fortune 100 list.[34]

NOTE: Content here are my personal opinions, and not intended to represent any employer (past or present). “PROTIP:” here highlight information I haven’t seen elsewhere on the internet because it is hard-won, little-know but significant facts based on my personal research and experience.

People who work in the company Splunk are called “Splunkers”.

Splunk is like “Google” for machine-generated data, especially logs from servers, applications, and networks.

Splunk is a software utility for machine log data collection, indexing, and visualization for “operational intelligence”. Splunk can ingest almost all technologies (on-prem, clouds, databases, etc.) for use by SOC (Security Operations Centers) who correlate what’s going on across the vast landscape of technologies.

  • Collect and Index Log Data: Index streaming log data from all your distributed systems regardless of format or location.

  • Visualize Trends

  • Zoom in and out on timelines to automatically reveal trends, spikes and patterns and click to drill down into search results.

  • Issue alerts (based on AIOps)

Splunk, Inc. is headquartered at 270 Brannan St, San Francisco, California 94107. +1 415.848-8400.

It IPO’d in 2012 as ticker SPLK.

https://www.wikiwand.com/en/Splunk notes that according to Glassdoor, it was the fourth highest-paying company for employees in the United States in April 2017. In 2020, Splunk was named to the Fortune 1000 list.

On June 11, 2018, Splunk announced its acquisition of VictorOps, a DevOps incident management startup, for US$120 million. The VictorOps product is renamed to “Splunk Online”.

In October 2019, Splunk announced the integration of its security tools - including security information and event management (SIEM), user behavior analytics (UBA), and security orchestration, automation, and response (Splunk Phantom) — into the new Splunk Mission Control.

In 2019, Splunk introduced an application performance monitoring (APM) platform, SignalFx Microservices APM, that pairs “no-sample” monitoring and analysis features with Omnition’s full-fidelity tracing capabilities. Splunk also announced that a capability called Kubernetes Navigator would be available through their product, SignalFx Infrastructure Monitoring.

Also in 2019, Splunk announced new Data Fabric Search and Data Stream Processor. Data Fabric Search that combines into a single view datasets across different data stores, including those that are not Splunk-based. The required data structure is only created when a query is run. The real-time Data Stream Processor collects data from various sources and then distributes results to Splunk or other destinations. It allows role-based access to create alerts and reports based on data that is relevant for each individual.[61] In 2020, it was updated to allow it to access, process, and route real-time data from multiple cloud services.[62]

Also in 2019, Splunk rolled out Splunk Connected Experiences, which extends its data processing and analytics capabilities to augmented reality (AR), mobile devices, and mobile applications.

In 2020, Splunk announced Splunk Enterprise 8.1 and the Splunk Cloud edition. They include stream processing, machine learning, and multi-cloud capabilities.

https://www.glassdoor.com/Reviews/Splunk-Reviews-E117313.htm at time of writing, 77% of employee responders would recommend to a friend, which is “Good” (based on 1,000+ reviews).

https://github.com/splunk

Product Names

Splunk can be used for ALL your data needs. It is a data platform. It is a data lake. It is a data warehouse. It is a data lakehouse. It is a data lakehousehouse. It is a data lakehousehousehouse. ;)

Splunk offers proprietary products.

  1. Splunk Application Performance Monitoring (APM)
  2. Splunk Cloud Platform
  3. Splunk Connected Experiences (Mobile, AR, VR, TV)
  4. Splunk Data Stream Processor
  5. Splunk Edge Hub
  6. Splunk Enterprise
  7. Splunk Enterprise Security
  8. Splunk Incident Intelligence
  9. Splunk Infrastructure Monitoring
  10. Splunk IT Service Intelligence
  11. Splunk Machine Learning Toolkit
  12. Splunk Mission Control
  13. Splunk Observability Cloud
  14. Splunk On-Call (formerly VictorOps)
  15. Splunk Real User Monitoring (RUM)
  16. Splunk Security, Orchestration, Automation and Response (SOAR)
  17. Splunk Synthetic Monitoring
  18. Splunk Threat Intelligence Management
  19. Splunk User Behavior Analytics
  20. Splunk Web Optimization
  21. OpenTelemetry

Industry terms

Gartner’s Hype Cycle for Monitoring, Observability and Cloud Operations, 2022 PDF: monitoring-Hype_Cycle_2022-1170x771.png

APM (Application Performance Monitoring) is a superset of RUM (Real User Monitoring), to monitor and troubleshoot performance issues in web applications.

RUM (Real User Monitoring) is a subset of APM (Application Performance Monitoring), to monitor and troubleshoot performance issues in web applications.

eBPF (extended Berkeley Packet Filter) is a subset of APM (Application Performance Monitoring), to monitor and troubleshoot performance issues in Linux and Windows applications.

VDIM (Virtual Desktop Infrastructure Monitoring) is a subset of APM (Application Performance Monitoring), to monitor and troubleshoot performance issues in virtual desktops.

EDR (Endpoint Detection and Response) is a subset of a EPP (Endpoint Protection Platform), which detect and automatically remediate attacks on endpoints. It’s a quality log source for SIEM. EDR protects against malware threats with “behaviral” protection that’s than signature-based anti-virus.

DEX (Data Exfiltration) is a subset of EDR (Endpoint Detection and Response), to detect and respond to data exfiltration.

NDR (Network Detection and Response) is a superset of EDR, to also detect and respond to network security incidents.

XDR (eXtended Detection and Response) is a superset of EPP and EDR, to also ingest, detect, and respond to network, email, and cloud security (any assset) with “advanced” threat detection and response.

  • MDR (Managed Detection and Response) is a superset of XDR, to also provide 24/7/365 monitoring and response to security incidents, to provide guided response to 3rd-party human investigators. It automates alerts, prioritizing them through automation to rule out false positives, to escalate only the most critical alerts to the MDR team for investigation and remediation. It also provides assistance in cleanup to bring conditions back to pre-attack state.

Splunk Security, Orchestration, Automation and Response (SOAR) - see below.

SOAR (Security Orchestration and Automation)

Splunk SOAR (Security Orchestration and Automation) is a cloud-based platform that automates security tasks, orchestrates workflows, and reduces incident response time.

It reduces “alert fatigue” by automating the triage and remediation of security alerts, and automating the response to security incidents.

The hype:

  • Investigate and respond to threats faster
  • Increase SOC efficiency and productivity
  • Eliminate analyst grunt work so you can stop working hard and start working smarter
  • Go from overwhelmed to in-control of your security operations

Splunk SOAR’s Main Dashboard provides a “single pane of glass”, with an overview of all analyst data and activity, notable events, playbooks, connections with other security tools, workloads, ROI, and more.

Splunk offers a free Community Edition of SOAR (to automate tasks, orchestrate workflows, and reduce incident response time). Fill out a form for their approval before allowing you to download:

https://my.phantom.us/signup/

    NOTE: "Phantom" was the previous name for the SOAR (cloud) product.

NOTE: Splunk SOAR (Cloud) does not allow access from the Splunk Connected Experiences mobile apps.

Splunk SOAR (Cloud) supports SAML2 authentication.

SOAR marketing page references:

  • https://www.splunk.com/en_us/products/splunk-security-orchestration-and-automation.html
  • https://www.splunk.com/en_us/software/splunk-security-orchestration-and-automation.html
  • https://www.splunk.com/en_us/data-insider/what-is-soar.html
  • https://www.splunk.com/en_us/blog/security/soaring-to-the-clouds-with-splunk-soar.html
  • https://docs.splunk.com/Documentation/SOAR/current/ServiceDescription/SplunkSOARService

SOAR Playbooks

  • VIDEO: https://www.splunk.com/en_us/software/splunk-security-orchestration-and-automation/features.html

By defining and automating the sequence of actions, SOAR playbooks enable swifter response to triggers, thus reducing incident response time.

It pulls in data from SIEM, EDR, firewall, and threat intelligence feeds.

Post-incident, SOAR playbooks automate remediation of the attack and case management.

SOAR automates tasks, orchestrate workflows:

  • Splunk SOAR comes with 100+ pre-made playbooks out of the box.

  • Users can build and edit playbooks in the original horizontal visual playbook editor or the vertical visual playbook editor introduced August 2021.

  • Splunk SOAR (Cloud) is provisioned with 600GB of disk space and 600GB of PostgreSQL database storage.

SOAR Enrichments

SOAR uses Splunk Intelligence Management (formerly TruSTAR) normalized indicator enrichment, captured within the notes of a container. It enables an analyst to view details and specify subsequent actions directly within a single Splunk SOAR prompt for rapid manual response.

The “Suspicious Email Domain Enrichment” playbook uses Cisco Umbrella Investigate to add to the security event in Spunk SOAR a risk score, risk status, and domain category. This enables faster recognition of the purpose of the email, and the domain enrichment will also provide a connection point to take further action on the output.


SOC Implementation Phases

with CMM (Capability Maturity Model)

  1. Define scope at CCM Level 1 (Ad Hoc State) - Processes are unpredictable and inconsistent.
  2. Implement Technologies
  3. Hire and Build Team
  4. Develop Policies, Processes, Procedures to reach CCM Level 2 (Repeatable State)
  5. Reach CCM Level 3 (Initial - Managed - Defined State)
  6. Develop KPI (Quantitative) and KRI (Qualitative) Metrics
  7. Automate to reach CCM Level 4 (Optimizing State)

Different Proprietary Editions

Not available in free versions:

  • Authentication
  • Distributed search. Scheduled searches and alerting
  • Forwarding in TCP/HTTP (to non-Splunk)
  • Deployment management

PROTIP: A free Splunk Enterprise license allows indexing of up to 500MB per day for 60 days. After that, convert to a perpetual Free license or purchase an Enterprise license.

The master pool quota aggregates a license pools for each index/source type, with its own sub-quota.

./splunk is “short-hand” for the splunk executable in $SPLUNK_HOME/bin/splunk

  • On Unix, this is by default /opt/splunk/bin/splunk
  • On Windows it is c:\program files\splunk\bin\splunk
  • On MacOS it is /Applications/splunk/bin/splunk

Use Splunk SaaS cloud using just a browser

There is a different set of installation and set-up instructions depending on the edition.

  • 14-day Splunk Cloud Platform Trial:

    https://www.splunk.com/en_us/download/splunk-cloud.html

  • 60-day @ 500MB/day Splunk Enterprise Trial on a server:

    https://www.splunk.com/en_us/download/splunk-enterprise.html

    Support for MacOS was deprecated since Slunk Enterprise versions 10.14 or 10.15.

  • 6 months @ 50 GB/day Dev/Test for customers with an enterprise support license.

Install Splunk locally

  1. Go to the Download page:

    https://www.splunk.com/en_us/download.html

  2. Fill out the form.
  3. Open “Confirm your email address” email and click the link.
  4. At https://www.splunk.com/en_us/download/splunk-cloud/cloud-trial.html
  5. Wait for email “Welcome to Splunk Cloud Platform!” providing
    • Splunk Cloud Platform URL: https://prd-c-xxxx.splunkcloud.com
    • User Name: sc_admin
    • Temporary Password: xxxxxxxx
  6. Login to Splunk Cloud Platform at https://prd-c-xxxx.splunkcloud.com
  7. Copy and paste the temporary password into the password field.
  8. Create a new password and save it in a secure location.
  9. Accept terms.
  10. Read what’s new
  11. Apps menu:

  12. In the menu, notice the “prd-…” in the URL to the Search Manual, Pivot Manual, Dashboard & Visualizations Manual
  13. “Take the free Splunk Fundamentals course” at https://www.splunk.com/en_us/training/free-courses.html A. What is Splunk (45 min) B. Intro to Splunk C. Using Fields

Terraform to install Splunk server

To manage your Splunk infrastructure as code using Terraform:

Created Septermber 2020, https://registry.terraform.io/providers/splunk/splunk/latest at https://github.com/splunk/terraform-provider-splunk


Architecture

Fundamentally

  1. INPUT
  2. PARSING
  3. INDEXING

Fundamentals Curriculum

Splunk has 160 commands. eval are the most important.

  • Search & Reporting
  • Dashboards
  • Alerts
  • Apps
  • Settings
  • Help

Dashboards

  1. Use the Safari browser to download
    • tutorialdata.zip
    • Prices.csv.zip Do not unzip them because Splunk unzips automatically when you install the app.

PDF: Dashboards and Visualizations Manual

Two different visualization frameworks:

  • The Classic Splunk dashboards and visualizations framework uses Simple XML as the source code and has a limited user interface.

  • The Splunk Dashboard Studio framework uses JSON-formatted stanzas as the source code for the objects in a dashboard, and for the entire dashboard. Add visualizations directly to a dashboard and wire them to searches (aka data sources), without entering the source editor or using Search & Reporting. No Trellis & 3rd party visualizations. Adds Choropleth SVG images.


Splunk Fundamentals Course

Module 1 – Introduction

Module 2 – What is Splunk?

  • Splunk components
  • Installing Splunk
  • Getting data into Splunk

Module 3 – Introduction to Splunk’s User Interface

  • Understand the uses of Splunk
  • Define Splunk Apps
  • Customizing your user settings
  • Learn basic navigation in Splunk

Module 4 – Basic Searching

  • Run basic searches
  • Use autocomplete to help build a search
  • Set the time range of a search
  • Identify the contents of search results
  • Refine searches
  • Use the timeline
  • Work with events
  • Control a search job
  • Save search results

Module 5 – Using Fields in Searches

  • Understand fields
  • Use fields in searches
  • Use the fields sidebar

Module 6 – Search Language Fundamentals

  • Review basic search commands and general search practices
  • Examine the search pipeline
  • Specify indexes in searches
  • Use autocomplete and syntax highlighting
  • Use the following commands to perform searches: o tables o rename o fields o dedup o sort

Module 7 – Using Basic Transforming Commands

  • The top command
  • The rare command
  • The stats command

Module 8 – Creating Reports and Dashboards

  • Save a search as a report
  • Edit reports
  • Create reports that include visualizations such as charts and tables
  • Create a dashboard
  • Add a report to a dashboard
  • Edit a dashboard

Module 9 – Datasets and the Common Information Model

  • Naming conventions
  • What are datasets?
  • What is the Common Information Model (CMI)?

Module 10 – Creating and Using Lookups

  • Describe lookups
  • Create a lookup file and create a lookup definition
  • Configure an automatic lookup

Module 11 – Creating Scheduled Reports and Alerts

  • Describe scheduled reports
  • Configure scheduled reports
  • Describe alerts
  • Create alerts
  • View fired alerts

Module 12 - Using Pivot tables and charts with SPL

  • Describe Pivot
  • Understand the relationship between data models and pivot
  • Select a data model object
  • Create a pivot report
  • Create an instant pivot from a search
  • Add a pivot report to a dashboard

The Pivot tool is a drag-and-drop UI that lets you report on Datasets without the Splunk Search Processing Language (SPL™).

Each dataset exists within a data model, which defines a subset of the dataset represented by the data model as a whole. Each data model consists of one or more data model datasets.

Data model datasets have a hierarchical relationship with each other (have parent-child relationships). Data models can contain multiple dataset hierarchies.

Child datasets have inheritance. Data model datasets are defined by characteristics that mostly break down into constraints and fields. Child datasets inherit constraints and fields from their parent datasets and have additional constraints and fields of their own.

The types of dataset hierarchies: event, search, transaction, child:

  • Event datasets represent a set of events. Root event datasets are defined by constraints (see below). *Transaction datasets represent transactions–groups of events that are related in some way, such as events related to a firewall intrusion incident, or the online reservation of a hotel room by a single customer.
  • Search datasets represent the results of an arbitrary search. Search datasets are typically defined by searches that use transforming or streaming commands to return results in table format, and they contain the results of those searches.
  • Child datasets can be added to any dataset. They represent a subset of the dataset encompassed by their parent dataset. You may want to base a pivot on a child dataset because it represents a specific chunk of data–exactly the chunk you need to work with for a particular report.
    See https://docs.splunk.com/Documentation/Splunk/9.0.4/Knowledge/Aboutdatamodels

SPL (Search Processing Language)

Search interface, Anatomy of a Search Logical expressions Using Pipe Using fields

Functions

  • stats functions
  • eventstats and streamstats
  • timechart

Reports

Tasks:

  • Create and manage reports, dashboards
  • Schedule a report
  • Schedule a dashboard for PDF delivery

Technology reports:

  • Malware Summary: No. of Infections, Hosts infected, Users, Malware Type/Name, Action by AV, Files
  • Firewall Summary: Inbound/Outbound, Source/Destination, Protocol, Action, Bytes, Packets
  • Account Management Summary: Account Creation, Account Modification, Account Deletion, Lockouts, Password Resets
  • Authentication Summary: Successful/Failed logins, logouts, Account Logons/Lockouts
  • Proxy Summary: Top 10: users, URLs, domains, IP addresses, Malware, Malicious URLs, Malicious domains, Malicious IP addresses, Malicious/Normal downloads and Action
  • Email Summary: Top 10: senders, Recipients, Sender domains, IP addresses, Mail blocking reasons, Malicious/Normal downloads and Action
  • Threat Intelligence Summary: Inbound/Outbound, Source/Destination, Protocol, Action, Bytes, Packets

SIEM Performance reports:

  • SIEM Performance Summary
  • SIEM Performance Summary by Source
  • SIEM Performance Summary by Destination
  • SIEM Performance Summary by Source and Destination

SOC (Security Operations Center)

Security Models: In-house, MSSP (Managed Security Service Provider)/MSP (Managed Service Provider): Dedicated or Shared.

A SOC team correlates and analyzes security events from multiple sources, including network traffic.

From Infosec Institute What does a SOC analyst do?

    Security operations center (SOC) analysts are responsible for analyzing and monitoring network traffic, threats and vulnerabilities within an organization’s IT infrastructure. This includes monitoring, investigating and reporting security events and incidents from security information and event management (SIEM) systems. SOC analysts also monitor firewall, email, web and DNS logs to identify and mitigate intrusion attempts.
  • Threat Intelligence, Threat Hunter, Forensic Investigator,
  • Incident Handler
  • Incident Response Automation Engineer
  • Red Team Specialist, Lead
  • SOC Engineer, Manager

Incident Response Process

NIST SP 800-61 & SANS (SysAdmin, Audit, Network, Security) Institute’s Incident Response Process:

  1. Preparation
  2. Identification
  3. Containment
  4. Eradication
  5. Recovery
  6. Lessons Learned

Metrics of activity:

  • No. of Log Sources: 2,800 - 3,000
  • No. of Log Events/day: 100,000 - 1,000,000
  • No. of Alerts/day: 100 - 200
  • No. of incidents/day: 2 - 5

SLA

SLA: time to identify and report suspicious activity:

  • P1: up to 30 minutes
  • P2: 1 hour
  • P3: 2 hours
  • P4: 4 hours

SOC Analyst Interview Q&A, from https://www.socexperts.com

https://www.youtube.com/watch?v=AtRTliJ4Fe0 The Roles and Responsibilities of a Security Operations Center (SOC) by Mike Worth

https://www.youtube.com/watch?v=YVQriOVHl18 A TYPICAL Day in the LIFE of a SOC Analyst

Incidents

Ticketing tools: Service Now (SNOW), Jira, BMC Remedy, RSA Archer

  1. Reported By
  2. Incident ID assigned by ticketing tool
  3. Detected Time
  4. Incident Description/Details
  5. Assigned To
  6. Occurred Time
  7. Incident Name (Summary)
  8. Priority
  9. Severity
  10. What’s Affected: systems, hosts, IP addresses, User, Business unit, etc.
  11. Evidence
  12. Analysis
  13. Status
  14. Resolution Date

Analysis Tools

  • VirusTotal.com
  • IPVOID
  • Wireshark
  • MXToolBox
  • CVE Details
  • US-CERT
  • IBM X-Force/Threat Crowd

Process Explorer

tools4noobs.com


OReilly trainings

https://learning.oreilly.com/live-events/beginning-splunk/0636920372424/ Video course: Beginning Splunk

Tutorials

Fundamentals courses are FREE at https://education.splunk.com/catalog?category=splunk-fundamentals-part-1

https://www.javatpoint.com/splunk text tutorial

VIDEO: Explaining Splunk Architecture Basics

Other downloads:

NOTE: Index 500 MB/Day.

  1. Previously: to download:

    wget -O splunk-7.1.0-2e75b3406c5b-darwin-64.tgz 'https://www.splunk.com/bin/splunk/DownloadActivityServlet?architecture=x86&platform=macos&version=7.1.0&product=splunk&filename=splunk-7.1.0-2e75b3406c5b-darwin-64.tgz&wget=true'

    The MD5 is at, for the version at time of writing: https://download.splunk.com/products/splunk/releases/7.1.0/osx/splunk-7.1.0-2e75b3406c5b-darwin-64.tgz.md5

  2. If you don’t have an account, register.

  3. You may have to copy and paste the URL from above to get back to the page.

    • https://www.splunk.com/en_us/training/videos/all-videos.html

    • http://docs.splunk.com/Documentation/Splunk/latest/Installation

    • https://www.splunk.com/pdfs/solution-guides/splunk-quick-reference-guide.pdf

For release notes, refer to the Known issues in the Release Notes manual:

  • http://docs.splunk.com/Documentation/Splunk/latest/ReleaseNotes/Knownissues
  • http://docs.splunk.com/Documentation/SplunkCloud/6.6.0/SearchReference/Commandsbycategory

Architecture

Splunk offers ingestion in streaming mode (not batch).

Splunk stores data in indexes organized in directories and files.

Splunk compresses data in flat files using their own proprietary format which are read by SPL (Splunk Processing Language).

Splunk apps have a preconfigured visual app UI.
Splunk add-ons do not have a preconfigured visual UI app (headless).

Configuration

Splunk default configurations are stored at $splunkhome/etc/system/default

  1. To disable Splunk Launch Messages in splunk_launch.conf

    OFFENSIVE=Less

Components, Licenses, Default Ports

Splunk licenses charge by how much data can be indexed per calendar day (midnight to midnight).

Deployment Server manages Splunk components in a distributed environment.

Each Cluster Member for index replication is licensed.

Each forwarder forwards logs to the Splunk Indexer:

  • Universal Forwarder (UF)
  • Heavyweight Forwarder (HWF) parses data (so not recommended for production systems)

  • 8000 - Splunk web port - the Search Head providing GUI for distributed searching

    • Search head cluster is more reliable and efficient than (older) search head pooling (to be deprecated).
    • Search head cluster is managed by a captain, which controls its slaves (legacy terminology)

  • 8080 - Splunk Index Replication port by the Indexer
  • 8089 - Splunk Management GUI port

    https://yoursplunkhost:8089/services/admin/inputstatus
  • 8191 - Splunk KV Store

  • 9997 - Splunk Indexing port
  • 514 - Splunk Network port

Add-ons

https://splunkbase.splunk.com/app/3138/ 3D Scatterplot - Custom Visualization is built with plotly.js, which combines WebGL and d3.js. So you can zoom, rotate, and orbit around the points, change aspect ratios, colors, sizes, opacity, labels, etc.

Currently, this visualization supports 50,000 points and does not limit your categorical values. Download the app to see some examples.

  1. Disalbe Start the daemon:

    $SPLUNK_HOME/bin/splunk disable boot-start
  2. boot Start the daemon:

    $SPLUNK_HOME/bin/splunk enable boot-start
  3. Start

    splunk start splunkweb

    Start daemon:

    splunk start splunkd
  4. Verify process started (by name):

    ps aux | grep splunk

To reset admin password (v7.1+):

  1. stop Splunk process.

  2. Find the passwd file and rename it “passwd.bk”.

  3. In directory: $SPLUNK_HOME/etc/system/local
  4. Create file user-seed.conf containing:

    [user_info]
    PASSWORD = NEW_PASSWORD
    
  5. Create file ui-prefs.conf containing this to have all search app users see : today

    [search]
    dispatch_earliest_time = @d
    dispatch_latest_time = now
    
  6. Start the server.

Precedence:

  1. system local directory have highest priority

  2. App local directories
  3. App default directories
  4. System default directory have lowest priority

Sample data (up to 500 MB) can be obtained free from Kaggle, such as Titanic passengers. See https://www.javatpoint.com/splunk-data-ingestion

Ingestion (Fishbucket) to avoid duplicate indexing

To prevent splunk from re-ingesting files it’s already processed (such as a directory full of logs).

Access them through the GUI by searching for:

index=_thefishbucket

Splunk UFs & HFs track what files it’s ingested - via monitor, batch, or oneshot - through an internal index called the fishbucket in default folder:

    /opt/splunk/var/lib/splunk That folder contains seek pointers and CRCs for files being indexed, for splunkd to tell whether each has been read.

The fishbucket index contains pairs of file paths & checksums of ingested files, as well as some metadata.

It also prevents re-ingestion if a first ingestion has somehow gone wrong (wrong index, wrong parsing, etc.).

  1. To bypass this limitation, it is possible to delete the entire fishbucket off the filesystem. But this is very much less than ideal - it may cause other files to be re-ingested. Instead, there is a ‘clean’ command that can excise the record of a particular file. Run this while logged in as the splunk user:

    splunk cmd btprobe -d $SPLUNK_HOME/var/lib/splunk/fishbucket/splunk_private_db --file /path/to/file.log --reset
    
  2. After removal, re-ingest the file:

    splunk add oneshot -source /path/to/file.log -sourcetype ST -index IDX -host HOSTNAME
    

The most common use case for this is testing index-time props/transforms - date/time extraction, line-breaking, etc. Using a sample log, ingest the file, check the parsing logic via search, then either fix the props and clean & reingest as necessary, or continue onboarding normally.

https://docs.splunk.com/Documentation/AddOns/released/Linux/Configure collectd_html format

Forwarder

http://docs.splunk.com/Documentation/Splunk/6.2.5/Data/Setupcustominputs

Splunk places indexed data in buckets (physical directories) each containing events of a specific period.

Over time, each bucket changes stages as it ages:

  • One or more buckets are hot when newly indexed and open for writing.
  • Warm buckets contain data rolled out of hot buckets
  • Cold buckets contain data rolled out of warm buckets
  • Frozen buckets contain data from cold buckets. They are not searchable, and deleted (or archived) by the indexer

To troublesheet Splunk performance issues

  • Watch Splunk metrics log in real time:

    index="_internal" source="metrics.log" group="per_sourcetype_thruput"
    series="<your_sourcetype_here&T;"
    eval MB=kb/1024 | chart sum(MB)
      

    Alternately, watch everthing, split by source type:

    index="_internal" source="metrics.log" group="per_sourcetype_thruput" | eval MB=kb/1024 | chart sum(MB) avg(eps) over series
      
  • Check splunkd.log for errors.

  • Check for server performance metrics (CPU, memory usage, disk I/O, etc.)

  • The SOS (Splunk on Splunk) app is installed to check for warnings and errors (on the dashboard)

  • Too many saved searches can consume excessive system resources.

  • Install and enable Firebug browser extension to reveal what happens when logging into Splunk. Then enable and switch to the “Net” panel to view time spent in HTTP requests and responses

  • use btool to troubleshoot configuration files.

Each search is recorded as a .csv of search results and a search.log in a folder within: $SPLUNK_HOME/var/run/splunk/dispatch

The waiting period before each dispatch directory is deleted is controlled by limits.conf.

If user requests saving, they are deleted after 7 days.

To add folder access logs:

  1. Enable Object Access Audit through group policy on the Windows machine on which the folder is located.
  2. Enable auditing on the specific folder for which logs are monitored.
  3. Install Splunk Universal Forwarder on the Windows machine.
  4. Configure Universal Forwarder to send security logs to Splunk Indexer.

Search terms

Ideally, stats commands are used when unique IDs are available for use because they have higher performance.

But sometimes the unique ID (from one or more fields) alone is not sufficient to discriminate among transactions, such as when web sessions are identified by a cookie/client IP. In that case, transaction commands reference raw text serving as message identifier may be used to begin and end transactions.

stats commands generate summary statistics of all existing fields in search results as values in new fields.

EventStats commands aggregates to original raw data when event stats

Regular Expressions

There are two ways to do Regular Expressions, such as extracting an IP address:

  • rex field = raw “(&LT;ip_address>\d+.\d+.\d+.\d+)”

  • rex field = raw “(&LT;ip_address>([0-9][{1,3}[.]){3}{0-9}{0,3})”

  1. To disable search history, delete file:

    $SPLUNK_HOME/var/log/splunk/searches.log

Map Reduce Alogorithm

The mechanism that enable Splunk’s fast data searching is that Splunk adapted the map() and reduce() alogorithms that functional programming uses for large-scale batch parallelization processing to streaming.


Book

Splunk Operational Intelligence Cookbook By Josh Diakun, Paul R Johnson, Derek Mock

Installs and Configures Splunk forwarders and servers

REST API

http://dev.splunk.com/restapi

https://github.com/cerner/cerner_splunk https://github.com/search?utf8=%E2%9C%93&q=splunk&type=

Blazemeter

Blazemeter has additional software that only works on their cloud platform.


splunk cloud Fundamentals 1

Splunk offers a free class called Fundamentals 1 to get people to get started in Splunk and get certified in Core User

Get FREE Access for one month at https://education.splunk.com/course/splunk-infrastructure-overview

https://docs.splunk.com/Documentation/Splunk/latest/Installation/Systemrequirements Splunk Web front-end runs on 8000 for modern browsers Splunk listens on port 8065 bound to loopback interface KV store uses port 8191 https://docs.splunk.com/Documentation/Splunk/latest/installation/RunSplunkasadifferentornon-rootuser In any directory (/opt/)

Splunk server components: all running splunkd written in C/C++ on SSL port 8089 for mgmt

  • Forwarders on servers where data originates forwards data to Splunk indexers
  • Indexers receive metrics to store in a Splunk index containing directories organized by age
  • Search heads handles search request language on indexers, then consolidate results in reports and dashboards of visualizations Knowledge Objects extracts additional fields and transform data.

Each instance indexes less than 20GB per day for under 20 users using a small number of forwarders.

  • Indexers: 2 64-bit CPU with 6x2GHz cores, 12GB RAM, 1GbE NIC, 800 IOPS
  • Search heads: 4 64-bit CPU with 4x2GHz cores, 12GB RAM, 1GbE NIC, 2 x 10K RPM 300GB SAS drives - RAID-1
  • Forwarders: 1 64-bit CPU with 2x1.5GHz cores, 1GB RAM

Supporting search head with 3 indexers up to 100GB per day up to 100 users using several hundred forwarders. Add 3 search heads under a search cluster to distribute requests.

An Index cluster replicates index data promotes availability and prevents data loss. https://docs.splunk.com/Documentation/Splunk/latest/installation/ChoosetheuserSplunkshouldrunas

Commands:

	./splunk help
	./splunk enable boot-start -user ubuntu
	./splunk start --accept-license
	./splunk stop
	./splunk restart

User: admin/changeme

Turn off transparent huge pages ?

Other components:

  • License master
  • Deployment server
  • Cluster manager

Splunk scales instances: Input, Parsing, Indexing, Searching

Forwarder Management

  1. Introduction
  2. What is Splunk?
  3. Introduction to Splunk’s interface
  4. Basic searching
  5. Using fields in searches
  6. Search fundamentals
  7. Transforming commands
  8. Creating reports and dashboards
  9. Datasets
  10. The Common Information Model (CIM)
  11. Creating and using lookups
  12. Scheduled Reports
  13. Alerts
  14. Using Pivot


Competitors

Datadog SaaS.

Sumo Logic provides a paid alternative only as a public cloud-based service.

The ELK stack (ElasticSearch, LogStash for log gathering, and Kibana for visualization) provides both free on-premises and paid cloud offerings.

Loggly and LogLogic also.

For visualization there is also Graphite, Librato, and DataDog.

QRadar is a SIEM product from IBM. VIDEO


Video tutorials

https://www.tutorialspoint.com/splunk/index.htm

Installing and Configuring Splunk

Pluralsight video course: Optimizing Fields, Tags, and Event Types in Splunk [1h 36m] 28 Feb 2019 by Joe Abraham (@jobabrh, jobabrahamtech.com) is based on Splunk version 7.2.1

Performing Basic Splunk Searches

Analyzing Machine Data with Splunk


References

By Intellipaat:

By Kinney Group:

Gerald Auger, PhD - Simply Cyber YouTube channel references home lab build by Eric Capuano of Recon Infosec YouTube channel. Uses Lima Charlie for threat hunting.

Social

https://www.linkedin.com/company/splunk/

https://twitter.com/splunk #TurnDataIntoDoing

Splunk online documentation: http://docs.splunk.com/Documentation/Splunk

Splunkbase community: https://community.splunk.com

Splunk Community Slack (splk.it/slack)

Splunk User Groups (usergroups.splunk.com)

http://conf.splunk.com/ July 17-20, 2023 | Las Vegas https://twitter.com/hashtag/splunkconf23 $1,695

References

https://www.devopsschool.com/tutorial/splunk/labs/fundamental/SplunkFundamentals1_module4.pdf