Wilson Mar bio photo

Wilson Mar

Hello!

Email me Calendar Skype call

LinkedIn Twitter Gitter Instagram Youtube

Github Stackoverflow Pinterest

Logging indexing and visualization

The company

Splunk has been around since 2003 and is firmly entrenched in many datacenters. As of September 2020, Splunk’s client list includes 92 companies on the Fortune 100 list.[34]

NOTE: Content here are my personal opinions, and not intended to represent any employer (past or present). “PROTIP:” here highlight information I haven’t seen elsewhere on the internet because it is hard-won, little-know but significant facts based on my personal research and experience.

Splunk is “Google” for machine-generated data. Splunk is a software utility for machine log data collection, indexing, and visualization for “operational intelligence”. Splunk can ingest almost all technologies (on-prem, clouds, databases, etc.) for use by SOC (Security Operations Centers) who correlate what’s going on across the vast landscape of technologies.

  • Collect and Index Log Data: Index streaming log data from all your distributed systems regardless of format or location.

  • Visualize Trends

  • Zoom in and out on timelines to automatically reveal trends, spikes and patterns and click to drill down into search results.

  • Issue alerts (based on AIOps)

Splunk, Inc. is headquartered in San Francisco, California. 270 Brannan St, 94107. +1 415.848-8400.

It IPO’d in 2012 as ticker SPLK.

https://www.wikiwand.com/en/Splunk notes that according to Glassdoor, it was the fourth highest-paying company for employees in the United States in April 2017. In 2020, Splunk was named to the Fortune 1000 list.

On June 11, 2018, Splunk announced its acquisition of VictorOps, a DevOps incident management startup, for US$120 million. The VictorOps product is renamed to “Splunk Online”.

In October 2019, Splunk announced the integration of its security tools - including security information and event management (SIEM), user behavior analytics (UBA), and security orchestration, automation, and response (Splunk Phantom) — into the new Splunk Mission Control.

In 2019, Splunk introduced an application performance monitoring (APM) platform, SignalFx Microservices APM, that pairs “no-sample” monitoring and analysis features with Omnition’s full-fidelity tracing capabilities. Splunk also announced that a capability called Kubernetes Navigator would be available through their product, SignalFx Infrastructure Monitoring.

Also in 2019, Splunk announced new Data Fabric Search and Data Stream Processor. Data Fabric Search that combines into a single view datasets across different data stores, including those that are not Splunk-based. The required data structure is only created when a query is run. The real-time Data Stream Processor collects data from various sources and then distributes results to Splunk or other destinations. It allows role-based access to create alerts and reports based on data that is relevant for each individual.[61] In 2020, it was updated to allow it to access, process, and route real-time data from multiple cloud services.[62]

Also in 2019, Splunk rolled out Splunk Connected Experiences, which extends its data processing and analytics capabilities to augmented reality (AR), mobile devices, and mobile applications.

In 2020, Splunk announced Splunk Enterprise 8.1 and the Splunk Cloud edition. They include stream processing, machine learning, and multi-cloud capabilities.

Competitors

Sumo Logic provides a paid alternative only as a public cloud-based service.

The ELK stack (ElasticSearch, LogStash for log gathering, and Kibana for visualization) provides both free on-premises and paid cloud offerings.

Loggly and LogLogic also.

For visualization there is also Graphite, Librato, and DataDog.

Social

https://twitter.com/splunk #TurnDataIntoDoing

Splunk online documentation: http://docs.splunk.com/Documentation/Splunk

Splunkbase community: https://community.splunk.com

http://conf.splunk.com/ in June https://twitter.com/hashtag/splunkconf22

OReilly.com had a video tutorial.

https://www.linkedin.com/company/splunk/

Tutorials

Fundamentals courses are FREE at https://education.splunk.com/catalog?category=splunk-fundamentals-part-1

https://www.javatpoint.com/splunk text tutorial

VIDEO: Explaining Splunk Architecture Basics

Different Proprietary Editions

Splunk offers proprietary products.

https://github.com/splunk

There is a different set of installation and set-up instructions depending on the edition.

  1. Go to the Download page:

    https://www.splunk.com/en_us/download.html

    Note that Splunk now offers 14-day trials on their products rather than a freemium.

    SOAR (Orchestration and Automation)

  2. Although Splunk offers a free Community Edition of SOAR (to automate tasks, orchestrate workflows, and reduce incident response time). You fill out a form for their approval before allowing you to download.

    https://my.phantom.us/signup/

    NOTE: “Phantom” was the previous name for the SOAR (cloud) product.

    NOTE: Splunk SOAR (Cloud) does not allow access from the Splunk Connected Experiences mobile apps.

    Splunk SOAR (Cloud) supports SAML2 authentication.

  3. The SOAR marketing page: References:

    • https://www.splunk.com/en_us/software/splunk-security-orchestration-and-automation.html
    • https://www.splunk.com/en_us/data-insider/what-is-soar.html
    • https://www.splunk.com/en_us/blog/security/soaring-to-the-clouds-with-splunk-soar.html
    • https://docs.splunk.com/Documentation/SOAR/current/ServiceDescription/SplunkSOARService

    • Investigate and respond to threats faster
    • Increase SOC efficiency and productivity
    • Eliminate analyst grunt work so you can stop working hard and start working smarter
    • Go from overwhelmed to in-control of your security operations

    Splunk SOAR’s Main Dashboard provides an overview of all your data and activity, notable events, playbooks, connections with other security tools, workloads, ROI, and so much more.

    SOAR Playbooks

    VIDEO: https://www.splunk.com/en_us/software/splunk-security-orchestration-and-automation/features.html

    Users can build and edit playbooks in the original horizontal visual playbook editor or the vertical visual playbook editor introduced August 2021.

    Automation is defined in Splunk SOAR playbooks which execute a sequence of actions.

    Splunk SOAR comes with 100+ pre-made playbooks out of the box.

    Splunk SOAR (Cloud) is provisioned with 600GB of disk space and 600GB of PostgreSQL database storage.

    Enrichments

    SOAR uses Splunk Intelligence Management (formerly TruSTAR) normalized indicator enrichment, captured within the notes of a container. It enables an analyst to view details and specify subsequent actions directly within a single Splunk SOAR prompt for rapid manual response.

    The “Suspicious Email Domain Enrichment” playbook uses Cisco Umbrella Investigate to add to the security event in Spunk SOAR a risk score, risk status, and domain category. This enables faster recognition of the purpose of the email, and the domain enrichment will also provide a connection point to take further action on the output.

Other downloads:

NOTE: Index 500 MB/Day.

  1. Previously: to download:

    wget -O splunk-7.1.0-2e75b3406c5b-darwin-64.tgz 'https://www.splunk.com/bin/splunk/DownloadActivityServlet?architecture=x86&platform=macos&version=7.1.0&product=splunk&filename=splunk-7.1.0-2e75b3406c5b-darwin-64.tgz&wget=true'

    The MD5 is at, for the version at time of writing: https://download.splunk.com/products/splunk/releases/7.1.0/osx/splunk-7.1.0-2e75b3406c5b-darwin-64.tgz.md5

  2. If you don’t have an account, register.

  3. You may have to copy and paste the URL from above to get back to the page.

    • https://www.splunk.com/en_us/training/videos/all-videos.html

    • http://docs.splunk.com/Documentation/Splunk/latest/Installation

    • https://www.splunk.com/pdfs/solution-guides/splunk-quick-reference-guide.pdf

For release notes, refer to the Known issues in the Release Notes manual:

  • http://docs.splunk.com/Documentation/Splunk/latest/ReleaseNotes/Knownissues
  • http://docs.splunk.com/Documentation/SplunkCloud/6.6.0/SearchReference/Commandsbycategory

Architecture

Splunk offers ingestion in streaming mode (not batch).

Splunk stores data in indexes organized in directories and files.

Splunk compresses data in flat files using their own proprietary format which are read by SPL (Splunk Processing Language).

Splunk apps have a preconfigured visual app UI.
Splunk add-ons do not have a preconfigured visual UI app (headless).

Configuration

Splunk default configurations are stored at $splunkhome/etc/system/default

  1. To disable Splunk Launch Messages in splunk_launch.conf

    OFFENSIVE=Less

Components, Licenses, Default Ports

Splunk licenses charge by how much data can be indexed per calendar day (midnight to midnight).

Deployment Server manages Splunk components in a distributed environment.

Each Cluster Member for index replication is licensed.

Each forwarder forwards logs to the Splunk Indexer:

  • Universal Forwarder (UF)
  • Heavyweight Forwarder (HWF) parses data (so not recommended for production systems)

  • 8000 - Splunk web port - the Search Head providing GUI for distributed searching

    • Search head cluster is more reliable and efficient than (older) search head pooling (to be deprecated).
    • Search head cluster is managed by a captain, which controls its slaves (legacy terminology)

  • 8080 - Splunk Index Replication port by the Indexer
  • 8089 - Splunk Management GUI port

    https://yoursplunkhost:8089/services/admin/inputstatus
  • 8191 - Splunk KV Store

  • 9997 - Splunk Indexing port
  • 514 - Splunk Network port

Not available in free versions:

  • Authentication
  • Distributed search. Scheduled searches and alerting
  • Forwarding in TCP/HTTP (to non-Splunk)
  • Deployment management

PROTIP: A free Splunk Enterprise license allows indexing of up to 500MB per day for 60 days. After that, convert to a perpetual Free license or purchase an Enterprise license.

The master pool quota aggregates a license pools for each index/source type, with its own sub-quota.

Add-ons

https://splunkbase.splunk.com/app/3138/ 3D Scatterplot - Custom Visualization is built with plotly.js, which combines WebGL and d3.js. So you can zoom, rotate, and orbit around the points, change aspect ratios, colors, sizes, opacity, labels, etc.

Currently, this visualization supports 50,000 points and does not limit your categorical values. Download the app to see some examples.

  1. Disalbe Start the daemon:

    $SPLUNK_HOME/bin/splunk disable boot-start
  2. boot Start the daemon:

    $SPLUNK_HOME/bin/splunk enable boot-start
  3. Start

    splunk start splunkweb

    Start daemon:

    splunk start splunkd
  4. Verify process started (by name):

    ps aux | grep splunk

To reset admin password (v7.1+):

  1. stop Splunk process.

  2. Find the passwd file and rename it “passwd.bk”.

  3. In directory: $SPLUNK_HOME/etc/system/local
  4. Create file user-seed.conf containing:

    [user_info]
    PASSWORD = NEW_PASSWORD
    
  5. Create file ui-prefs.conf containing this to have all search app users see : today

    [search]
    dispatch_earliest_time = @d
    dispatch_latest_time = now
    
  6. Start the server.

Precedence:

  1. system local directory have highest priority

  2. App local directories
  3. App default directories
  4. System default directory have lowest priority

Sample data (up to 500 MB) can be obtained free from Kaggle, such as Titanic passengers. See https://www.javatpoint.com/splunk-data-ingestion

Ingestion (Fishbucket) to avoid duplicate indexing

To prevent splunk from re-ingesting files it’s already processed (such as a directory full of logs).

Access them through the GUI by searching for:

index=_thefishbucket

Splunk UFs & HFs track what files it’s ingested - via monitor, batch, or oneshot - through an internal index called the fishbucket in default folder:

    /opt/splunk/var/lib/splunk That folder contains seek pointers and CRCs for files being indexed, for splunkd to tell whether each has been read.

The fishbucket index contains pairs of file paths & checksums of ingested files, as well as some metadata.

It also prevents re-ingestion if a first ingestion has somehow gone wrong (wrong index, wrong parsing, etc.).

  1. To bypass this limitation, it is possible to delete the entire fishbucket off the filesystem. But this is very much less than ideal - it may cause other files to be re-ingested. Instead, there is a ‘clean’ command that can excise the record of a particular file. Run this while logged in as the splunk user:

    splunk cmd btprobe -d $SPLUNK_HOME/var/lib/splunk/fishbucket/splunk_private_db --file /path/to/file.log --reset
    
  2. After removal, re-ingest the file:

    splunk add oneshot -source /path/to/file.log -sourcetype ST -index IDX -host HOSTNAME
    

The most common use case for this is testing index-time props/transforms - date/time extraction, line-breaking, etc. Using a sample log, ingest the file, check the parsing logic via search, then either fix the props and clean & reingest as necessary, or continue onboarding normally.

https://docs.splunk.com/Documentation/AddOns/released/Linux/Configure collectd_html format

Forwarder

http://docs.splunk.com/Documentation/Splunk/6.2.5/Data/Setupcustominputs

Splunk places indexed data in buckets (physical directories) each containing events of a specific period.

Over time, each bucket changes stages as it ages:

  • One or more buckets are hot when newly indexed and open for writing.
  • Warm buckets contain data rolled out of hot buckets
  • Cold buckets contain data rolled out of warm buckets
  • Frozen buckets contain data from cold buckets. They are not searchable, and deleted (or archived) by the indexer

To troublesheet Splunk performance issues

  • Watch Splunk metrics log in real time:

    index="_internal" source="metrics.log" group="per_sourcetype_thruput"
    series="<your_sourcetype_here&T;"
    eval MB=kb/1024 | chart sum(MB)
      

    Alternately, watch everthing, split by source type:

    index="_internal" source="metrics.log" group="per_sourcetype_thruput" | eval MB=kb/1024 | chart sum(MB) avg(eps) over series
      
  • Check splunkd.log for errors.

  • Check for server performance metrics (CPU, memory usage, disk I/O, etc.)

  • The SOS (Splunk on Splunk) app is installed to check for warnings and errors (on the dashboard)

  • Too many saved searches can consume excessive system resources.

  • Install and enable Firebug browser extension to reveal what happens when logging into Splunk. Then enable and switch to the “Net” panel to view time spent in HTTP requests and responses

  • use btool to troubleshoot configuration files.

Each search is recorded as a .csv of search results and a search.log in a folder within: $SPLUNK_HOME/var/run/splunk/dispatch

The waiting period before each dispatch directory is deleted is controlled by limits.conf.

If user requests saving, they are deleted after 7 days.

To add folder access logs:

  1. Enable Object Access Audit through group policy on the Windows machine on which the folder is located.
  2. Enable auditing on the specific folder for which logs are monitored.
  3. Install Splunk Universal Forwarder on the Windows machine.
  4. Configure Universal Forwarder to send security logs to Splunk Indexer.

Search terms

Ideally, stats commands are used when unique IDs are available for use because they have higher performance.

But sometimes the unique ID (from one or more fields) alone is not sufficient to discriminate among transactions, such as when web sessions are identified by a cookie/client IP. In that case, transaction commands reference raw text serving as message identifier may be used to begin and end transactions.

stats commands generate summary statistics of all existing fields in search results as values in new fields.

EventStats commands aggregates to original raw data when event stats

Regular Expressions

There are two ways to do Regular Expressions, such as extracting an IP address:

  • rex field = raw “(&LT;ip_address>\d+.\d+.\d+.\d+)”

  • rex field = raw “(&LT;ip_address>([0-9][{1,3}[.]){3}{0-9}{0,3})”

  1. To disable search history, delete file:

    $SPLUNK_HOME/var/log/splunk/searches.log

Map Reduce Alogorithm

The mechanism that enable Splunk’s fast data searching is that Splunk adapted the map() and reduce() alogorithms that functional programming uses for large-scale batch parallelization processing to streaming.


Book

Splunk Operational Intelligence Cookbook By Josh Diakun, Paul R Johnson, Derek Mock

Installs and Configures Splunk forwarders and servers

REST API

http://dev.splunk.com/restapi

https://github.com/cerner/cerner_splunk https://github.com/search?utf8=%E2%9C%93&q=splunk&type=

Blazemeter

Blazemeter has additional software that only works on their cloud platform.

Video tutorials

https://www.tutorialspoint.com/splunk/index.htm

Installing and Configuring Splunk

Pluralsight video course: Optimizing Fields, Tags, and Event Types in Splunk [1h 36m] 28 Feb 2019 by Joe Abraham (@jobabrh, jobabrahamtech.com) is based on Splunk version 7.2.1

Performing Basic Splunk Searches

Analyzing Machine Data with Splunk

splunk cloud Fundamentals 1

Splunk offers a free class called Fundamentals 1 to get people to get started in Splunk and get certified in Core User

Get FREE Access for one month at https://education.splunk.com/course/splunk-infrastructure-overview

https://docs.splunk.com/Documentation/Splunk/latest/Installation/Systemrequirements Splunk Web front-end runs on 8000 for modern browsers Splunk listens on port 8065 bound to loopback interface KV store uses port 8191 https://docs.splunk.com/Documentation/Splunk/latest/installation/RunSplunkasadifferentornon-rootuser In any directory (/opt/)

Splunk server components: all running splunkd written in C/C++ on SSL port 8089 for mgmt

  • Forwarders on servers where data originates forwards data to Splunk indexers
  • Indexers receive metrics to store in a Splunk index containing directories organized by age
  • Search heads handles search request language on indexers, then consolidate results in reports and dashboards of visualizations Knowledge Objects extracts additional fields and transform data.

Each instance indexes less than 20GB per day for under 20 users using a small number of forwarders.

  • Indexers: 2 64-bit CPU with 6x2GHz cores, 12GB RAM, 1GbE NIC, 800 IOPS
  • Search heads: 4 64-bit CPU with 4x2GHz cores, 12GB RAM, 1GbE NIC, 2 x 10K RPM 300GB SAS drives - RAID-1
  • Forwarders: 1 64-bit CPU with 2x1.5GHz cores, 1GB RAM

Supporting search head with 3 indexers up to 100GB per day up to 100 users using several hundred forwarders. Add 3 search heads under a search cluster to distribute requests.

An Index cluster replicates index data promotes availability and prevents data loss. https://docs.splunk.com/Documentation/Splunk/latest/installation/ChoosetheuserSplunkshouldrunas

Commands:

	./splunk help
	./splunk enable boot-start -user ubuntu
	./splunk start --accept-license
	./splunk stop
	./splunk restart

User: admin/changeme

Turn off transparent huge pages ?

Other components:

  • License master
  • Deployment server
  • Cluster manager

Splunk scales instances: Input, Parsing, Indexing, Searching

Forwarder Managerment

  1. Introduction
  2. What is Splunk?
  3. Introduction to Splunk’s interface
  4. Basic searching
  5. Using fields in searches
  6. Search fundamentals
  7. Transforming commands
  8. Creating reports and dashboards
  9. Datasets
  10. The Common Information Model (CIM)
  11. Creating and using lookups
  12. Scheduled Reports
  13. Alerts
  14. Using Pivot

Module 1 – Introduction § Overview of Buttercup Games Inc.

Module 2 – What is Splunk? § Splunk components § Installing Splunk § Getting data into Splunk

Module 3 – Introduction to Splunk’s User Interface § Understand the uses of Splunk § Define Splunk Apps § Customizing your user settings § Learn basic navigation in Splunk

Module 4 – Basic Searching § Run basic searches § Use autocomplete to help build a search § Set the time range of a search § Identify the contents of search results § Refine searches § Use the timeline § Work with events § Control a search job § Save search results

Module 5 – Using Fields in Searches § Understand fields § Use fields in searches § Use the fields sidebar

Module 6 – Search Language Fundamentals § Review basic search commands and general search practices § Examine the search pipeline § Specify indexes in searches § Use autocomplete and syntax highlighting § Use the following commands to perform searches: o tables o rename o fields o dedup o sort

Module 7 – Using Basic Transforming Commands § The top command § The rare command § The stats command

Module 8 – Creating Reports and Dashboards § Save a search as a report § Edit reports § Create reports that include visualizations such as charts and tables § Create a dashboard § Add a report to a dashboard § Edit a dashboard

Module 9 – Datasets and the Common Information Model § Naming conventions § What are datasets? § What is the Common Information Model (CMI)?

Module 10 – Creating and Using Lookups § Describe lookups § Create a lookup file and create a lookup definition § Configure an automatic lookup

Module 11 – Creating Scheduled Reports and Alerts § Describe scheduled reports § Configure scheduled reports § Describe alerts § Create alerts § View fired alerts

Module 12 - Using Pivot § Describe Pivot § Understand the relationship between data models and pivot § Select a data model object § Create a pivot report § Create an instant pivot from a search § Add a pivot report to a dashboard

References

By Intellipaat:

By Kinney Group: