Analyze without agents
Overview
Here are my notes toward building an “unsupervised” machine-learning framework to identify patterns in various logs.
Logs are produced by each program components:
- Operating system logs
- Web Server logs
- Perfmon
- Linux top
- Custom application logs to commemorate specific events such as an invoice being sent or other business transaction being processed.
https://sematext.github.io/logagent-js/parser/ detects log formats based on a pattern library (yaml file) and converts it to a JSON Object.
Trend Visualizations
The value to keeping logs is to provide insights to what is being logged.
That is usually about the pattern (trends) over time.
SIEM systems collect and analyze logs over time to detect persistent threats.
Google’s Site Reliability Engineering book identified these quantitative data about a system:
- Server lifetimes
- Processing times = Latency
- Traffic volume
- Query counts and types = Saturation (to capacity)
- Error counts and types
Each of the 4 “golden signals” consist of measures at various points in the system.
Latency is measured at different points in the system:
- Time to first response
- Page load by end-users
- Requests queuing waiting for a thread
- Query duration
- Service reponse time
- Transaction duration
- Time to complete data return
Traffic is a key denominator for calculating infrastructure spend.
- Dollars cost per transaction
- HTTP requests per second
- Number of transactions per second
-
Number of retrievals per second from the database
- Network I/O
- Number of concurrent sessions
- Number of active requests
-
Number of active connections
- Number of write opps
- Number of read ops
Saturation metrics measure the utilization of the capacity in various components of the system:
- % memory utilization
- % thread pool utilization
- % cache utilization
- % disk utilization
- % CPU utilization
-
% disk free space
- Disk quota
- Memory quota
- Number of available connections
- Number of users on the system
Errors:
- Incorrect content or wrong answers
- Number of HTTP errors (400 & 500 series)
- Number of failed requests
- Number of exceptions
- Number of stack traces generated
- Number of servers that fail liveness checks
-
Number of dropped connections in the network
- Each SLI (Service Level Indicator) is a ratio (percent) of good events divided by all valid events, as in 99% good!
- Each SLO (Service Level Objective) is an internal expectation of employees
- Each SLA (Service Level Agreement) is an agreement with customers
Alerts from Monitoring
Knowing trends enable detection of anomalies occuring. For example, violations of predefined SLOs and SLAs.
Monitoring also enables analysis of incident response.
System logs
Microsoft System Logs can be parsed using http://logparserplus.com/Article
Web server logs
Web servers such as Apache, IIS, NGINX, etc. store an entry for each HTTP and file (resource) query.
Apache and others create logs in a W3C-defined format.
A trivial sample is provided at data/apache.access.log.
A fuller example is provided at http://www.monitorware.com/en/logsamples/apache.php
A parser and model for the log file: See ApacheAccessLog.java.
See https://databricks.gitbooks.io/databricks-spark-reference-applications/content/logs_analyzer/chapter1/spark.html
A configuration file specifies what fields are output in the log.
https://github.com/rory/apache-log-parser is written in Python. http://codereview.stackexchange.com/questions/68846/someone-thinks-poorly-of-my-server-log-parser
https://awstats.sourceforge.io/ is written in Perl with an architecture that enables plug-ins for additional functionality.
https://wiki.jenkins-ci.org/display/JENKINS/Log+Parser+Plugin
http://alvinalexander.com/scala/scala-apache-access-log-parser-library-java-jvm
https://easyengine.io/tutorials/nginx/log-parsing/
MS Log Parser for SQL
Microsoft Log Parser provides SQL-like query access to text-based data such as log files, XML files and CSV files, as well as key data sources on the Windows® operating system such as the Event Log, the Registry, the file system, and Active Directory®. It was created for Windows 2000, Windows Server 2003, Windows XP Professional Edition.
$31 http://lizard-labs.com/log_parser_lizard.aspx provides a GUI to the command-line access to a “Swiss Army Knife”
-
https://blogs.msdn.microsoft.com/carlosag/2010/03/25/analyze-your-iis-log-files-favorite-log-parser-queries/
-
https://blog.codinghorror.com/microsoft-logparser/
-
http://www.symantec.com/connect/articles/forensic-log-parsing-microsofts-logparser
-
https://technet.microsoft.com/en-us/library/ee692659.aspx
-
https://www.codeproject.com/articles/13504/simple-log-parsing-using-ms-log-parser-in-c-ne
-
Microsoft Log Parser Toolkit: A Complete Toolkit for Microsoft’s by Gabriele Giuseppini, Mark Burnett
-
https://www.simple-talk.com/blogs/using-logparser-part-1/
QUESTION: Its equivalent for Linux?
Perfmon logs
MS PAL (Performance Analysis of Logs)
https://pal.codeplex.com/
It makes use of PowerShell v2.0 or greater which uses Microsoft Chart Controls for Microsoft .NET Framework 3.5 Service Pack 1
Custom application logs
Code to output logs
https://www.arcgis.com/home/item.html?id=90134fb0f1c148a48c65319287dde2f7
Log gathering
Due to their size, systems “rotate” logs. When the allocated disk space for each file is used up, “rollover” to a new file name.
Log parsing
http://stackoverflow.com/questions/3328688/need-some-ideas-on-how-to-code-my-log-parser
Utah parser (Java)
https://github.com/sonalake/utah-parser
is a Java library for parsing semi-structured text files to JSON maps
based on an XML configuration ‘template’ file
which are applied to lines that satisfies a specific regular expression.
https://github.com/google/textfsm
uses Python.
GoogleRefine.org
http://openrefine.org/ by Google is a free, open source, powerful program for working with messy data. It runs on your desktop (not a SaaS web service).
Clean-up Field values
Text facets groups together cells and provides a convenient way to group various values into a single one.
The tool also has a way to apply common transforms such as removing trailing spaces.
References: Packt BOOK: Using OpenRefine, by Ruben Verborgh and Max De Wilde,
More on Security
This is one of a series on Security in DevSecOps:
- Security actions for teamwork and SLSA
- Code Signing on macOS
- Git Signing
- GitHub Data Security
- Azure Security-focus Cloud Onramp
- AWS Onboarding
- AWS Security (certification exam)
- AWS IAM (Identity and Access Management)
- SIEM (Security Information and Event Management)
- Intrusion Detection Systems (Goolge/Palo Alto)
- SOC2
- FedRAMP
-
CAIQ (Consensus Assessment Initiative Questionnaire) by cloud vendors
- AKeyless cloud vault
- Hashicorp Vault
- Hashicorp Terraform
- SonarQube
- WebGoat known insecure PHP app and vulnerability scanners
- Security certifications
- Quantum Supremecy can break encryption in minutes
- Pen Testing
- Threat Modeling
- WebGoat (deliberately insecure Java app)