Wilson Mar bio photo

Wilson Mar


Email me Calendar Skype call

LinkedIn Twitter Gitter Instagram Youtube

Github Stackoverflow Pinterest

Analyze without agents

US (English)   Español (Spanish)   Français (French)   Deutsch (German)   Italiano   Português   Estonian   اَلْعَرَبِيَّةُ (Egypt Arabic)   中文 (简体) Chinese (Simplified)   日本語 Japanese   한국어 Korean


Here are my notes toward building an “unsupervised” machine-learning framework to identify patterns in various logs.

Logs are produced by each program components:

https://sematext.github.io/logagent-js/parser/ detects log formats based on a pattern library (yaml file) and converts it to a JSON Object.

Trend Visualizations

The value to keeping logs is to provide insights to what is being logged.

That is usually about the pattern (trends) over time.

SIEM systems collect and analyze logs over time to detect persistent threats.

Google’s Site Reliability Engineering book identified these quantitative data about a system:

  • Server lifetimes
  • Processing times = Latency
  • Traffic volume
  • Query counts and types = Saturation (to capacity)
  • Error counts and types

Each of the 4 “golden signals” consist of measures at various points in the system.

Latency is measured at different points in the system:

  • Time to first response
  • Page load by end-users
  • Requests queuing waiting for a thread
  • Query duration
  • Service reponse time
  • Transaction duration
  • Time to complete data return

Traffic is a key denominator for calculating infrastructure spend.

  • Dollars cost per transaction
  • HTTP requests per second
  • Number of transactions per second
  • Number of retrievals per second from the database

  • Network I/O
  • Number of concurrent sessions
  • Number of active requests
  • Number of active connections

  • Number of write opps
  • Number of read ops

Saturation metrics measure the utilization of the capacity in various components of the system:

  • % memory utilization
  • % thread pool utilization
  • % cache utilization
  • % disk utilization
  • % CPU utilization
  • % disk free space

  • Disk quota
  • Memory quota
  • Number of available connections
  • Number of users on the system


  • Incorrect content or wrong answers
  • Number of HTTP errors (400 & 500 series)
  • Number of failed requests
  • Number of exceptions
  • Number of stack traces generated
  • Number of servers that fail liveness checks
  • Number of dropped connections in the network

  • Each SLI (Service Level Indicator) is a ratio (percent) of good events divided by all valid events, as in 99% good!
  • Each SLO (Service Level Objective) is an internal expectation of employees
  • Each SLA (Service Level Agreement) is an agreement with customers

Alerts from Monitoring

Knowing trends enable detection of anomalies occuring. For example, violations of predefined SLOs and SLAs.

Monitoring also enables analysis of incident response.

System logs

Microsoft System Logs can be parsed using http://logparserplus.com/Article

Web server logs

Web servers such as Apache, IIS, NGINX, etc. store an entry for each HTTP and file (resource) query.

Apache and others create logs in a W3C-defined format.

A trivial sample is provided at data/apache.access.log.

A fuller example is provided at http://www.monitorware.com/en/logsamples/apache.php

A parser and model for the log file: See ApacheAccessLog.java.

See https://databricks.gitbooks.io/databricks-spark-reference-applications/content/logs_analyzer/chapter1/spark.html

A configuration file specifies what fields are output in the log.

https://github.com/rory/apache-log-parser is written in Python. http://codereview.stackexchange.com/questions/68846/someone-thinks-poorly-of-my-server-log-parser

https://awstats.sourceforge.io/ is written in Perl with an architecture that enables plug-ins for additional functionality.




MS Log Parser for SQL

Microsoft Log Parser provides SQL-like query access to text-based data such as log files, XML files and CSV files, as well as key data sources on the Windows® operating system such as the Event Log, the Registry, the file system, and Active Directory®. It was created for Windows 2000, Windows Server 2003, Windows XP Professional Edition.

$31 http://lizard-labs.com/log_parser_lizard.aspx provides a GUI to the command-line access to a “Swiss Army Knife”


  • https://blogs.msdn.microsoft.com/carlosag/2010/03/25/analyze-your-iis-log-files-favorite-log-parser-queries/

  • https://blog.codinghorror.com/microsoft-logparser/

  • http://www.symantec.com/connect/articles/forensic-log-parsing-microsofts-logparser

  • https://technet.microsoft.com/en-us/library/ee692659.aspx

  • https://www.codeproject.com/articles/13504/simple-log-parsing-using-ms-log-parser-in-c-ne

  • Microsoft Log Parser Toolkit: A Complete Toolkit for Microsoft’s by Gabriele Giuseppini, Mark Burnett

  • https://www.simple-talk.com/blogs/using-logparser-part-1/

QUESTION: Its equivalent for Linux?

Perfmon logs

MS PAL (Performance Analysis of Logs)


It makes use of PowerShell v2.0 or greater which uses Microsoft Chart Controls for Microsoft .NET Framework 3.5 Service Pack 1

Custom application logs

Code to output logs


Log gathering

Due to their size, systems “rotate” logs. When the allocated disk space for each file is used up, “rollover” to a new file name.

Log parsing


Utah parser (Java)

is a Java library for parsing semi-structured text files to JSON maps based on an XML configuration ‘template’ file which are applied to lines that satisfies a specific regular expression.

uses Python.


http://openrefine.org/ by Google is a free, open source, powerful program for working with messy data. It runs on your desktop (not a SaaS web service).

Clean-up Field values

Text facets groups together cells and provides a convenient way to group various values into a single one.

The tool also has a way to apply common transforms such as removing trailing spaces.

References: Packt BOOK: Using OpenRefine, by Ruben Verborgh and Max De Wilde,

More on Security

This is one of a series on Security in DevSecOps:

  1. Azure Security-focus Cloud Onramp

  2. AWS Security (certification exam)
  3. AWS IAM (Identity and Access Management)


  5. SOC2
  6. FedRAMP
  7. CAIQ (Consensus Assessment Initiative Questionnaire) by cloud vendors

  8. Git Signing
  9. Hashicorp Vault
  10. OPA (Open Policy Agent)

  11. SonarQube
  12. WebGoat known insecure PHP app and vulnerability scanners
  13. Test for OWASP using ZAP on the Broken Web App

  14. Encrypt all the things

  15. Cyber Security
  16. Security certifications