Making sense of semi-structured data
Overview
Utah parser
https://github.com/sonalake/utah-parser
is a Java library for parsing semi-structured text files to JSON maps
based on an XML configuration ‘template’ file
which are applied to lines that satisfies a specific regular expression.
https://github.com/google/textfsm
uses Python.
Clean-up Field values
http://openrefine.org/ by Google is a free, open source, powerful program for working with messy data. It runs on your desktop (not a SaaS web service).
Text facets groups together cells and provides a convenient way to group various values into a single one.
The tool also has a way to apply common transforms such as removing trailing spaces.
Packt BOOK: Using OpenRefine, by Ruben Verborgh and Max De Wilde,