Wilson Mar bio photo

Wilson Mar

Hello!

Email me Calendar Skype call

LinkedIn Twitter Gitter Instagram Youtube

Github Stackoverflow Pinterest

Making sense of semi-structured data

US (English)   Español (Spanish)   Français (French)   Deutsch (German)   Italiano   Português   Estonian   اَلْعَرَبِيَّةُ (Egypt Arabic)   中文 (简体) Chinese (Simplified)   日本語 Japanese   한국어 Korean

Overview

Utah parser

https://github.com/sonalake/utah-parser
is a Java library for parsing semi-structured text files to JSON maps based on an XML configuration ‘template’ file which are applied to lines that satisfies a specific regular expression.

https://github.com/google/textfsm
uses Python.

Clean-up Field values

http://openrefine.org/ by Google is a free, open source, powerful program for working with messy data. It runs on your desktop (not a SaaS web service).

Text facets groups together cells and provides a convenient way to group various values into a single one.

The tool also has a way to apply common transforms such as removing trailing spaces.

Packt BOOK: Using OpenRefine, by Ruben Verborgh and Max De Wilde,