Wilson Mar bio photo

Wilson Mar

Hello!

Calendar YouTube Github

LinkedIn

Making sense of semi-structured data

US (English)   Norsk (Norwegian)   Español (Spanish)   Français (French)   Deutsch (German)   Italiano   Português   Estonian   اَلْعَرَبِيَّةُ (Egypt Arabic)   Napali   中文 (简体) Chinese (Simplified)   日本語 Japanese   한국어 Korean

Overview

Utah parser

https://github.com/sonalake/utah-parser
is a Java library for parsing semi-structured text files to JSON maps based on an XML configuration ‘template’ file which are applied to lines that satisfies a specific regular expression.

https://github.com/google/textfsm
uses Python.

Clean-up Field values

http://openrefine.org/ by Google is a free, open source, powerful program for working with messy data. It runs on your desktop (not a SaaS web service).

Text facets groups together cells and provides a convenient way to group various values into a single one.

The tool also has a way to apply common transforms such as removing trailing spaces.

Packt BOOK: Using OpenRefine, by Ruben Verborgh and Max De Wilde,