About this blog

BASHing data is a companion blog to A Data Cleaner's Cookbook. It continues a series of data-related posts I contributed from 2014 to 2018 to Andrew Powell's Linux Rain blog.

This is a place for demonstrations and trials of command-line data "ops". The operations might include analysing, archiving, auditing, cleaning, de-duplicating, encoding, entering, migrating, querying, reformatting, reporting, storing etc.

Want to comment?

Email me. If your remarks are on-topic and helpful, they'll be edited straight into the relevant post, not buried in a list of comments at the bottom of the webpage.

Want notice of future posts?

Copy the RSS link into your feed reader:    RSS

About me

I'm a part-time data auditor and incompletely retired zoologist.

Robert Mesibov, West Ulverstone, Tasmania, Australia

List of posts:

How to validate ISO 8601 dates without regex (2018-10-05)
     Check for format and content errors in YYYY-MM-DD fields with AWK

Fightin' fields (2018-09-30)
     Finding disagreements between data fields can be challenging

Fuzzy matching in practice (2018-09-23)
     Tips for approximate matching with tre-agrep

Data on clay (2018-09-20)
     Cheap data storage for thousands of years? Check. Ancient glyphs in your terminal? Check.

iconv and illegal input sequences (2018-09-13)
     Getting around a roadblock in changing the character encoding of a file

Displaying data from table fragments (2018-09-06)
     One way to build a tidy table from a jumble of data

SCI and 62;c62;c62;c... (2018-08-25)
     A control character causes strange behaviour in GUI terminals

A record pager built with YAD (2018-08-18; updated 2018-09-09)
     How to turn a YAD dialog into a GUI viewer/pager for records in a data table

48 sea levels and a trope for your terminal (2018-08-11)
     A bulk string replacement with AWK, and that ACCESS DENIED thing

Mojibake detective work (2018-08-06)
     A close look at some character encoding problems

Pseudo-blank ("empty") records and fields (2018-08-04)
     How to find not-quite-empty rows and columns in a data table

GUI ways to view and edit big text files (2018-07-31)
     glogg, gvim, Geany and csvpad, but not spreadsheets

Question marks that aren't really question marks (2018-07-27)
     Some question marks are signs that a program doesn't understand a character's encoding

Time series ops (2018-07-23)
     Using AWK to summarise time series data

Curse of the CSV monster (2018-07-18)
     CSV with broken records to TSV

Partial duplicates (2018-07-14)
     One way to find "pseudoduplicated" records

Fun with BOM data (2018-07-11)
     Weather watching with wget and gnuplot

Truncated data items (2018-07-04)
     Detecting truncations, such as a 100-character string clipped to 50 characters in a database

Too many lat/lon digits (2018-06-30)
     Rounding off latitude/longitude data to an appropriate number of significant figures

Embedded newlines (2018-06-23)
     How to safely remove embedded newlines

Combo characters (2018-06-09)
     How to deal with Unicode's combining characters

Pivoting airlines (2018-06-03)
     Using arrays of arrays to build a pivot table with AWK

A surprising AWK trick (2018-05-27)
     A clever way to avoid using a flag in AWK

Compare parts of strings (2018-05-22)
     How to use AWK's "split" function to compare parts of strings

YAD repeat and edit (2018-05-21)
     How to avoid re-entering data in a YAD data entry form