banner

About this blog

BASHing data is a companion blog to A Data Cleaner's Cookbook. It continues a series of data-related posts I contributed from 2014 to 2018 to Andrew Powell's Linux Rain blog.

This is a place for demonstrations and trials of command-line data "ops". The operations might include analysing, archiving, auditing, cleaning, de-duplicating, encoding, entering, migrating, querying, reformatting, reporting, storing etc.

Want to comment?

Email me. If your remarks are on-topic and helpful, they'll be edited straight into the relevant post, not buried in a list of comments at the bottom of the webpage.

Want notice of future posts?

Copy the RSS link into your feed reader:    RSS

About me

I'm a part-time data auditor and incompletely retired zoologist.

Robert Mesibov, West Ulverstone, Tasmania, Australia
robert.mesibov@gmail.com


List of posts:

Putting information into a table from the table's filename (2018-12-13)
     The example adds a date from the filename to each record in the table

Finding changepoints in a list, revisited (2018-12-06)
     Using AWK to find where values change in a list

Unwrap your fasta (2018-12-01)
     How to concatenate the sequence lines in FASTA files

Avoiding senior moments with command-line functions (2018-11-13)
     The trick is to make the documentation available on the CLI

How to find distances between lat/lons for geochecking (2018-11-07)
     When you're looking for big differences, an approximate method is fine

Mapping with gnuplot (2018-10-31)
     How to use gnuplot to put data points on a basemap

Repair job: separate the tandem repeats (2018-10-26)
     How to split a tandem repeat between fields

Bird watching with AWK and grep (2018-10-24)
     Showing off the fastest way to search a text file for strings in another file

How to enter nothing in a database (2018-10-18)
     If you have nothing to say, say nothing

How to validate ISO 8601 dates without regex (2018-10-05)
     Check for format and content errors in YYYY-MM-DD fields with AWK

Fightin' fields (2018-09-30)
     Finding disagreements between data fields can be challenging

Fuzzy matching in practice (2018-09-23)
     Tips for approximate matching with tre-agrep

Data on clay (2018-09-20)
     Cheap data storage for thousands of years? Check. Ancient glyphs in your terminal? Check.

iconv and illegal input sequences (2018-09-13)
     Getting around a roadblock in changing the character encoding of a file

Displaying data from table fragments (2018-09-06)
     One way to build a tidy table from a jumble of data

SCI and 62;c62;c62;c... (2018-08-25)
     A control character causes strange behaviour in GUI terminals

A record pager built with YAD (2018-08-18; updated 2018-09-09)
     How to turn a YAD dialog into a GUI viewer/pager for records in a data table

48 sea levels and a trope for your terminal (2018-08-11)
     A bulk string replacement with AWK, and that ACCESS DENIED thing

Mojibake detective work (2018-08-06)
     A close look at some character encoding problems

Pseudo-blank ("empty") records and fields (2018-08-04)
     How to find not-quite-empty rows and columns in a data table

GUI ways to view and edit big text files (2018-07-31)
     glogg, gvim, Geany and csvpad, but not spreadsheets

Question marks that aren't really question marks (2018-07-27)
     Some question marks are signs that a program doesn't understand a character's encoding

Time series ops (2018-07-23)
     Using AWK to summarise time series data

Curse of the CSV monster (2018-07-18)
     CSV with broken records to TSV

Partial duplicates (2018-07-14)
     One way to find "pseudoduplicated" records

Fun with BOM data (2018-07-11)
     Weather watching with wget and gnuplot

Truncated data items (2018-07-04)
     Detecting truncations, such as a 100-character string clipped to 50 characters in a database

Too many lat/lon digits (2018-06-30)
     Rounding off latitude/longitude data to an appropriate number of significant figures

Embedded newlines (2018-06-23)
     How to safely remove embedded newlines

Combo characters (2018-06-09)
     How to deal with Unicode's combining characters

Pivoting airlines (2018-06-03)
     Using arrays of arrays to build a pivot table with AWK

A surprising AWK trick (2018-05-27)
     A clever way to avoid using a flag in AWK

Compare parts of strings (2018-05-22)
     How to use AWK's "split" function to compare parts of strings

YAD repeat and edit (2018-05-21)
     How to avoid re-entering data in a YAD data entry form