does not scan

I’ve seen this too many times:
> data<-read.table('filename.txt')
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
line X did not have Y elements

Every time I see and solve this problem, I’m going to keep a note of what it was and put it here. Hopefully this will not turn out to be an exhaustive list…

Most recently, the reason for this was this default in read.table (read.csv is different):
comment.char = "#"
One of the strings in my file had a “#” in it, which resulted in the rest of the line being “commented out” and the above error.

Printing a range of bytes from a specific line in a file

To read the first B bytes from the Nth line of a file using pipes in bash:

cat filename | head -N | tail -1 | head -c B

The first pipe takes the output of cat filename (which is simply a printout of filename) and feeds it as input to head -N which produces the first N lines of filename, the last line of which is produced by tail -1, and then the first B bytes are pulled out by head -c B.

So, to read from byte B1 to B2 (inclusive) on the Nth line:

cat filename | head -N | tail -1 | tail -c +B1 | head -c B2-B1

This is cheating a bit because you have to work out B2-B1 first. If you type it as-is you will not succeed. The +B1 bit in the second tail command tells it to work from the start of the file (normally tail counts backwards).

Or you could do it much more “easily” using sed and cut. To start, we get the Nth line of the file:

sed 'Nq;d' filename

Knowing nothing about sed, this took a while to understand. I’m still not entirely sure if my intuition is correct, but it goes as follows: sed scans the file line by line. The q command is triggered only on line N. On every other line, instead of printing the file, it doesn’t print (thanks to d), so we see nothing. When it gets to line N it quits, after printing the line, but before triggering d. Another way to print only line N is with

sed -n 'Np' filename

What happens here is that -n tells said not to print anything, while p says ‘print line N’. The difference between this and the previous one is that sed will continue to the end of the file, quietly not printing anything. That’s sort of a waste of time, so – as per usual – the less comprehensible version is faster.

And now we can simply do

sed 'Nq;d' filename | cut -b B1-B2

Where here B1-B2 can can be written literally, because cut takes a range as its argument.

We could have also used awk to extract a line, instead of sed:

awk 'NR==N' filename

I am particularly interested in learning sed though, so I’ll try to stick to sed solutions where possible.