I’ve seen this too many times: > data<-read.table('filename.txt')
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
line X did not have Y elements
Every time I see and solve this problem, I’m going to keep a note of what it was and put it here. Hopefully this will not turn out to be an exhaustive list…
Most recently, the reason for this was this default in read.table (read.csv is different): comment.char = "#"
One of the strings in my file had a “#” in it, which resulted in the rest of the line being “commented out” and the above error.
I follow Humans of New York on Facebook, and I often skim the top comments on interesting posts. Today there was a story of a woman who works as a counsellor for HIV-positive teenagers after overcoming a lot of personal suffering. At the time, one of the top comments described her as ‘beautiful’. Given what I know of HONY commenters, the poster very likely meant she has a ‘beautiful soul’. But still, you wouldn’t call a man beautiful like that… or would you?
I used the portal to all of human knowledge (Google) to test a series of hypotheses. My resting assumption, given what I know of the world, of feminist writing, of the internet etc., is that certain qualities/words tend to be attributed more to men than to women (and vice versa). So I created bigrams (two-word phrases) from ‘adjective’ + ‘man’ or ‘woman’ and tested how many hits I got. I’ll put the raw data at the bottom of this post.
First of all, what is the unigram (single-word) prevalence of ‘man’ and ‘woman’?
Not terribly surprising given ‘man’ also functions as ‘person’ in the English language. Given this, we might expect ‘adjective man’ to appear overall much more frequently than ‘adjective woman’, all else being equal. But all else is not equal.
If I wanted to test for some association between ‘man/woman’ and ‘adjective’ I could also look for the unigram frequencies of ‘adjective’, but that’s not the objective here. That is to say, we could imagine these results give us an idea of bigram probabilities, and if the components of the bigram are independent, the resultant probability should be the product of the individual probabilities.
Immediately it appears that the original adjective of interest (‘beautiful’) really stands out for women, which corresponds to my existing hypothesis. Unsurprisingly, ‘pregnant woman’ and ‘handsome man’ are more common than their counterparts.
Since I’m more interested in comparisons, let’s look at the ratio of ‘adjective man’ to ‘adjective woman’, taking care to divide out the background frequencies of ‘man’ and ‘woman’ as we do so. Specifically, I take the counts of ‘adjective man’ and divide by ‘total man’, for each adjective, and do similarly for ‘adjective woman’. Then I can actually compare their counts, knowing the greater abundance of ‘man’ won’t disturb the results.
This is more interpretable, although I could definitely have plotted this better (as deviations from 1, rather than growing from 0, as well as using a neutral shade for ratios close enough to one, but I want to eat a burrito soon). Once again ‘handsome’, ‘beautiful’ and ‘pregnant’ behave as expected.
‘Successful’ is the most woman-specific adjective from this set. This obviously does not imply that women are more successful than men (or men are more dead than women…), but it could suggest that we talk about the success of women more. It seems that we prefer most of these adjectives for women, in fact. I tried to go for words which might have some kind of subtle gender/sex bias (… and ‘pregnant’), but if it wasn’t for the very obvious ‘handsome’ and the potentially-weird ‘dead’, I didn’t do a very even job of it. My initial assumptions about bigram frequencies (I mean word usage) were clearly flawed.
A word of caution: without controlling for the background frequency of ‘man’ and ‘woman’, the previous graph looks like this: (a conscious observer may note that the only change is in the y-axis, which results from the normalisation scaling everything by a factor of ‘woman’/’man’ frequency.)
This might agree with what you expect to see (if you are as I am constantly inundated with strong, friendly, homeless, handsome crying dead men) but is based on a flawed analysis.
I’m not sure what to take from all of this. I would like to find adjectives which are more commonly associated with men and then draw wild conclusions about the nature of society, but that will have to wait for another day.
Data: (#: I excluded ‘lovely’ for being too obscure and ‘old’ for wrecking the axes on the graphs)
adjective count sex #none 204000000 woman #none 512000000 man beautiful 3380000 woman beautiful 772000 man handsome 289000 woman handsome 1630000 man brave 710000 woman brave 614000 man strong 989000 woman strong 1450000 man inspiring 313000 woman inspiring 216000 man intelligent 1150000 woman intelligent 398000 man smart 642000 man smart 1300000 woman successful 360000 man successful 2130000 woman lonely 495000 man lonely 1340000 woman angry 847000 woman angry 585000 man sexy 748000 man sexy 1550000 woman #old 20400000 man #old 5970000 woman dead 459000 woman dead 2660000 man homeless 1180000 man homeless 814000 woman crying 343000 man crying 339000 woman pregnant 2630000 woman pregnant 468000 man #lovely 355000 man #lovely 491000 woman friendly 809000 man friendly 313000 woman
One of the original unstated goals for this blog (at the time of its inception) was to write up solutions to all the problems in Peskin and Schroeder’s An Introduction to Quantum Field Theory. It was a reasonable idea for a theoretical physics student. Despite having since abandoned theoretical physics, I still occasionally entertain the idea of continuing with the project, an impulse driven perhaps by nostalgia, or some masochistic tendency which drives me towards involved integral calculations. Thankfully, someone has already gone to the trouble, and I can spare myself the hours of pointless TeXing that this particular indulgence may have cost me.
It is important to watch one’s figure. To obsess over it, inspecting every detail, carefully erasing unpleasant angles, making sure all elements are perfectly aligned. The hours I spend caring about my figures may one day be my downfall, but until that time I shall continue to worry about things like inner margins, font embedding and what shade of cornflower blue to use.*
A few weeks ago** I decided to use R for some data analysis. It transpires that the R I thought I knew was in fact a semi-incomprehensible amalgamation of gnuplot and Octave, so I was forced to go back to the beginning. A lot of language-learning boils down to reading documentation (still looking for the official documentation for Mandarin, though) but that’s inefficient when you have a specific task to do. I’ve waded around in the help files, and here’s what I dredged up for basic (but not ugly) plotting. Note: as far as I can tell, ggplot2 is the main R package for plotting. This post is not about that. That comes later.
This’ll suck your datafile into R. The sep option should contain whatever separates fields in your file. Default is whitespace. The as.is=TRUE option stops R from turning fields containing text into ‘factors’, which may not be what you want. Note: Try to name your variables informatively, unlike me.
Let’s suppose we want to plot some function of the fourth column of the data frame (data[,4]) against the first column (data[,1]). Note: if you wanted the fifth row for example, you would write data[5,]. The i,jth element is just data[i,j].
The first line is figuring out how many rows are in our data frame in two equivalent ways. Note that length(data) just gives you the number of columns. The second line creates a new column whose values are a function of the (standard normal) cumulative distribution function (pnorm) of the absolute value of whatever is in the fourth column of the data frame. The third line appends (columnwise, with cbind) it to the data frame. Note: I think this might mess up the column names of the data-frame, so there must be a better way to do this. The last line is just defining a variable we’ll use later.
Now let’s use some other functions to create (useless***) vectors.
x is just a vector of real numbers starting at min(data[,1) and ending with max(data[,1]), with evenly spaced intervals so its length is 1000 (seq creates a sequence). y is also a vector of length 1000. runif(100) creates a vector of 100 numbers between 0 and 1 taken from a uniform distribution. rep then repeats this 10 times. If you wanted the repetition to be elementwise (eg [a,a,b,b] instead of [a,b,a,b]) just write each=10.
We’re creating the plot environment here. What is going on is as follows: we’ve got three plots to arrange. Each one of these corresponds to a number in the matrix, which determines where it will go. Note: you can clearly try to give nonsensical layouts here, like demanding the first plot exist in both upper left and bottom right corners. This produces… interesting results.
The matrix command takes a list (c(1,1,2,3)) and matrix…es it. Here we’re telling it to have 2 rows and 2 columns, and to fill the matrix hull with the elements of the list row by row (byrow=TRUE – the default is by column).
We then inform layout that we want the columns of the matrix to have relative widths c(1,3) (note the second plot on the second line is 3 times as wide as the first) and the rows to have relative heights c(1.75,1). Many joyous minutes can be spent tweaking these values.
par lets you globally set graphical parameters. A lot of its options are the same as for plot or hist, but it saves you having to write them for every individual plot. Here we’re setting the font family and bty, which determines the box style for the plots (eg the presence of axis lines). The options for these are depictions of the results: L, 7, C, U, ], etc. Note: I love Avenir, but using ‘strange’ fonts like this can cause problems if/when you go to save these plots directly to pdf.
I thoroughly recommend perusing the options of par. They will open your eyes to the scope of basic R plotting. In particular mar and omi lend themselves well to tweaking, and mfrow is like a poor-man’s layout.
plot(data[,1]/(1e07), data[,4], type="h", col=ifelse(abs(data[,4])>2.5, "darkorange1","black"), ylim=c(1.03*min(data[,4], na.rm=TRUE), 1.03*max(data[,4], na.rm=TRUE)), main=paste("Some data on chromosome", chro), xlab="Physical Distance (e+07)", ylab="Test Statistic")
I’m plotting data[,4] against data[,1]/(1e07). In my case, the elements of data[,1] are all of the order 10 million, so the division just rescales the x-axis to make it easier on the eye. R understands scientific notation like this, which is nice.
type: "p" plots points, "h" plots vertical bars (as in a histogram).
pch: (see later) is point style (if we had them). Tiny dots are ".", bigger dots are "20", see a diagram for more options.
col: (colour) you could write something simple like col="cornflowerblue" to make all of your points (or vertical lines as the case is here) that colour. But that’s boring. What we have here is "darkorange1" if abs(data[,4])>2.5, and "black" otherwise. Note: you may have noticed that my data[,4] column consists of z-scores, and I’m highlighting the ‘unusual’ ones.
ylim: we’re manually setting the limits on the y axis here, so it’s ylim = c(ymin, ymax). I’m going a little bit beyond the actual range of the data, and making sure to remove “nan“s in the min/max functions. Note: if your data is nice unlike mine, you shouldn’t need to do this.
paste: lets me combine text with other variables in the environment. In this case, it’s chro, which we defined earlier.
abline: this just adds a line to the plot. In this case, because a y-position was specified, it’s horizontal (actually, I added two lines, because I specified two positions). Alternately, abline(v=a) for a vertical line.
The second two lines produce a normal distribution and overlay them onto the hist plot. (Both points and lines will do this, if you need to add more data to an existing plot). breaks is a histogram option saying how many individual bars we want, so you can adjust the ‘resolution’ of the histogram (so tweakable!). dnorm gives the probability density function for a standard normal, which we compare on the histogram.
And now for the final plot. Assuming the values in data[,4] come from a random variable with (standard) normal distribution, then p_col = 2*(1-pnorm(abs(data[,4])) is the probability of getting a more extreme value. (pnorm(abs(data[,4])) is just the cumulative distribution function of the absolute value of the variable, so 1-pnorm is the probability of getting a higher value. The multiplication by 2 compensates for our having taken an absolute value). We can then take the -log (base 10) of this, as is the style. Plotting is simple enough:
Here I’ve used cex to make the plot points a bit smaller.
Statistical note: depending on the nature of the analysis you’re doing, a ‘p-value’ of 0.05 may not be significant. You may be doing multiple hypothesis testing and will need to account for this. The example shown here is purely for demonstrative purposes and has no scientific legitimacy.
And we are done! For this particular example, I would be inclined to spend additional time tweaking the various inner/outer/inter-plot margins to reduce the amount of whitespace in the figure, but that’s just me.
*technical point: R only has one shade of cornflower blue.
**It seems I started this post over a month ago. Oops!
***I was originally planning on including the full complexity of my plotting script here, which involved vectors like this. Now they serve only as examples of what is possible.
The courier, the donkey, the crow, chicken. The ever inscrutable, ever occupied courier. It comes in many forms (pictured above: badger), but the default in Dota 2 is a donkey (pictured below).
Dota 2 is a shopping game, and the courier delivers the goods. It costs 150 gold and is invaluable to the team, yet nobody wants to buy it, so if you spend your Dota 2 career playing with pub scrubs, then you may never actually see one of these.
If you are lucky enough to have semi-competent teammates (or be semi-competent yourself), you need to know how to use the courier.
There are three shops on each side of the map. One in the fountain (that’s where you spawn), one in the easy lane (“side shop”), and another one called the “secret shop”. Either team can use any shop, but trying to purchase one’s items from the enemy fountain is not usually advisable.
Here’s the deal with items: you can open the shop tab at any time and buy items (unless they are secret shop items – see below). If you happen to be adjacent to the appropriate shop at the time of purchase, the item goes into your inventory and you can begin using it – hurray! If you are not adjacent, the item goes into your stash.
Think of the stash as a locker you have back at the fountain, where your wayward purchases end up. To get items out of the stash, you have to return to the fountain. This is a major pain/waste of time, so you should use THE COURIER to do it for you. Grabbing stuff from the stash and bringing it to you is basically the point of the courier.
Note: you can also use teammates to courier items for you, although they will probably just steal them and mock you in a foreign language. Teammates can’t take items from your stash, but you can “drop from stash” (right click on the item in the stash to see this option), which causes the item to appear on the floor in the fountain. A teammate/courier can then pick this up and bring it to you. Only consumable items (potions, wards, dust, smoke, etc.) which weren’t purchased by a teammate can actually be used by them, so they have no reason to steal your Manta Style/whatever beyond malice and spite (you will never see your precious Manta Style again). This also means rich teammates can’t buy items for the poor ones – there will be no redistribution of wealth in Dota 2!
Using the courier.
I have highlighted the main courier skills (there are a total of six. Learning the courier should not be challenging.)
A: “Go to the fountain.” Courier will walk (or fly, if it has been upgraded into a flying courier) back to the fountain. Note: the ability after this (W in my keybinds) sends the courier to the secret shop.
B: “Get items from stash.” This transfers items from your stash into the courier’s inventory, if it is at the fountain. Unfortunately the courier cannot use (most) items, so its inventory is purely for storage. Note: the courier can actually use healing potions/clarities on heroes, and Smoke Of Deceit more generally.
C: “Give me my goddamn items.” The courier will fly to you and automatically give you any items in its inventory which belong to you, and then fly back to the fountain. If it’s not carrying anything of yours, it will still fly to you (and back), so be warned. I always hit “get items from stash” before using this command, to prevent accidental empty-couriering. If you don’t have space for all/any of the items it wants to give you, it will give as many as it can before returning to the fountain. I don’t know how it decides which item to give, but experience suggests it is “whichever item is least useful right now”. Be warned.
D: “Fly fast, little one.” This ability is only available if the courier has been upgraded into its flying form. It causes the courier to move at maximum speed for 20 seconds, and has a 40-second cool-down.
And so, I use the courier by ensuring it’s at the fountain, then mashing D,F,R*, and continuing on my merry way.
*If you share my key-binds, you should make sure you have the courier selected before you do this. Many’s the accidental ult has been triggered (usually bound to R) by careless courier-use. Just hope nobody witnesses your shame.
You can also buy items directly using the courier, if you send it to a shop and buy your items while it is selected.
A final note about the courier: see its hitpoints? It can die. It dies very easily (although it is magic immune). If it dies, it gives the enemy team 175 gold each. It won’t drop/lose its precious items, but they will be inaccessible until it respawns three minutes later. If you/your team are soon to engage in a fight, or otherwise go into a dangerous area, be careful about using the courier! Getting the courier killed can be even more shameful than wasting your ult.
The secret shop.
What is this mysterious secret shop, I hear you cry? Well, some items are awkward and require a specialist merchant. You can tell an item can only be purchased from the secret shop by hovering over it in the shop tab (or noticing the red mark on its icon):
As I mentioned, secret shop items cannot be remotely purchased and placed into the stash. (The stash is located at the fountain, and the secret shop is in a jungle somewhere! What merchant would risk transporting valuable stock through the dangerous jungles? The side shops are okay, because the goods can be transported via the darkness outside beyond the map, clearly.) Either you or the courier have to go there in person to shop. Sorry! On the upside, items from the secret shop tend to be a little bit exotic, so you shouldn’t need to buy from there too frequently.
More shopping: building items and quick-buy.
Many items are made of more basic items. In order to build one of these composite items, you simply have to have all the required ingredients in your(/the courier’s) inventory. If you left-click on an item in the shop, it will tell you what (if any) other items are required to build it. Note: the “piece of paper” item depicted here is a recipe. Recipes can only be purchased from the fountain, and are useless beyond their role in building items.
Also highlighted in this screenshot is quick-buy area. Dragging an item from the shop tab into this region will place all of its ingredients here, ready to be purchased with a simple right-click. If you have enough gold to purchase any of these parts, they get a golden border (in my incredibly illustrative example, I have enough money to buy all of them). They don’t get automatically purchased for you. Quick-buy is useful if you fear sudden and unexpected death, as it allows you to frantically spend your money before you die (and lose some of it). I play Dota 2 a bit like StarCraft 2, so I buy items as soon as I can, and try to spread creep(s) evenly across the map.
Apologies for wildly varying image size and quality in this post. Since starting it, I changed operating system and lost my Adobe software. I struggle on with GIMP.
Critwhale posted this map in the Dota 2 group on g+ (whoever knew g+ would turn out to be useful!). I haven’t tested it so I can’t vouch for its accuracy, but if nothing else it’s a really great high-res game map.
For the uninitiated:
“to juke” refers to the act of evading an enemy who is pursuing you by taking an unexpected route. Or in practice, “dodging around a tree”. I imagine the jungles of Dota 2 to be filled with incredibly dense foliage, because hiding behind a single tree can sometimes completely shield you from vision.
A spawn box is the regions in which a creep camp can be blocked from spawning. The neutral camps respawn on the minute every minute (with the exception of their initial spawn, which is at 30s on the game clock), if the spawn box is empty. That includes heroes, creeps, and wards. You can use this offensively to impede the enemy team (if a camp doesn’t spawn, it can’t be farmed!), but accidentally blocking a friendly camp is also a danger. I usually give creep camps a wide berth around the minute mark, but knowing exactly where the box is is clearly the superior option.
I have some more guides in the pipeline (notably “how to shop” and “how to ward”), but I cannot promise graphics as beautiful as this.
The Korean alphabet (Hangul) is – so far – my favourite writing system. It is logical and efficient. It pleases my sense of style. Since starting this post over a month ago I took up learning Mandarin so my feelings towards Hanzi are liable to threaten Hangul’s dominance in the future, but for now I side with space-robot alphabet. Because that’s what Hangul is.
At first glance one may assume that Hangul consists of logograms – characters representing words rather than phonemes, but this is not the case. The alphabet is very much phonetic. Each “block” is a single syllable, so for example Hangul(한글) is Han(한)+gul(글).
Since syllables are made of phonemes, it is not surprising that the blocks consist of sub-components representing these phonemes. (It was surprising the first time I learned of this, because such an elegant solution to written language had not occurred to me – though upon further reflection, the trick is just “writing words more compactly” so it’s not as novel as it is aesthetically pleasing.) Some insane person wrote a Wikipedia page documenting every possible syllabic block in Korean, so all you need to do is memorise all ten thousand of these (give or take a few thousand) and reading Korean will become trivial. End of post. If this idea is appealing to you, I might suggest going to Cambridge to do Part III of the Mathematical Tripos.
The more elegant solution is to learn the alphabet. Each letter is called a “jamo”, but they only occur inside blocks, sort of like quarks. Unlike quarks, we can still look at them individually. I’ll include the IPA in , and a ‘translation’ of IPA into my accent (mileage may vary). For pronunciation purposes, text is no replacement for audio, so I would suggest finding some videos, like this one, for example.
Simple vowels: Simple vowels are made of horizontal or vertical lines and short strokes.
ㅣ [i] (“ee” in “tree”)
ㅏ [a] (“a” in “mad”)
ㅓ [ʌ] (“u” in “mud”)
ㅡ [ɯ] (somewhere between the “oo” in “cool” and the “eu” in “eugh” – I have a really hard time differentiating this from ㅜ)
ㅗ [o] (“o” in “bowl”)
ㅜ [u] (“oo” in “too”)
Complex vowels: Combinations of simple vowels (including diphthongs). I’m not going to include all combinations because many of them are self-evident given the simple vowels.
These ones are less obvious:
ㅐ [ɛ] (“e” in “bed”)
ㅔ [e] (“e” in “grey”)
Generally, ㅗ or ㅜ combined with another vowel gives a “w-” sound, so for example ㅘ is “wah”, ㅙ is “weh”, and ㅟ is “wee”.
There’s no letter for “y” in Korean, so if you want to “y” up a vowel, double up on short strokes (I believe this process is called ‘iotation’. You can do something similar in Slavic languages with ь – Cyrillic comes a close second in the space-robot race.) This produces
ㅑ [ja] (“yah”)
ㅕ [jʌ] (“yuh”)
ㅛ [jo] (“yoh”)
ㅠ [ju] (“yoo”)
We can extend this to the complex vowels, to get ㅒ for “yeh” and ㅖ for a slightly different “yeh”.
Syllables are usually a consonant-vowel sandwich, so consonants can be “initial”, “medial”, or “final” (I’ll write [i/m/f]), and the placement makes a (small) difference to the pronunciation of the letter.
ㄱ [k/g/k̚] (“k” as in “Kant”, “g” as in “gravity”, “k̚” as in “quark”)
ㄴ [n/n/n] (“n” as in “neutron”)
ㄷ [t/d/t̚] (“t” as in “tachyon”, “d” as in “down”, “t̚” as in “cat”)
ㅅ [s/s/t̚] (“s” as in “strange”)
ㅁ [m/m/m] (“m” as in “mass”)
ㅂ [p/b/p̚] (“p” as in “point”, “b” as in “baryon”, “p̚” as in “top”)
ㅇ [-/ŋ/ŋ] (This is just a silent placeholder in the initial position. In all others it’s “ng”, as in “ping”)
ㄹ [ɾ/ɾ/l] (“ɾ” as in “alveolar tap”, a sound which is neither “r” nor “l”)
Some consonants are obtained from others by aspiration. Aspiration is basically just adding air to the sound – so imagine trying to sneak a “h-” sound in after the consonant. In Hangul, the addition of a horizontal line seems to denote this aspiration, or a general ‘softening’ or alteration of the sound (in the case of the letter I like to think of as “j”). This produces:
ㄱ > ㅋ [kʰ/kʰ/k̚] (“kʰ” is an aspirated “k”, oddly enough)
ㄷ > ㅌ [tʰ/tʰ/t̚]
ㅅ > ㅈ [tɕ/dʑ/t̚] (“tɕ” as in “charm”, “dʑ” as in “jam”)
ㅈ > ㅊ [tɕʰ/tɕʰ/t̚] (“tɕʰ” as in “oh god send help”)
ㅂ > ㅍ [pʰ/pʰ/p̚] (“pʰ” as in *strangling noises*)
ㅇ > ㅎ [h/ɦ/-] (“h” as in “hello”, “ɦ” as in “cool whip”)
There are also “double letters”: ㄲ, ㄸ, ㅃ, ㅆ, ㅉ which are “tense”, so they’re pronounced a bit like you’re after spending the last hour reading articles about phonetics and just realised it’s too late to watch Breaking Bad. “Damn it!” ~ “땀읻!”
I should stress that this entire post has very little to do with the Korean language. I don’t know any Korean, but transliteration can be fun, and this article was largely about IPA. Trying to cram English into a foreign language really makes you appreciate phonetic differences.