Variations on the word "worry"
Dec. 23rd, 2020 12:08 pm1) I had meant to go to my parents today after a week of quarantining as best one can in an apartment with other people, but I feel a very slight tickle in my throat, which could be my imagination, but it makes me worried. Better wait a day and see. /o\
2) Occasionally I realize I've been using a word that is not allowable in 1750 and have to go back and edit it out of all my Flight of the Heron fic. Today's word is "worry" in the sense of "feeling anxious or troubled". Sigh, that word was a transitive verb meaning "to annoy or vex someone" back then (from an earlier stronger meaning of "to slay, kill or injure by biting and shaking the throat"), and I have now replaced it with variations of concern/trouble/fret/fear, which are all fine. I make copious use of the Online Etymology Dictionary. I mean, it's not that I imagine myself to actually be writing as they did in 1750, nor is that necessarily my goal, but at least I can try to avoid anachronisms.
And indeed, there is not a single use of "worry" in canon! Yeah, Broster did her research. But hey, she was born in 1877, and the intransitive use of "worry" to mean "feeling anxious or troubled" is actually only attested from 1860, though the use "to cause mental distress or trouble" is attested from 1822.
Another error I've recently discovered is that Anglicans/Episcopalians do not pray for the souls of the dead! This is/was a Catholic thing to do. Ugh, I have a character do this at least at one point, I suppose I should fix it. I should thank Naomi Mitchison for this discovery.
2) Occasionally I realize I've been using a word that is not allowable in 1750 and have to go back and edit it out of all my Flight of the Heron fic. Today's word is "worry" in the sense of "feeling anxious or troubled". Sigh, that word was a transitive verb meaning "to annoy or vex someone" back then (from an earlier stronger meaning of "to slay, kill or injure by biting and shaking the throat"), and I have now replaced it with variations of concern/trouble/fret/fear, which are all fine. I make copious use of the Online Etymology Dictionary. I mean, it's not that I imagine myself to actually be writing as they did in 1750, nor is that necessarily my goal, but at least I can try to avoid anachronisms.
And indeed, there is not a single use of "worry" in canon! Yeah, Broster did her research. But hey, she was born in 1877, and the intransitive use of "worry" to mean "feeling anxious or troubled" is actually only attested from 1860, though the use "to cause mental distress or trouble" is attested from 1822.
Another error I've recently discovered is that Anglicans/Episcopalians do not pray for the souls of the dead! This is/was a Catholic thing to do. Ugh, I have a character do this at least at one point, I suppose I should fix it. I should thank Naomi Mitchison for this discovery.
(no subject)
Date: 2020-12-23 02:00 pm (UTC)(no subject)
Date: 2020-12-23 08:15 pm (UTC)The site's creator has open-sourced all the codes and datasets here, which is pretty cool. Looks like they're taking all words and 2-grams (two words occurring sequentially) in the user's text and checking which of them don't appear in a selection of historical texts (the ones listed here ) They only have a limited number of texts for each decade, I guess because they must have had to clean the text of each novel by hand, which must have taken some time.
So there's going to be false negatives like the examples you saw (words that get flagged only because they're not in the website's corpus, even though they were actually in use in that decade) and false negatives (words that should have been flagged as anachronisms, but don't get flagged because they were in use in that decade with a different meaning, like your "worry" example). You could easily reduce the first problem just by including more texts in the corpus, but I think the second problem is basically impossible to solve with our current levels of natural language processing technology.
It looks like they're not using the Google Ngram data at all (besides linking to their website) probably because the Google dataset is not great quality for this kind of thing. A word attested in the dataset in 1890, for instance, might be a reprint of something originally published in 1810, which itself might be a story set in some previous century. Whereas I see this Wordsworth person has made an effort to only use novels which are both originally published in the given decade and also set in the given decade.
I guess maybe the Google Ngram dataset is still pretty useful for seeing when words first appear? Even if not so useful for seeing their variation or decline over time...
Can you tell I've spent a lot of time thinking about this? *g*
(no subject)
Date: 2020-12-24 10:17 am (UTC)Merry Christmas, if you celebrate it. : )