KaylaDot7
3 min readFeb 2, 2021

The structural parts of language contain information about authorship and style.

(Copy Paste)

Language also carries information about who we are as individuals. Regardless of what a document is about, different sorts of people have a different writing style or a different dialect. Why? The reason is that the grammar of a language gives us millions of different choices for saying the exact same thing. Look at the examples below in (1) to (5). Each of these means the same thing. But the linguistic form is a bit different for each sentence. The content words are highlighted in yellow. These are mostly the same across the sentences: go, puddles, bike, cycle, work, commute. But the other words, the function words in green, change quite a bit from one sentence to the next.

We can use these differences to predict information about the text’s author: gender, dialect, native language, and sometimes even age and class. This information is contained in the grammatical structure of the document. The basic idea is that we all (unconsciously) prefer different variants. A variant here is an alternation, like “go around” vs. “avoid” or like “on the way to work” vs. “commuting.” We have many choices like this in every sentence that we say, tens of thousands of choices in fact. But, of course, we only choose one sentence to say. It turns out that another of the fundamental properties of language is that our choices between variants are governed by social attributes (like age or dialect). So when AI learns properties of style and authorship, it is uncovering these underlying linguistic choices.

· Function Words as Style. In languages like English, function words are a good representation for style and authorship. Function words are things like pronouns (you, me), conjunctions (and, or), prepositions (in, on), auxiliary verbs (was, were), and wh-words (who, what). Of course, the grammar of English has a lot more going on than just these individual function words. But these words indicate parts of the grammar of the sentence. And, unlike a full syntactic analysis like the dialect study we saw in Section 3.1, function words are easy to find.

· Style and Authorship. Style offers a glimpse into demographics. In other words, one of the fundamental properties of language is that it encodes social attributes (age, gender, etc.). From this perspective, each of us belong to different groups, different combinations of social attributes. Some of our linguistic patterns come from larger groups (dialects) and some of them are specific to us as individual writers. In and of itself, each stylistic feature is meaningless. But taken together, the structure of a text provides a pointer to the individual who produced it.

· Style and Context. We also use different language in different contexts. For example, people writing an email to someone in authority (like their boss) would say something like this: “I’m wondering if you have time for a meeting with me tomorrow?” But people writing an email to someone in their authority (like an employee) are more likely to say something like this: “We will meet tomorrow afternoon to discuss the report.” There are significant differences here in the use of pronouns and other function words!

· N-Grams. So far, we’ve counted the frequency of individual words. From this perspective, the order of words in a sentence doesn’t matter. But we know this isn’t true. For example, “I am going to the store” and “Am I going to the store” have very different meanings. So “I am” vs. “am I” makes a big difference. An n-gram is a way of counting words while retaining order. So far, we’ve worked with unigrams (sequences of just one word at a time). But now we’ll add more n-grams: if we count pairs of two words, we call it a bigram (like “I am”); and if we count sets of three words, we call it a trigram (like “would have been”). When we use n-grams, we count each pair of words as if it were just one unit. In other words, there is a column that counts “am I” and another column that counts “I am.”

KaylaDot7
KaylaDot7

Written by KaylaDot7

“I open my heart and let myself be reborn daily, so I may revel in a new, improved, lighter and brighter version of who I am.”

No responses yet