Who's blogging now?: linguistic features and authorship analysis in sports blogs
Document
Description
The field of authorship determination, previously largely falling under the umbrella of literary analysis but recently becoming a large subfield of forensic linguistics, has grown substantially over the last two decades. As its body of research and its record of successful forensic application continue to grow, this growth is paralleled by the demand for its application. However, methods which have undergone rigorous testing to show their reliability and replicability, allowing them to meet the strict Daubert criteria put forth by the US court system, have not truly been established.
In this study, I set out to investigate how a list of parameters, many commonly used in the methodologies of previous researchers, would perform when used to test documents of bloggers from a sports blog, Winging It in Motown. Three prolific bloggers were chosen from the site, and a corpus of posts was created for each blogger which was then examined for each of the chosen parameters. One test document for each of the three bloggers which was not included in that blogger’s corpus was then chosen from the blog page, and these documents were examined for each of the parameters via the same methodologies as were used to examine the corpora. Once data for the corpora and all three test documents was obtained, the results were compared for similarity, and an author determination was made for each test document along each parameter.
The findings indicated that overall the parameters were quite unsuccessful in determining authorship for these test documents based on the author corpora developed for the study. Only two parameters successfully identified the authors of the test documents at a rate higher than chance, and the possibility exists that other factors may be driving these successful identifications, demanding further research to confirm their validity as parameters for the purpose of authorship work.
In this study, I set out to investigate how a list of parameters, many commonly used in the methodologies of previous researchers, would perform when used to test documents of bloggers from a sports blog, Winging It in Motown. Three prolific bloggers were chosen from the site, and a corpus of posts was created for each blogger which was then examined for each of the chosen parameters. One test document for each of the three bloggers which was not included in that blogger’s corpus was then chosen from the blog page, and these documents were examined for each of the parameters via the same methodologies as were used to examine the corpora. Once data for the corpora and all three test documents was obtained, the results were compared for similarity, and an author determination was made for each test document along each parameter.
The findings indicated that overall the parameters were quite unsuccessful in determining authorship for these test documents based on the author corpora developed for the study. Only two parameters successfully identified the authors of the test documents at a rate higher than chance, and the possibility exists that other factors may be driving these successful identifications, demanding further research to confirm their validity as parameters for the purpose of authorship work.