Quantitative, Computational, Digital: Musing on Definitions and History

I recently ran across a trifecta of adjectives: “quantitative, computational, and digital” history. It intrigued me enough that I did an internet search which gave me precisely 4 hits, 3 of which were for the same job posting. Clearly this isn’t mainstream yet.

That said, the phrasing really resonated with me on a number of levels and continued to haunt me to the point where I finally decided it was worth writing about at a bit of length.

I am, at the end of the day, a quantitative historian – numbers are integral to both my sources and many of my methods. When I first encountered demographic history in grad school, I instinctively called it “history by numbers” and critiqued sample sizes while interrogating authors’ calculations. My dissertation and first book project analyze early modern British numeracy and quantitative thinking, while my current DH project involves quantification at a massive scale (Death by Numbers: building a database out of the London Bills of Mortality so that I can examine, among other things, early modern people’s addition skills).

As needed, I am also a computational historian – methodologically I use statistics and computer programming on a semi-regular basis. My work on the Six Degrees of Francis Bacon project involved statistical work in R, as well as less quantitative programming in PostGreSQL, Ruby/Rails, JavaScript, HTML, and a sprinkling of Python for good measure. The bibliometric work I’ve done on Identifying Early Modern Books was also fundamentally computational as is much of the work I’m doing on Death by Numbers (I’m not calculating with nearly a million numbers by hand!) And my newest project, the Bridges of Pittsburgh, will involve a variety of pre-existing softwares as well as probably some bespoke programming for the graph theory aspects. Some of these computational methods are clearly also quantitative, but not all of them.

Lastly, by my actual title and job description, I am a digital historian – for whatever contested definition we give for DH. Increasingly, I and my colleagues in the Pittsburgh area have been scoping DH and digital scholarship projects using the criteria of web-facing, which plays out interestingly against the other two terms I use above. By these definitions, the digital is often but not always computational. An Omeka exhibit or WordPress site is digital but not particularly computational (in either the quantitative or programmatic sense). And if we define digital as web-facing, then the computational is not always digital. An example of this disjunction could be found in any computational project that ends with a traditional article or monograph publication rather than a sustained digital project.

Cue Venn Diagram to visualize the way I’ve been thinking about these similarities and differences… c’mon, you knew this was coming, didn’t you? Venn

So where does this leave DH (and Humanities Computing, Quantitative History, and the like)? Not a clue, hence the reason I called this a “musings” post. This will certainly not be the last (virtual) ink spilled on this very-contested and interesting subject of definitions. In the meantime, I will continue to enjoy my liminality and try on adjectives to suit my research objectives of the moment – be they qualitative, quantitative, computational, digital, or something else entirely.

Twitter at the Big Three: Global Network Stats

Every year in the break between Fall and Spring academic semesters, tens of thousands of scholars from across the world descend on an American city for several caffeine-fueled days of panels, receptions, job interviews, and social networking. Actually, this happens more than once, as members of the American Historical Association, the Modern Language Association, and the American Library Association all meet in January. And while most of their social networking happens face-to-face, some of it happens on Twitter where enterprising digital humanists armed with Martin Hawksey’s TAGS can collect conference tweets and analyze them for fun and profit.

Posts in this (intended) series include (and will be linked as they are published):

  1. Global Networks Stats
  2. Bipartite Network Analysis
  3. Directed Network Analysis
  4. Preliminary Conclusions (TL;DR)
  5. The Methods Post

So without further ado, here are some initial stats about the networks I constructed from the three official conference Twitter hashtags: #aha17, #mla17, and #alamw17.

The AHA network is the smallest at 2,826 nodes (people who either tweeted or whose twitter handle showed up in another person’s tweets) and 6,945 edges (connections generated by said tweets). These edges have been weighted so that if Person A mentions Person B 14 times in tweets, the edge from Person A to Person B has weight 14. If Person A mentions Person C only once, the edge from Person A to Person C has weight 1. The average degree is 2.5 (number of edges divided by number of nodes) but when weight is factored in (edges are multiplied by their weight before added and divided by number of nodes) the average weighted degree is 3.9.

There are 74 connected components (subnetworks with no connection to the rest of the network), with the largest connected component containing 90% of the nodes and 96% of the edges in the overall network. This component has diameter 10 (the shortest distance between two people furthest away from each other) and average path length 4.3 (average of the shortest distance between every pair of people in the network).

The MLA network is slightly bigger and slightly more connected:

  • nodes: 3,538
  • edges: 10,178
  • average degree: 2.9
  • average weighted degree: 5.2
  • connected components: 70
  • largest connected component contains
    • nodes: 94.2%
    • edges: 97.8%
  • diameter 12
  • average path length 4.4

The ALA network is the largest and most connected:

  • nodes: 7,851
  • edges: 20,505
  • average degree: 2.6
  • average weighted degree: 3.9
  • connected components: 99
  • largest connected component:
    • nodes: 96.1%
    • edges: 98.9%
  • diameter: 14
  • average path length: 5.4

So what happens when we put it all together?

bigthree
Green edges = #aha17 hashtag. Red edges = #mla17 hashtag. Blue edges = #alamw17 hashtag.

Merging the three networks together creates some overlap of nodes (people on Twitter during more than one conference) and edges (people tweeting to the same people at more than one conference) but the three networks remain largely discrete. The force atlas 2 layout I employed in Gephi created more overlap of the AHA and MLA conferences than the ALA conference, but in general disciplinarity is the rule of the day.

While some of this is likely an artifact of most scholars’ inability to physically attend multiple conferences (the AHA and MLA, in particular, occurred at the same time in Colorado and Pennsylvania respectively), scholars have the ability to interact via Twitter with conferences they aren’t attending. The co-occurrance of the AHA and MLA could have – theoretically – increased connectivity between the two conferences if similar themes and conversations arose at both then connected via social media. Alas, I don’t have the 2015 metrics (the last time these conferences didn’t co-occur) to do a comparison, but if anyone has them and wants to share I’d love to see them!

In general, the merged “Big Three” network stats clearly derive from their constituent conferences’ stats:

  • nodes: 13,489
  • edges: 37,308
  • average degree: 2.8
  • average weighted degree: 4.5
  • connected components: 203
  • largest connected component:
    • nodes: 95%
    • edges: 98.3%
  • diameter: 16
  • average path length: 5.9

One of these numbers, however, immediately jumped out at me as not like the others: the number of connected components. If only the largest connected component of each conference network had been able to connected in the Big Three network, there should have been 74+(70-1)+(99-1)=241 connected components. Instead, 38 of the small components in the conference networks appear to have merged with another component (either the largest connected component or another small component).

This is encouraging to me as it implies that there is an interdisciplinary scholarly community that emerges on Twitter, not just in the dense “center” of the network but also in the disconnected “margins.” In is not (yet?) clear whether this interdisciplinary community is generated by digital humanists, librarians, geographical proximity, common interests, or – most likely – some combination of factors, including some I haven’t considered.

Regardless of the cause, something is going on. In the interests of exploring it, next time I’m going to restructure my data as a bipartite network to see if anything else interesting emerges.

 

I Tweet Therefore I Am Paying Attention

While I’m not on Twitter daily, I am a very active conference tweeter. I’m one of those people sitting by the electrical outlet with my laptop, hastily typing as the speakers present. To give you a better sense of the dichotomy between my everyday and conference tweeting, I present a screenshot of my July Twitter analytics:

tweet_activity

Can you guess when SHARP 2016 occurred?

I’ve had a few people ask me about conference tweeting. What am I doing? Why? And – most importantly – how can I listen and tweet at the same time?

Conference Tweeting 101

The idea is straightforward. As you listen to a speaker, you extract the main ideas, themes, questions, and illuminating examples. You then tweet these things, ideally each one in a single tweet but breaking it across multiple tweets is also an option if the idea is especially complex.

Because all good academics cite their sources, the format of these tweets tends to be something like “Name: idea expounded here #conferencehashtag” or “idea expounded here @speakersTwitterHandle #conferencehashtag”  Session hashtags sometimes emerge at more Twitter active conferences, to separate out the conversations happening around each panel.

tweet

Whenever possible, it’s best to include the speaker’s Twitter handle because this means they will be automatically notified of your tweet (and be able to see other Twitter users’ interest in their ideas). They will also be included in any conversations that happen because someone responds to your tweet. HOWEVER for that to happen, the speaker needs to tell the audience what their Twitter handle is.

Pro Tip: if you have a/v and want your talk to be tweeted, it’s best to include your Twiter handle at the bottom of every one of your conference slides.

The Benefits of Conference Tweeting

… are legion. Because I want to keep this short, I’ll stick to my top two.

You can’t be in more than one panel at a time, assuming you can even afford to attend the conference in the first place. Conference tweeting allows you to “peek” into other panels, spot synchronicity of themes across multiple panels, and virtually attend far more scholarly events than even the most generous professional development stipend could allow.

Social network visualization

Furthermore, conference tweeting is a fantastic way to network – to find people with like interests and spark conversations that begin online, continue in receptions, and last after that conference ends. I’ve had collaborations and future conference panels emerge organically from these conversations, in a way they never would have if I’d sat alone in the back of a conference room then quickly escaped that reception full of strangers

The Concentration Question

This is the question I get asked most often and I’ll give a longer version of my usual response. We train all our academic careers to take notes while listening to lectures and other auditory events. In fact, this is a skill I’ve practiced so long, I have to take notes in order to actively listen to a talk. If I’m not taking notes, I tune out. And for me, conference tweeting is a form of note-taking.

When the Internet connection’s bad, I still take notes in a text document, but I vastly prefer note-taking via Twitter. First, I almost never go back to look at my old notes but I do continuously reenage with old tweets either because someone’s liked/retweeted something or because I’m analyzing datasets of old conference tweets.

Image of conference tweets archive
Tweets Captured with TAGS v6.0 ns

Second, Twitter functions as essentially a communal note-taking platform, enabling me to see what other people are getting out of the same talk. Third, the public nature of this note-taking leads to immediate conversations with other conference tweeting, in which we dissect, analyze, and expand on the ideas we’re hearing together.

So the next time you’re sitting in a conference next to me or anyone else who has Twitter open in the browser, you’ll know: I tweet therefore I am paying attention.