Ben Whishaw, Broadway, the RADA and Wikidata (in English and with updates)

Harmonia_Amanda Wikidata Français

Hello everyone! Here is Harmonia Amanda, squatting Ash_Crow’s blog. Some people told me repeatedly I should write about some of what I did these last few months on Wikidata, e. g. all my work about the RADA (Royal Academy of Dramatic Art) and other things. And after I wrote it in French, some people told me I should write it again in English. So here we are! To ensure that no one will read it, I wrote a long text, stuffed with footnotes,[1] and even with real SPARQL queries[2] here and there. No need to thank me.[3]

How it begins: The Hollow Crown

Everything is because of Ben Whishaw. I was quietly watching Shakespeare’s adaptations by the BBC (and for those who haven’t watched The Hollow Crown, I suggest to do so) and I was thinking that the actor playing Richard II deserved an award for his role, because he was simply extraordinary.[4] [5] So I went lurking on his French Wikipedia page[6] and as a good Wikimedian,[7] I decided to make it a little bit better. For now[8] I’ve mostly cleaned up the wikicode and dealt with accessibility for blind-reading software. As I couldn’t instantaneously make it a featured article, I thought it could be fun to complete his Wikidata entry. That was the beginning. As I said, everything is because of Ben Whishaw.

Ben Whishaw in 2008

Ben Whishaw in 2008 by KikeValencia – CC-BY-SA

Wikidata: the easy beginning

Wikidata is a free knowledge database with some twenty million entries, under a free license. It’s not made to be directly read by humans (although they can)[9] but to be machine-readable, and to be used in other projects through visualisation tools.[10] I am an experienced Wikidatian by now so, at first, working on Whishaw’s entry seemed easy.

I just had to add more precise occupations (he isn’t just an “actor”, he is a “stage actor”, a “television actor”, a “film actor”…). He received many awards, which should all be listed (P166), as well as for each of them the information about the year it was awarded (P585) and for which work P1686) and even sometimes with whom the award was shared (P1706). And I could do the same work for all the awards he was nominated for (P1411) but didn’t receive. Then I could also list all his roles, which we don’t add to his Wikidata entry but on the works’ entries using “P161 (cast member)” with “Q342617 (Ben Whishaw)” as value. Sometimes we can even use qualifiers, like “P453 (character role)” when the characters themselves have a Wikidata entry (like Q in James Bond).[11]

Wikidata screenshot

So far, so easy. Well, the thing is, Whishaw is primarily a stage actor. I mean, he became well-known for his heartbreaking interpretation of Hamlet at 23 at the Old Vic.[12] It’s a bit strange to see all his TV and film roles listed and not his theatrical ones (Mojo, Bakkhai…). So I started digging about theatre on Wikidata and let me tell you… it’s at least as much under-treated and messy than on Wikipedia! Which is saying something.[13]

Photograph of the front entrance of the Old Vic Theatre

Old Vic Theatre by MrsEllacott – CC-BY-SA 3.0.

Here would be the perfect place to speak about ontologies, semantic web and the questions of knowledge organisation but the consensus between my beta-readers is that my article is already too long and I should focus about the RADA (which is a long time coming) and speak of everything else another time.[14]

The Internet Broadway Database

While I was thinking about the relations between “art”, “work”, “genre” and “performance”,[15] I learned that Whishaw is now[16] in Broadway, where he plays John Proctor in Arthur Miller‘s The Crucible directed by Ivo van Hove.[17] What’s interesting for all of us Wikimedians is that Broadway has already an excellent database (IBDB, Internet Broadway Database). Well done, decently complete, with a limited number of errors;[18] oh joy! And even better: Wikidata already has properties to link to this database (and not only for people; the properties exist also for venues, works and productions).[19]

Luminous ad in front of a theatre

Walter Kerr Theatre, ad for Grey Garden – Michael J Owens CC-BY 2.0

Of course, no one had properly exploited this database before and there were many errors in the wikidatian uses. So I’ve cleaned up every and each of the uses of these properties on Wikidata.[20] And on Wikipedia, because that’s where the errors came from.[21] I complained about the Wikipedians who add absurd references (or worse, don’t add references at all), who aren’t philosophically unnerved when they add a production identifier to a work entry, or who even seem to think that the IBDB identifier is the same one as the IMDB (Internet Movie Database) identifier (oh hell NO!)[22] but, as I am a Wikimedian, I cleaned up nevertheless.

I came to the conclusion that it would be better if, instead of having some correct links, we linked all the entries. Going from “I-worked-on-Ben-Whishaw-so-I-searched-his-IBDB-identifier” to “this is the complete list of IBDB identifiers, we should find the matching Wikidata entries”. For our joy, there is a truly marvellous tool called Mix n’ Match.[23] Here again I could do a detailed presentation of this tool, but to keep the scope of this article I’ll just say it needs to have the complete list of valid identifiers before working; therefore I started hoarding them all.[24] As it wasn’t an instantaneous process,[25] I needed to do something besides that. For those of you willing to give a hand, you can help me match IBDB entries to Wikidata entries: you can do it for works or for people. Do it carefully and if you are not sure, don’t. Thank you, any help is always appreciated.

Back to when all my scripts were running, I didn’t know exactly what to do to occupy myself, so I went again to Whishaw’s entry[26] and noticed he was a RADA (Royal Academy of Dramatic Art) alumnus.[27]

The RADA

Presentation

The cool thing about Wikidata[28] is that not only can we add where people studied (P69) but we can even add numerous details: when they started studying there (P580), when they stopped (P582), what degree they were preparing (P512), their academic major (P812)… There were no references. I didn’t like that at all. I searched for them. I thought: why not try the school’s website? And then… RADA!

Front entrance of the RADA Theatre

RADA Theatre, Malet Street, Londres — CC-BY-SA 2.0

Yes. The RADA had put the profiles of its alumni online. Here is Whishaw’s page for the curious ones.[29] Anyway, I was seeking a source and I’ve found a goldmine. My inner Wikimedian went a little dizzy with happiness[30] and I told myself that now, I not only had a reference for Whishaw, I had references for all RADA alumni, with their year of graduation, their degree, everything, and that I could do mad statistics based on SPARQL queries![31] (and that it would give me an occupation when I retrieved the identifiers of all people who ever worked in a Broadway show).[32]

Naively, I thought that the RADA didn’t have so many alumni (approximately a hundred a year in recent years) and so it wouldn’t take me too much time…[33]

Identification of the relevant entries

On Wikidata

To start, I tried to know what already existed on Wikidata. I wrote a little query to find all the existing Wikidata entries with P69:Q523926 (educated at the Royal Academy of Dramatic Art). I cross-checked with the English category. Actually someone had, a few months ago, added P69:Q523926 on all the entries categorised as “Alumni of the Royal Academy of Dramatic Art”.[34] Anyway, at that time I had no intention of writing this blog post, so I didn’t bother writing down the actual number somewhere but it was like ~650, with a very small gap between the Wikidata query and the English category (so only a few Wikidata entries without articles on the English Wikipedia as a working hypothesis). There were more entries listed on the Wikidata query than there were articles in the category (which is logical) but all the categorised articles were correctly present in the Wikidata list. Not too bad as a start.

To follow my progress, I only had to do two queries: the first one to list all RADA alumni and the second one to list all RADA alumni with a year of graduation (which would mean that someone (me) had added the necessary information).

So beware the first SPARQL queries of this article:

SELECT DISTINCT ?student ?studentLabel
WHERE {
  ?student wdt:P69 wd:Q523926 .
 SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}

link to query.

and

SELECT ?student ?studentLabel 
WHERE {
  ?student wdt:P31 wd:Q5 . # human
  ?student p:P69 ?statement .  # Student of...
  ?statement ps:P69 wd:Q523926 .        # ...RADA
  ?statement pq:P582 ?x . # with end date
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}

link to query.

Easy, as I said.

There were already four or five students for which we had the “end date” information, but we didn’t have a reference, or a reference other than the RADA. I decided not to care and that I would treat these cases at the same time as the others.

On Wikipedia

I had already noted that the whole English category “Alumni of the Royal Academy of Dramatic Art” had the property P69 “educated at” with the RADA value (Q523926) on Wikidata. I knew there were more entries on Wikidata than in the category: where did the difference come from? From uncategorized English articles? [35] From Wikidata entries without a matching article in English?[36]

The category also exists on Wikipedias in others languages: it exists in Spanish, in Arab, in French, in Latin, in Polish, in Russian, in Simple English, in Turkish and in Chinese. But if you visit these pages, you will see they are fairly less complete than the English one (which is logical for a Londonian school) and that they would probably not help me much.[37]

However, the category isn’t the only way to spot students. The English Wikipedia also has a list (List of RADA alumni). This list[38] is interesting because it contains, between brackets, the year of graduation, information missing in the category.

Assuming that all articles present in the category were also on the list, or that all the entries in the list were categorized, was too big of a hope, it seems. Once more, Wikipedia dazzles us with its incomplete management; if there are two systems, of course they won’t match!

Identification: From RADA to Wikidata

I thought the easiest way to begin was to observe the RADA database and search matching entries on Wikidata and Wikipedia. There are indeed many RADA alumni known enough to have a Wikipedia article, but not all of them, let’s not exaggerate. In an ideal world where Wikidata and Wikipedia would have reached completion, once I had verified all the RADA database entries, I should have formally identified the approximately 670 Wikidata entries previously spotted. But as we don’t live in an ideal world and as neither Wikidata nor Wikipedia claims to be complete, I knew before I started that it would very probably not be so easy.

Manual research name by name

At first I thought I would simply search on Wikidata every and each student name listed on the RADA database and hope to find a match. Starting with 1906, the first year with graduates listed[39] as the school opened in 1904.

Very quickly the problems appeared with this painstakingly slow method.

In 1907 for example, the only student listed is “H Bentley”. The Wikidata internal search engine only returns the “H Bentley” and “H. Bentley” with a request with this name. Not “Henry Bentley”, “Harriet Bentley” or whatever. If I had been lucky, someone would have added “H Bentley” as an alias of the wikidata entry label and the search engine would have yielded a result. As I was unlucky but stubborn, I still tried a query like that (not a SPARQL one, it’s an adaptation for Autolist, an old Wikidata tool)[40] :

FIND H% Bentley in Labels in Alias

(link to the autolist query) and hoped it would work.[41] I can also be really dedicated, search for “Bentley” and read quickly all entries… Not as easy as I hoped at first, then.

Typos and database errors

Moreover, the RADA database isn’t immune to typographical errors: I’m reasonably certain that Joan Mibourrrne doesn’t really have three Rs in her last name or Dorothy Reeeve three Es.

Desmond Llewellyn[42] is for example listed on the RADA database as Desmond Wilkinson (Wikipedia says he is called “Desmond Wilkinson Llewellyn”). In fact, that’s not entirely true: he is listed both as “Desmond Llewellyn” (here) and as “Desmond Wilkinson”. Yay duplicate entries![43]

Desmond Llewellyn in 1983

Desmond Llewellyn in 1983 – Towpilot CC-BY-SA 3.0

Actually there are many duplicates in the RADA database. I think far-fetched that there would really be two different students called “Alison James” and “Allison James” who graduated both in 1954…

Disambiguation?

Even without typographical errors, if we find a match between a name in the RADA database and a name in Wikidata, it needs verification. The Rose Hersee, graduated in 1908 isn’t the same Rose Hersee as the singer born in 1845.[44] Verification is really necessary! In many cases that means that I had to read the Wikipedia article (which sometimes cites the RADA! Sometimes even with references!) and most importantly the sources used in these articles (honestly, for the first half of the 20th century, it meant reading dozens of obituaries). Sometimes—yay!—I could confirm the match. Sometimes—yay too!—I could confirm that it wasn’t the same person. But often I didn’t succeed with just a short search because the RADA profiles before 1999 are, let’s say, a little bare.

Several students can have the same name, or some people followed several courses (particularly in postgraduate technical studies). On Wikidata, many items share the same label (well, what would you expect from a name like “John Jones”?…), so it is often necessary to filter several hundreds of results to find the most probable person (and I sincerely thank every Wikimedian who ever completed Wikidata descriptions).[45]

Pseudonyms

They have pseudonyms! Aaaaahhh! And an impressive number of women attained celebrity under their spouse’s name; nobody thought of adding their birth name as an alias on Wikidata. And of course, their RADA entry lists only their original name. Another impressive number of students used pseudonyms (Conrad Havord became known as “Conrad Phillips” for example). Sometimes, it’s even the opposite: the RADA lists the pseudonym they used when they were in the school, or their married name if they were married, or their nickname, and Wikipedia still uses the birth name (for example, June Flewett is listed on the RADA database as Jill Freud, her nickname and husband’s family name). I also like very much Priya Rajvansh listed on RADA as Vera Singh. Each of these cases can only be identified if someone had thought of adding the aliases on Wikidata.[46] And sometimes we even have combo: pseudonyms and typographical errors! We can cite Kay Hammond (pseudonym), whose birth name is “Dorothy Katherine Standing” but who is listed in the RADA website as “Kathrine Standing”. The missing “e” is sufficient for not being returned with a query or a search on Wikidata. Finding her was not easy at all and it was more luck than anything else.

Is Jean Rhys, born “Ella Gwendolen Rees Williams” in 1890 and known for using numerous pseudonyms, the same person as Ella Reeve, the RADA student who graduated in 1909?[47] Vern Agopsowicz became famous under the name John Vernon… I could continue like that for a long time. I went over a hundred “maybe it is them/maybe not” early in April.

Henry Darrow and John Vernon

Henry Darrow and John Vernon – NBC Television, public domain in the USA

Arkanosis helps me!

By then (late March 2016), several Wikimedians already helped me, most notably on my Internet Broadway Database work ((Ahah, had you forgotten?)) but one evening in Cléry ((Wikimedia France has a welcoming space for Wikimedians on rue de Cléry in Paris and we can be found there regularly.)) Arkanosis saw me manually searching the RADA entries and took pity on me. He wrote me a beautiful Linux shell script (later amended by Ash_Crow to become even more easy to use):

#! /bin/sh

if [ $# -ne 2 ]; then
    echo 'Usage: rada.sh <profile> <year>'
    exit 1
fi

profile=$1
year=$2

echo "<html><head><title>Year $year</title></head><body><ul>" > list-$profile-$year.html
wget -q 'https://www.rada.ac.uk/profiles?search='$profile'&yr-acting='$year'&yr-technicaltheatrearts='$year'&crs-technicaltheatrearts=&yr-theatrelab='$year'&yr-directing='$year'&crs-directing=&fn=&sn=' -O - | \
  sed -n 's@.*fn=\([^&]*\).*sn=\([^"&]*\).*@\1 \2@p' | \
  while read firstname lastname; do
    echo "<li><a href=\"https://www.rada.ac.uk/profiles?aos='$profile'&yr=$year&fn=$firstname&sn=$lastname\">$firstname $lastname</a> <a href=\"https://www.wikidata.org/w/index.php?search=&search=$firstname+$lastname&title=Special%3ASearch&go=Lire\">wikidata</a>"
    wget -q 'https://www.wikidata.org/w/api.php?action=query&list=search&srwhat=text&srsearch='$firstname'+'$lastname -O - | \
      sed -n 's@.*title&.*\(Q[0-9]\+\)&.*@\1@p' | \
      while read qid; do
    if grep -q $qid unhandled.lst; then
            echo " <a href=\"http://www.wikidata.org/wiki/$qid\">$qid</a>"
    fi
      done
    echo "</li>"
  done >> list-$profile-$year.html
echo "</ul></body></html>" >> list-$profile-$year.html

The RADA URLs are systematically constructed like this : year/given name/surname,[48] Arkanosis simply extracted listings by year, a row by student, like this:

  • Student’s name (link to the RADA entry) / Wikidata (link to the search page with the name) / eventually a Qid[49] found in the second link and who also appear in the existing list of P69:Q523926 (entries already marked as RADA students)

For example a row for a student of the “acting” course in 1947 looks like:
harold goodwin wikidata Q1585750

Not all rows have a Qid associated (they were a tiny minority, honestly, as by then only ~650 student were listed and the RADA has had much more than 650 students). Not all Qid lead to correct matches either: as I said, there are some people sharing the same name at the RADA; or the Wikidata search engine was, for once, too generous and yielded combinations of given names/surnames not matching the RADA entry (for example a search for Romany Evens offers George Bramwell Evens on Wikidata). Nevertheless, the majority of the suggested Qid lead to matches, which was a way better result than for the rows without Qid. Thank you Arkanosis and Ash_Crow!

Even with these listings, having only to click on the search links instead of doing dozens of copy/paste, I still needed to verify manually each and every entry. [50] The problem when we use the names from the URLs, is the lack of apostrophes and blank spaces. A search of peter otoole on Wikidata doesn’t yield Peter O’Toole for example. So you still need to add the blank spaces, not just clicking and reading the results.

From RADA to Wikipedia: a temporary conclusion

I’ve spent the end of March, April and early May doing this work. At the end of it, I had identified exactly 835 entries, but of course, the vast majority of alumni didn’t have matches (which was to be expected) and a strangely high number yielded only uncertain results. I have 442 rows in a spreadsheet with each a RADA entry and a possibly matching Wikidata entry. I’ll need to dig deeper to confirm (or not) the matches.

Digging deeper – Hans Hillewaert CC-BY-SA 4.0

Identification: from Wikipedia to the RADA

When I finished identifying alumni from the RADA database, I had a problem: there were people listed on the Wikipedia category “Alumni of the RADA” who weren’t on my done list on Wikidata. In a perfect world, at the end of the work on my scripts, the number of Wikidata entries with “studied at:RADA” and the number of Wikidata entries with “studied at:RADA, endtime:something” (and with a RADA reference the query for that) should have been the same. As it isn’t a perfect world, I had people that Wikipedia listed as alumni that I didn’t find in the RADA website. There was some overlap with my “maybe yes/maybe no” list[51] but not so important: my list is mostly composed of people whose drama school I don’t know at all, if they even went to one.

Using PetScan I searched for the list of Wikipedia articles categorized as RADA alumni but which didn’t respond to the query “studied at RADA with an end time”. Link to the automatically updated PetScan query.

I found 132 results, which I—again—treated manually. I identified 23 additional articles (mostly it was cases of pseudonyms or maiden names not present as aliases on Wikidata: they weren’t returned in searches as a result).[52]

At the end of April, the English category listed 907 articles, Wikidata 953 entries and only 850 of them had been correctly completed with a decent reference. And we mustn’t forget that not all Wikidata entries have a matching English Wikipedia article: some actresses and actors have articles on others languages (Norwegian, Italian, German, Romanian…) and a little dozen doesn’t have a Wikipedia article at all, only the Wikidata entry without sitelinks. Their entry was created so Wikidata could list the full cast of a film.

So we query to find the Wikidata entries of RADA alumni without an end date:

SELECT ?student ?studentLabel
WHERE {
  ?student wdt:P31 wd:Q5 .
  ?student p:P69 ?statement .
  ?statement ps:P69 wd:Q523926 .
  FILTER NOT EXISTS { ?statement pq:P582 ?x .}
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
  }

link to the query.
Usually SPARQL is pretty understandable by humans because it made for querying semantic data. However Wikidata is a multilingual database, which consequently use numeric identifiers.[53] I should comment all my queries but I’m lazy and I take advantage of the Wikidata endpoint which declare itself the needed PREFIX and even offers comments: if you hover over a Pid or Qid, you’ll see the name and description in your language. And you can change this language in the top right corner. So I’m entitled to laziness.[54]

Inconsistencies

This list contains mostly entries with sitelinks to the English Wikipedia: the SPARQL query above (on Wikidata but without end date for RADA studies) yielded 112 results at the end of April when the PetScan query (in the English category of RADA alumni but without end time on Wikidata) gave us 110 results. One of these is an article deleted on the English Wikipedia after someone imported the category on Wikidata and the other is about a French actress. So all 111 of these “maybe errors, maybe not, but in all cases lacking references” on Wikidata came from the English Wikipedia. I SEE YOU ENWIKI!

The work now is to find under which name the person was registered at the RADA (beware typographical errors…) or to find why they were categorized as students when they weren’t. For example Ash_Crow corrected the article on George Bernard Shaw, listed as student instead of “people associated with” the RADA. He was very implicated in the school and even gave them part of his heritage[55] but never studied here. For Armaan Kirmani, his IMDB entry says that he was the student of a RADA professor… but that doesn’t mean he went to RADA itself.

George Bernard Shaw in 1915

George Bernard Shaw in 1915 – Public domain in the USA.

In these dozens of problematic cases, there is a little bit of everything, from articles that don’t mention the RADA at all (why were they ever categorized?), to articles that clearly state that the person was a student (but without any sources),[56] to articles that even have sources but these sources aren’t so explicit… The RADA doesn’t propose exclusively graduating courses; they also organize numerous workshops and internships. If an actor or an actress participated in a two-days workshop at RADA, they won’t appear on the RADA website as a student but they could sincerely say in interviews they learned something at RADA… We are only a step away from an enthusiastic Wikipedian deciding they are alumni.

For example Ash_Crow found a source (in French and of not really great quality) saying that Émilie Rault studied at RADA. She is nowhere on the database because it’s very likely she only did workshops there, as she was also studying musicology at the Sorbonne for her master diploma at the time. This should lead us to question the limits we want to fix to the “studied at” property on Wikidata: do we want to use it exclusively for long formations with diplomas or accept everything including workshops of only a few days?

Differences between the list and the category

Like I already said, the “List of RADA alumni” doesn’t match the articles listed in the category. Systematically, every time I’ve identified someone on Wikidata (and subsequently found their Wikidata article), I added their name on the list and I added the category. So I’ve reduced the gap between the two. The article-list should be more complete than the category, since it can hold red links existing on other Wikipedias.

Xavier Combelle has been kind enough to list the differences between the category and the list in early May, the thirty problematic cases mentioned above remained (missing from the list) and in the list, in addition to the usual red links, we found eighteen uncategorised articles. None of them bore any obvious connection to RADA, except for Xenia Kalogeropoulou which could be identified as Xenia Calogeropoulos and was thus categorised. Among those cases, some Wikipedia articles explicitly mentioned training at RADA as constituting in workshops or interships. We go back to the question: what courses warrant being considered an alumni?

Problems with the RADA database

Having listed issues on Wikipedia and Wikidata sides (which amount of: “people add information without references and that information spread everywhere like an epidemic”), we have to face the fact that some of the problems stem from the RADA database itself.

Completeness of data

As we have already seen, the database is littered with double entries, each pseudonym or name spelling yielding a new page instead of centralising these entries on a unique page associated with the student. This is obviously a problem if you are interested in the number of students for a given year, for instance.

From a Wikidata point of view, this prevents resorting to the simple solution of creating one entry for each student, independently on whether a Wikipedia article exists or not. The Cambridge database, for example, associates a fixed identifier to every student, which enabled us to import these identifiers on Wikidata, creating new entries as needed (P1599: ID of the Cambridge Alumni Database).[57] If the RADA had chosen the solution of one identifier per student instead of the URL of the form diploma/year/first name/last name, it would have been easier to import it in its entirety.

Which brings us to the next problem: we have no certainty that the database is complete at all. Nothing to support that is said on the site. A visit to the Internet Archive’s Wayback Machine shows that the database has only been online since 2015, and that before that date only the current students had a profile on the site. If recent data seem complete (from 1999 on, where profiles are detailed and come with photographs), the profiles of the earlier years are sometimes quite patchy. And in particular, some years seem suspiciously poor in students, such as 1988 and every year before 1922.[58]

Could it be that among the dozens RADA alumni without a match in the database, some have been forgotten? One typical case is the one Noel Streatfeild who, according to her website, attended as a student starting in 1919. I did find a “Noel Goodwin” who graduated in 1922, but is that her?

Another example even more explicit is Dora Mavor Moore, who was the first Canadian who went to RADA, per this biographical article, and who graduated in 1912. The problem is, on the RADA website only one student is listed as graduating in 1912 and “Leonard Notcutt” isn’t a known pseudonym of Dora Mavor Moore.

Data reliability

The more strident problem is that some alumni listed in the RADA database left the RADA before graduation. Someone like Harold Pinter has a RADA profile which says he was part of the 1949 class. In fact, Pinter went to RADA in 1948 and left the course in 1949, before the graduation. Does the RADA list every student, no matter if they are actually graduates or if they didn’t finish? In Wikidata we can use the property “diploma” with “no value” instead of the actual diploma in the qualifiers for the “studied at” property.

Wikidata screenshot to the “studied at” statement of Harold Pinter’s entry

It’s a little bit problematic if we can’t trust the official school website to know who has been graduated there…

I have another problem with the RADA entry of Sheila Terry, whom I think I can match to the Wikipedia article Sheila Terry. It’s very likely she didn’t go to London during her studies; according to Wikipedia, she went to the Dickson-Kenwin academy, “a school affiliated with London’s Royal Academy”. Does that means the Dickson-Kenwin academy was then delivering the RADA diplomas? (before the 2000 reform, the RADA delivered its own diplomas). I lack information.

I also have a Jack May of the 1943 class whose Wikipedia article states explicitly he was admitted to RADA and never went

Never so easy, even when the matchings are done!

What am I doing now?

I still do many other things on Wikidata. This article resumes some of my work but not all, far from it. But to stay somehow on topic I’ll only speak here of what I do in relation with the RADA and theatre in general.

For example, people justly said that Wikidata has a property to indicate the birth name of one person, which should always be present (but isn’t in reality) and is useful in particular in cases of women known under their married name. So I’m working to add these birthnames-in-property as aliases to facilitate the future identification. It’s a lot of fun with little scripts, SPARQL and an healthy use of QuickStatements, a tool made to facilitate bulk editions on Wikidata.

I’m also still working on Mix n’ Match to add the correct IBDB identifiers to Wikidata entries about people and works. You can help me, as I already said above. And it’s not just for the pleasure to have identifiers; when we will have enough of them, we will be able to add many informations about Broadway productions on Wikidata. And that will be fantastic!

I started adding data about theatrical awards too, which is long, somewhat repetitive, but is immediately useful. The English Wikipedia mostly already has articles about the most important awards, but many smaller Wikipedia don’t. I’m working on a lua module to be able to generate a Wikinews article based on Wikidata data:[59] in practice, that will mean watching the Tony awards ceremony, adding the data on Wikidata and immediately after the end being able to have a complete table with links and everything just using a template.[60] And that in dozens of languages. Great, no?

I still have to reduce my two lists of RADA students:

  1. one with people categorized as alumni but whom I didn’t find in the RADA website (errors? workshops? missing?): ~112
  2. one with Wikidata entries I think match a RADA student but I don’t have a definite proof: ~400

Solving these two lists should help me reduce the gap between the English category and the English list. And by the way I’m very proud of my French list of RADA alumni, which has names, date, course, diploma, nationality and even some pictures!

I wrote to the RADA archivist in June to at least inform him of the typrographical errors found in their database but he didn’t write me back for now. Which probably can be explained by the fact the RADA archives are moving to a new building this summer. They are probably pretty busy!

And of course, for a purely Wikidatian point of view, I officially launched the WikiProject:Theatre this week. That’s for every Wikidatian new or confirmed who want to join me in my mad quest.

Curious and fun queries and statistics

Everything being said, we still have an interesting sample with ~850 entries. It’s only a small percent of all RADA alumni (and the technical courses are vastly under-represented) but it’s enough to start to have fun with SPARQL queries. We can ask pretty much anything!

If you want to see the results of the queries, click on the links then sur “Run” and a few seconds after, you will be able to explore the answers yourself!

Number of RADA student with a Wikidata entry by year

Well, starting easily, maybe we don’t want the list of RADA alumni but only the number of them with an entry by year of graduation:

SELECT ?year (COUNT(?student) AS ?number)
WHERE {
  ?student wdt:P31 wd:Q5 .
  ?student p:P69 ?statement .
  ?statement ps:P69 wd:Q523926 .
  ?statement pq:P582 ?endtime .
  BIND(YEAR(?endtime) as ?year) . 
} GROUP BY ?year ORDER BY ?year

Query link.

Screenshot of the SPARQL query

We can then do this beautiful graph:

RADA alumni with a Wikidata entry by year of graduation

Average age at graduation

Maybe we can go further. Now that we know when they graduated, can we know at what age they did it? This means our sample will be reduced to the entries with a birthdate of course.

SELECT ?endYear (AVG(?age) AS ?averageAge)
WHERE {
 ?person p:P69 ?radaStatement .
 ?radaStatement ps:P69 wd:Q523926 .
 ?radaStatement pq:P582 ?endDate .
 ?person wdt:P569 ?birthDate .
 BIND(YEAR(?endDate) AS ?endYear)
 BIND(?endYear - YEAR(?birthDate) AS ?age)
} GROUP BY ?endYear ORDER BY ?endYear

Query link

Screenshot of the SPARQL query

Or even something more fun: the average age of graduation (with entire values only, this time), by year and by gender (only “male” and “female” in our sample, but the query could handle others) and to have appearance of seriousness, the number of people in the sample:

SELECT ?endYear ?genderLabel (ROUND(AVG(?age)) AS ?averageAge) (COUNT(?person) AS ?number)
WHERE {
    ?person p:P69 ?radaStatement .
    ?person wdt:P21 ?gender .
    ?gender rdfs:label ?genderLabel filter (lang(?genderLabel) = "en") .
    ?radaStatement ps:P69 wd:Q523926 .
    ?radaStatement pq:P582 ?endDate .
    ?person wdt:P569 ?birthDate .
BIND(YEAR(?endDate) AS ?endYear)
BIND(?endYear - YEAR(?birthDate) AS ?age)
} GROUP BY ?endYear ?genderLabel ORDER BY ?endYear

Query link. I should really do an age pyramid but I suffer of a fit of laziness.[61]

screenshot of the SPARQL query

Timeline of graduates

That was fun but I want something more human-readable, like a timeline with pictures!

#defaultView:Timeline
SELECT DISTINCT ?person ?personLabel ?personDescription (SAMPLE(?GraduationDate) AS ?date) (SAMPLE(?photo) AS ?pic)
WHERE {
  ?person wdt:P31 wd:Q5 .
  ?person p:P69 ?statement .
  ?statement ps:P69 wd:Q523926 .
  ?statement pq:P582 ?GraduationDate .            
OPTIONAL { ?person wdt:P18 ?photo . }
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
} GROUP BY ?person ?personLabel ?personDescription ORDER BY ?date

Query link. You just needed to ask! Beware that this query is heavy and can slow down your browser.

Screenshot of the SPARQL query – timeline

How many nationalities were represented in RADA?

We can do a query to list all the nationalities, and for each the number of students involved, in decreasing order.

SELECT ?nationality ?nationalityLabel (COUNT(?student) AS ?number) {
  ?student p:P69 ?statement .
  ?statement ps:P69 wd:Q523926 .
  ?statement pq:P582 ?endtime .
  ?student wdt:P27 ?nationality .
  ?nationality rdfs:label ?nationalityLabel filter (lang(?nationalityLabel) = "en") .
} GROUP BY ?nationality ?nationalityLabel ORDER BY desc(?number)

Query link. Surprisingly[62] the most frequent is the… British. But hey! More than thirty nationalities!

Screenshot of the SPARQL query

If we just add, as the first line of the query:

#defaultView:BubbleChart

we obtain the results as a bubble chart.[63] It’s explicit:

Screenshot of the bubble chart

Map of birth places of RADA students

I don’t really care what nationalities the alumni are… but I would love to see a map of birthplaces! And I can do that directly in SPARQL!

#defaultView:Map
SELECT DISTINCT ?coords ?birthplaceLabel ?person ?personLabel
WHERE {
  ?person wdt:P31 wd:Q5 .
  ?person wdt:P69 wd:Q523926 .   
  ?person wdt:P19 ?birthplace .
  ?birthplace wdt:P625 ?coords.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}

Query link. If you run the query, you can zoom!

Map of RADA alumni birth places

Layered map of birth places of RADA students by graduation date

Just a simple map? But why not make a layered one? We could ask for a map showing the birth places of RADA graduates, one layer by decade of graduation!

#defaultView:Map
SELECT DISTINCT ?coords (floor(year(?endtime)/10)*10 as ?layer) ?birthplaceLabel ?student ?studentLabel {
  ?student wdt:P31 wd:Q5 .
  ?student p:P69 ?statement .
  ?statement ps:P69 wd:Q523926 .
  ?statement pq:P582 ?endtime .
  ?student wdt:P19 ?birthplace .
  ?birthplace wdt:P625 ?coords.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}

Query link. Run it and play with the layers!

Map of RADA alumni birth places by decade of graduation

Number of RADA alumni cast in a James Bond film

Do you remember that Whishaw and Llewellyn played Q? Exactly how many RADA students did play in a James Bond film?

SELECT DISTINCT ?actor ?actorLabel
WHERE {
  ?film wdt:P179 wd:Q2484680 .
  ?film wdt:P161 ?actor .
  ?actor wdt:P69 wd:Q523926 .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
} ORDER BY ?actorLabel

Query link.
More than forty!

Screenshot of the query results

RADA alumni by James Bond films, ordered by date

Hsarrazin liked my James Bond query but she wanted more: she wanted to know which former student was cast in which James Bond film, and to order the results by publication date of the film. Of course this means that people who worked in several films are listed several times.

SELECT DISTINCT ?actor ?actorLabel ?film ?filmLabel ?year
WHERE {
  BIND(YEAR(?date) AS ?year)
  ?film wdt:P179 wd:Q2484680 .
  ?film wdt:P577 ?date .
  ?film wdt:P161 ?actor .
  ?actor wdt:P69 wd:Q523926 .
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
} ORDER BY ?year

Query link.

Screenshot of the query results

RADA alumni by James Bond films, in a graph

I don’t actually know why Hsarrazin wanted a table, when we could have all RADA alumni playing in a James Bond film as a graph:

#defaultView:Graph
SELECT DISTINCT ?actor ?actorLabel (concat("24890D") as ?rgb) ?film ?filmLabel ?year
WHERE {
  BIND(YEAR(?date) AS ?year)
  ?film wdt:P179 wd:Q2484680 .
  ?film wdt:P577 ?date .
  ?film wdt:P161 ?actor .
  ?actor wdt:P69 wd:Q523926 .
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
} ORDER BY ?year

Query link. Beware, this query is heavy, even more than the timeline one.[64]

Screenshot of the graph generated by the query

And in all films?

Well, James Bond is great, but why limit ourselves to it? Can’t we just have all films on Wikidata with more than 5 actors or actresses listed in the casting, ordered by the number of them who studied at RADA?

SELECT DISTINCT ?film ?filmLabel (COUNT(?actors) AS ?nbActors)
WHERE {
  ?film wdt:P31/wdt:P279* wd:Q11424 .
  ?film wdt:P161 ?actors .
  ?actors wdt:P69 wd:Q523926 .
 SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
} GROUP BY ?film ?filmLabel ORDER BY DESC(?nbActors)

Query link.

Screenshot of the query results

Films by rate of actors and actresses who studied at RADA

That’s fun but I want even more fun: it’s not exactly the same if there are five RADA actors-actresses in a distribution of eight or in a distribution of one hundred. I want all films with at least five people in casting ordered by rate of RADA students!

SELECT DISTINCT ?film ?filmLabel ((xsd:float(?nbRadaActors)/xsd:float(?totalNbActors)) AS ?rate)
WHERE {
  {
    SELECT DISTINCT ?film (COUNT(?actors) AS ?nbRadaActors) {
      ?film wdt:P31/wdt:P279* wd:Q11424 .
      ?film wdt:P161 ?actors .
      ?actors wdt:P69 wd:Q523926 .
    } GROUP BY ?film
  }
  {
    SELECT DISTINCT ?film (COUNT(?actors) AS ?totalNbActors) {
      ?film wdt:P31/wdt:P279* wd:Q11424 .
      ?film wdt:P161 ?actors .
    } GROUP BY ?film HAVING (?totalNbActors >= 5)
  }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
} ORDER BY DESC(?rate)

Query link. Honestly the more “high rated” are probably films with an incomplete cast but I love this query anyway.

Screenshot of the query results

RADA alumni who worked in Broadway

Remember when we worked on the Broadway database? How many RADA students ever worked in Broadway? (We are considering that “working in Broadway” means “having an Internet Broadway Database identifier”). Well at least…

SELECT DISTINCT ?human ?humanLabel
WHERE {
  ?human wdt:P31 wd:Q5 .
  ?human p:P1220 ?ID .
  ?human wdt:P69 wd:Q523926 .
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}

Query link. 276 listed mid-August 2016!

Screenshot of the query results

RADA alumni with a Tony award win or nomination

So… if RADA alumni worked in Broadway… how many of them were nominated to or received a Tony award?

SELECT DISTINCT ?human ?humanLabel ?reason ?distinctionLabel (year(?date) as ?year)
WHERE {
?human wdt:P69 wd:Q523926 .
?human ?prop ?distinctionStatement .
?distinctionStatement ?propS ?distinction .

VALUES (?prop ?propS ?reason) {
(p:P1411 ps:P1411 "nominated for")
(p:P166 ps:P166 "award received")
}

?distinction wdt:P31*/wdt:P279 wd:Q191874 .

OPTIONAL { ?distinctionStatement pq:P585 ?date . }
?human wdt:P31 wd:Q5 .

SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }

} ORDER BY ?humanLabel ?distinction ?year

Query link. More than one hundred!

Screenshot of the query results

And by the way we should verify that all people nominated to or awarded a Tony Award have an IBDB identifier! (if all is right in the world, this query should lend you a “No matching records found”):

SELECT DISTINCT ?human ?humanLabel
WHERE {
?human wdt:P31 wd:Q5 .
?human wdt:P1411*/wdt:P279 wd:Q191874 .
FILTER NOT EXISTS { ?human wdt:P1220 ?ibdb . }
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}

Query link.

All Tony Awards!

Hey! Can we list all people nominated to/awarded a Tony, by win/nomination, by award and by year? Like everyone ever? Well yes, of course, it’s SPARQL!

SELECT ?human ?humanLabel ?reason ?distinctionLabel (year(?date) as ?year)
WHERE {
  ?human ?prop ?statement .
  ?statement pq:P805 ?ceremony .
  ?ceremony wdt:P31 wd:Q24569309 .
  ?statement ?propS ?distinction .
  
  VALUES (?prop ?propS ?reason) {
    (p:P1411 ps:P1411 "nominated for") 
    (p:P166 ps:P166 "award received")
  }
  OPTIONAL { ?statement pq:P585 ?date . }
  
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }

} ORDER BY ?year ?distinctionLabel

Query link.

Screenshot of the query results

Conclusion

  1. I’m not done;
  2. I hope the RADA archivist will be as kind as he seems;
  3. People, seriously, you should add aliases on Wikidata items;
  4. And sources. Sources are great;
  5. And you also should take photographs of Ben Whishaw, we are clearly lacking on free Whishaw’s pictures;
  6. Isn’t SPARQL a lot of fun? Whatever your question, someone can ask Wikidata for the answer![65]
Tags:

5 comments posted.

Footnotes

  1. Like this one.

  2. If you went for the SPARQL queries, you should know that they were made for the Wikidata endpoint where most PREFIX are already declared.

  3. But if you have already read it in French, you can still read it again: I updated it!

  4. He isn’t the only one, Rory Kinnear is also excellent in Bolingbroke (and Patrick Stewart makes a terrific Gaunt ♥) but Bolingbroke is interesting in Shakespeare’s play, meanwhile Richard II is the guy who declaims too long monologues without being clearly a good guy we could attach ourselves to or a bad guy we could unashamedly hate. In my experience, in the previous adaptations I saw, it was either a very tiring character or a character played so badly it became funny. He really develops as a character in the fourth act which is a bit late, to be honest. But Whishaw as a child-king-turned-into-an-adult-but-not-really, sometimes capricious, sometimes christic, always mercurial, made me believe in this character long before the Fourth Act. I could write an entire blog article about The Hollow Crown, actors, costumes and sets (and the cinematography! it was really good too) but I’m theoretically here to speak about Wikidata and the RADA and you’ll see it’ll be long in coming…

  5. Actually, he did receive a BAFTA award for this role so my opinion was somewhat shared, it seems.

  6. Yes, I’m French. In case the fact I wrote the first version of this text in French escaped you.

  7. Yes, I’m not only French but a Wikimedian one at that.

  8. Yes, I’m collecting sources and references to redact completely anew the article but first of all, I never wrote an article about a living person, and secondly, I was somewhat occupied since then, as you’ll see if you pursue your reading.

  9. Actually, if you want to read Wikidata and not modify it, I strongly suggest you to use Reasonator, whose slogan is “Wikidata in pretty”.

  10. If you don’t know Wikidata at all and are curious, this Commons category hosts several presentations.

  11. Yes, because Whishaw also played in James Bond and by the way there are many Shakespearean actresses and actors in the last James Bond’s films.

  12. Yes, because Richard II wasn’t even his first Shakespearean role, and not even his second, as he played Ariel in the film adaptation of The Tempest with Helen Mirren as Prosper(a). It isn’t a really good film, frankly, but it needs to be seen because Mirren and Whishaw.

  13. It’s very marginally better on the English Wikipedia than it is on the French Wikipedia, but my point still stands.

  14. And honestly you don’t really need to understand all of that to understand what I do, or even to do the same yourself. It’s fascinating and compelling and if you are interested that’s pretty great, but don’t be frightened or put out if you aren’t.

  15. If by chance you are a specialist of upper ontologies, I would be honoured to talk to you.

  16. Well, uh, he was when I first wrote this article, but the run has ended now.

  17. You are maybe totally indifferent to this information but I would have so loved to go to Broadway to see it…

  18. Not only the IBDB has really few errors, they correct it in less than a week when we tell them.

  19. Yes, I created the Wikidata entry of The Crucible production.

  20. Not sure if it’ll make you laugh but it was the first time I saw a property where all uses were wrong. P1218 has won entry in my personal wtf? ratings. It’s cleared up now, obviously.

  21. In my experience, if there is an error on Wikidata, it nearly always comes from Wikipedia and the IBDB was no exception.

  22. Catalan Wikipedia, I see you!

  23. Which also exist in mobile phone version for people wanting to complete Wikidata on their mobile phones (and who have a mobile phone which can go to the internet).

  24. With help from fellow Wikimedians. They are, objectively, fantastic people.

  25. But I’m done now for this English version of the article! Yay me!

  26. It’s still his fault.

  27. Who said “finally!”?

  28. In fact, there are many cool things about Wikidata.

  29. Where we learn he listed “cat breeding” as a special interest, very important information, admit it, which I couldn’t add on Wikidata. I’m the disappointment incarnated.

  30. I will deny that I cooed at my computer seeing that. Well, maybe not deny. But it was in a dignified sort of way!

  31. SPARQL is the language with which you can ask questions to a semantic database and it answers. And I love SPARQL.

  32. Did you know that more than ~150 000 people worked at Broadway?

  33. I’m a deeply optimistic kind of person.

  34. Action also known as “how to upload Wikipedias’ errors on Wikidata”, see previous note.

  35. Spoiler: Yes, in part.

  36. Spoiler: Yes, that too.

  37. Since then, I did a little bit of work on the French one, so it’s not as bad now as it was back then.

  38. List which is organized manually and not as a sortable table! It’s totally stupid, as we can’t easily sort by year for example, or by diploma. Urrrgh. (I created the French list at the end on July 2016 as a sortable, referenced table, just because).

  39. In this precise case, only one.

  40. I should try to do that one in SPARQL too!

  41. Spoiler: no. I still don’t know who is “H Bentley”. They very probably don’t have a Wikidata entry yet.

  42. Because Whishaw isn’t even the first RADA student to have played Q in James Bond.

  43. And here disappears the hope to know what % of RADA alumni have a Wikidata entry. Sigh.

  44. Which the RADA kindly confirmed on twitter. I love them!

  45. Descriptions are good! Descriptions are great! We love descriptions!

  46. Aliases are good! Aliases are great! We love aliases too!

  47. I really think so but I haven’t found a reference to back this claim yet. Some cases are much more difficult.

  48. Thank you RADA technicians to have chosen to do that consistently!

  49. Wikidatians call “Qid” the Wikidata identifier of an item because the identifier always begin by “Q”.

  50. Finally, the RADA has had too many alumni. Not optimistic any more.

  51. Which second the “maybe yes” hypothesis.

  52. Haven’t I already said how much I love aliases? It bears repeating.

  53. Which are strangely less understandable if you don’t happen to memorize that “Q523926” means “Royal Academy of Dramatic Art”.

  54. So there.

  55. Which was very kind of him.

  56. Like Margaret Rutherford: someone added summarily the information in 2008, then the page was categorized in 2010…

  57. Mix n’ Match can tag an identifier as needing a Wikidata entry to be created.

  58. With the exception of 1908 and 1909, all other years only have a unique student listed.

  59. But frankly, my lua isn’t good enough right now.

  60. Tpt created a similar lua module for me when I worked on sled dog races. You can see it in use in this French Wikinews article: the table is dynamic and show the content of the race’s Wikidata entry.

  61. Yes, again.

  62. Or not.

  63. No surprise here.

  64. Am I the only weird one out there who like to poke the graph and see it rearrange itself all around my screen?

  65. And I can write footnotes, which are clearly the other fun part of this article, hands down.

Comments

Charlie Sept. 2, 2016, 4:49 p.m.

Thanks for the great article and all the work you've put in! I enjoyed reading it a lot and learned some new things today :D

Harmonia Amanda Sept. 2, 2016, 7:53 p.m.

Thank you for your kind comment!

David McDonell Sept. 5, 2016, 4:51 p.m.

Wow! Amazing!! Thank you for both the work and the educational &amp; entertaining narrative;-) Hats off!

Barbara Cohen-Stratyner May 24, 2017, 12:10 a.m.

Leonard Notcutt, who you mention as having graduated in 1912, was an actor with the Beerbohm-Tree company and made at least one film. He won the Bancroft Gold Medal. His career was short because he joined up and was killed in WWI action, May 1917. I know about him because he was married to a photographer, Florence Vandamm. I would very much like to find a photograph of him. Question -- is the Noel Streatfielld that you also found the author of that name?

Harmonia Amanda May 24, 2017, 8:16 p.m.

Thank you very much for the information about Leonard Notcutt! Yes, the Noel Streatfield who graduated RADA is the author. I discovered it during my search.

Comments are closed.