Ben Whishaw, Broadway, the RADA and Wikidata (in English and with updates)
Hello everyone! Here is Harmonia Amanda, squatting Ash_Crow’s blog. Some people told me repeatedly I should write about some of what I did these last few months on Wikidata, e. g. all my work about the RADA (Royal Academy of Dramatic Art) and other things. And after I wrote it in French, some people told me I should write it again in English. So here we are! To ensure that no one will read it, I wrote a long text, stuffed with footnotes,[1] and even with real SPARQL queries[2] here and there. No need to thank me.[3]
How it begins: The Hollow Crown
Everything is because of Ben Whishaw. I was quietly watching Shakespeare’s adaptations by the BBC (and for those who haven’t watched The Hollow Crown, I suggest to do so) and I was thinking that the actor playing Richard II deserved an award for his role, because he was simply extraordinary.[4] [5] So I went lurking on his French Wikipedia page[6] and as a good Wikimedian,[7] I decided to make it a little bit better. For now[8] I’ve mostly cleaned up the wikicode and dealt with accessibility for blind-reading software. As I couldn’t instantaneously make it a featured article, I thought it could be fun to complete his Wikidata entry. That was the beginning. As I said, everything is because of Ben Whishaw.
Wikidata: the easy beginning
Wikidata is a free knowledge database with some twenty million entries, under a free license. It’s not made to be directly read by humans (although they can)[9] but to be machine-readable, and to be used in other projects through visualisation tools.[10] I am an experienced Wikidatian by now so, at first, working on Whishaw’s entry seemed easy.
I just had to add more precise occupations (he isn’t just an “actor”, he is a “stage actor”, a “television actor”, a “film actor”…). He received many awards, which should all be listed (P166), as well as for each of them the information about the year it was awarded (P585) and for which work P1686) and even sometimes with whom the award was shared (P1706). And I could do the same work for all the awards he was nominated for (P1411) but didn’t receive. Then I could also list all his roles, which we don’t add to his Wikidata entry but on the works’ entries using “P161 (cast member)” with “Q342617 (Ben Whishaw)” as value. Sometimes we can even use qualifiers, like “P453 (character role)” when the characters themselves have a Wikidata entry (like Q in James Bond).[11]
So far, so easy. Well, the thing is, Whishaw is primarily a stage actor. I mean, he became well-known for his heartbreaking interpretation of Hamlet at 23 at the Old Vic.[12] It’s a bit strange to see all his TV and film roles listed and not his theatrical ones (Mojo, Bakkhai…). So I started digging about theatre on Wikidata and let me tell you… it’s at least as much under-treated and messy than on Wikipedia! Which is saying something.[13]
Here would be the perfect place to speak about ontologies, semantic web and the questions of knowledge organisation but the consensus between my beta-readers is that my article is already too long and I should focus about the RADA (which is a long time coming) and speak of everything else another time.[14]
The Internet Broadway Database
While I was thinking about the relations between “art”, “work”, “genre” and “performance”,[15] I learned that Whishaw is now[16] in Broadway, where he plays John Proctor in Arthur Miller‘s The Crucible directed by Ivo van Hove.[17] What’s interesting for all of us Wikimedians is that Broadway has already an excellent database (IBDB, Internet Broadway Database). Well done, decently complete, with a limited number of errors;[18] oh joy! And even better: Wikidata already has properties to link to this database (and not only for people; the properties exist also for venues, works and productions).[19]
Of course, no one had properly exploited this database before and there were many errors in the wikidatian uses. So I’ve cleaned up every and each of the uses of these properties on Wikidata.[20] And on Wikipedia, because that’s where the errors came from.[21] I complained about the Wikipedians who add absurd references (or worse, don’t add references at all), who aren’t philosophically unnerved when they add a production identifier to a work entry, or who even seem to think that the IBDB identifier is the same one as the IMDB (Internet Movie Database) identifier (oh hell NO!)[22] but, as I am a Wikimedian, I cleaned up nevertheless.
I came to the conclusion that it would be better if, instead of having some correct links, we linked all the entries. Going from “I-worked-on-Ben-Whishaw-so-I-searched-his-IBDB-identifier” to “this is the complete list of IBDB identifiers, we should find the matching Wikidata entries”. For our joy, there is a truly marvellous tool called Mix n’ Match.[23] Here again I could do a detailed presentation of this tool, but to keep the scope of this article I’ll just say it needs to have the complete list of valid identifiers before working; therefore I started hoarding them all.[24] As it wasn’t an instantaneous process,[25] I needed to do something besides that. For those of you willing to give a hand, you can help me match IBDB entries to Wikidata entries: you can do it for works or for people. Do it carefully and if you are not sure, don’t. Thank you, any help is always appreciated.
Back to when all my scripts were running, I didn’t know exactly what to do to occupy myself, so I went again to Whishaw’s entry[26] and noticed he was a RADA (Royal Academy of Dramatic Art) alumnus.[27]
The RADA
Presentation
The cool thing about Wikidata[28] is that not only can we add where people studied (P69) but we can even add numerous details: when they started studying there (P580), when they stopped (P582), what degree they were preparing (P512), their academic major (P812)… There were no references. I didn’t like that at all. I searched for them. I thought: why not try the school’s website? And then… RADA!
Yes. The RADA had put the profiles of its alumni online. Here is Whishaw’s page for the curious ones.[29] Anyway, I was seeking a source and I’ve found a goldmine. My inner Wikimedian went a little dizzy with happiness[30] and I told myself that now, I not only had a reference for Whishaw, I had references for all RADA alumni, with their year of graduation, their degree, everything, and that I could do mad statistics based on SPARQL queries![31] (and that it would give me an occupation when I retrieved the identifiers of all people who ever worked in a Broadway show).[32]
Naively, I thought that the RADA didn’t have so many alumni (approximately a hundred a year in recent years) and so it wouldn’t take me too much time…[33]
Identification of the relevant entries
On Wikidata
To start, I tried to know what already existed on Wikidata. I wrote a little query to find all the existing Wikidata entries with P69:Q523926 (educated at the Royal Academy of Dramatic Art). I cross-checked with the English category. Actually someone had, a few months ago, added P69:Q523926 on all the entries categorised as “Alumni of the Royal Academy of Dramatic Art”.[34] Anyway, at that time I had no intention of writing this blog post, so I didn’t bother writing down the actual number somewhere but it was like ~650, with a very small gap between the Wikidata query and the English category (so only a few Wikidata entries without articles on the English Wikipedia as a working hypothesis). There were more entries listed on the Wikidata query than there were articles in the category (which is logical) but all the categorised articles were correctly present in the Wikidata list. Not too bad as a start.
To follow my progress, I only had to do two queries: the first one to list all RADA alumni and the second one to list all RADA alumni with a year of graduation (which would mean that someone (me) had added the necessary information).
So beware the first SPARQL queries of this article:
SELECT DISTINCT ?student ?studentLabel WHERE { ?student wdt:P69 wd:Q523926 . SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . } }
and
SELECT ?student ?studentLabel WHERE { ?student wdt:P31 wd:Q5 . # human ?student p:P69 ?statement . # Student of... ?statement ps:P69 wd:Q523926 . # ...RADA ?statement pq:P582 ?x . # with end date SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . } }
Easy, as I said.
There were already four or five students for which we had the “end date” information, but we didn’t have a reference, or a reference other than the RADA. I decided not to care and that I would treat these cases at the same time as the others.
On Wikipedia
I had already noted that the whole English category “Alumni of the Royal Academy of Dramatic Art” had the property P69 “educated at” with the RADA value (Q523926) on Wikidata. I knew there were more entries on Wikidata than in the category: where did the difference come from? From uncategorized English articles? [35] From Wikidata entries without a matching article in English?[36]
The category also exists on Wikipedias in others languages: it exists in Spanish, in Arab, in French, in Latin, in Polish, in Russian, in Simple English, in Turkish and in Chinese. But if you visit these pages, you will see they are fairly less complete than the English one (which is logical for a Londonian school) and that they would probably not help me much.[37]
However, the category isn’t the only way to spot students. The English Wikipedia also has a list (List of RADA alumni). This list[38] is interesting because it contains, between brackets, the year of graduation, information missing in the category.
Assuming that all articles present in the category were also on the list, or that all the entries in the list were categorized, was too big of a hope, it seems. Once more, Wikipedia dazzles us with its incomplete management; if there are two systems, of course they won’t match!
Identification: From RADA to Wikidata
I thought the easiest way to begin was to observe the RADA database and search matching entries on Wikidata and Wikipedia. There are indeed many RADA alumni known enough to have a Wikipedia article, but not all of them, let’s not exaggerate. In an ideal world where Wikidata and Wikipedia would have reached completion, once I had verified all the RADA database entries, I should have formally identified the approximately 670 Wikidata entries previously spotted. But as we don’t live in an ideal world and as neither Wikidata nor Wikipedia claims to be complete, I knew before I started that it would very probably not be so easy.
Manual research name by name
At first I thought I would simply search on Wikidata every and each student name listed on the RADA database and hope to find a match. Starting with 1906, the first year with graduates listed[39] as the school opened in 1904.
Very quickly the problems appeared with this painstakingly slow method.
In 1907 for example, the only student listed is “H Bentley”. The Wikidata internal search engine only returns the “H Bentley” and “H. Bentley” with a request with this name. Not “Henry Bentley”, “Harriet Bentley” or whatever. If I had been lucky, someone would have added “H Bentley” as an alias of the wikidata entry label and the search engine would have yielded a result. As I was unlucky but stubborn, I still tried a query like that (not a SPARQL one, it’s an adaptation for Autolist, an old Wikidata tool)[40] :
FIND H% Bentley in Labels in Alias
(link to the autolist query) and hoped it would work.[41] I can also be really dedicated, search for “Bentley” and read quickly all entries… Not as easy as I hoped at first, then.
Typos and database errors
Moreover, the RADA database isn’t immune to typographical errors: I’m reasonably certain that Joan Mibourrrne doesn’t really have three Rs in her last name or Dorothy Reeeve three Es.
Desmond Llewellyn[42] is for example listed on the RADA database as Desmond Wilkinson (Wikipedia says he is called “Desmond Wilkinson Llewellyn”). In fact, that’s not entirely true: he is listed both as “Desmond Llewellyn” (here) and as “Desmond Wilkinson”. Yay duplicate entries![43]
Actually there are many duplicates in the RADA database. I think far-fetched that there would really be two different students called “Alison James” and “Allison James” who graduated both in 1954…
Disambiguation?
Even without typographical errors, if we find a match between a name in the RADA database and a name in Wikidata, it needs verification. The Rose Hersee, graduated in 1908 isn’t the same Rose Hersee as the singer born in 1845.[44] Verification is really necessary! In many cases that means that I had to read the Wikipedia article (which sometimes cites the RADA! Sometimes even with references!) and most importantly the sources used in these articles (honestly, for the first half of the 20th century, it meant reading dozens of obituaries). Sometimes—yay!—I could confirm the match. Sometimes—yay too!—I could confirm that it wasn’t the same person. But often I didn’t succeed with just a short search because the RADA profiles before 1999 are, let’s say, a little bare.
Several students can have the same name, or some people followed several courses (particularly in postgraduate technical studies). On Wikidata, many items share the same label (well, what would you expect from a name like “John Jones”?…), so it is often necessary to filter several hundreds of results to find the most probable person (and I sincerely thank every Wikimedian who ever completed Wikidata descriptions).[45]
Pseudonyms
They have pseudonyms! Aaaaahhh! And an impressive number of women attained celebrity under their spouse’s name; nobody thought of adding their birth name as an alias on Wikidata. And of course, their RADA entry lists only their original name. Another impressive number of students used pseudonyms (Conrad Havord became known as “Conrad Phillips” for example). Sometimes, it’s even the opposite: the RADA lists the pseudonym they used when they were in the school, or their married name if they were married, or their nickname, and Wikipedia still uses the birth name (for example, June Flewett is listed on the RADA database as Jill Freud, her nickname and husband’s family name). I also like very much Priya Rajvansh listed on RADA as Vera Singh. Each of these cases can only be identified if someone had thought of adding the aliases on Wikidata.[46] And sometimes we even have combo: pseudonyms and typographical errors! We can cite Kay Hammond (pseudonym), whose birth name is “Dorothy Katherine Standing” but who is listed in the RADA website as “Kathrine Standing”. The missing “e” is sufficient for not being returned with a query or a search on Wikidata. Finding her was not easy at all and it was more luck than anything else.
Is Jean Rhys, born “Ella Gwendolen Rees Williams” in 1890 and known for using numerous pseudonyms, the same person as Ella Reeve, the RADA student who graduated in 1909?[47] Vern Agopsowicz became famous under the name John Vernon… I could continue like that for a long time. I went over a hundred “maybe it is them/maybe not” early in April.
Arkanosis helps me!
By then (late March 2016), several Wikimedians already helped me, most notably on my Internet Broadway Database work ((Ahah, had you forgotten?)) but one evening in Cléry ((Wikimedia France has a welcoming space for Wikimedians on rue de Cléry in Paris and we can be found there regularly.)) Arkanosis saw me manually searching the RADA entries and took pity on me. He wrote me a beautiful Linux shell script (later amended by Ash_Crow to become even more easy to use):
#! /bin/sh if [ $# -ne 2 ]; then echo 'Usage: rada.sh <profile> <year>' exit 1 fi profile=$1 year=$2 echo "<html><head><title>Year $year</title></head><body><ul>" > list-$profile-$year.html wget -q 'https://www.rada.ac.uk/profiles?search='$profile'&yr-acting='$year'&yr-technicaltheatrearts='$year'&crs-technicaltheatrearts=&yr-theatrelab='$year'&yr-directing='$year'&crs-directing=&fn=&sn=' -O - | \ sed -n 's@.*fn=\([^&]*\).*sn=\([^"&]*\).*@\1 \2@p' | \ while read firstname lastname; do echo "<li><a href=\"https://www.rada.ac.uk/profiles?aos='$profile'&yr=$year&fn=$firstname&sn=$lastname\">$firstname $lastname</a> <a href=\"https://www.wikidata.org/w/index.php?search=&search=$firstname+$lastname&title=Special%3ASearch&go=Lire\">wikidata</a>" wget -q 'https://www.wikidata.org/w/api.php?action=query&list=search&srwhat=text&srsearch='$firstname'+'$lastname -O - | \ sed -n 's@.*title&.*\(Q[0-9]\+\)&.*@\1@p' | \ while read qid; do if grep -q $qid unhandled.lst; then echo " <a href=\"http://www.wikidata.org/wiki/$qid\">$qid</a>" fi done echo "</li>" done >> list-$profile-$year.html echo "</ul></body></html>" >> list-$profile-$year.html
The RADA URLs are systematically constructed like this : year/given name/surname,[48] Arkanosis simply extracted listings by year, a row by student, like this:
- Student’s name (link to the RADA entry) / Wikidata (link to the search page with the name) / eventually a Qid[49]
found in the second link and who also appear in the existing list of
P69:Q523926
(entries already marked as RADA students)
For example a row for a student of the “acting” course in 1947 looks like:
harold goodwin wikidata Q1585750
Not all rows have a Qid associated (they were a tiny minority, honestly, as by then only ~650 student were listed and the RADA has had much more than 650 students). Not all Qid lead to correct matches either: as I said, there are some people sharing the same name at the RADA; or the Wikidata search engine was, for once, too generous and yielded combinations of given names/surnames not matching the RADA entry (for example a search for Romany Evens offers George Bramwell Evens on Wikidata). Nevertheless, the majority of the suggested Qid lead to matches, which was a way better result than for the rows without Qid. Thank you Arkanosis and Ash_Crow!
Even with these listings, having only to click on the search links instead of doing dozens of copy/paste, I still needed to verify manually each and every entry. [50] The problem when we use the names from the URLs, is the lack of apostrophes and blank spaces. A search of peter otoole on Wikidata doesn’t yield Peter O’Toole for example. So you still need to add the blank spaces, not just clicking and reading the results.
From RADA to Wikipedia: a temporary conclusion
I’ve spent the end of March, April and early May doing this work. At the end of it, I had identified exactly 835 entries, but of course, the vast majority of alumni didn’t have matches (which was to be expected) and a strangely high number yielded only uncertain results. I have 442 rows in a spreadsheet with each a RADA entry and a possibly matching Wikidata entry. I’ll need to dig deeper to confirm (or not) the matches.
Identification: from Wikipedia to the RADA
When I finished identifying alumni from the RADA database, I had a problem: there were people listed on the Wikipedia category “Alumni of the RADA” who weren’t on my done list on Wikidata. In a perfect world, at the end of the work on my scripts, the number of Wikidata entries with “studied at:RADA” and the number of Wikidata entries with “studied at:RADA, endtime:something” (and with a RADA reference the query for that) should have been the same. As it isn’t a perfect world, I had people that Wikipedia listed as alumni that I didn’t find in the RADA website. There was some overlap with my “maybe yes/maybe no” list[51] but not so important: my list is mostly composed of people whose drama school I don’t know at all, if they even went to one.
Using PetScan I searched for the list of Wikipedia articles categorized as RADA alumni but which didn’t respond to the query “studied at RADA with an end time”. Link to the automatically updated PetScan query.
I found 132 results, which I—again—treated manually. I identified 23 additional articles (mostly it was cases of pseudonyms or maiden names not present as aliases on Wikidata: they weren’t returned in searches as a result).[52]
At the end of April, the English category listed 907 articles, Wikidata 953 entries and only 850 of them had been correctly completed with a decent reference. And we mustn’t forget that not all Wikidata entries have a matching English Wikipedia article: some actresses and actors have articles on others languages (Norwegian, Italian, German, Romanian…) and a little dozen doesn’t have a Wikipedia article at all, only the Wikidata entry without sitelinks. Their entry was created so Wikidata could list the full cast of a film.
So we query to find the Wikidata entries of RADA alumni without an end date:
SELECT ?student ?studentLabel WHERE { ?student wdt:P31 wd:Q5 . ?student p:P69 ?statement . ?statement ps:P69 wd:Q523926 . FILTER NOT EXISTS { ?statement pq:P582 ?x .} SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . } }
link to the query.
Usually SPARQL is pretty understandable by humans because it made for querying semantic data. However Wikidata is a multilingual database, which consequently use numeric identifiers.[53]
I should comment all my queries but I’m lazy and I take advantage of the Wikidata endpoint which declare itself the needed PREFIX
and even offers comments: if you hover over a Pid or Qid, you’ll see the name and description in your language. And you can change this language in the top right corner. So I’m entitled to laziness.[54]
Inconsistencies
This list contains mostly entries with sitelinks to the English Wikipedia: the SPARQL query above (on Wikidata but without end date for RADA studies) yielded 112 results at the end of April when the PetScan query (in the English category of RADA alumni but without end time on Wikidata) gave us 110 results. One of these is an article deleted on the English Wikipedia after someone imported the category on Wikidata and the other is about a French actress. So all 111 of these “maybe errors, maybe not, but in all cases lacking references” on Wikidata came from the English Wikipedia. I SEE YOU ENWIKI!
The work now is to find under which name the person was registered at the RADA (beware typographical errors…) or to find why they were categorized as students when they weren’t. For example Ash_Crow corrected the article on George Bernard Shaw, listed as student instead of “people associated with” the RADA. He was very implicated in the school and even gave them part of his heritage[55] but never studied here. For Armaan Kirmani, his IMDB entry says that he was the student of a RADA professor… but that doesn’t mean he went to RADA itself.
In these dozens of problematic cases, there is a little bit of everything, from articles that don’t mention the RADA at all (why were they ever categorized?), to articles that clearly state that the person was a student (but without any sources),[56] to articles that even have sources but these sources aren’t so explicit… The RADA doesn’t propose exclusively graduating courses; they also organize numerous workshops and internships. If an actor or an actress participated in a two-days workshop at RADA, they won’t appear on the RADA website as a student but they could sincerely say in interviews they learned something at RADA… We are only a step away from an enthusiastic Wikipedian deciding they are alumni.
For example Ash_Crow found a source (in French and of not really great quality) saying that Émilie Rault studied at RADA. She is nowhere on the database because it’s very likely she only did workshops there, as she was also studying musicology at the Sorbonne for her master diploma at the time. This should lead us to question the limits we want to fix to the “studied at” property on Wikidata: do we want to use it exclusively for long formations with diplomas or accept everything including workshops of only a few days?
Differences between the list and the category
Like I already said, the “List of RADA alumni” doesn’t match the articles listed in the category. Systematically, every time I’ve identified someone on Wikidata (and subsequently found their Wikidata article), I added their name on the list and I added the category. So I’ve reduced the gap between the two. The article-list should be more complete than the category, since it can hold red links existing on other Wikipedias.
Xavier Combelle has been kind enough to list the differences between the category and the list in early May, the thirty problematic cases mentioned above remained (missing from the list) and in the list, in addition to the usual red links, we found eighteen uncategorised articles. None of them bore any obvious connection to RADA, except for Xenia Kalogeropoulou which could be identified as Xenia Calogeropoulos and was thus categorised. Among those cases, some Wikipedia articles explicitly mentioned training at RADA as constituting in workshops or interships. We go back to the question: what courses warrant being considered an alumni?
Problems with the RADA database
Having listed issues on Wikipedia and Wikidata sides (which amount of: “people add information without references and that information spread everywhere like an epidemic”), we have to face the fact that some of the problems stem from the RADA database itself.
Completeness of data
As we have already seen, the database is littered with double entries, each pseudonym or name spelling yielding a new page instead of centralising these entries on a unique page associated with the student. This is obviously a problem if you are interested in the number of students for a given year, for instance.
From a Wikidata point of view, this prevents resorting to the simple solution of creating one entry for each student, independently on whether a Wikipedia article exists or not. The Cambridge database, for example, associates a fixed identifier to every student, which enabled us to import these identifiers on Wikidata, creating new entries as needed (P1599: ID of the Cambridge Alumni Database).[57] If the RADA had chosen the solution of one identifier per student instead of the URL of the form diploma/year/first name/last name, it would have been easier to import it in its entirety.
Which brings us to the next problem: we have no certainty that the database is complete at all. Nothing to support that is said on the site. A visit to the Internet Archive’s Wayback Machine shows that the database has only been online since 2015, and that before that date only the current students had a profile on the site. If recent data seem complete (from 1999 on, where profiles are detailed and come with photographs), the profiles of the earlier years are sometimes quite patchy. And in particular, some years seem suspiciously poor in students, such as 1988 and every year before 1922.[58]
Could it be that among the dozens RADA alumni without a match in the database, some have been forgotten? One typical case is the one Noel Streatfeild who, according to her website, attended as a student starting in 1919. I did find a “Noel Goodwin” who graduated in 1922, but is that her?
Another example even more explicit is Dora Mavor Moore, who was the first Canadian who went to RADA, per this biographical article, and who graduated in 1912. The problem is, on the RADA website only one student is listed as graduating in 1912 and “Leonard Notcutt” isn’t a known pseudonym of Dora Mavor Moore.
Data reliability
The more strident problem is that some alumni listed in the RADA database left the RADA before graduation. Someone like Harold Pinter has a RADA profile which says he was part of the 1949 class. In fact, Pinter went to RADA in 1948 and left the course in 1949, before the graduation. Does the RADA list every student, no matter if they are actually graduates or if they didn’t finish? In Wikidata we can use the property “diploma” with “no value” instead of the actual diploma in the qualifiers for the “studied at” property.
It’s a little bit problematic if we can’t trust the official school website to know who has been graduated there…
I have another problem with the RADA entry of Sheila Terry, whom I think I can match to the Wikipedia article Sheila Terry. It’s very likely she didn’t go to London during her studies; according to Wikipedia, she went to the Dickson-Kenwin academy, “a school affiliated with London’s Royal Academy”. Does that means the Dickson-Kenwin academy was then delivering the RADA diplomas? (before the 2000 reform, the RADA delivered its own diplomas). I lack information.
I also have a Jack May of the 1943 class whose Wikipedia article states explicitly he was admitted to RADA and never went…
Never so easy, even when the matchings are done!
What am I doing now?
I still do many other things on Wikidata. This article resumes some of my work but not all, far from it. But to stay somehow on topic I’ll only speak here of what I do in relation with the RADA and theatre in general.
For example, people justly said that Wikidata has a property to indicate the birth name of one person, which should always be present (but isn’t in reality) and is useful in particular in cases of women known under their married name. So I’m working to add these birthnames-in-property as aliases to facilitate the future identification. It’s a lot of fun with little scripts, SPARQL and an healthy use of QuickStatements, a tool made to facilitate bulk editions on Wikidata.
I’m also still working on Mix n’ Match to add the correct IBDB identifiers to Wikidata entries about people and works. You can help me, as I already said above. And it’s not just for the pleasure to have identifiers; when we will have enough of them, we will be able to add many informations about Broadway productions on Wikidata. And that will be fantastic!
I started adding data about theatrical awards too, which is long, somewhat repetitive, but is immediately useful. The English Wikipedia mostly already has articles about the most important awards, but many smaller Wikipedia don’t. I’m working on a lua module to be able to generate a Wikinews article based on Wikidata data:[59] in practice, that will mean watching the Tony awards ceremony, adding the data on Wikidata and immediately after the end being able to have a complete table with links and everything just using a template.[60] And that in dozens of languages. Great, no?
I still have to reduce my two lists of RADA students:
- one with people categorized as alumni but whom I didn’t find in the RADA website (errors? workshops? missing?): ~112
- one with Wikidata entries I think match a RADA student but I don’t have a definite proof: ~400
Solving these two lists should help me reduce the gap between the English category and the English list. And by the way I’m very proud of my French list of RADA alumni, which has names, date, course, diploma, nationality and even some pictures!
I wrote to the RADA archivist in June to at least inform him of the typrographical errors found in their database but he didn’t write me back for now. Which probably can be explained by the fact the RADA archives are moving to a new building this summer. They are probably pretty busy!
And of course, for a purely Wikidatian point of view, I officially launched the WikiProject:Theatre this week. That’s for every Wikidatian new or confirmed who want to join me in my mad quest.
Curious and fun queries and statistics
Everything being said, we still have an interesting sample with ~850 entries. It’s only a small percent of all RADA alumni (and the technical courses are vastly under-represented) but it’s enough to start to have fun with SPARQL queries. We can ask pretty much anything!
If you want to see the results of the queries, click on the links then sur “Run” and a few seconds after, you will be able to explore the answers yourself!
Number of RADA student with a Wikidata entry by year
Well, starting easily, maybe we don’t want the list of RADA alumni but only the number of them with an entry by year of graduation:
SELECT ?year (COUNT(?student) AS ?number) WHERE { ?student wdt:P31 wd:Q5 . ?student p:P69 ?statement . ?statement ps:P69 wd:Q523926 . ?statement pq:P582 ?endtime . BIND(YEAR(?endtime) as ?year) . } GROUP BY ?year ORDER BY ?year
We can then do this beautiful graph:
Average age at graduation
Maybe we can go further. Now that we know when they graduated, can we know at what age they did it? This means our sample will be reduced to the entries with a birthdate of course.
SELECT ?endYear (AVG(?age) AS ?averageAge) WHERE { ?person p:P69 ?radaStatement . ?radaStatement ps:P69 wd:Q523926 . ?radaStatement pq:P582 ?endDate . ?person wdt:P569 ?birthDate . BIND(YEAR(?endDate) AS ?endYear) BIND(?endYear - YEAR(?birthDate) AS ?age) } GROUP BY ?endYear ORDER BY ?endYear
Or even something more fun: the average age of graduation (with entire values only, this time), by year and by gender (only “male” and “female” in our sample, but the query could handle others) and to have appearance of seriousness, the number of people in the sample:
SELECT ?endYear ?genderLabel (ROUND(AVG(?age)) AS ?averageAge) (COUNT(?person) AS ?number) WHERE { ?person p:P69 ?radaStatement . ?person wdt:P21 ?gender . ?gender rdfs:label ?genderLabel filter (lang(?genderLabel) = "en") . ?radaStatement ps:P69 wd:Q523926 . ?radaStatement pq:P582 ?endDate . ?person wdt:P569 ?birthDate . BIND(YEAR(?endDate) AS ?endYear) BIND(?endYear - YEAR(?birthDate) AS ?age) } GROUP BY ?endYear ?genderLabel ORDER BY ?endYear
Query link. I should really do an age pyramid but I suffer of a fit of laziness.[61]
Timeline of graduates
That was fun but I want something more human-readable, like a timeline with pictures!
#defaultView:Timeline SELECT DISTINCT ?person ?personLabel ?personDescription (SAMPLE(?GraduationDate) AS ?date) (SAMPLE(?photo) AS ?pic) WHERE { ?person wdt:P31 wd:Q5 . ?person p:P69 ?statement . ?statement ps:P69 wd:Q523926 . ?statement pq:P582 ?GraduationDate . OPTIONAL { ?person wdt:P18 ?photo . } SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . } } GROUP BY ?person ?personLabel ?personDescription ORDER BY ?date
Query link. You just needed to ask! Beware that this query is heavy and can slow down your browser.
How many nationalities were represented in RADA?
We can do a query to list all the nationalities, and for each the number of students involved, in decreasing order.
SELECT ?nationality ?nationalityLabel (COUNT(?student) AS ?number) { ?student p:P69 ?statement . ?statement ps:P69 wd:Q523926 . ?statement pq:P582 ?endtime . ?student wdt:P27 ?nationality . ?nationality rdfs:label ?nationalityLabel filter (lang(?nationalityLabel) = "en") . } GROUP BY ?nationality ?nationalityLabel ORDER BY desc(?number)
Query link. Surprisingly[62] the most frequent is the… British. But hey! More than thirty nationalities!
If we just add, as the first line of the query:
#defaultView:BubbleChart
we obtain the results as a bubble chart.[63] It’s explicit:
Map of birth places of RADA students
I don’t really care what nationalities the alumni are… but I would love to see a map of birthplaces! And I can do that directly in SPARQL!
#defaultView:Map SELECT DISTINCT ?coords ?birthplaceLabel ?person ?personLabel WHERE { ?person wdt:P31 wd:Q5 . ?person wdt:P69 wd:Q523926 . ?person wdt:P19 ?birthplace . ?birthplace wdt:P625 ?coords. SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . } }
Query link. If you run the query, you can zoom!
Layered map of birth places of RADA students by graduation date
Just a simple map? But why not make a layered one? We could ask for a map showing the birth places of RADA graduates, one layer by decade of graduation!
#defaultView:Map SELECT DISTINCT ?coords (floor(year(?endtime)/10)*10 as ?layer) ?birthplaceLabel ?student ?studentLabel { ?student wdt:P31 wd:Q5 . ?student p:P69 ?statement . ?statement ps:P69 wd:Q523926 . ?statement pq:P582 ?endtime . ?student wdt:P19 ?birthplace . ?birthplace wdt:P625 ?coords. SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . } }
Query link. Run it and play with the layers!
Number of RADA alumni cast in a James Bond film
Do you remember that Whishaw and Llewellyn played Q? Exactly how many RADA students did play in a James Bond film?
SELECT DISTINCT ?actor ?actorLabel WHERE { ?film wdt:P179 wd:Q2484680 . ?film wdt:P161 ?actor . ?actor wdt:P69 wd:Q523926 . SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . } } ORDER BY ?actorLabel
Query link.
More than forty!
RADA alumni by James Bond films, ordered by date
Hsarrazin liked my James Bond query but she wanted more: she wanted to know which former student was cast in which James Bond film, and to order the results by publication date of the film. Of course this means that people who worked in several films are listed several times.
SELECT DISTINCT ?actor ?actorLabel ?film ?filmLabel ?year WHERE { BIND(YEAR(?date) AS ?year) ?film wdt:P179 wd:Q2484680 . ?film wdt:P577 ?date . ?film wdt:P161 ?actor . ?actor wdt:P69 wd:Q523926 . SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . } } ORDER BY ?year
RADA alumni by James Bond films, in a graph
I don’t actually know why Hsarrazin wanted a table, when we could have all RADA alumni playing in a James Bond film as a graph:
#defaultView:Graph SELECT DISTINCT ?actor ?actorLabel (concat("24890D") as ?rgb) ?film ?filmLabel ?year WHERE { BIND(YEAR(?date) AS ?year) ?film wdt:P179 wd:Q2484680 . ?film wdt:P577 ?date . ?film wdt:P161 ?actor . ?actor wdt:P69 wd:Q523926 . SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . } } ORDER BY ?year
Query link. Beware, this query is heavy, even more than the timeline one.[64]
And in all films?
Well, James Bond is great, but why limit ourselves to it? Can’t we just have all films on Wikidata with more than 5 actors or actresses listed in the casting, ordered by the number of them who studied at RADA?
SELECT DISTINCT ?film ?filmLabel (COUNT(?actors) AS ?nbActors) WHERE { ?film wdt:P31/wdt:P279* wd:Q11424 . ?film wdt:P161 ?actors . ?actors wdt:P69 wd:Q523926 . SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . } } GROUP BY ?film ?filmLabel ORDER BY DESC(?nbActors)
Films by rate of actors and actresses who studied at RADA
That’s fun but I want even more fun: it’s not exactly the same if there are five RADA actors-actresses in a distribution of eight or in a distribution of one hundred. I want all films with at least five people in casting ordered by rate of RADA students!
SELECT DISTINCT ?film ?filmLabel ((xsd:float(?nbRadaActors)/xsd:float(?totalNbActors)) AS ?rate) WHERE { { SELECT DISTINCT ?film (COUNT(?actors) AS ?nbRadaActors) { ?film wdt:P31/wdt:P279* wd:Q11424 . ?film wdt:P161 ?actors . ?actors wdt:P69 wd:Q523926 . } GROUP BY ?film } { SELECT DISTINCT ?film (COUNT(?actors) AS ?totalNbActors) { ?film wdt:P31/wdt:P279* wd:Q11424 . ?film wdt:P161 ?actors . } GROUP BY ?film HAVING (?totalNbActors >= 5) } SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . } } ORDER BY DESC(?rate)
Query link. Honestly the more “high rated” are probably films with an incomplete cast but I love this query anyway.
RADA alumni who worked in Broadway
Remember when we worked on the Broadway database? How many RADA students ever worked in Broadway? (We are considering that “working in Broadway” means “having an Internet Broadway Database identifier”). Well at least…
SELECT DISTINCT ?human ?humanLabel WHERE { ?human wdt:P31 wd:Q5 . ?human p:P1220 ?ID . ?human wdt:P69 wd:Q523926 . SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . } }
Query link. 276 listed mid-August 2016!
RADA alumni with a Tony award win or nomination
So… if RADA alumni worked in Broadway… how many of them were nominated to or received a Tony award?
SELECT DISTINCT ?human ?humanLabel ?reason ?distinctionLabel (year(?date) as ?year)
WHERE {
?human wdt:P69 wd:Q523926 .
?human ?prop ?distinctionStatement .
?distinctionStatement ?propS ?distinction .
VALUES (?prop ?propS ?reason) {
(p:P1411 ps:P1411 "nominated for")
(p:P166 ps:P166 "award received")
}
?distinction wdt:P31*/wdt:P279 wd:Q191874 .
OPTIONAL { ?distinctionStatement pq:P585 ?date . }
?human wdt:P31 wd:Q5 .
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
} ORDER BY ?humanLabel ?distinction ?year
Query link. More than one hundred!
And by the way we should verify that all people nominated to or awarded a Tony Award have an IBDB identifier! (if all is right in the world, this query should lend you a “No matching records found”):
SELECT DISTINCT ?human ?humanLabel
WHERE {
?human wdt:P31 wd:Q5 .
?human wdt:P1411*/wdt:P279 wd:Q191874 .
FILTER NOT EXISTS { ?human wdt:P1220 ?ibdb . }
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}
All Tony Awards!
Hey! Can we list all people nominated to/awarded a Tony, by win/nomination, by award and by year? Like everyone ever? Well yes, of course, it’s SPARQL!
SELECT ?human ?humanLabel ?reason ?distinctionLabel (year(?date) as ?year) WHERE { ?human ?prop ?statement . ?statement pq:P805 ?ceremony . ?ceremony wdt:P31 wd:Q24569309 . ?statement ?propS ?distinction . VALUES (?prop ?propS ?reason) { (p:P1411 ps:P1411 "nominated for") (p:P166 ps:P166 "award received") } OPTIONAL { ?statement pq:P585 ?date . } SERVICE wikibase:label { bd:serviceParam wikibase:language "en" } } ORDER BY ?year ?distinctionLabel
Conclusion
- I’m not done;
- I hope the RADA archivist will be as kind as he seems;
- People, seriously, you should add aliases on Wikidata items;
- And sources. Sources are great;
- And you also should take photographs of Ben Whishaw, we are clearly lacking on free Whishaw’s pictures;
- Isn’t SPARQL a lot of fun? Whatever your question, someone can ask Wikidata for the answer![65]
Header image:
Pediment of the RADA building in Gower Street, by Chemical Engineer, CC-BY-SA 3.0
Charlie Sept. 2, 2016, 4:49 p.m. ¶
Thanks for the great article and all the work you've put in! I enjoyed reading it a lot and learned some new things today :D
Harmonia Amanda Sept. 2, 2016, 7:53 p.m. ¶
Thank you for your kind comment!
David McDonell Sept. 5, 2016, 4:51 p.m. ¶
Wow! Amazing!! Thank you for both the work and the educational & entertaining narrative;-) Hats off!
Barbara Cohen-Stratyner May 24, 2017, 12:10 a.m. ¶
Leonard Notcutt, who you mention as having graduated in 1912, was an actor with the Beerbohm-Tree company and made at least one film. He won the Bancroft Gold Medal. His career was short because he joined up and was killed in WWI action, May 1917. I know about him because he was married to a photographer, Florence Vandamm. I would very much like to find a photograph of him. Question -- is the Noel Streatfielld that you also found the author of that name?
Harmonia Amanda May 24, 2017, 8:16 p.m. ¶
Thank you very much for the information about Leonard Notcutt! Yes, the Noel Streatfield who graduated RADA is the author. I discovered it during my search.