Using a word cloud generator to check students' work

Click the cover to see this book on Amazon (affiliate link).

Click the cover to see this book on Amazon (affiliate link).

Warning: ridiculously long sentence coming up.

When I first started suggesting to writers and teachers that a good way of checking whether a piece of writing addressed the stated objective, or as a shortcut way of discovering what the piece was about without going to the trouble of actually reading it, or as a sort of executive summary to prime you for the reading experience you were about to enjoy (or endure), would be to use a word cloud, I thought I was being terribly original.

However, I have been reading If On A Winter’s Night A Traveller, by Italo Calvino, and have been disabused of my belief that I am an unsung genius. For in this novel, published in 1979, one of the characters says:

She explained to me that a suitably programmed computer can read a novel in a few minutes and record the list of all the words contained in the text, in order of frequency. ‘That way I can have an already completed reading at hand,” Lotaria says, “with an incalculable saving of time. What is the reading of a text, in fact, except the recording of certain thematic recurrences, certain insistences of forms and meanings? An electronic reading supplies me with a list of the frequencies, which I have only to glance at to form an idea of the problems the book suggests to my critical study. Naturally, at the highest frequencies the list records countless articles, pronouns, particles, but I don’t pay them any attention. I head straight for the words richest in meaning; they can give me a fairly precise notion of the book.
— Italo Calvino

This was, of course, long before the existence of word cloud generators. It was also long before the widespread use of computers by “ordinary” people or in school classrooms. That is why I find the passage so extraordinary, because it has so much foresight.

The kind of word frequency analysis Calvino was talking about may be exemplified by this breakdown of my article about Ofsted:

Order Unfiltered word count Occurrences Percentage
1. in 13 3.4483
2. the 13 3.4483
3. to 12 3.1830
4. i 11 2.9178
5. was 11 2.9178
6. that 10 2.6525
7. a 9 2.3873
8. of 7 1.8568
9. and 7 1.8568
10. subject 7 1.8568
11. so 6 1.5915
12. this 5 1.3263
13. ofsted 5 1.3263
14. have 5 1.3263
15. for 5 1.3263
16. when 5 1.3263
17. an 4 1.0610
18. by 4 1.0610
19. inspectors 4 1.0610
20. are 4 1.0610
21. be 3 0.7958
22. if 3 0.7958
23. it 3 0.7958
24. me 3 0.7958
25. been 3 0.7958
26. it's 3 0.7958
27. his 3 0.7958
28. what 3 0.7958
29. sure 2 0.5305
30. as 2 0.5305
31. is 2 0.5305
32. my 2 0.5305
33. secondly 2 0.5305
34. person 2 0.5305
35. told 2 0.5305
36. case 2 0.5305
37. good 2 0.5305
38. being 2 0.5305
39. education 2 0.5305
40. really 2 0.5305
41. asked 2 0.5305
42. head 2 0.5305
43. things 2 0.5305
44. because 2 0.5305
45. used 2 0.5305
46. inspections 2 0.5305
47. kids' 2 0.5305
48. would 2 0.5305
49. firstly 2 0.5305
50. their 2 0.5305
51. knows 2 0.5305
52. had 2 0.5305
53. how 2 0.5305
54. inspector 2 0.5305
55. work 2 0.5305
56. which 2 0.5305
57. computing 2 0.5305
58. senior 2 0.5305
59. e 1 0.2653
60. taught 1 0.2653
61. looks 1 0.2653
62. at 1 0.2653
63. do 1 0.2653
64. on 1 0.2653
65. us 1 0.2653
66. actually 1 0.2653
67. inspect 1 0.2653
68. technology 1 0.2653
69. invested 1 0.2653
70. against 1 0.2653
71. learning 1 0.2653
72. kids 1 0.2653
73. tell 1 0.2653
74. training 1 0.2653
75. newsletter 1 0.2653
76. provision 1 0.2653
77. knew 1 0.2653
78. they 1 0.2653
79. them 1 0.2653
80. then 1 0.2653
81. addressed 1 0.2653
82. authority 1 0.2653
83. taken 1 0.2653
84. colleagues 1 0.2653
85. i've 1 0.2653
86. trained 1 0.2653
87. schools 1 0.2653
88. teacher 1 0.2653
89. past 1 0.2653
90. they're 1 0.2653
91. working 1 0.2653
92. meeting 1 0.2653
93. thought 1 0.2653
94. enough 1 0.2653
95. encouraged 1 0.2653
96. nicely 1 0.2653
97. ofsted's 1 0.2653
98. ridiculous 1 0.2653
99. plus 1 0.2653
100. observing 1 0.2653
101. pictures 1 0.2653
102. they'd 1 0.2653
103. like 1 0.2653
104. your 1 0.2653
105. announced 1 0.2653
106. misgivings 1 0.2653
107. shelved 1 0.2653
108. standards 1 0.2653
109. start 1 0.2653
110. staff 1 0.2653
111. look 1 0.2653
112. judged 1 0.2653
113. questions 1 0.2653
114. surveys 1 0.2653
115. formatted 1 0.2653
116. adviser 1 0.2653
117. myself 1 0.2653
118. didn't 1 0.2653
119. given 1 0.2653
120. reasons 1 0.2653
121. mentioned 1 0.2653
122. office 1 0.2653
123. lessons 1 0.2653
124. response 1 0.2653
125. make 1 0.2653
126. across 1 0.2653
127. fairness 1 0.2653
128. decades 1 0.2653
129. reaction 1 0.2653
130. fancy 1 0.2653
131. different 1 0.2653
132. inserted 1 0.2653
133. about 1 0.2653
134. benchmark 1 0.2653
135. very 1 0.2653
136. pretty 1 0.2653
137. seriously 1 0.2653
138. again 1 0.2653
139. english 1 0.2653
140. doing 1 0.2653
141. idea 1 0.2653
142. obvious 1 0.2653
143. hands 1 0.2653
144. looking 1 0.2653
145. much 1 0.2653
146. whether 1 0.2653
147. anyway 1 0.2653
148. could 1 0.2653
149. something 1 0.2653
150. curriculum 1 0.2653
151. means 1 0.2653
152. inspection 1 0.2653
153. into 1 0.2653
154. documents 1 0.2653
155. couple 1 0.2653
156. ago 1 0.2653
157. any 1 0.2653
158. ask 1 0.2653
159. exercise 1 0.2653
160. summaries 1 0.2653
161. but 1 0.2653
162. can 1 0.2653
163. think 1 0.2653
164. thing 1 0.2653
165. going 1 0.2653
166. hardly 1 0.2653
167. news 1 0.2653
168. mixed 1 0.2653
169. i'm 1 0.2653
170. wail 1 0.2653
171. has 1 0.2653
172. measure 1 0.2653
173. able 1 0.2653
174. ict 1 0.2653
175. nice 1 0.2653
176. well 1 0.2653
177. situation 1 0.2653
178. lot 1 0.2653
179. someone 1 0.2653
180. now 1 0.2653
181. one 1 0.2653
182. own 1 0.2653
183. will 1 0.2653
184. put 1 0.2653
185. undertaken 1 0.2653
186. also 1 0.2653
187. say 1 0.2653
188. two 1 0.2653
189. use 1 0.2653
190. computers 1 0.2653
191. who 1 0.2653
192. why 1 0.2653
193. you 1 0.2653
194. subjects 1 0.2653
195. local 1 0.2653
196. acquaintance 1 0.2653
197. england 1 0.2653
198. side 1 0.2653
199. whole 1 0.2653
200. just 1 0.2653
201. sort 1 0.2653
202. since 1 0.2653
203. published 1 0.2653
204. headings 1 0.2653

(Created using an online word frequency analyser.) I find this hard to derive meaning from, which is why the modern version, a word cloud, is so much better in my opinion:

Word cloud, by Terry Freedman

Word cloud, by Terry Freedman

(Generated using WordCloud.) You can see at a glance that the article was about subject inspection, and in particular Computing.

Now you could check students’ essays using a tool like this. For example, in The New DfE Education Technology Strategy: A Textual Analysis I ran the Department for Education’s educational technology strategy through a word cloud generator to see if it really was about educational technology. I discovered that it was, which meant that in theory it was going to be worth my while spending valuable time reading it. (In the event I wished I hadn’t, because it had been written so badly, full of corporate guff which made it virtually unreadable, but that’s another matter.)

In the same way, if you set students an essay, by using a word cloud generator you could check whether they had, broadly speaking, written about what you had asked them to write about. Better still, get them to use themselves as they go along, and certainly before handing it in.

I have done this myself, on occasion, when I’ve been asked to write an article about a subject I’m not overly familiar with. I have found that it’s easy to get carried away and write and write, gradually going off course without quite realising it. A word cloud generator can provide an objective check on whether I am still on target, and imply that you should make adjustments if necessary.