Introduce data mining to your History course in 2 minutes

Background offers Application Programming Interfaces (APIs) that perform common text analysis tasks, including the one we’ll use for this tutorial: Named Entity Recognition (NER). Text analysis is a good way to introduce students to data mining. You can use text documents, or a URL.


To introduce non-technical historians and students to data mining. The idea is to present students with a purposefully naive example of data mining, working on the assumption it’ll prompt them to critically engage with data mining as a method. The questions they explore are in many ways more important than the results you present them with.

Step 1: Get the NER results

  1. Go to the Fordham Modern History Sourcebook and find a document that works for your class. After blindly clicking I’ve chosen ‘Instructions for the Virginia Colony 1606’. Copy the URL.
  2. Sign up to Mashup.
  3. Go to the Mashape text analysis API. Scroll down until you find the Named Entity Recognition tool, copy your document URL into the appropriate field and click Test Endpoint.
  4. At this point you’ll need to enter credit card details. Be sure to subscribe to their ‘Freemium’ service! After your credit card has been accepted click Subscribe. You’ll be returned to the Text Analysis API page.
  5. Scroll back down to the Named Entity Recognition tool, copy your document URL into the appropriate field again and click Test Endpoint.
  6. The results of the query will appear. Copy this (I’ve included my results below). You could get your students to sign up for the service and generate the results themselves, but since a credit card is required it’s probably better to just provide it for them. At any rate, this is the content of your tutorial.

Step 2: Produce some questions

So, entity extraction for historians – useful? Yes, but. If you’ve got a body of thousands of documents (or millions of emails, perhaps), it can give you an idea of the type of content and tone of the corpus etc. It might be useful to test commonly held opinions about a given document or corpus that were formed when people could only read a subset because of time / labour constraints. In some cases the document set might simply be too large to analyse manually. It can also help to develop secondary websites that present the content using maps and timelines, or enrich online publications with contextual links.

Different data mining algorithms offer different perspectives, of course. Latent Dirichlet allocation (LDA), for example, returns common topics. The algorithms tend to have been developed for the sciences, so they work well with factual prose, but increasingly poorly the more subjective the text gets (although those cases result in interesting unexpected results anyway). Data mining algorithms benefit from ‘training’ , so the results below are fairly raw.

Questions for students are obvious enough:

  1. ‘Are you confident that the source document used in this tutorial is adequate for our analysis? How could we ensure it’s a perfect copy of the original?’
  2. ‘How useful is data mining to historians?’
  3. ‘Is data mining using an algorithm superior or inferior to manual approaches?’
  4. ‘Should you draw any conclusions from the results without understanding what the data mining algorithm is designed to do?’
  5. ‘Do the results suggest a fully successful analysis, or do some things need tweaking?’
  6. ‘What are the implications of using a machine and algorithms to conduct historical research?’
  7. ‘Will these techniques ever render historians redundant?’
  8. ‘Does it mean students should be being taught how algorithms work, so humanists aren’t reliant on ones designed for scientists?’.
  9. etc

Step 3: Further Reading

Named Entity Recognition and History

Crane, Gregory, and Alison Jones. “The Challenge of Virginia Banks: An Evaluation of Named Entity Analysis in a 19th-Century Newspaper Collection.” In Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries, 31–40. JCDL ’06. New York, NY, USA: ACM, 2006.

Further afield

The readings below will get the students thinking about the use of data mining in historical research. Most of the work I’ve come across has used topic modeling and particularly the Mallet tool, which would be a good next step for anyone interested in taking things further. For projects specific to your historical sub-discipline, take a look at

Blevins, Cameron. “Topic Modeling Historical Sources: Analyzing the Diary of Martha Ballard.” DH2011 Conference, Stanford University, 2011.

___________. Mining the Dispatch.

Cohen, D., F. Gibbs, T. Hitchcock, G. Rockwell, J. Sander, R. Shoemaker, S. Sinclair, S. Takats, W.J. Turkel, and C. Briquet. Data Mining with Criminal Intent, 2011.

Graham, Shawn, Scott Weingart, and Ian Milligan, ‘Getting Started with Topic Modelling and Mallett‘. The Programming Historian.

Step 4: Next Steps

Experiment with the other text analysis virtualmachines offered by Mashape and try different data mining virtualmachines, like Mallet.

Results (scroll down to see the entities returned)


Connection: keep-alive

Content-Length: 12883

Content-Type: application/json;charset=UTF-8

Date: Fri, 21 Feb 2014 19:53:04 GMT

Server: nginx/1.4.1 (Ubuntu)

X-Mashape-Proxy-Response: false

X-Mashape-Version: 3.1.11



“text”: “Instructions for the Virginia Colony 1606 < 1600-1650 < Documents < American History From Revolution To Reconstruction and beyond. In the first decade of the seventeenth century England began a second round of colonizing attempts. This time jointstock companies were used as the vehicle to plant settlements rather than giving extensive grants to a landed proprietor such as Gilbert or Raleigh, whose attempts at colonization in the 1570s and 1580s had failed. The founding of Virginia marked the beginning of a twenty-five year period in which every colony in the New World was established by means of a joint-stock company. A variety of motives intensified the colonizing impulse – international rivalry, propagation of religion, enlarged opportunity for individual men – but none exceeded that of trade and profit. The companies were created to make a profit; their in vestments in the colonies were based on this assumption. Early in the 1630’s merchants and investors discovered that they could employ their money in other more rewarding enterprises. After 1631, therefore, no colony was founded by mercantile enterprise, but by that date the enterprisers had left a legacy of colonization that was to endure. In these instructions for the Virginia Company, the power of Spain and the fear derived from past failures invade every line. The detail and precision of the instructions reflect the work of experienced men; Richard Hakluyt, the younger, for example, probably had a hand in writing them.  As we doubt not but you will have especial care to observe the ordinances set down by the King’s Majesty and delivered unto you under the Privy Seal; so for your better directions upon your first landing we have thought good to recommend unto your care these instructions and articles following.  When it shall please God to send you on the coast of Virginia, you shall do your best endeavour to find out a safe port in the entrance of some navigable river, making choice of such a one as runneth farthest into the land, and if you happen to discover divers portable rivers, and amongst them any one that hath two main branches, if the difference be not great, make choice of that which bendeth most toward the North-West for that way you shall soonest find the other sea.  When you have made choice of the river on which you mean to settle, be not hasty in landing your victuals and munitions; but first let Captain Newport discover how far that river may be found navigable, that you make election of the strongest, most wholesome and fertile place; for if you make many removes, besides the loss of time, you shall greatly spoil your victuals and your caske, and with great pain transport it in small boats.  But if you choose your place so far up as a bark of fifty tuns will float, then you may lay all your provisions ashore with ease, and the better receive the trade of all the countries about you in the land; and such a place you may perchance find a hundred miles from the river’s mouth, and the further up the better. For if you sit down near the entrance, except it be in some island that is strong by nature, an enemy that may approach you on even ground, may easily pull you out; and if he be driven to seek you a hundred miles [in] the land in boats, you shall from both sides of the river where it is narrowest, so beat them with your muskets as they shall never be able to prevail against you.  And to the end that you be not surprised as the French were in Florida by Melindus, and the Spaniard in the same place by the French, you shall do well to make this double provision. First, erect a little stoure at the mouth of the river that may lodge some ten men; with whom you shall leave a light boat, that when any fleet shall be in sight, they may come with speed to give you warning. Secondly, you must in no case suffer any of the native people of the country to inhabit between you and the sea coast; for you cannot carry yourselves so towards them, but they will grow discontented with your habitation, and be ready to guide and assist any nation that shall come to invade you; and if you neglect this, you neglect your safety.  When you have discovered as far up the river as you mean to plant yourselves, and landed your victuals and munitions; to the end that every man may know his charge, you shall do well to divide your six score men into three parts; whereof one party of them you may appoint to fortifie and build, of which your first work must be your storehouse for victuals; the other you may imploy in preparing your ground and sowing your corn and roots; the other ten of these forty you must leave as centinel at the haven’s mouth. The other forty you may imploy for two months in discovery of the river above you, and on the country about you; which charge Captain Newport and Captain Gosnold may undertake of these forty discoverers. When they do espie any high lands or hills, Captain Gosnold may take twenty of the company to cross over the lands, and carrying a half dozen pickaxes to try if they can find any minerals. The other twenty may go on by river, and pitch up boughs upon the bank’s side, by which the other boats shall follow them by the same turnings. You may also take with them a wherry, such as is used here in the Thames; by which you may send back to the President for supply of munition or any other want, that you may not be driven to return for every small defect.  You must observe if you can, whether the river on which you plant doth spring out of mountains or out of lakes. If it be out of any lake, the passage to the other sea will be more easy, and [it] is like enough, that out of the same lake you shall find some spring which run[s] the contrary way towards the East India Sea; for the great and famous rivers of Volga, Tan[a]is and Dwina have three heads near joynd; and yet the one falleth into the Caspian Sea, the other into the Euxine Sea, and the third into the Paelonian Sea.  In all your passages you must have great care not to offend the naturals [natives], if you can eschew it; and imploy some few of your company to trade with them for corn and all other . . . victuals if you have any; and this you must do before that they perceive you mean to plant among them; for not being sure how your own seed corn will prosper the first year, to avoid the danger of famine, use and endeavour to store yourselves of the country corn.  Your discoverers that pass over land with hired guides, must look well to them that they slip not from them: and for more assurance, let them take a compass with them, and write down how far they go upon every point of the compass; for that country having no way nor path, if that your guides run from you in the great woods or desert, you shall hardly ever find a passage back.  And how weary soever your soldiers be, let them never trust the country people with the carriage of their weapons; for if they run from you with your shott, which they only fear, they will easily kill them all with their arrows. And whensoever any of yours shoots before them, be sure they may be chosen out of your best marksmen; for if they see your learners miss what they aim at, they will think the weapon not so terrible, and thereby will be bould to assault you.  Above all things, do not advertize the killing of any of your men, that the country people may know it; if they perceive that they are but common men, and that with the loss of many of theirs they diminish any part of yours, they will make many adventures upon you. If the country be populous, you shall do well also, not to let them see or know of your sick men, if you have any; which may also encourage them to many enterprizes.  You must take especial care that you choose a seat for habitation that shall not be over burthened with woods near your town; for all the men you have, shall not be able to cleanse twenty acres a year; besides that it may serve for a covert for your enemies round about.  Neither must you plant in a low or moist place, because it will prove unhealthfull. You shall judge of the good air by the people; for some part of that coast where the lands are low, have their people blear eyed, and with swollen bellies and legs; but if the naturals be strong and clean made, it is a true sign of a wholesome soil.  You must take order to draw up the pinnace that is left with you, under the fort: and take her sails and anchors ashore, all but a small kedge to ride by; least some ill-dispositioned persons slip away with her.  You must take care that your marriners that go for wages, do not mar your trade; for those that mind not to inhabite, for a little gain will debase the estimation of exchange, and hinder the trade for ever after; and therefore you shall not admit or suffer any person whatsoever, other than such as shall be appointed by the President and Counsel there, to buy any merchandizes or other things whatsoever.  It were necessary that all your carpenters and other such like workmen about building do first build your storehouse and those other rooms of publick and necessary use before any house be set up for any private person: and though the workman may belong to any private persons yet let them all work together first for the company and then for private men.  And seeing order is at the same price with confusion, it shall be adviseably done to set your houses even and by a line, that your street may have a good breadth, and be carried square about your market place and every street’s end opening into it; that from thence, with a few field pieces, you may command every street throughout; which market place you may also fortify if you think it needfull.  You shall do well to send a perfect relation by Captaine Newport of all that is done, what height you are seated, how far into the land, what commodities you find, what soil, woods and their several kinds, and so of all other things else to advertise particularly; and to suffer no man to return but by pasport from the President and Counsel, nor to write any letter of anything that may discourage others.  Lastly and chiefly the way to prosper and achieve good success is to make yourselves all of one mind for the good of your country and your own, and to serve and fear God the Giver of all Goodness, for every plantation which our Heavenly Father hath not planted shall be rooted out.”,

“entities”: {

“location”: [






“India Sea”,

“Caspian Sea”,

“Euxine Sea”


“keyword”: [


“the Virginia Colony 1606 < 1600-1650 < Documents”,

“American History”,



“Virginia Colony”,

“American History From Revolution To Reconstruction”,

“the seventeenth century England”,

“colonizing attempts”,

“jointstock companies”,

“plant settlements”,

“extensive grants”,

“a landed proprietor”,

“Gilbert or Raleigh”,


“a twenty-five year period”,

“a joint-stock company”,

“New World”,


“the colonizing impulse”,

“international rivalry”,




“individual men”,

“trade and profit”,


“merchants and investors”,

“mercantile enterprise”,

“the Virginia Company”,

“Virginia Company”,

“experienced men”,

“the younger”,

“especial care”,


“the Privy Seal”,

“Privy Seal”,


“a safe port”,

“some navigable river”,


“runneth farthest”,


“portable rivers”,

“hath two main branches”,


“Captain Newport”,


“the strongest , most wholesome and fertile place”,


“great pain transport”,

“small boats”,

“fifty tuns”,






“this double provision”,

“some ten men”,

“a light boat”,




“the native people”,

“the sea coast”,

“whereof one party”,





“Captain Gosnold”,

“these forty discoverers”,

“any high lands or hills”,







“every small defect”,

“doth spring”,



“[it ]”,

“the East India Sea”,

“the great and famous rivers”,

“Volga , Tan[a]is and Dwina”,


“the Caspian Sea”,

“the Euxine Sea”,

“the Paelonian Sea”,

“East India Sea”,

“Paelonian Sea”,

“great care”,



“the country corn”,



“hired guides”,

“the great woods or desert”,

“the country people”,


“common men”,

“your sick men”,



“twenty acres”,

“the good air”,

“swollen bellies and legs”,

“a true sign”,

“a wholesome soil”,


“a small kedge”,

“some ill-dispositioned persons”,






“any private person”,

“any private persons”,

“private men”,


“a good breadth”,


“your market place”,

“market place”,

“a perfect relation”,


“Lastly and chiefly”,

“good success”,

“our Heavenly Father hath”,

“Heavenly Father”


“date”: [







“money”: [



“person”: [


“Richard Hakluyt”,

“Captaine Newport”


“phone”: [