Pages

Friday, 28 October 2011

Chronicling America: Historic American Newspapers, 1836-1922


Yesterday we had a visit from Deborah Thomas from the National Digital Newspaper Program at the Library of Congress, who gave a presentation on the Library's Chronicling America website. I thought I'd write up my notes from her presentation here for anyone who was unable to attend and also for future reference.

About Chronicling America and the National Digital Newspaper Program

Chronicling America (http://chroniclingamerica.loc.gov) is a website that provides access to digitised pages of selected American newspapers from 1836-1922, and is hosted and maintained by the Library of Congress as part of the National Digital Newspaper Program. This program is a partnership between the Library and the National Endowment for the Humanities and has been running since 2005. It funds state projects to digitise historic newspapers from this time period (each grant funds the digitisation of 100,000 pages), which are then made available through the Chronicling America site. The program builds on the earlier United States Newspaper Program, which inventoried, catalogued and microfilmed some 75 million pages from 140,000 historic American newspapers over a period of 25 years. The current programme aims to make a representative sample of those pages available online - right now there are almost 4.3 million pages from 25 states (and the District of Columbia), and it is hoped that eventually (within 15-20 years) all states will be covered.  The latest three states to join in are Indiana, North Dakota, and West Virginia, and their content will start to be made available next Spring.  This state-specific approach is driven by the nature of newspaper publishing and newspaper collections in the United States - local institutions have the most comprehensive collections for their area, and there is no single national US newspaper collection as in many other countries. However, aggregating the content in the website at a national level provides a much better and more accessible online service to researchers, and this is one of the main goals of the project.

Each state is responsible for selecting the newspapers and editions that they will digitise, and so the type of content that you will find within the site varies from state to state. Some have picked long runs of a few large papers; others have picked a lot of smaller titles - Kentucky, for example, wanted to ensure that each of their eighty counties were represented in their sample.  As part of the process, the contributing institutions were asked to write short essays for each newspaper title that they digitised, explaining why they chose that title and its significance and context - these essays are all available as part of the bibliographic record for the title within the US Newspaper Directory (of which more later in this post), so you can see the rationale behind what could otherwise seem like a fairly arbitrary sample.

The material being digitised is all within the date range of 1836-1922.  The end date is determined by US copyright law, whose threshold is 1923 - all content from before that date is in the public domain and may therefore be freely digitised and made available by the Library in this way, whereas anything more recent would require the permission of a complicated network of rights-holders.  This does mean that you are able to download, copy and re-use any of the content from the site entirely freely yourself!  The start date is rather more arbitrary, but was chosen because of the increasing difficulty of digitising the older material and because the format of newspapers changes and is less consistent the further back in time you go.

Searching the site

The Chronicling America interface provides several different routes in to access the content. On the main page you will find a search bar which defaults to the basic full-text search - this allows you to search for terms and restrict your search to a specific state and/or date range. The search box here acts as a proximity search, so if you put more than one term in it will search for instances where those terms appear within five words of each other.  The digitised pages have been converted for full-text search using Optical Character Recognition (OCR), which can have a variable success rate - it's less than 100%, and in some instances can even be less than 50% accurate. However chances are if you are searching for significant terms, those words will turn up more than once in an article which does compensate for this a little!


The advanced search gives you more options to construct your search terms, and allows you to restrict by more than one state and/or newspaper title as well as enter date ranges by day and month as well as year.


The next tab (All Digitized Newspapers) is the means by which you can browse the collection, either in its entirety, or by state, ethnicity, or language.  The newspapers that are included are almost entirely English-language, though there are two in French, two Hawaiian, three Spanish and one Choctaw title and more French and Spanish content will be added later this year. 

Once you have found some results, the page image interface is really nice and smooth, with scrolling zoom and click and drag, and really high quality scans. You can navigate through the full edition of that paper, and view or download page images as text and PDF (with the bibliographic information attached, ready for referencing).  You can also 'clip' images to get an image of part of a page - the 'clip image' button will essentially cut out whatever is visible in the viewer at that moment.  All the page images, and clipped images that you make, have persistent URLs, which means you can bookmark or save the link and you will always be able to get back to it.  As an example, this link will take you to the following clipped image:


US Newspaper Directory, 1690-


As well as the full-text material in Chronicling America, the site also provides access to the full US Newspaper Directory.  This contains bibliographic information about all 140,000 titles that were catalogued as part of the United States Newspaper Program, along with information about which libraries hold them. These bibliographic records also include links to full-text digitised versions both within Chronicling America and on other sites where available.  Quite often if something is in Chronicling America it will also be available online from the institution which scanned it in the first place, so you are likely to find some titles duplicated.  For the titles that are included in Chronicling America, this is where you will find the contextual essays written by the people that selected the title for inclusion.

Topics in Chronicling America

The Library of Congress also provides around 100 topic guides to the site, which are designed to give you starting points in researching a whole variety of events, subjects and themes within Chronicling America. The topics available are quite a random selection, ranging from Presidential administrations and elections, to events such as the Haymarket Affair or the Annexation of Hawaii, and people like Booker T. Washington and Nikola Tesla, as well as all sorts of other things like the ping-pong craze of 1900 to 1902.  Each topical guide gives you a bit of the background as well as links to sample articles and (perhaps most usefully), suggested search terms to use. You can see a full list of the topics at http://www.loc.gov/rr/news/topics/topics.html, but check back again in the future as more are being created and added.

Keeping up-to-date with the site

Chronicling America is growing rapidly and continually, so it's worth going back again and again to see if new material has been added that is relevant to your research.  The site also provides you with ways to keep up with the new content as it becomes available and to be alerted when there are additions.  You can subscribe to general updates (of content as well as points of interest and research) both by RSS and email, or subscribe to an RSS feed of just the new content as it is added, from the subscribe link available throughout the site.


And finally...

All the data from the site is freely available for re-use and can be obtained via the site's API. To see an example of one of the innovative uses of the data from the US Newspaper Directory, see Stanford University's data visualisation of the growth of newspapers across the United States from 1690-2011.