Making linguistic analysis accessible

The Research and Development Unit for English Studies is a corpus linguistic research group, working to uncover real-world linguistic patterns and trends.

City Centre Campus and Eastside Park

Research summary

The Research and Development Unit for English Studies has built an international reputation through our work to develop novel computational, statistical, and linguistic methods. We have released a range of free-to-use software to support detailed linguistic analysis.

Research Background

A range of projects have been undertaken, culminating in the production of a set of linguistic tools.

WebCorp Live was designed to test the hypothesis that the web could provide evidence of rare, new, and changing language use – to complement offline text. Previous linguistic research on the web would require manual processing of web pages, but WebCorp Live automatically accesses web pages – via commercial search engines, such as Google – and produces examples of words and phrases to be studied. It can also search in multiple languages, and therefore play a key role in augmenting language teaching and translation. 

A sister project, the WebCorp Linguist’s Search Engine, saw the team building a bespoke large-scale collection of web texts to be used for advanced linguistic and statistical analysis. This large-scale sample of the web captured various document formats, types of content and subjects. This was followed by the launch of WebCorp Learn, a tool specifically designed to support interactive English language learning which has been integrated into courses at German secondary schools.

This was taken a step further with the release of OurSurveySays, a management tool for open-text survey analysis and insight. It provides a web-based visualisation package that can be used by non-specialists – for example marketing strategists, academic planners, and course directors – to analyse text-based survey responses. This allows organisations to undertake detailed analysis and make tactical interventions in response to findings.

Aiming to resolve a different problem, the eMargin project addressed the limitations of applying a traditional close-reading approach in modern teaching settings; notes on physical texts become cluttered and are not easily shared or reused, and recreating class-based close reading for distance-learning students is particularly challenging. With no other solutions readily available to solve this problem, the eMargin web-based annotation tool was developed. The tool not only enables collaboration and discussion across multiple locations, but also retains a digital record of students’ progress. Although it was designed as a teaching tool, it can also be used for collaborative textual annotation.

Outcomes and impact 

Our online WebCorp, eMargin and OurSurveySays tools bridge significant gaps in textual analysis, enabling novel teaching practices and enhanced insights from otherwise unmanageably large datasets. 

With over 5,000 monthly users in 190 countries, our software has:

  • Facilitated data-driven language teaching in higher education institutions around the world
  • Augmented English language teaching in German secondary schools
  • Enriched the teaching of literary analysis and textual interpretation in higher education, further education, and schools, with particular growth during the COVID-19 pandemic
  • Improved the accuracy and efficiency of professional translation services
  • Enabled the Belgian and Alicantian chapters of Podemos to formulate political policy in a collaborative way
  • Informed decision making by university planners and management at five UK institutions.

The WebCorp tools have also featured in over 1,700 publications by researchers across disciplines, with users worldwide. 

REF 2021

Birmingham City University's hub for everything related to the Research Excellence Framework.

REF 2021

Andrew Kehoe

Dr Andrew Kehoe

Associate Professor / Director of Research

Andrew Kehoe is Associate Professor in Linguistics and Director of Research in English. He studied at the University of Liverpool, gaining qualifications in both English and Computer Science. For over 20 years he has worked on and led a series of UKRI-funded projects in the field of Corpus Linguistics: the automated analysis of patterns and trends in large text collections to discover how language is used by different groups in society. Andrew leads the Research & Development Unit for English Studies (RDUES), an interdisciplinary team developing software tools which are used by hundreds of thousands of people worldwide to analyse textual data in research, teaching and commercial contexts.

Matt Gee

Matt Gee

Research Fellow in Linguistics

Matt Gee develops research and teaching tools in the Research and Development Unit for English Studies (RDUES). This includes the creation of a fully-fledged search engine (WebCorp LSE) designed to treat the web as a source of linguistic data. WebCorp LSE can download, clean-up and present through a search interface texts from the web for corpus linguistic style analysis (including wildcard and part-of-speech search, concordancing, collocation and change over time).