Presented by: Ben Showers (Jisc), Joy Palmer (Mimas) and Graham Stone (University of Huddersfield)
LAMP blog: http://jisclamp.mimas.ac.uk
Slideshare link: http://www.slideshare.net/UKSG/qs1-group-a
Ben’s role at this breakout session was to introduce the topic and the 2 speakers. He pointed out that most of the session's topics aren’t absolutely new; Jisc has been working with library data since 2007 (projects with Huddersfield, Mimas etc). JiscLAMP (Library Analytics and Metrics project) or LAMP has the same aims but a much bigger scale. Also, the terminology has changed - activity data has become analytics. The aims include supporting service development and improving user experience, as well as lowering the barriers to use of this data.
Graham Stone then gave context for LAMP and information on the last year in the project. It has been about 10 years at Huddersfield since Dave Pattern’s been in post messing about with the added value features of sites like Amazon and Tesco, wondering why library sites can’t be more like them. 4 years ago, Graham and the Huddersfield library director presented data from their library equality review (and from the Huddersfield registry), stating that they believed they'd found a link between library usage (analytics) and student attainment; however, others wanted more proof.
Not long after this, they were funded by Jisc for a Library Impact Data Project (across 8 universities). The hypothesis was that there was a signification correlation across universities between library activity data and student attainment, via looking at final grade achieved, books borrowed, e-resource accesses, school or faculty of each student etc. The project showed a statistical significance between grade and library usage across the project, with 33,000 students included). It is important to note that this is NOT a cause and effect relationship! Also, it is NOT a correlation, the hypothesis wasn’t workable under the given wording - this has been more carefully considered since! They then got Phase 2 money from Jisc to delve more deeply into the Huddersfield data for 2000 FT undergraduates. The additional data included for Phase 2 covered demographics, academic disciplines, retention data, on/off campus use, breadth and depth of e-resource use, UCAS points (entry data) - all of this supported the findings of Phase 1. Their conclusions: there is a statistical significance for demographics such as age, gender, ethnicity & country of origin (example: Chinese students used e-resources less); there is a statistical significance across top level subjects and within these disciplines; there is a connection between library use and retention; the depth/breadth of a collection may make a difference. At the same time, other surveys were also looking at the importance of analytics to academic libraries, as well as the issues of sharing data about usage with other institutions [most institutions (91%) do not mind if institutions are anonymised or benchmarked].
All of this was the jumping-off point for LAMP, a joint project between Mimas and Huddersfield (with assistance from Ellen Collins). Joy came in to talk about LAMP. She is also involved with Copac at Mimas and had studied using activity data to drive functions.
The main question of the project is: can they collect data from institutions and create tools that allow libraries to analyse how their resources are being used, when and by whom? Many services include dashboards for subscribers to see analytics, but they are not really answering this question. LAMP is also looking at what can be automated to assist users. What about the benefits of scale? This is especially important with issues of benchmarking (some reservations about this) against other institutions; no benchmarking without a critical mass of data, and it’s too early for that yet. What data can they use/get hold of? UCAS data, loan data, eResource logins etc. but not data on usage of individual items… yet. They need to look at this for the future (as this data would be important for collections management), but for the moment, the focus is on users not items.
6 institutions are contributing data: Huddersfield, Salford, Wolverhampton, Exeter, De Montfort and Manchester. There are barriers to providing data; it is a very mixed picture across UK of where data sits at an institution and how easy it is to provide it for a project like this (business cases should incentivise provision). Also, the data wrangling to get this sorted was not glamorous - getting, analysing, cleaning and providing.
Joy added a brief (important) word on ethics: should they be holding/analysing this type of data? There are concerns about Big Brother and data protection issues, telling the wrong types of stories etc… also, all students pay the same fees, shouldn’t they be treated the same? But what if they didn’t do this? What would the reaction be if institutions had this data but didn’t act on it? They have a duty to care for the individual wellbeing of their students. The project can also show how it complies with data protection issues.
Working with an API to present the data also brings up questions: how should users work with data? What do they want to be able to do with it? Also: what do users do? What does the system do? There are many 'epic user stories' and use cases to cite; they don’t always help with the day-to-day uses of the data but they can still lead in the right direction.
Joy showed sketches for ideas of the interface - not glamorous but some interesting ideas came out. There are also 'job stories' - step-by-step workflows of how those in different positions might want to work with the data. They showed that the idea of a shared library analytics service was feasible via working with data from these first universities. Also, they are continuing to demonstrate correlations between usage and attainment / usage and cohort (and attainment and cohort).
Charts and graphs from the included data were shown, including pie chart using live data of male vs. female students and number of library loans. This can be taken out of context to tell a certain story, so is it always accurate? They can potentially signal if findings are statistically significant or not. Where exactly does user journey or workflow begin? How much do we assume users are analysing the data? Joy went through a test query created by Ellen Collins: ‘Here’s a simple question: how do humanities are social science students use books?’. It showed the different ways you can pitch this question, including mean book borrowing, or just by discipline (irregardless of relative size of discipline), which can inform different types of decisions (other factors may need to be considered, including who’s not borrowing, part time vs. full time etc.).
There is now funding for Phase 2 of LAMP and they are now testing the ‘ugly prototype’ and working through ideas. LAMP wants to make data beautiful and compelling and they dashboard UI will be created through iterative testing/development. Other issues being looked at include ‘profiling’ individuals (what are ethical or legal issues?), whether Shibboleth can be used to more fully understand e-resource usage, integration of NSS (student satisfaction) data and SCONUL statistics, data literacy (what does it mean in this context? who needs it? what needs to be automated and what needs to be taught as a skill set?) and benchmarking (is it the killer app? is there a business case for the service if it doesn’t provide the capability to compare across institutions? if you do it, how does it work?). LAMP will be holding a workshop with SCONUL on 7 May, and will continue to work on business case after that. For now, they are focusing on the user interface but looking for more data contributors (and holding more LAMP workshops in future).
There was a question and answer session at the end which was very lively - I will include a few queries here. When asked about using RAPTOR, Graham said he had thought it could be a key, but it anonymises too much via Shibboleth (EZProxy is less anonymising) - it could work if tweaked slightly; Ben said there was an attempt to bring JUSP and RAPTOR together but it didn’t work fully. Another person mentioned the flip side: re-presenting the anonymised data back to students could be very useful, had the team thought of this? Graham mentioned Library Game; Joy talked about knowledge infrastructures and visualisations as well as routes through information - this is ambitious but there is so much more that can be done for research purposes; Ben said the student view on data had been considered since the beginning, but the current project needs to be tackled first. A publisher asked: would the data be fed back to publishers? This is what they would love to have and are unable to get! Graham said this is not the first publisher to say that - Joy agreed that they can start to think about how this would connect with REF and provision of resources. Someone asked about tracking the impact of materials in repositories, OA resources etc; Joy mentioned that IRUS is aggregating access to Open Access resources. The repository is still seen by many academics as a step or hurdle, not so much as a benefit - you would need to use other metrics to measure that impact in the brave new world of the web. Finally, one person asked whether there are certain types of data that would be too difficult to track, with too many confounding factors - how do you identify the ‘right’ kind of data? Joy says this comes back to data literacy - it is always very important to highlight that data is NOT truth and not a scientific object in its own right. That’s why it’s important to train, share, discuss etc - we have to be more like social scientists in considering variances and how watertight the visualisations are. Ben pointed out that LAMP isn’t the answer, more part of the dialogue; Graham agreed, and pointed out with all of this it’s not an absolute - don’t cancel a journal immediately if the stats are low, you must ask why first!