In this interview Chris Csonka and Megan Carroll talk with Data Curator and Metadata Librarians Brendan Honick and Jackie Uranic about their work and projects at PSC.
CC: Brendan and Jackie, welcome. What is your role here at PSC? What’s your wheelhouse?
BH: My two projects are HuBMAP and SenNet. I work internally with software developers and make sure all of our various systems are in place to take in new data and publish them. I also work externally with data providers from different institutions.
There are universities across the country who have their own research centers focused on different biomedical applications or experiments that they’re doing. Working with them requires not just me, but also our Project Managers (PMs) like Diane [Eshelman] or Holly [Klinesmith] to meet with these data providers and understand their needs. There’s a balance between working with our own internal teams, but also all of these third party partners.
MC: Jackie, what about you? What’s your wheelhouse here at PSC?
JU: I work primarily with submission to external repositories for HuBMAP and SenNet data, dbGaP and CFDE. This is kind of abstract, but also design — in the sense that business processes, data structures, etc., need a design. Thinking about how to do that efficiently, but in a way that makes sense for people. Thinking about the user experience and how it can work for them. Trying to foresee issues that might come up later and design around them. Knowing that you can’t prevent everything, but can minimize the impact on people and make the best use of the available resources.
MC: So you have the same title but you each have your own specialty.
BH: I’ll give you an example of that in SenNet. Over the last year we’ve been developing a new metadata schema to describe whole mice as experimental subjects. Coming at this from the metadata background, I don’t have a biology background but Mariah [Kenney] does. So we had this collaboration where we were both meeting with the same data providers, but we were asking different types of questions, ultimately for the same goal.
CC: Let’s talk about some of the other projects you have been working on recently.
BH: Absolutely. In mid-May for the HuBMAP Annual Meeting, Jackie and I designed and led a workshop for anyone in HuBMAP who wanted to learn more about the process of data submission and ingestion. That involved a lot of thinking about how to convey a good user experience that would be easy to follow for everyone in the audience, knowing that they were coming at data submission from different perspectives. Not just the various rules but their own familiarity with the processes that we have in place at HuBMAP.
JU: What Brendan brought to the workshop was more than just a technical explanation. Because of his in-depth experience with metadata, he was able to convey why it’s important to do these things and the impact it has for data providers. There’s a bigger perspective on why this is important, why it works for HuBMAP and how it fits together.
BH: What I’ve really appreciated about PSC is that we get a lot of support for doing these kinds of things, not just our projects. I did research on metadata in grad school and I continue to do my own metadata research, specifically with a tool called Open Refine. I’ve also been able to present my research at conferences. I gave a guest lecture at the master of Data Analytics for Science program here, when Joel [Welling] was teaching a class this spring. In a few months, I’m giving another lecture at the University of Hawaii’s Master’s of Library Science program.
I’m also part of the Academy of Health Information Professionals (AHIP), and I only joined after I started at PSC. I knew I wanted to become a certified medical librarian and there’s a whole process through the academy for doing that and if I was at a different employer, I may not have had that same support. That’s something I’ve really appreciated here at PSC, the ability to do stuff like that and expand my professional horizons.
CC: Your title, Metadata Librarian, tell me more about that.
BH: I went to library school where I studied metadata. They interviewed me and asked me, Brendan, where do you see yourself after grad school? And I said, “metadata librarian in a research setting”. I knew I liked metadata or the concept of cataloging or describing pieces of information. Librarians do so many different things in public and school libraries and academic settings. I have my own niche with metadata and I took some classes in grad school about data curation. I didn’t think that I was going to go into it immediately.
It was really interesting to me and the opportunity came up at PSC. I saw this as an intriguing venue where I wasn’t just doing metadata librarianship. I was also expanding into a different but related field, curation. So, the metadata librarian part is thinking about how we organize and describe these data sets … data curators are people obsessed with data, learning about it and what we can do with it.
CC: How does your work here compare with other places that you have worked before?
BH: This is my first job after grad school, other than various internships … My last real job was a barista at Starbucks, 5 years ago.
MC: How about you, Jackie?
JU: I haven’t worked in research before, my previous work experience is mostly in higher education, and also some nonprofit and corporate environments. So the difference here is about research and the fact that everything that we do helps people and involves groups working together, and that there’s a larger focus. I don’t want to say loftier ideals because that sounds a little cheesy, but, you know, that’s part of it, and I would say that’s the big difference to me.
I had heard about PSC before I started working here. Actually about a month before I saw the listing for this job, I was on a tour of the Large Scale Systems Museum in New Kensington where they have old computer equipment. They have one of the [computer] cores from Bridges that they used for COVID research. I said, “Wow, I can’t believe you have something from PSC here!” and the museum director was telling me about how he acquired it. A few weeks later when I saw this job I thought, that’s so crazy, just the synchronicity of that. But the interview was fun, I got to gush about the fact that I was a PSC fan.
And it’s been very helpful working with and learning about biology from Mariah. When we were just at the CMU Ethics and AI conference, during one of the presenters’ talks they were explaining transcriptomics, GeoMX, etc. and they were going through some of the concepts pretty rapidly and I asked Mariah, what did they just say? And she said, it’s proteomics and spelled it out for me so I could look it up — that’s very helpful.
CC: It’s clear what you find rewarding about your work … What are some things you find more challenging?
BH: There’s a lot of moving parts for any of our roles. Something that I got used to doing was organizing all the various tasks and issues that need to be addressed from day to day. I use my own labeling system for different tasks and issues. I try to keep things organized and that’s where I’ve been appreciative of the work that our various project managers do. It’s been challenging, but it’s also been great working cooperatively with everyone at PSC. We’ve addressed the challenges in a good way and I’m happy to look back over the last year.
JU: There are a lot of things here that change constantly, either that’s the nature of them or they literally are new — a new consortium, new projects, etc. So just keeping up with that is the challenge. But again, that’s also the fun of it. The challenge is what makes it exciting. I don’t have any repetitive days at work here, which is really a great thing.
CC: Brendan and Jackie, where do you see your work down the road?
BH: We’ve had some discussions over the last few months about metadata, specifically across disciplines. The beauty and the curse of metadata is that everyone has a different way of organizing how they describe information. There’s a ton of different schemas and standards for other disciplines and domains. So there’s this idea of a metadata crosswalk. This is a term used throughout information science where you’re taking various metadata schemas and trying to match up the elements at play. There’s opportunities for further research to start consolidating these standards and schemas. It’s great that everyone has a way to describe stuff, but it also makes it harder for accessibility. When everyone is describing stuff in different ways, it makes it harder for the end user, the scientific community, and the public to search for stuff. There’s a lot of avenues for exploration at play there.
We’ve been doing some work in that regard in HuBMAP which will continue. We’re working with a group at Stanford led by Mark Musen and his CEDAR tools. Their work can create metadata templates that tie together all these various schemas and that’s where I see a lot of interesting avenues for research, much broader than that for HuBMAP and SenNet. I want to see more. Our colleague Mariah is really focused on accessibility and so am I.
I also want to see in both consortia a greater volume of data. HuBMAP has 1887 published data sets, a bit less than 2000, and we’re definitely going to have a few thousand more by the end of the second phase of HuBMAP. SenNet is still at the very early stages and we’re able to modify things and learn from HuBMAP in new ways, where things in HuBMAP are more or less set in stone. And there’s a lot of great opportunities for cross-collaboration between us and the teams that we work with, especially the University of Pittsburgh and the developers there. That’s what I’m thinking of in terms of the future.
JU: I think about how people will start to use that data when it’s available, what kind of novel approaches there might be, or improvements to existing applications. HuBMAP and SenNet have a lot of potential for users outside of the consortium. Seeing some of the visualizations that have been developed at the HuBMAP Annual Meeting was really amazing. As a lay person who is not a biologist I could see there was so much potential for a deeper understanding of cells in the body and how they’re structured. And how that might impact the future of medicine and disease prevention. Having “good” data, accurate data, is really the first and one of the most important steps. So the data curation part of things will only increase as more data becomes available, and more people are interested in using it in different ways.
BH: That’s a really good point. HuBMAP is still taking in data, but I know there’ve been pushes to get different groups of end users involved, whether it’s students or researchers, etc.
JU: That’s when we’ll get feedback that shows what things we can improve.