How One HPC Center Learned to Count

John WestEarly this summer I wrote in this space that #HPCmatters because our field enables the science and engineering that makes the world a safer, cleaner, and healthier place for all of its inhabitants. Given the impact of HPC, it is important that we do everything we can to ensure that we are including the best solutions possible in the technologies that we build, and for that we need to be sure we are asking the broadest sampling of people for their best ideas.

The homogeneity of the HPC workforce and student populations today provides evidence that we are not, in fact, sampling broadly enough to be sure that we are building the best tools possible.

But how big is the gap we are trying to close? Computer science (CS) is related to HPC, and we already know that field is struggling in terms of diversity. But CS may not be a good proxy for the HPC workforce, as we tend to collect computationally-minded professionals from many fields in science, mathematics, and engineering. Are we doing better or worse than CS? Today, we just don’t know, and it is irrational to ask for funding for HPC diversity efforts when we cannot even measure success.

I proposed that every organization in the HPC community – vendors, centers, research organizations, and so on – publish their diversity results. Essentially I’m suggesting we crowd source our community demographics. And, as general chair of SC16, I promised to start by publishing SC16’s committee diversity statistics.

My employer has also taken up this challenge, and asked me to lead the effort to report our own demographics. I am part of the Texas Advanced Computing Center at the University of Texas in Austin. Getting this done in a university context wasn’t a Herculean task, but it did involve some unexpected twists and turns. My hope in documenting my process for getting this done is that it may serve as a rough template as you champion this effort in your own organizations.


Phase 1: Denial

My first thought was that this is easy: we have an HR department, we know who works for us and what their backgrounds are already. We get a spreadsheet, do some cross-tabulation of numbers, and we’re done.

Uh, no.

It turns out that my organization is committed to the privacy of its employees. This is a good thing. While they can and do publish demographic information at the university scale (tens of thousands of employees), publishing such information at the scale of a single center (order one hundred employees) creates “small cell” problems that could threaten the anonymity of the data. The university agreed in general that we could pursue this, but any published data had to be voluntarily surrendered by center staff and properly anonymized.


Phase 2: Surveys for fun and profit

At this point I had two options: go door to door and ask people to volunteer their data, or do an electronic survey. The first approach would have been easier, but I worried that such a direct, personal interaction would have a coercive effect on people, causing them to provide their data even if they didn’t want to in order to avoid an awkward social situation. This certainly isn’t in the spirit of volunteered data, and could have created resentment that would undercut any future diversity efforts.

The electronic survey presented its own challenges, as the university considers the data I needed to collect to be in a category that needs to be protected by reasonable measures until it is anonymized by aggregation; using a free web tool wasn’t an option.

Fortunately, the university had an established relationship with an outside web-based survey provider that it had already certified as being appropriate for this data. Once I got connected with the right people who knew the details of this relationship, the survey was easy to create and distribute.

The survey we created has three questions designed to capture the same information reported by the large tech companies (see for an example):

•    What is your gender?
•    What is your race and ethnicity?
•    What team are you on at TACC?

For race and ethnicity, I focused on the very broad categories that the tech companies used, and for gender I only offered two choices (male or female). I was advised during my review with the office of Institutional Equity that this is not a best practice approach for either of these questions (more below); I pushed forward without changes under the philosophy that “better is the enemy of good enough” – let’s get something done, then we can improve it later. After launch I had very animated conversations with a few people at TACC who felt passionately that my approach disenfranchised individuals and continued long-standing patterns of discrimination. That was not my intention, but I fully understand the reaction. The next time we do a survey we will do a better job ensuring the instrument reflects the best practice knowledge available to us. Lesson learned.

Let me add that I am aware that the US approach to the race and ethnicity question is not a universal approach. Other countries do not track diversity using the same markers the US does; the SC16 committee is struggling with this right now in deciding how to report diversity in a way that is relevant internationally.

We included the last question so that we can map respondents to three categories to get a view for how diversity varies throughout the organization: “Administrative”, “Technical”, and “Leadership”.


Phase 3: The office of institutional what now?

The office of Institutional Equity is the part of the University of Texas that watches after matters of diversity. This project falls squarely in their domain, so we needed to visit with them to ensure they knew what we were doing, that we weren’t violating any university policies by doing the survey or publishing the data, and to get their professional feedback on the survey instrument we had designed.

This was an instructive process, they offered very helpful advice on our survey that we will incorporate as we refine our approach while continuing to refine the data we collected. And they gave us approval to proceed.


Phase 4: Wait…you aren’t working with humans, are you?

I am new to the university environment. It turns out that universities are sensitive to anything that might be an experiment on humans. This probably explains why we don’t have a real life SpiderMan.

So, yes, we are collecting and reporting data on humans and that means we had to at least consider the Institutional Review Board (IRB). Fortunately, the IRB at the University of Texas has good resources to help investigators determined what level of review is needed, if any. In our case, we weren’t interacting directly with individuals (good thing we decided not to go knock on office doors), we are using a survey instrument, the data we collected was not associated in the survey system with identifiers that tracked to individuals, we aren’t collecting data on prisoners or children, and disclosure of the data would not create a criminal or civil liability for respondents. So, no formal IRB review needed in my situation – you need to work with your own IRB to determine whether you fall outside their guidelines.


Phase 5: Let the begging begin

All approved, we launched the survey with an email that includes a very brief summary of the effort, a pointer to the blog post I wrote describing in more detail why this effort is important, and a link to the survey. This email gathered about 40% of all those who would ever respond. Over the next three weeks I would renew our call to action three times.

It turns out that our staff either takes action when they first see the email or, to first approximation, never take action on that email. But they may take action on another email. In all I sent three email calls to action; 85% of all my responses were gathered on the days the emails were sent (40%, 32%, 13%), with the rest coming in a thin stream between emails.

In the end we had a 92% response rate. We could have continued to ask, but I was concerned that more requests would be perceived as coercive by those who had not responded (remember that our data had to be given voluntarily), and analysis of the data we did gather shows that it is representative of the whole.


And now we publish

As I write this, we have the data analyzed and cross-tabulated and have delivered it to our web team who are designing a page that will be ready to launch in the next month or so. Don’t forget to allow time for this step! Unless you are writing the web page yourself, you are going to have to ask folks who already have a plate full of assignments to help you, and you’ll have to fit into their schedule.
Depending on where you work, you may have more or less steps to go through than we did. But I do hope that you will encourage your organization to report its diversity numbers. We cannot get the data we need to move forward as a community without them. And, you’ll get free publicity at SC15.


Free publicity?

Yup, free. I will be using my SC16 introduction on Thursday during SC15 this November to highlight those centers that have published their data. I hope I can include your organization! Send an email to me at if your org will have publicly published data by SC15 and you’d like to be included.

Current rating: 5