Today, data of all kinds is generated at a nearly incomprehensible rate, creating endless opportunities for advances in fields from health care to education. Yet plucking meaningful insights out of an overwhelming amount of information can be challenging, leading many researchers to seek out a firmer understanding of data science.
“On one hand, we have tons and tons of information, and on the other hand, we get lost in this information,” explains Zhibo Zhang, associate professor of physics, “so the question is how we can get something meaningful, something useful, from this ocean of data.”
Zhang and three UMBC colleagues will help researchers address this issue head-on with a unique interdisciplinary course for graduate students, postdocs, and junior faculty that will launch in spring 2018. The project is funded by a three-year National Science Foundation grant, with the expectation that after an initial on-campus test run, the program will be refined and expanded to reach researchers nationwide.
The students in the course will spend at least one-third of their time working on interdisciplinary research projects with colleagues from other fields, under the mentorship of one of the faculty. “In this program, we are teaching the latest technology to the students, but also letting them do research together,” explains Jianwu Wang, assistant professor of information systems and the principal investigator on the grant.
“Often we use a siloed approach, where each discipline sticks to itself,” says Aryya Gangopadhyay, professor and chair of information systems. Instead, this team’s approach creates opportunities “to leverage the multidisciplinary nature of problem-solving that is required” to address challenging, complex questions in the real world.
The new course will accept students studying atmospheric physics, information systems, and applied mathematics with a focus on high-performance computing. Moving forward, the team hopes to include a wider variety of researchers, with the goal to complement training in their disciplines with stronger computing and complex data analysis skills.
“Data and computing are becoming two more pillars of science in all fields,” explains Wang, who sees strengthening collaborations across disciplines as beneficial to all. “We need each other,” he says. “For us, in information systems, we can get more concrete challenges to solve, and for scientists in other fields, they can learn from the techniques. Together, we can solve the multi-pronged challenges the world is facing today.”
In atmospheric physics, for example, pulling climate trends out of mountains of data from NASA and other sources is important, but it’s impossible to process vast quantities of complex climate data effectively and efficiently with outdated tools and training. “I used to be able to do everything on my laptop,” remembers Zhang, “but now you can’t put your data on a laptop. You have to put it on a huge data server.”
That’s where collaborations with information systems researchers and high-performance computing experts come in. “In traditional analytics, data is loaded in the memory of a computer, but in big data analytics, the data might be much larger than the machine can handle,” explains Gangopadhyay. “So how do you deal with that? New algorithms must be designed.”
The fourth participating faculty member, Matthias Gobbert, professor of mathematics and faculty lead of UMBC’s High-Performance Computing Facility initiative, notes that NSF’s full funding of the grant proposal indicates this area is a priority for the agency, and his collaborators explain why. “There are no such training programs nationwide,” says Wang. As compared to industry, “for scientific research, we have specific challenges.” Gangopadhyay adds, “Big data analytics is arguably one of the most sought-after skill sets in today’s IT world.”
“In a way, we are training a new type of scientist, a new generation of scientists,” Zhang says. These scientists will access and analyze data in new ways, often relying on computing techniques previously unnecessary in their fields. And now that the four colleagues have begun working together, Zhang sees future collaboration opportunities in every new paper he reads. “They’re just popping up everywhere.”
The team acknowledges that there will be hurdles along the way to changing how researchers understand and utilize data science, first and foremost teaching a group of students who are already all highly trained, but in vastly different fields. Zhang shares, “I think the challenge will be fitting it all together.”
Still, the team is excited and optimistic. “This type of work is only possible by people who have an adventurous spirit,” says Gangopadhyay. “Sometimes, taking risks is the only way to do something meaningful. And we are ready for the challenge.”
Check cybertraining.umbc.edu for updates on the program.
Image: Data servers at the climate computing facility at Goddard Space Flight Center in Greenbelt, Maryland. Photo by Flickr user ep_jhu, used under license CC BY-NC 2.0.