Eng

[Tuning In] Peter Bol on creating the China Biological Database and the power of digital humanities

KrASIA
更新於 2020年09月25日09:42 • 發布於 2020年09月25日01:42 • AJ Cortese

Dr. Peter Bolis the Charles H. Carswell Professor of East Asian Languages and Civilizations at Harvard University. His research is centered on the history of China’s cultural elites at the national and local levels from the 7th to 17th century.

Professor Bol directs the China Biographical Database project, which is maintained by Harvard University, Academia Sinica, and Peking University. This online relational database currently contains some 350,000 historical figures and is being expanded to include all biographical data in China’s historical records from the last 2,000 years.

Our community members can ask questions in Slido.

廣告(請繼續閱讀本文)

Peter Bol, professor and database director

KrASIA (Kr): Can you tell us about the origins of the China Biological Database (CBD)?

Peter Bol (PB): It began in the 1990s with some initial work by a social historian named Robert Hartwell who I knew. He decided that if he should pass away, he would give everything to Harvard and I promised to continue his work. I didn’t see that as a likely prospect, so I said yes because you don’t want to insult somebody.

廣告(請繼續閱讀本文)

Then he upped and passed away quite unexpectedly and the materials came to us. In around 2005, almost 10 years after his death, I decided we really needed to take the 20,000 people in the database that he had compiled and somehow try and create something that other people could use and build upon.

But I also knew that he tended to make a lot of mistakes. So I arranged with colleagues at Academia Sinica Taiwan and Peking University to collaborate and clean up the database and release it to the world.

Kr: How did you organize all of that data so that it became a functional database?

廣告(請繼續閱讀本文)

PB: We did that and found enormous numbers of mistakes but along the way, I happened to talk to one of my colleagues in computer science and he asked whether I had heard of regular expressions and named entity recognition and other things like this. I hadn’t. He said if you have digital text, there are ways of mining it automatically and you don’t have to do it manually.

That’s where it really took off. We brought in Michael Fuller, who studies Chinese literature at the University of California Irvine.

We brought in a graduate student in computer science who could write these regular expressions, and one of my graduate students took on the job as a project manager. Today, we have almost 1 million people in the pipeline to be disambiguated and contextualized in the database.

Kr: What are the data-sharing norms with your international colleagues?

PB: At the Institute for History and Philology at Academia Sinica, which is where thepremodern historians reside, colleagues there saw some value in this and they agreed to do two things. First of all, they gave us some important, very reliable digital texts, and we shared those digital texts with the group at the Institute for Ancient Chinese History at Peking University.

One of my colleagues at Peking University had some graduate students who could meet some university obligations by working on the project. So we put together an editorial group at the university, and we had our computer science group here at Harvard who would generate texts for our colleagues at Peking University to look over and make corrections. We were also iterating the whole time and trying to improve computational methods.

Then the Institute for History and Philology at Academia Sinica funded the creation of online databases in around 2008, so we not only had it as a standalone database on standalone computers, but you could go on the web and perform searches. So it has really been a wonderful tripartite collaboration with everyone contributing.

Kr: How has the database evolved? 

PB: This is a relational database. We think that is important because it is a way of modeling people’s lives by asking complex queries. Here’s an example: How many people passed the Chinese Civil Service exam in a certain period, and how many of those people were related to each other by marriage? You can ask that kind of question. It is not meant to function as a biographical dictionary, it is meant to function as a way of looking at large amounts of data to see change over time and space.

We currently have built-in export functions so you can export to a geographic information system (GIS) program, a mapping program, or to social network analysis programs. In principle, we should be able to do that online, so you can do a query online and you can map it right away to see your distribution and download the data as you choose. Developing these capabilities online is going to be important.

There are different kinds of visualization. In fact, there is a group at a university in China that is very interested in visualization. They have been using the CBD to do experiments and figure out how you can create multiple kinds of visualization to get more information. For example, they used various kinds of network analysis to discover likely connections between people that are not given in the database already.

Kr: What are the future plans for the project?

PB: We are cautiously moving into the area of crowdsourcing to see if we can figure out ways of developing a following of people who are interested in history and have the knowledge to contribute information.

The biggest obstacle we have is not one of collaboration between people and universities and countries, but of disambiguation. The Chinese language has an enormous number of characters. Initially, we thought this wouldn’t be a big problem because people would have unique names.

It turns out at certain points in Chinese history, certain kinds of names are very popular. Since China has a much smaller number of surnames than we find in Western languages, we would run into a situation where we would have 50–60 people in a 100-year period with the same surname and given name. Unless you had complete data on all of those people, figuring out who is who is much more of a challenge than we originally thought.

Kr: What is the significance of digital humanities?

PB: How do we run the research cycle for the humanities in a digital environment? If we think about how we find topics to study or discuss, how we define questions, how we gather data, how we store it, how we analyze it, how we disseminate it, all of this can happen in a digital environment with digital methods.

Digital humanities are simply the humanities being conducted in a digital environment. There’s so much we can do—particularly since March, when universities and libraries shut down, we have had to do a lot of work in a digital environment, both in research and teaching.

Digital humanities include various free, open access utilities that people can use to study the past. It allows us to answer questions that were impractical to answer before. We are not good at looking at large quantities of things at once, but now we are able to do it with digital methods.

Kr: Two-thirds of public libraries in the US don’t have a formal digitization strategy. What are the biggest challenges to digitization that libraries face?

PB: It would be so good for public libraries if they had complete access to digital collections.

If you think about academic publishing and university presses, they were created to disseminate knowledge coming out of universities that were not commercially viable. The model progressed to the point where, instead, these universities presses would begin to generate profits.

I remember at a meeting with people from the university press where a professor said to them, “You were originally a means to disseminate knowledge. You are now an obstacle to its dissemination.” That kind of sums it up.

Academic work should be open access, particularly academic work that is funded by grants or foundations. The reason why it can’t be made freely available is that we haven’t set up the institutional mechanisms to do that. Meanwhile, publishing still costs money. You have editors, quality control, peer reviews, and so on.

Looking into the future, the dissemination of knowledge favors a digital environment. The marginal cost after the initial editing of a digital work is minimal compared to creating and shipping physical book copies.

Kr: How has China’s education system been able to digitize academic resources? 

PB: The explosion of university education and quality education in the country meant that there was desperate demand for access to intellectual resources, Digitization was a solution to that.

查看原始文章

更多 Eng 相關文章

Xinhua Commentary: Chinese economy progressing well in pursuit of 2024 targets
XINHUA
CCTV+: Embarking on a Renewed Dialogue at Liangzhu Between Global Civilizations--The 'Liangzhu: A Dialogue across World Civilizations' Promotional Video and Documentary to Premier
PR Newswire (美通社)
Drones relay to complete power network inspections
XINHUA
FedEx boosts Xiamen-U.S. cargo flight services
XINHUA
Tourism revenue hits new high at Horgos port in Xinjiang
XINHUA
Xinhua News | Arab League condemns U.S. veto on UN resolution demanding Gaza ceasefire
XINHUA
BingX Launches Spot Innovation Zone to Capture Opportunities in Bullish Markets
PR Newswire (美通社)
Proposed U.S. tariffs risk fueling inflation, slowing economic growth: economists
XINHUA
ViewSonic Reveals Winners of the 5th ColorPro Awards: Celebrating Momentum in Visual Arts
PR Newswire (美通社)
Hanwha Power Systems signs MOU with TC Energy to accelerate commercialization of sCO2 power generation
PR Newswire (美通社)
LG Innotek Shines on Global Stage with Automotive Lighting Innovations
PR Newswire (美通社)
HKFDA’s Virtuose puts Hong Kong at the forefront of fashion through a stunning couture showcase
Tatler Hong Kong
H3C takes a deep dive into the challenges - and opportunities - AI brings
PR Newswire (美通社)
Two In Vitro Studies on Eravacycline (XERAVA®) Presented at IDWeek, Demonstrating Its Sustained and Robust Antimicrobial Activity
PR Newswire (美通社)
Turfan Leads the Way: Electricity Drives Development of Smart Irrigation Demonstration Zone
PR Newswire (美通社)
Deep Dive into Today's Clothing Shoppers
PR Newswire (美通社)
Xinjiang reaps pearl harvest through pioneering inland aquaculture
XINHUA
Xi attends welcome ceremony held by Brazilian president
XINHUA
Global Times: Neo-Chinese cuisine: How China is reimagining its food heritage
PR Newswire (美通社)
Yokogawa Marks 50 Years in Singapore with the launch of Sustainability Incubation Hub
PR Newswire (美通社)
Turfan's Real-Time Power Response System Boosts Regional Tourism
PR Newswire (美通社)
Shincheonji Church of Jesus Addresses Event Cancellation and Reaffirms Commitment to Religious Freedom
PR Newswire (美通社)
IN2MF 2024 sets world record with 1,550 designers' fashion attires of 208 Indonesian brands and 10 international designers.
PR Newswire (美通社)
Update: China has half of top 20 science cities: Nature Index
XINHUA
APRIL Group Supports National Health Development in Riau Province, Indonesia
PR Newswire (美通社)
Insights About Today's Clothing Shoppers
PR Newswire (美通社)
Witness the majesty of the Amazon rainforest through the lens of world-renowned photographer Sebastião Salgado in the Asia-Pacific debut of Amazônia: Photographs by Sebastião Salgado at the National Museum of Singapore
PR Newswire (美通社)
Continuously optimize user experience, Baijiayun's live and on-demand products complete autumn upgrade
PR Newswire (美通社)
The Global AI Defense Challenge 2024 Announces Winners Across Two Competition Tracks
PR Newswire (美通社)
Foreign guests experience vitality of urban development in China's "Spring City"
XINHUA
Olverembatinib Surmounts Ponatinib and Asciminib Resistance and Is Well Tolerated in Patients With CML and Ph+ ALL: New Report in JAMA Oncology
PR Newswire (美通社)
Agoda Finds Asian Markets Flocking to U.S. for a Thanksgiving Feast
PR Newswire (美通社)
SHL Medical announces plans to establish SHL Advantec
PR Newswire (美通社)
Webull Malaysia Is The First Online Retail Brokerage To Launch An Islamic Banking Channel, Enabling Shariah-Compliant Funding Options To Trade MY & US Markets
PR Newswire (美通社)
Olverembatinib Surmounts Ponatinib and Asciminib Resistance and Is Well Tolerated in Patients With CML and Ph+ ALL: New Report in JAMA Oncology
PR Newswire (美通社)
Bybit Waives P2P Transaction Fees for Users in Africa
PR Newswire (美通社)