Building User Friendly, Searchable Databases for Meeting Abstracts
J Song, A Chang. University of Chicago, Chicago, Il
Background: International pathology conferences, such as the annual meetings of International Academy of Pathology (IAP) and United States and Canadian Academy of Pathology (USCAP), present thousands of scientific abstracts every year. These abstracts are typically published in a supplement issue of a journal, and may also be accessible on internet. Abstracts offer an important forum to share most recent advances in pathology. However, the efficient use of abstracts has been hindered by several factors. Abstracts are not PubMed indexed; they lack references to provide background information; and they are often not searchable. In addition, it can be tedious to find out if an abstract has resulted in a full publication in peer-reviewed journal. The usability of abstracts could be greatly enhanced through data mining and transformation. The goal of our project is to build a user-friendly, searchable database for meeting abstracts.
Design: We developed a PERL computer program to automatically process abstracts complied in electronic files in PDF or HTML format. The program runs in 3 steps. (1) It parses files to retrieve the titles, authors, institutions and main texts of all abstracts. (2) For each abstract, it searches PubMed by either author names or title keywords, using the Entrez Programming Utilities. PubMed search results (articles) are retrieved in XML format, and parsed to look for keywords matching the query abstract. These articles serve as surrogate references. Multiple articles returned by a single search are sorted by the likelihood of being a true match. (3) The program creates an HTML file to display the query abstract and reference articles. Matched keywords are highlighted in colors to facilitate reading. Links to relevant online resources, such as Entrez searches and full-text articles, can be added. Finally, all batch-generated HTML files are compiled together into a file system based database. A PERL CGI interface was developed to provide the search functionality and access the database through a web server.
Results: A test database with 2005-2007 USCAP abstracts was built using this program. The database can be accessed through intranet or internet.
Conclusions: We developed software to automatically create searchable databases for abstracts. In addition to enhancing the usability of abstracts, the software can also be used to systemically analyze abstract characteristics. For example, we used the test database to quickly review all 2005-2007 USCAP abstracts, and revealed their overall publication rate (full publication in peer-reviewed journals within 3-year followup) was 36% (1725/4824).
Monday, March 22, 2010 1:00 PM
Poster Session II # 171, Monday Afternoon