Semantic knowledge discovery
2021 (English)Conference paper, Oral presentation with published abstract (Refereed)
Abstract [en]
Since many databases lack relevance ranking, a citation-based approach can be a valuable complement since it is possible to use citation-based data to indicate centrality, relevance, or visibility in the research community. However, using bibliometric methods in the humanities is often challenging since a lot of the research literature is not indexed in the traditional citation databases that we generally use for bibliometric mapping.
We introduce a combined bibliometric and semantic approach to extend a network of bibliographic records by incorporating a larger set of records lacking bibliometric features based on the semantic similarities between their titles. In order to expand the set of identified relevant articles, we used the Universal Sentence Encoder (USE) algorithm developed by Google Research to generate semantic vectors for the titles.
We searched several different databases, of which some include citation data, to create a pool C of candidate documents within the selected subject area. A set A of documents was obtained from a citation database to generate the initial network of articles. We then calculated the bibliographic coupling of articles as quantified by their shared references.
We manually selected a small set S1 ⊂ A of documents representing different topical clusters as a seed for the expansion based on semantic similarities. For each document d ∈ S1, we ranked the documents in C in ascending order according to their cosine distance to the title vector assigned to d, then selecting the k documents closest to d. This procedure gave us a set S2 ⊂ C of documents to read.
The results were evaluated using qualitative analysis to determine they were thematically relevant to the present information needs.
Place, publisher, year, edition, pages
2021.
Keywords [en]
citation analysis, machine learning, semantic modelling, bibliographic networks
National Category
Information Studies Natural Language Processing
Research subject
Library and Information Science
Identifiers
URN: urn:nbn:se:hb:diva-27158OAI: oai:DiVA.org:hb-27158DiVA, id: diva2:1626216
Conference
26th Nordic Workshop on Bibliometrics and Research Policy (NWB2021), Odense, Denmark, 3-5 november 2021.
Projects
Data as Impact Lab2022-01-102022-01-102025-02-01Bibliographically approved