DEKDIV Help Document

Data Enrichment, Knowledge Discovery and Interactive Visualization Tool

Introduction

Welcome to DEKDIV, a Linked-Data-driven Web portal for the field of learning analytics. The purpose of this portal is to allow users to browse Linked datasets, search for researchers, interact with dynamic visualizations, and perform in-depth analysis. Based on the provided datasets in Learning Analytics Knowledge (LAK) and Educational Data Mining (EDM), we first enriched the datasets with geographic locations of research institutes and study topics extracted from papers.
The functionalities of this Web portal were designed and implemented using enriched RDF data, and can be divided into two groups. The first set of functionality was developed to provide dynamic visualization and animation of the LAK/EDM linked datasets, modules such as "Conference Participants" and "Reference Map" are examples of this. The second set were designed for advanced analysis and knowledge discovery, such functions are accessible through the "Scholar Similarity" and "Reviewer Recommendation" modules, for example. DEKDIV is highly modular, having been designed from the ground-up, and therefore can be easily migrated and reused in other projects.

Hot Topics for Conferences

This module shows the hot topics in a conference. The topics are ranked based on the number of publications on this topic (i.e., top topics have higher number of related publications). Clicking on any of the topic will show the papers and researchers on this topic. Also, clicking on any of the paper or the researcher will show detailed information.

Active Scholars in Conferences

This module discovers the most productive researchers based on the number of their publications. We select the top 30 authors who have the highest number of papers, and display them in the center of the visualization canvas. The number after the name of each author denotes the number of papers that this author has published in this conference. In addition to the most active scholars, we also find and visualize the most popular topics based on the number of publications in this topic. Hovering mouse on one of the authors, users can see the topics that are related to this author. Similarly, moving mouse on one topic will show all the authors who have publications on this topic.

Collaborative Institutes in Conferences

This module is designed to help users explore the spatial distribution of co-authorship. We visualize the institutes as small red dots on a world scale map, and a blue link between two institutes indicates research collaborations between authors from these institutes. Moving the mouse over links and nodes, users can see additional details about each of the institutes and publication information. Using this geo-visualization, one can see spatial patterns of research collaborations. For example, some researchers prefer to collaborate domestically, as opposed to internationally, for certain conferences. This is evident in the institute-collaboration pattern seen in the EDM 2013 dataset, which shows primarily US-based collaborations. Conversely, conferences, such as LAK 2013, consist of more internationally oriented authors.

Conference Participants

This module was developed to provide an interactive visualization and animation of the geographic distributions of EDM and LAK conference authors. The size of the yellow circles represents the number of participants that attended the conference; the larger the circle, the more attendees from the mapped institute. Hovering the mouse over a circle on the map displays the total number of authors who participated in the conference. From a purely visual perspective, it appears that the geographic distribution of participants is strongly biased towards the region in which the event took place. For example, LAK 2013 was held in Leuven, Belgium, and attracted many European researchers, while many participants of EDM 2013, which was held in Memphis, USA, are from the United States.

Academic Network

This module is constructed to show an interactive visualization of an author's academic network based on co-author links. A graph-node approach (often found in social network analysis) has been implemented to connect authors with each other through their LAK/EDM publications. This technique views the academic network as a set of relationships composed of nodes and links. In this module, each author is presented as a node in the network, and a link connects two nodes if co-authorship exists between these two authors. The total number of co-authorships between two authors is recorded as an attribute of the link and is visually represented though the stroke width of each link. In addition to the visualization of the academic network, measurements of the network are also provided. For example, the centrality of a node represents its relative importance within a network, and four types of centrality are visually presented in this module when a node is selected.

Scholar Similarity

This module measures the similarity among authors in the LAK dataset using a Multidimensional Scaling (MDS) approach. The full text publications of each author have been selected as input for a Latent Dirichlet allocation (LDA) model consisting of 20 topics. Given these topics, each author can then be described as a distribution across these topics which we term as signatures. Using the Jensen-Shannon divergence method, each author in the dataset is compared to each other author producing a dissimilarity measure bounded between 0 and 1. This resulting matrix of dissimilarity values is used as input to MDS which produces a 2D representation of the authors in space. Authors that are closer together in space are more similar in their research topics compared with those further apart.

Coauthor TreeMap

Ths TreeMap visualization shows the co-authors of a researcher, as well as their institutes. Clicking on the universities will show the individual authors with pictures. The size of the colored fields indicates the number of co-authors from the particular institution. The module does not only contain the LAK data but collects co-author information from Microsoft's Academic Search. While the overall data quality is good, errors in Microsoft's identity resolution will also appear in the TreeMap.

Citation Map

This module allows users to visually explore the spatial patterns of citations. We integrate Microsoft's Academic Search and OpenStreetMap, and allow users to search publications and their corresponding citations through topic keywords or authors' names. We then geolocate publications using the first author's institution, and dynamically retrieve map citation information all over the world. Each author is visualized as an icon on the map, and a link indicates that this author has cited the searched paper. Hovering mouse on the links, one can find the number of citations from this author. Clicking on each icon, users can find detailed information about the citation. This module also shows the top 10 authors who have cited the searched paper most frequently.

Key Concepts of a Paper

This module allows the important concepts of paper, which are extracted using the Alchemy API. Each concept is represented as a bubble, and is associated with a relevance value towards the content of the paper. Hovering the mouse on any of these bubbles, one can find the relevance value. By visualizing these concepts, we are hoping that one can grasp the general topic of the paper without having to look into the paper itself.

Reference Map

This module presents an animated and interactive visualization demonstrating the geographic distribution of a paper's references. Using the reference data provided by the LAK dataset, we extract the names of the first authors in the reference records, and use MAS to find the institutions of these authors. We then visualize these institutions as smaller bubbles on a global map, and create animated links from these bubbles to the location of the published paper, where a larger bubble is created. The message that we are trying to deliver through this visualization is: scientific publications (i.e., the smaller bubbles) emerged at different locations of the world, their ideas were then absorbed (i.e., the animated linked) by a researcher, and a new paper (i.e., the bigger red bubble) was created and published. When hovering over the bubbles and links, users can see additional information about authors, institutions, and each reference.

Reviewer Recommendation

This module is built on the concept that researchers who have already published on a topic have the potential to become good reviewers for new papers on the same topic. This tool uses LDA to generate a set of topics from a corpus of full text publications. The authors of these publications are then defined as a distribution across the topic space. Given a new submission (in the form of an abstract), the topic distribution of this material can be inferred from the existing LDA topics. Using the Jensen-Shannon divergence method, the topic distribution of the new material is compared with the topic distributions of all authors, and authors are then ranked based on the similarity value ( defined as 1 – JSD dissimilarity value). To avoid interest conflicts, we compare the list of potential reviewers with the researcher's previous co-authors. Those co-authors are then highlighted to reflect the conflicts.

Potential Collaborators

This analytical function is designed to find out the researchers who have similar research interests but may have never co-authored a paper before (i.e., the researchers that can potentially become collaborators). The similarity of researchers is calculated using cosine similarity measure based on researchers' expertise imported from Arnetminer. We then calculate the shortest network distance based on the co-authorships in the module of Academic Network. In the visualization, the blue icon represents the current researcher and the surrounding grey icons display his/her potential collaborators. Researchers who have already co-authored papers before will have a link, while no link indicates there exist no co-authored paper in the LAK dataset.

Research Topic Trends

This module demonstrates how research topics in the LAK or the EDM conferences trend over time. We extract the top 10 topics for each year ranked by total number of papers which contain the topic keywords. The EDM conferences, for instance, has 46 distinct topics extracted from 2008 to 2013. Frequencies for all topics were calculated across all years in a time-series format for further visualization and analysis. Using the interactive stream graph, it is clear to see the decline of certain topics, emergence of new topics, as well as trend expansions through time. The user interface enables interaction through mouse hover, reporting the chosen topic along with the number of papers associated with the topic in each year.