Cells and Topics: Similar papers are clusters into one cell, and the topic with the biggest font is the major one shared by most papers in that cell. Topics with smaller fonts are those shared by fewer papers, but are relatd to the major topic. Topics located close to the center of the cell are closer to the major topic, whereas those on the boundary are related to both the major topic and topics in neighbor cells. Some popular keywords (e.g., "data") may show up in multiple cells in different roles (e.g., as the major topic or a secondary topic).
Landscape Metaphor:As SOM is also a clustering method, a landscape visualization has been employed to represent the significance of the clustering result. "Mountains" indicate the clusters which has high significance value, while "valleys" and "rivers" represent clusters which are less significant.
Web@25 is a collaborative project of IOS Press and UCSB STKO Lab to celebrate the 25th birthday of the Web. This project is based on the bibliographic data from the World Wide Web conference (the top conference on Web-related topics) between 1994 and 2014. The current web portal consists of three major components.
A Timeline of Topics: this component highlights the top 100 keywords per WWW conference year. Different font sizes are used to depict the relative importance (frequency) of this keyword for that conference year. While some obvious variations of the same keyword were merged (e.g., the singular and plural versions), other syntactic variations were kept deliberately. For instance, it is interesting to see how the term "World Wide Web" is increasingly replaced by "WWW" or simply "Web". This timeline component also shows the evolution of Web-related research topics, with some new topics emerging and some old topics disappearing.
Geographic Mapping: Moving away from a purely temporal perspective, the second component shows the spatial and spatiotemporal characteristics of the WWW bibliographic data. A world map has been employed to visualize the geographic distribution of topics. One can access this geographic map by double clicking a term in the timeline word clouds. This will map the specific term-year tuple. It is also possible to change the range and to add or remove certain keywords. The new range can be mapped using the "Map" button, or the year-by-year change can be shown gradually using the "Animation" button. It is worth mentioning that the list of mappable keywords includes all extracted keywords, not just the top 100 from the timeline.
Self-Organization Maps: While called a "Map", Self-Organization Map (SOM) is actually a type of artificial neural network. SOM can represent high-dimentional data onto a 2D space based on the similarity among data records. In this project, we use SOM to reveal the hidden relations among Web-related research topics, i.e., which topics are more closely related than others. We combine the WWW papers of every five years, and use SOM to specify their position on the 2D plane. Papers with similar topics are clustered together, and their shared topics will be shown as labels. Labels with larger fonts indicate higher-level topics, and therefore are shared by a larger number of papers. We also employ a landscape metaphor to indicate the significance of the clustering results. "Mountain" areas indicate a strong significance of the derived clusters, while "valleys" and "rivers" represent less significant clusters.
This web portal is built based on the Semantic Web technology stack:
SPARQL endpoint: Jena Fuseki triple store.
RDF Converter: Java as programming language and Apache Jena API for creating RDF statements.
Ontology design:CMap and Protege.