In May 2018, Wikidata added support for a new kind of entities called lexemes or L-items. An important concept in lexicography and linguistic analysis, lexemes are the units of language that are used to group together words that are related through inflection. For example, the English verb run is a lexeme that refers to the set of words which includes run, runs, ran, and running—all of which share the same meaning. Capturing such lexicographical information is important, and doing so using linked data greatly increases the utility of the resulting dataset.
As efforts to fight the ongoing pandemic of COVID-19 caused by SARS-CoV-2 are ramping up across the world, more and more authoritative and high-quality datasets are becoming available for research and analysis.
Official COVID-19 datasets are published by governments in a variety of different formats and normally do not follow the same structure. Aggregating them is essential for getting a unified, global view of the pandemic.
Documents are at the center of many business processes. Scanned pages and PDFs are ubiquitous and contain large amounts of information represented as forms and tables.
Historically, this information could only be analysed and used following manual data re-entry—the process which is slow and prone to error—as traditional optical character recognition (OCR) systems haven't been able to analyse such data and preserve its inherent structure in their output.
Document understanding is concerned with advancing the abilities of document intelligence by supporting the retrieval of structured data in addition to simple text. A process that heavily relies on machine learning, it has proven key to automating structured data extraction and unlocking its full potential by making it readily accessible for subsequent processing and analysis.
Organisations deal with vast amounts of text-based data on a daily basis. Text and unstructured data in general are involved in some of the core parts of every business: customer communication, reference documentation, and reporting, to name a few.
Natural language processing (NLP) and other emerging cognitive technologies are capable of analysing textual information in ways never before possible and are becoming widely used to enhance, scale, and automate various business processes. Examples of such processes include customer support and service, internal knowledge base search, content management, and research.
These and many other examples often require reading through large amounts of text and finding exact answers to given questions. The area of NLP that is concerned with building systems capable of automating these tasks is called machine reading comprehension, or more narrowly, question answering.
Since 1901, the Nobel Prizes and the Prizes in Economic Sciences have been awarded 597 times. 950 people and organisations have received the award in the following categories: Physics, Chemistry, Physiology or Medicine, Literature, Peace, and Economic Sciences.
The official Nobel Prize Linked Data dataset is an authoritative source of information about Nobel Prizes and laureates. Importantly, the Nobel Prizes are often shared between multiple people, and the same person or organisation can receive multiple Nobel Prizes. RDF works really well for representing such relationships.
SPARQL has been widely adopted since first proposed as the query language for the Semantic Web. There are many SPARQL endpoints available today, both public and private, exposing various interlinked data sources that are all part of the global RDF data cloud.
SPARQL federation offers the mechanism for integrating RDF data distributed across multiple sources. It allows data consumers to retrieve and join data from those sources via a single query in a simple and elegant way. This way, it effectively exposes the data as a single integrated RDF graph.