Anton Vasetenkov is a software engineer.

Boole is your local analytics consultant. Based in Auckland, New Zealand, Boole offers web analytics services including Google Analytics tracking installation, data reporting, and optimisation. Boole's services range from auditing to consulting and teaching. A digital analytics expert, Boole provides analytics debugging and troubleshooting services to help you fix your existing Google Analytics setup. Boole works with a range of industry leading tools such as Google Analytics, Google Optimise, Google Data Studio, Google Tag Manager, Hotjar, Facebook Pixel, and others to help you get the most of your marketing.

An introduction to WikiPathways by Tett Bioinformatics is an overview of the collaboratively edited structured biological pathway database that discusses the history of the project, applications of the open dataset, and ways to access the data programmatically.

Hesper's article about question answering explains how question answering helps extract information from unstructured data and why it will become a go-to NLP technology for the enterprise.

Read more about how document understanding AI works, what its industry use cases are, and which cloud providers offer this technology as a service.

Lexemes are Wikidata's new type of entity used for storing lexicographical information. The article explains the structure of Wikidata lexemes and ways to access the data, and discusses the applications of the linked lexicographical dataset.

Boole is a marketing consultant in Auckland, New Zealand. Boole helps you optimise your website content, product, and services through A/B testing, personalisation, and product recommendations supported by accurate and timely measurement of your key business metrics such as web conversion rates.

Tett is a bioinformatics consultant in Auckland, New Zealand. Tett Bioinformatics offers bioinformatics services including genomics and biomedical data analysis and discovery.

The guide to exploring linked COVID-19 datasets describes the existing RDF data sources and ways to query them using SPARQL. Such linked data sources are easy to interrogate and augment with external data, enabling more comprehensive analysis of the pandemic both in New Zealand and internationally.

The introduction to the Gene Ontology graph published by Tett outlines the structure of the GO RDF model and shows how the GO graph can be queried using SPARQL.

Hesper is a knowledge management and data integration consultant in Auckland, New Zealand. Hesper's insights into state-of-the-art data, information, and knowledge management enable it to help organisations reassess their data analysis, integration, and enrichment approaches in light of advanced semantic technologies that are evolving every day. Enterprise knowledge graphs, knowledge bases, ontologies, and taxonomies are emerging technologies that support better decision-making and knowledge integration and enable automated knowledge inference over internal and external data.

The overview of the Nobel Prize dataset published by Hesper demonstrates the power of Linked Data and demonstrates how linked datasets can be queried using SPARQL. Use SPARQL federation to combine the Nobel Prize dataset with DBPedia.

Learn why federated queries are an incredibly useful feature of SPARQL.

As digital products and services are becoming more and more complex, so are the technical requirements for correctly implementing user measurement and analytics. Boole helps you better understand your audience by setting up measurement using Google Analytics.

What are the best online Arabic dictionaries?

How to pronounce numbers in Arabic?

List of months in Maori.

Days of the week in Maori.

The list of country names in Tongan.

The list of IPA symbols.

What are the named entities?

What is computational linguistics?

Learn how to use the built-in React hooks.

Learn how to use language codes in HTML.

Learn about SSML.

Browse the list of useful UX resources from Google.

Where to find the emoji SVG sources?.

What is Wikidata?

What's the correct markup for multilingual websites?

How to use custom JSX/HTML attributes in TypeScript?

Learn more about event-driven architecture.

Where to find the list of all emojis?

How to embed YouTube into Markdown?

What is the Google Knowledge Graph?

Learn SPARQL.

Explore the list of coronavirus (COVID-19) resources for bioinformaticians and data science researchers.

Sequence logos visualize protein and nucleic acid motifs and patterns identified through multiple sequence alignment. They are commonly used widely to represent transcription factor binding sites and other conserved DNA and RNA sequences. Protein sequence logos are also useful for illustrating various biological properties of proteins. Create a sequence logo with Sequence Logo. Paste your multiple sequence alignment and the sequence logo is generated automatically. Use the sequence logo maker to easily create vector sequence logo graphs. Please refer to the Sequence Logo manual for the sequence logo parameters and configuration. Sequence Logo supports multiple color schemes and download formats.

Sequence Logo is a web-based sequence logo generator. Sequence Logo generates sequence logo diagrams for proteins and nucleic acids. Sequence logos represent patterns found within multiple sequence alignments. They consist of stacks of letters, each representing a position in the sequence alignment. Sequence Logo analyzes the sequence data inside the user's web browser and does not store or transmit the alignment data via servers.

Te Reo Maps is an online interactive Maori mapping service. All labels in Te Reo Maps are in Maori, making it the first interactive Maori map. Te Reo Maps is the world map, with all countries and territories translated into Maori. Please refer to the list of countries in Maori for the Maori translations of country names. The list includes all UN members and sovereign territories.

Phonetically is a web-based text-to-IPA transformer. Phonetically uses machine learning to predict the pronunciation of English words and transcribes them using IPA.

Punycode.org is a tool for converting Unicode-based internationalized domain names to ASCII-based Punycode encodings. Use punycode.org to quickly convert Unicode to Punycode and vice versa. Internationalized domains names are a new web standard that allows using non-ASCII characters in web domain names.

Bioinformatically is an online journal about everything bioinformatics. It includes industry news, research highlights, and a variety of editorials. Bioinformatically helps you start your day with everything you need to know and a dash of fun.

My Sequences is an online platform for storing and analyzing personal sequence data. My Sequences allows you to upload your genome sequences and discover insights and patterns in your own DNA.

Словообразовательный словарь «Морфема» дает представление о морфемной структуре слов русского языка и слов современной лексики. Для словообразовательного анализа представлены наиболее употребительные слова современного русского языка, их производные и словоформы. Словарь предназначен школьникам, студентам и преподавателям. Статья разбора слова «сладкоежка» по составу показывает, что это слово имеет два корня, соединительную гласную, суффикс и окончание. На странице также приведены слова, содержащие те же морфемы. Словарь «Морфема» включает в себя не только те слова, состав которых анализируется в процессе изучения предмета, но и множество других слов современного русского языка. Словарь адресован всем, кто хочет лучше понять структуру русского языка.

COVID-19 drugs dataset

Tongan names of countries dataset

Maori names of countries dataset

Разбор слова "машина" по составу.

Разбор слова "лесник" по составу.

Разбор слова "солнышко" по составу.

Разбор слова "пятнышко" по составу.

Разбор слова "удаваться" по составу.

Разбор слова "весенний" по составу.

Разбор слова "лесной" по составу.

Разбор слова "колеблемый" по составу.

Разбор слова "солнце" по составу.

Разбор слова "почернеть" по составу.

Разбор слова "влажный" по составу.

Hesper
InsightsTopics
Contact
InsightsTopicsContact
Hesper
Your local knowledge engineering guru.
Learn more about the services provided by Hesper here.
Article

Data exploration on linked COVID-19 datasets

An overview of the available RDF datasets and discovery tools for COVID-19.
Updated 31 Aug 2020
Anton Vasetenkov

As efforts to fight the ongoing pandemic of COVID-19 caused by SARS-CoV-2 are ramping up across the world, more and more authoritative and high-quality datasets are becoming available for research and analysis.

Official COVID-19 datasets are published by governments in a variety of different formats and normally do not follow the same structure. Aggregating them is essential for getting a unified, global view of the pandemic.

When published as linked data in the RDF format, datasets "automatically" become part of the global data graph that connects all linked data sources. The interconnected data can be viewed and analysed as a single dataset which is key to revealing new information and generating new insights.

There are a number of linked datasets for COVID-19 covering all aspects of the disease and the pandemic which can be jointly queried using SPARQL.

Wall

The data, tools, and sample queries

The popular source of linked data, including the data related to the pandemic, is Wikidata. Wikidata is the central storage for the structured data used by Wikipedia and other Wikimedia projects, and can be easily queried using the Wikidata Query Service and offers a SPARQL query endpoint for remotely tapping in to the data and joining it with local RDF datasets.

The following query returns the number of COVID-19 cases recorded globally over time and is a good starting point for exploring the COVID-19 data stored in Wikidata:

SELECT ?date ?cases
WHERE {
    wd:Q81068910 p:P1603 ?casesNode .
    ?casesNode ps:P1603 ?cases ;
               pq:P585 ?date .
}
ORDER BY ASC(?date)

Try this query using the Wikidata Query Service

Results (truncated):

datecases
20/01/2020282
21/01/2020314
......
01/08/202017,396,943
02/08/202017,660,523
......

Another useful resource is the RDFised version of The New York Times' COVID-19 dataset provided by Stardog. This dataset contains the cumulative counts of coronavirus cases in the United States at the state and county level. The RDF dataset is quite extensive (over 2 million triples) and is available in Stardog Studio.

For example, this is how the fact that as of 1 August 2020, 190,693 cases have been recorded in Los Angeles County, California (FIPS code 06-037) is represented in the dataset:

@prefix : <http://api.stardog.com/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<urn:uuid:648d5c0b-89b9-42f6-b06b-b65109d25a3e>
    a :Report ;
    :date "2020-08-01"^^xsd:date ;
    :county :CountyLos%20Angeles-California ;
    :cases "190693"^^xsd:integer .
:CountyLos%20Angeles-California
    a :County ;
    rdfs:label "Los Angeles, California" ;
    :state :California ;
    :fips "06037" .
:California
    a :State ;
    rdfs:label "California" .

To get the number of cases in the county over time, this query can be used:

SELECT ?date ?cases
WHERE {
    ?report a :Report ;
            :date ?date ;
            :county ?county ;
            :cases ?cases .
    ?county :fips "06037" .
}
ORDER BY ASC(?date)

Results (truncated):

datecases
26/01/20201
27/01/20201
......
01/08/2020190,693
02/08/2020192,167
......

Being a linked dataset, this dataset can be joined with the data from Wikidata by means of SPARQL federation. For example, the population data for each county stored in Wikidata can be easily joined with the case statistics from The New York Times' dataset, which allows the occurrence to be calculated:

PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>

SELECT ?countyLabel ?cases ?population (?cases / ?population AS ?occurrence)
WHERE {
    ?report a :Report ;
        :date "2020-08-01"^^xsd:date ;
        :county ?county ;
        :cases ?cases .
    ?county rdfs:label ?countyLabel ;
        :state :California ;
        :fips ?fips .
    {
        SELECT ?fips ?population
        WHERE {
            SERVICE <https://query.wikidata.org/sparql> {
                ?countyWd wdt:P882 ?fips ;
                    wdt:P1082 ?population .
            }
        }
    }
}
ORDER BY DESC(?occurrence)

Results (truncated):

countyLabelcasespopulationoccurrence
Imperial, California9,409181,2150.051922
Kings, California4,380152,9400.028639
Kern, California20,061900,2020.022285
Tulare, California9,454466,1950.020279
Lassen, California59830,5730.019560
............
Trinity, California612,2850.000488
Sierra, California13,0050.000333
Modoc, California28,8410.000226

The results show that as at 1 August 2020, of the 58 California counties, Imperial, Kings, and Kern demonstrate the highest rates of COVID-19.

Data on COVID-19 in New Zealand

There are no linked RDF datasets covering the COVID-19 pandemic in New Zealand in detail that can be found online. The existing data can however be RDFised and additionally interlinked with other datasets through the use of Wikidata identifiers (URIs) for specifying regions and district health boards. This can bring rich possibilities for reliably joining and augmenting the data with geographic, demographic, relief, and mobility data from Wikidata and other data providers. This can also be achieved by consistently using owl:sameAs instead of utilising Wikidata URIs or other external URIs directly.

As an example, this is how the fact that 3 new confirmed COVID-19 cases were recorded on 24 August 2020 by Auckland District Health Board (wd:Q24189683) can be represented in RDF:

@prefix : <http://example.org/> .
@prefix wd: <http://www.wikidata.org/entity/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

:r0
    a :Report ;
    :date "2020-08-24"^^xsd:date ;
    :dhb wd:Q24189683 ;
    :newCases "3"^^xsd:integer .

This data can be linked with the facts available in Wikidata, such as the fact that Auckland District Health Board is located in Auckland (wd:Q37100) which in 2018 had the population of 1,467,800.

In conclusion

The more COVID-19 datasets are published as linked data, the more data integration and enrichment techniques become possible. SPARQL's built-in federation capabilities make it easier to query such interlinked datasets which facilitates the comprehensive analysis of the pandemic both in New Zealand and globally.

Last updated on 31 Aug 2020 by Anton Vasetenkov.
Interested?
Get in touch now.
Hesper
Your local knowledge engineering guru.
Copyright © 2020 Hesper NZ. Various trademarks held by their respective owners.