Example: Zinc Powder from a Supplier#

Let’s describe an instance of some zinc powder with a set of properties defined in the specification sheet from the manufacturer!

This example covers a few topics:

  • How to describe a resource using ontology terms and JSON-LD [Basic]

  • How to run a simple query using SPARQL [Moderate]

  • How to use the ontology to fetch more information from other sources [Advanced]

A live version of this notebook is available on Google Colab here

Describe the powder using ontology terms in JSON-LD [Basic]#

In this example, we will describe a commercially available zinc powder that is available from the chemical company Sigma Aldrich. We will first describe some general information (e.g. what it is, who manufactured it, where more information can be found, etc.) and then we will describe a simple property of the powder.

The JSON-LD description that we will use is:

[31]:
jsonld = {
  "@context": "https://w3id.org/emmo/domain/electrochemistry/context",
  "@type": ["Zinc", "Powder"],
  "schema:manufacturer": {
      "@id": "https://www.wikidata.org/wiki/Q680841",
      "schema:name": "Sigma-Aldrich"
  },
  "schema:productID": "324930",
  "schema:url": "https://www.sigmaaldrich.com/NO/en/product/aldrich/324930",
  "hasProperty": [
      {
        "@type": ["D95ParticleSize", "ConventionalProperty"],
        "hasNumericalPart": {
              "@type": "Real",
              "hasNumericalValue": 150
        },
        "hasMeasurementUnit": "emmo:MicroMetre",
        "dc:source": "https://www.sigmaaldrich.com/NO/en/product/aldrich/324930"
      }
  ]
}

Let’s break this description down, little by little.

@context: the context is a hallmark of JSON-LD descriptions that provides a kind of dictionary to help translate between human-readable labels and machine-readable IRIs. You can use the generic context available from the domain ontology, or define your own.

@type: describes the parent class(es) of the individual thing we are describing. While most things will only have one parent, it is possible to inherit from multiple parents. In this example, our thing inherits from both Zinc and Powder - becoming a Zinc Powder.

The next three properties use the schema.org vocabulary to make some generic statements about the product, such as its manufacturer, product ID, and URL. Notice that the value of schema:manufacturer is a node with an @id value of https://www.wikidata.org/wiki/Q680841 and described by a property schema:name. This allows us to consistently refer to the manufacturer by a persistent and uniquie identifier (e.g. its wikidata URL) and describe it with a human-readable name (“Sigma-Aldrich”).

We use the EMMO term hasProperty to assign the value and unit to a quantity. We again use @type to give information about the type of property we are defining. In this case, we again use multiple inheretance to state that this is the D95ParticleSize and it is a ConventionalProperty. A ConventionalProperty is an EMMO term that indicates that the property is obtained by convention (e.g. from a manufacturer’s specification sheet) rather than being explicitly measured.

Quantities always have two parts: a number and a unit. This is achieved in EMMO rules through the use of hasNumericalPart and hasMeasurementUnit. In this case, we say that the numerical part is a real number with a value of 150 and the measurement unit is micrometre. Finally, we can use the Dublin Core terms to define a source for where this information came from, such as a URL or other persistent and unique identifier (e.g. a DOI).

But JSON-LD just offers a handy way to generate human-readable linked data graphs. To really take advantage of its potential, we need to convert it into a machine-readable graph for querying and other operations.

Parse this description into a graph#

Now let’s see how a machine would process this data by reading it into a Graph! First, we install and import the python dependencies that we need for this example.

[32]:
# Install and import dependencies
!pip install jsonschema rdflib requests matplotlib > /dev/null

import json, rdflib, requests, sys
from IPython.display import Image, display
import matplotlib.pyplot as plt

We create the graph using a very handy python package called rdflib, which provides us a way to parse our JSON-LD data, run some queries using the language SPARQL, and serialize the graph in any RDF compatible format (e.g. JSON-LD, Turtle, etc.).

[33]:
# Create a new graph
g = rdflib.Graph()

# Parse our json-ld data into the graph
g.parse(data=json.dumps(jsonld), format="json-ld")
[33]:
<Graph identifier=Nb5552bc1d06041c895f1772067933789 (<class 'rdflib.graph.Graph'>)>

Query the graph using SPARQL [Moderate]#

Now, let’s write a SPARQL query to get back some information…like what is the name of the manufacturer?

SPARQL queries reflect the same basic subject-predicate-object (node-edge-node) structure of triples. Variables can be used in place of any of the three parts to query for triples that match the pattern. Multiple lines can be combined to yield more advanced queries.

In this example, we say that we are looking for the value of some variable ?manufacturerName that matches the patterns:

?thing schema:manufacturer ?manufacturer .

&

?manufacturer schema:name ?manufacturerName .

Roughly translated into English, the query reads: select all values for the variable manufacturerName, where a thing is manufactured by some manufacturer and the manufacturer has a name manufacturerName. We can execute this query on the grpah using rdflib:

[34]:
query_txt = """
PREFIX schema: <https://schema.org/>

SELECT ?manufacturerName
WHERE {
  ?thing schema:manufacturer ?manufacturer .
  ?manufacturer schema:name ?manufacturerName .
}
"""

# Execute the SPARQL query
results = g.query(query_txt)

# Print the results
for row in results:
    print(row)

(rdflib.term.Literal('Sigma-Aldrich'),)

We can now see that the name of the manufacturer is ‘Sigma-Aldrich’.

Fetch additional information from other sources [Advanced]#

Ontologies contain a lot of information about the meaning of things, but they don’t always contain an exhaustive list of all the properties. Instead, they often point to other sources where that information exists rather than duplicating it. Let’s see how you can use the ontology to fetch additional information from other sources.

Wikidata is a free and open knowledge base that can be read and edited by both humans and machines. Wikidata acts as central storage for the structured data of its Wikimedia sister projects including Wikipedia, Wikivoyage, Wiktionary, Wikisource, and others. EMMO ontologies often include links to Wikidata terms where applicable to provide users with additional information. Let’s give it a try!

First, we parse the ontology into the knowledge graph and retrieve the IRIs for the terms that we are interested in. In this case, we want to retrieve more information about Zinc from Wikidata, so we query the ontology to find Zinc’s Wikidata ID.

[35]:
# Parse the ontology into the knowledge graph
ontology = "https://w3id.org/emmo/domain/electrochemistry/inferred"
g.parse(ontology, format='turtle')

# Fetch the context
context_url = 'https://w3id.org/emmo/domain/electrochemistry/context'
response = requests.get(context_url)
context_data = response.json()

# Look for the IRI of Zinc in the context
zinc_iri = context_data.get('@context', {}).get('Zinc')
wikidata_iri = context_data.get('@context', {}).get('wikidataReference')

# Query the ontology to find the wikidata id for zinc
query_txt = """
SELECT ?wikidataId
WHERE {
    <%s> <%s> ?wikidataId .
}
""" % (zinc_iri, wikidata_iri)

results = g.query(query_txt)
for row in results:
    wikidata_id = row.wikidataId.split('/')[-1]

print(f"The Wikidata ID of Zinc: {wikidata_id}")
print(f"The whole URL is: {row.wikidataId}")
The Wikidata ID of Zinc: Q758
The whole URL is: https://www.wikidata.org/wiki/Q758

Now that we have the Wikidata ID for Zinc, we can query their SPARQL endpoint to retrieve some property. Let’s ask it for the atomic mass.

[36]:
# Query the Wikidata knowledge graph for more information about zinc
wikidata_endpoint = "https://query.wikidata.org/sparql"

# SPARQL query to get the atomic mass of zinc (Q758)
query = """
SELECT ?mass WHERE {
  wd:%s wdt:P2067 ?mass .
}
""" % wikidata_id

# Execute the request
response = requests.get(wikidata_endpoint, params={'query': query, 'format': 'json'})
data = response.json()

# Extract and print the mass value
mass = data['results']['bindings'][0]['mass']['value']
print(f"Wikidata says the atomic mass of zinc is: {mass}")
Wikidata says the atomic mass of zinc is: 65.38

We can also retrieve more complex data. For example, let’s ask Wikidata to show us an image of zinc.

[37]:
# SPARQL query to get the image of zinc (Q758)
query = """
SELECT ?image WHERE {
  wd:%s wdt:P18 ?image .
}
""" % wikidata_id

# Execute the request
response = requests.get(wikidata_endpoint, params={'query': query, 'format': 'json'})
data = response.json()

# Extract and display the image URL
if data['results']['bindings']:
    image_url = data['results']['bindings'][0]['image']['value']
    print(f"Image of Zinc: {image_url}")
    display(Image(url=image_url, width=300))  # Adjust width and height as needed

else:
    print("No image found for Zinc.")
Image of Zinc: http://commons.wikimedia.org/wiki/Special:FilePath/Zinc%20fragment%20sublimed%20and%201cm3%20cube.jpg

Isn’t that cool?! And we’re just scratching the surface of what is possible with linked data, ontologies, and knowledge graphs. Keep checking out more examples to explore the possibilities!