{ "nbformat": 4, "nbformat_minor": 0, "metadata": { "colab": { "provenance": [] }, "kernelspec": { "name": "python3", "display_name": "Python 3" }, "language_info": { "name": "python" } }, "cells": [ { "cell_type": "markdown", "source": [ "# Example: Simple Battery Cell Metadata\n", "\n", "Let's describe an instance of a simple CR2032 coin cell with a capacity defined in a specification sheet from the manufacturer!\n", "\n", "This example covers a few topics: \n", "\n", "- How to describe a resource using ontology terms and JSON-LD \n", "- How machines convert JSON-LD into triples \n", "- What is the meaning of the subject, predicate, and object identifiers \n", "- How to run a simple query using SPARQL **[Moderate]** \n", "- How to use the ontology to fetch more information from other sources **[Advanced]** \n", "\n", "A live version of this notebook is available on Google Colab [here](https://colab.research.google.com/drive/10F5YRAnO5ubY4Ut3uEjv5rLqvr_GRFC5?usp=sharing)\n" ], "metadata": { "id": "1wseTQGaB4x9" } }, { "cell_type": "markdown", "source": [ "## Describe the powder using ontology terms in JSON-LD format\n", "The JSON-LD data that we will use is:" ], "metadata": { "id": "jcTVz9-DEh3m" } }, { "cell_type": "code", "source": [ "jsonld = {\n", " \"@context\": \"https://raw.githubusercontent.com/emmo-repo/domain-battery/master/context.json\",\n", " \"@type\": \"CR2032\",\n", " \"schema:name\": \"My CR2032 Coin Cell\",\n", " \"schema:manufacturer\": {\n", " \"@id\": \"https://www.wikidata.org/wiki/Q3041255\",\n", " \"schema:name\": \"SINTEF\"\n", " },\n", " \"hasProperty\": {\n", " \"@type\": [\"NominalCapacity\", \"ConventionalProperty\"],\n", " \"hasNumericalPart\": {\n", " \"@type\": \"Real\",\n", " \"hasNumericalValue\": 230\n", " },\n", " \"hasMeasurementUnit\": \"emmo:MilliAmpereHour\"\n", " }\n", " }" ], "metadata": { "id": "gohQKEBrF2QP" }, "execution_count": 42, "outputs": [] }, { "cell_type": "markdown", "source": [ "## Parse this description into a graph\n", "Now let's see how a machine would process this data by reading it into a Graph!\n", "\n", "First, we install and import the python dependencies that we need for this example." ], "metadata": { "id": "in30p-x4H91Y" } }, { "cell_type": "code", "source": [ "# Install and import dependencies\n", "!pip install jsonschema rdflib requests matplotlib > /dev/null\n", "\n", "import json\n", "import rdflib\n", "import requests\n", "import sys\n", "from IPython.display import Image, display\n", "import matplotlib.pyplot as plt" ], "metadata": { "id": "wk4sFl_eA2ML" }, "execution_count": 43, "outputs": [] }, { "cell_type": "markdown", "source": [ "We create the graph using a very handy python package called [rdflib](https://rdflib.readthedocs.io/en/stable/), which provides us a way to parse our json-ld data, run some queries using the language [SPARQL](https://en.wikipedia.org/wiki/SPARQL), and serialize the graph in any RDF compatible format (e.g. JSON-LD, Turtle, etc.)." ], "metadata": { "id": "lotp-0QABV-2" } }, { "cell_type": "code", "source": [ "# Create a new graph\n", "g = rdflib.Graph()\n", "\n", "# Parse our json-ld data into the graph\n", "g.parse(data=json.dumps(jsonld), format=\"json-ld\")\n", "\n", "# Create a SPARQL query to return all the triples in the graph\n", "query_all = \"\"\"\n", "SELECT ?subject ?predicate ?object\n", "WHERE {\n", " ?subject ?predicate ?object\n", "}\n", "\"\"\"\n", "\n", "# Execute the SPARQL query\n", "all_the_things = g.query(query_all)\n", "\n", "# Print the results\n", "for row in all_the_things:\n", " print(row)\n" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "zWibLw6NIrrq", "outputId": "6be74891-73f3-43ff-a4d1-29b6697f8b11" }, "execution_count": 44, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "(rdflib.term.BNode('N4c3bba051ecb4cb7a8336502c67cf29b'), rdflib.term.URIRef('http://www.w3.org/1999/02/22-rdf-syntax-ns#type'), rdflib.term.URIRef('file:///content/NominalCapacity'))\n", "(rdflib.term.BNode('N4c52ea3012a7451c8194bcd5f42b1679'), rdflib.term.URIRef('https://schema.org/manufacturer'), rdflib.term.URIRef('https://www.wikidata.org/wiki/Q3041255'))\n", "(rdflib.term.BNode('Nc3ad291a291c481481cd4df5c311af50'), rdflib.term.URIRef('http://www.w3.org/1999/02/22-rdf-syntax-ns#type'), rdflib.term.URIRef('http://emmo.info/emmo#EMMO_18d180e4_5e3e_42f7_820c_e08951223486'))\n", "(rdflib.term.BNode('N4c3bba051ecb4cb7a8336502c67cf29b'), rdflib.term.URIRef('http://emmo.info/emmo#EMMO_bed1d005_b04e_4a90_94cf_02bc678a8569'), rdflib.term.URIRef('http://emmo.info/emmo#MilliAmpereHour'))\n", "(rdflib.term.BNode('N4c3bba051ecb4cb7a8336502c67cf29b'), rdflib.term.URIRef('http://emmo.info/emmo#EMMO_8ef3cd6d_ae58_4a8d_9fc0_ad8f49015cd0'), rdflib.term.BNode('Nc3ad291a291c481481cd4df5c311af50'))\n", "(rdflib.term.BNode('N4c52ea3012a7451c8194bcd5f42b1679'), rdflib.term.URIRef('http://www.w3.org/1999/02/22-rdf-syntax-ns#type'), rdflib.term.URIRef('http://emmo.info/battery#battery_b61b96ac_f2f4_4b74_82d5_565fe3a2d88b'))\n", "(rdflib.term.BNode('Nc3ad291a291c481481cd4df5c311af50'), rdflib.term.URIRef('http://emmo.info/emmo#EMMO_faf79f53_749d_40b2_807c_d34244c192f4'), rdflib.term.Literal('230', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#integer')))\n", "(rdflib.term.BNode('N4c52ea3012a7451c8194bcd5f42b1679'), rdflib.term.URIRef('https://schema.org/name'), rdflib.term.Literal('My CR2032 Coin Cell'))\n", "(rdflib.term.BNode('N4c3bba051ecb4cb7a8336502c67cf29b'), rdflib.term.URIRef('http://www.w3.org/1999/02/22-rdf-syntax-ns#type'), rdflib.term.URIRef('http://emmo.info/emmo#EMMO_d8aa8e1f_b650_416d_88a0_5118de945456'))\n", "(rdflib.term.BNode('N4c52ea3012a7451c8194bcd5f42b1679'), rdflib.term.URIRef('http://emmo.info/emmo#EMMO_e1097637_70d2_4895_973f_2396f04fa204'), rdflib.term.BNode('N4c3bba051ecb4cb7a8336502c67cf29b'))\n", "(rdflib.term.URIRef('https://www.wikidata.org/wiki/Q3041255'), rdflib.term.URIRef('https://schema.org/name'), rdflib.term.Literal('SINTEF'))\n" ] } ] }, { "cell_type": "markdown", "source": [ "You can see that our human-readable JSON-LD file has been transformed into some nasty looking (but machine-readable!) triples. Let's look at a couple in more detail to understand what's going on.

\n", "\n", "## Examine and explore the triples\n", "\n", "Let's start with this one:\n", "\n", "|   |   |\n", "|-----------|--------------------------------------|\n", "| subject | https://www.wikidata.org/wiki/Q3041255 |\n", "| predicate | https://schema.org/name |\n", "| object | ‘SINTEF |\n", "\n", "This tells the machine that something with a wikidata identifier has a property called 'name' from the schema.org vocabulary with a literal value '**SINTEF**'. These identifiers serve not only as persistent and unique identifiers for the concepts, but also point to a place where a machine can go to learn more about what it is. Try it yourself! Click on one and see where it takes you!

\n", "\n", "\n", "*Neat, right?!* Let's look at another one:\n", "\n", "|   |   |\n", "|-----------|--------------------------------------|\n", "| subject | 'Nb9d4bdc220954548a09b8b56f95d9cf3' |\n", "| predicate | http://www.w3.org/1999/02/22-rdf-syntax-ns#type |\n", "| object | http://emmo.info/battery#battery_b61b96ac_f2f4_4b74_82d5_565fe3a2d88b |\n", "\n", "\n", "\n", "This tells the machine that a certain node in the graph is a a type of some thing that exists in the EMMO domain 'battery'. And this gets to one of the difficult bits for humans: many ontologies (like EMMO) use UUIDs for term names to ensure that they are universally unique. It works, but it sacrifices the human readability. Luckily we can get around this by assigning human-readable annotations to that term and/or mapping the IRI to a human readable label in a JSON-LD context like we did above.\n", "\n", "Go ahead, click the link and see if you can figure out what this thing is...\n", "\n", "...*it's a CR2016!* Now we can see how our simple description in the JSON-LD file has now been converted to a machine-readable IRI.

\n", "\n", "## Query the graph using SPARQL [Moderate]\n", "\n", "Now, let's write a SPARQL query to get back some specific thing...like what is the name of the manufacturer?" ], "metadata": { "id": "C-w1TbxkI4W5" } }, { "cell_type": "code", "source": [ "query = \"\"\"\n", "PREFIX schema: \n", "\n", "SELECT ?manufacturerName\n", "WHERE {\n", " ?thing schema:manufacturer ?manufacturer .\n", " ?manufacturer schema:name ?manufacturerName .\n", "}\n", "\"\"\"\n", "\n", "# Execute the SPARQL query\n", "results = g.query(query)\n", "\n", "# Print the results\n", "for row in results:\n", " print(row)\n" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "6bXHGG4cI-kr", "outputId": "5c79fa6e-50a4-4fc2-c513-149bd8cd9170" }, "execution_count": 45, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "(rdflib.term.Literal('SINTEF'),)\n" ] } ] }, { "cell_type": "markdown", "source": [ "## Fetch additional information from other sources [Advanced]\n", "Ontologies contain a lot of information about the meaning of things, but they don't always contain an exhaustive list of all the properties. Instead, they often point to other sources where that information exists rather than duplicating it. Let's see how you can use the ontology to fetch additional information from other sources.\n", "\n", "First, we parse the ontology into the knowledge graph and retrieve the IRIs for the terms that we are interested in. In this case, we want to retrieve more information about CR2032 from Wikidata, so we query the ontology to find CR2032's Wikidata ID." ], "metadata": { "id": "b7LJC8BubFce" } }, { "cell_type": "code", "source": [ "# Parse the ontology into the knowledge graph\n", "ontology = \"https://raw.githubusercontent.com/emmo-repo/domain-battery/master/inferred_version/battery-inferred.ttl\"\n", "g.parse(ontology, format='turtle')\n", "\n", "# Fetch the context\n", "context_url = 'https://raw.githubusercontent.com/emmo-repo/domain-battery/master/context.json'\n", "response = requests.get(context_url)\n", "context_data = response.json()\n", "\n", "# Look for the IRI of CR2032 in the context\n", "cr2032_iri = context_data.get('@context', {}).get('CR2032')\n", "wikidata_iri = context_data.get('@context', {}).get('wikidataReference')\n", "\n", "# Query the ontology to find the wikidata id for CR2032\n", "query = \"\"\"\n", "SELECT ?wikidataId\n", "WHERE {\n", " <%s> <%s> ?wikidataId .\n", "}\n", "\"\"\" % (cr2032_iri, wikidata_iri)\n", "\n", "qres = g.query(query)\n", "for row in qres:\n", " wikidata_id = row.wikidataId.split('/')[-1]\n", "\n", "print(f\"The Wikidata ID of CR2032: {wikidata_id}\")" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "ntT1Rf_yM6sZ", "outputId": "7eb1b90f-c97e-4d1e-b311-ca9355501c2e" }, "execution_count": 46, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "The Wikidata ID of CR2032: Q5013811\n" ] } ] }, { "cell_type": "markdown", "source": [ "Now that we have the Wikidata ID for CR2032, we can query their SPARQL endpoint to retrieve some property. Let's ask it for the thickness." ], "metadata": { "id": "XGXFrNa5dKSr" } }, { "cell_type": "code", "source": [ "# Query the Wikidata knowledge graph for more information about zinc\n", "wikidata_endpoint = \"https://query.wikidata.org/sparql\"\n", "\n", "# SPARQL query to get the thickness of a CR2032 cell\n", "query = \"\"\"\n", "SELECT ?value ?unit WHERE {\n", " wd:%s p:P2386 ?statement .\n", " ?statement ps:P2386 ?value .\n", " OPTIONAL {\n", " ?statement psv:P2386 ?valueNode .\n", " ?valueNode wikibase:quantityUnit ?unit .\n", " }\n", "}\n", "\n", "\"\"\" % wikidata_id\n", "\n", "# Execute the request\n", "response = requests.get(wikidata_endpoint, params={'query': query, 'format': 'json'})\n", "data = response.json()\n", "\n", "# Extract and print the thickness value\n", "thickness = data['results']['bindings'][0]['value']['value']\n", "unit = data['results']['bindings'][0]['unit']['value']\n", "print(f\"Wikidata says the thickness of a CR2032 cell is: {thickness} {unit}\")" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "zTBOZAf-dWQQ", "outputId": "9f9d1c00-d74f-4c76-ceb5-b58b21853c41" }, "execution_count": 47, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Wikidata says the thickness of a CR2032 cell is: 20 http://www.wikidata.org/entity/Q174789\n" ] } ] }, { "cell_type": "markdown", "source": [ "We can also retrieve more complex data. For example, let's ask Wikidata to show us an image of a CR2032." ], "metadata": { "id": "-xdSIS6Idy5m" } }, { "cell_type": "code", "source": [ "# SPARQL query to get the image of the CR2032 cell (Q758)\n", "query = \"\"\"\n", "SELECT ?image WHERE {\n", " wd:%s wdt:P18 ?image .\n", "}\n", "\"\"\" % wikidata_id\n", "\n", "# Execute the request\n", "response = requests.get(wikidata_endpoint, params={'query': query, 'format': 'json'})\n", "data = response.json()\n", "\n", "# Extract and display the image URL\n", "if data['results']['bindings']:\n", " image_url = data['results']['bindings'][0]['image']['value']\n", " print(f\"Image of a CR2032- cell: {image_url}\")\n", " display(Image(url=image_url, width=300)) # Adjust width and height as needed\n", "\n", "else:\n", " print(\"No image found.\")" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 339 }, "id": "T7bkBY0sNqNY", "outputId": "c9c3bcf4-d278-4acd-a93b-5a7d553d66fd" }, "execution_count": 48, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Image of a CR2032- cell: http://commons.wikimedia.org/wiki/Special:FilePath/CR2032%20battery%2C%20KTS-2728.jpg\n" ] }, { "output_type": "display_data", "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {} } ] }, { "cell_type": "markdown", "source": [ "Finally, let's retireve the id for CR2032 in the Google Knowledge Graph and see what it has to say!" ], "metadata": { "id": "mRcFo-MBDVBW" } }, { "cell_type": "code", "source": [ "# SPARQL query to get the Google Knowledge Graph ID of the CR2032 cell\n", "query = \"\"\"\n", "SELECT ?id WHERE {\n", " wd:%s wdt:P2671 ?id .\n", "}\n", "\"\"\" % wikidata_id\n", "\n", "# Execute the request\n", "response = requests.get(wikidata_endpoint, params={'query': query, 'format': 'json'})\n", "data = response.json()\n", "\n", "# Extract and display the Google Knowledge Graph ID\n", "if data['results']['bindings']:\n", " gkgid = data['results']['bindings'][0]['id']['value']\n", " gkgns = 'https://www.google.com/search?kgmid='\n", " gkg = gkgns + gkgid\n", " print(f\"The Google Knowledge Graph entry for a CR2032 cell: {gkg}\")\n", "\n", "else:\n", " print(\"None found.\")" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "nAAC5bo8FLD6", "outputId": "d3543deb-ce22-4d90-f054-6b705c94fb49" }, "execution_count": 49, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "The Google Knowledge Graph entry for a CR2032 cell: https://www.google.com/search?kgmid=/g/11bc5qf2g9\n" ] } ] }, { "cell_type": "code", "source": [], "metadata": { "id": "T1qUAeCDVNq3" }, "execution_count": 49, "outputs": [] } ] }