JSON-LD: A Simple Introduction Using a Person#

JSON-LD (JavaScript Object Notation for Linked Data) is a lightweight syntax to express linked data using JSON. It allows you to add semantic meaning to data by referencing concepts from ontologies or controlled vocabularies like schema.org.

In this notebook, we use the example of a person to demonstrate how to:

  • Enrich regular JSON with semantic context

  • Link data to external definitions using URIs

  • Enable data sharing in a machine-readable, interoperable format

You can think of JSON-LD as “JSON + semantics”.

From JSON to JSON-LD#

Here’s what JSON-LD adds to regular JSON:

  • @context: A mapping between your terms (e.g. "firstName") and standardized URIs that define their meaning (e.g. "schema:givenName").

  • @id: A globally unique identifier (IRI) for the entity being described.

  • @type: A type indicator, often from a vocabulary like schema:Person.

These additions allow machines—not just humans—to understand what your data is about.

A Person in JSON-LD#

Below is an example of a person described using JSON-LD. This is not just a person named “Simon Clark” — it is a semantically described entity with a globally unique identifier (@id), and relationships (such as their employer) that are also fully structured as linked data.

The @context block maps the terms used in the JSON document to well-defined concepts from external vocabularies.

For example:

"@context": "https://schema.org/"

This indicates that the terms used (like givenName, birthDate, or affiliation) come from the schema.org vocabulary. This mapping enables software agents and data systems to interpret the data consistently, beyond just reading key-value pairs.

Using JSON-LD in this way makes the data both human-readable and machine-interpretable, opening the door to powerful integration, validation, and reasoning across systems.

[1]:
import jsonschema
from jsonschema import validate
import json
import rdflib

# Regular JSON representation of a person
person_data = {
    "@context": "https://schema.org/",
    "@id": "https://orcid.org/0000-0002-8758-6109",
    "@type": "Person",
    "givenName": "Simon",
    "familyName": "Clark",
    "gender": {"@type": "Male"},
    "birthDate": "1987-04-23",
    "affiliation": {
        "@id": "https://ror.org/01f677e56",
        "name": "SINTEF",
        "@type": "ResearchOrganization"
    }
}

Validating JSON-LD Structure with a JSON Schema#

While JSON-LD enriches data with semantic meaning, it is still fundamentally JSON — which means we can use JSON Schema to validate its structure.

In the code below, we define a JSON Schema to validate the structure of a person object. This schema enforces that:

  • givenName and familyName are required strings,

  • birthDate must follow the YYYY-MM-DD format (validated with both a format and a regex),

  • affiliation and gender must be valid objects.

The validate_json() function uses the jsonschema Python package to validate the person_data object against this schema. If the data is valid, it confirms success; otherwise, it prints a validation error.

This is especially useful when:

  • Receiving data from users or external systems

  • Validating linked data before publishing or storage

  • Integrating structured data into APIs or semantic pipelines

[2]:
person_schema = {
    "type": "object",
    "properties": {
        "@context": {
            "type": ["string", "object"]  # object form if using inline mappings
        },
        "@type": {
            "type": "string",
        },
        "@id": {
            "type": "string",
            "format": "uri"
        },
        "givenName": {
            "type": "string"
        },
        "familyName": {
            "type": "string",
            "minLength": 1
        },
        "birthDate": {
            "type": "string",
            "format": "date",
            "pattern": "^[0-9]{4}-[0-1][0-9]-[0-3][0-9]$"
        },
        "gender": {
            "type": "object"
        },
        "affiliation": {
            "type": "object"
        }
    },
    "required": ["@context", "@type", "@id", "givenName", "familyName", "birthDate", "affiliation"]
}

# Function to validate JSON data against the schema
def validate_json(data, schema):
    try:
        validate(instance=data, schema=schema)
        return True, "JSON data is valid according to the schema."
    except jsonschema.exceptions.ValidationError as ve:
        return False, ve.message

# Validate the sample JSON data
is_valid, message = validate_json(person_data, person_schema)
print(message)
JSON data is valid according to the schema.

Querying JSON-LD Data with SPARQL and RDFLib#

In this section, we demonstrate how to use rdflib to work with JSON-LD data and execute SPARQL queries against it.

Step 1: Create an RDF Graph#

We start by creating an RDF graph using rdflib.Graph(), which serves as a container for all the triples (subject-predicate-object statements) derived from our data.

Step 2: Load Schema.org Vocabulary#

We load the full Schema.org vocabulary into the graph from its latest official JSON-LD release. This gives us access to the class hierarchy and definitions used in our person data, including terms like schema:Person and schema:Organization.

Step 3: Load JSON-LD Person Data#

We convert the person_data dictionary into a JSON string and parse it into the RDF graph. This integrates our structured data with the schema definitions, allowing us to query both vocabulary and instance data together.

Step 4: Run a SPARQL Query#

We execute a SPARQL query to retrieve all subclasses (direct or indirect) of schema:Organization using the rdfs:subClassOf* path operator. This is useful when you want to identify all organization-related types defined in Schema.org.

Output#

The result is a list of IRIs for types that are (transitively) subclasses of schema:Organization. This could include entities like schema:EducationalOrganization, schema:Corporation, or schema:ResearchOrganization.

This approach demonstrates how JSON-LD + Schema.org + SPARQL can provide a powerful way to:

  • Enrich data with formal semantics

  • Query both vocabulary and data in a unified RDF graph

  • Integrate data across schemas and domains

[3]:
# Step 1: Create an RDF Graph
g = rdflib.Graph()

# Step 2: Load Schema.org Vocabulary
g.parse("https://schema.org/version/latest/schemaorg-current-http.jsonld", format="json-ld")

# Step 3: Load JSON-LD Person Data
person_data_str = json.dumps(person_data)
g.parse(data=person_data_str, format="json-ld")

# Step 4: Run a SPARQL Query
sparql_query = """
PREFIX schema: <http://schema.org/>
SELECT DISTINCT ?type WHERE {
  ?type rdfs:subClassOf* schema:Organization .
}
LIMIT 20
"""

# Execute the SPARQL query
results = g.query(sparql_query)

# Print the results
for row in results:
    print(row)
(rdflib.term.URIRef('http://schema.org/Organization'),)
(rdflib.term.URIRef('http://schema.org/GovernmentOrganization'),)
(rdflib.term.URIRef('http://schema.org/Consortium'),)
(rdflib.term.URIRef('http://schema.org/PerformingGroup'),)
(rdflib.term.URIRef('http://schema.org/TheaterGroup'),)
(rdflib.term.URIRef('http://schema.org/MusicGroup'),)
(rdflib.term.URIRef('http://schema.org/DanceGroup'),)
(rdflib.term.URIRef('http://schema.org/OnlineBusiness'),)
(rdflib.term.URIRef('http://schema.org/OnlineStore'),)
(rdflib.term.URIRef('http://schema.org/LibrarySystem'),)
(rdflib.term.URIRef('http://schema.org/SearchRescueOrganization'),)
(rdflib.term.URIRef('http://schema.org/PoliticalParty'),)
(rdflib.term.URIRef('http://schema.org/Corporation'),)
(rdflib.term.URIRef('http://schema.org/Project'),)
(rdflib.term.URIRef('http://schema.org/FundingAgency'),)
(rdflib.term.URIRef('http://schema.org/ResearchProject'),)
(rdflib.term.URIRef('http://schema.org/NewsMediaOrganization'),)
(rdflib.term.URIRef('http://schema.org/MedicalOrganization'),)
(rdflib.term.URIRef('http://schema.org/Dentist'),)
(rdflib.term.URIRef('http://schema.org/MedicalClinic'),)

Querying Instances of schema:Organization#

In this example, we go one step further by querying for actual instances of schema:Organization (or any of its subclasses) present in the RDF graph.

What This SPARQL Query Does#

This SPARQL query performs two key operations:

  1. It uses:

    ?subclass rdfs:subClassOf* schema:Organization .
    

    to find all types that are subclasses of schema:Organization. The * means it includes both direct and indirect subclasses.

  2. It then finds:

    ?instance rdf:type ?subclass .
    

    all instances in the graph whose rdf:type is one of these subclasses — meaning they are some kind of organization.

Why This Matters#

This allows us to extract not just definitions (as in the previous example), but real data entries that correspond to organizations — such as companies, research institutes, or educational organizations — described in your JSON-LD.

Since our person_data includes an affiliation field that references a schema:ResearchOrganization, this query will match that and return it.

Output#

The output is a list of IRIs identifying each organization instance in the graph. This provides a powerful way to:

  • Discover all known organizations in your data

  • Use these IRIs for follow-up queries (e.g., get their name, address, or related persons)

  • Analyze structured relationships between people and institutions

This pattern is central to working with linked data: describing entities with types, and then querying them using semantic relationships.

[4]:
# Define and execute a SPARQL query for all instances of Organization
sparql_query = """
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX schema: <http://schema.org/>

SELECT ?instance WHERE {
    ?subclass rdfs:subClassOf* schema:Organization .
    ?instance rdf:type ?subclass .
}
LIMIT 10
"""

# Execute the SPARQL query
results = g.query(sparql_query)

# Print the results
for row in results:
    print(row[0])
https://ror.org/01f677e56

Querying Birth Dates of Persons#

In this example, we execute a SPARQL query to retrieve the birth dates of individuals in the graph who are typed as schema:Person.

What the Query Does#

This SPARQL query looks for:

  1. Individuals explicitly typed as schema:Person:

    ?subject rdf:type schema:Person .
    
  2. The associated birth date of each person using the schema:birthDate property:

    ?subject schema:birthDate ?bday .
    
  3. It selects and returns only the ?bday values, which represent literal dates.

  4. The query includes:

    LIMIT 10
    

    to restrict the results to the first 10 entries (useful for inspection or previewing large datasets).

Why This Matters#

This kind of query is useful when you want to extract attribute values from structured data. In this case, we’re retrieving dates of birth for people in the graph. These values can then be used for analytics, filtering, or even plotting demographics.

Assumptions#

  • It assumes that schema:birthDate is used directly with a literal (e.g., "1987-04-23").

  • If the birth date is represented as a nested object or typed node, additional handling would be required in the query.

Result#

The query prints a list of birth dates (as literals) for up to 10 individuals defined in your RDF graph.

[5]:
# Define and execute a SPARQL query for all instances of Organization
sparql_query = """
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX schema: <http://schema.org/>

SELECT ?bday WHERE {
    ?subject rdf:type schema:Person .
    ?subject schema:birthDate ?bday .
}
LIMIT 10
"""

# Execute the SPARQL query
results = g.query(sparql_query)

# Print the results
for row in results:
    print(row[0])
1987-04-23

Summary#

In this notebook, we explored how JSON-LD can transform regular JSON into semantically enriched, machine-readable data using well-defined vocabularies like schema.org.

Key Concepts Covered#

  • JSON-LD Basics: We structured a Person object with fields like @context, @type, and @id, connecting each field to a formal semantic definition.

  • JSON Schema Validation: We used jsonschema to ensure that our JSON-LD documents are syntactically valid before graph conversion.

  • RDF Graph Construction: Using rdflib, we converted JSON-LD data and schema.org into an RDF graph that supports reasoning and querying.

  • SPARQL Queries: We demonstrated several SPARQL queries to:

    • Retrieve all types derived from schema:Organization

    • Find all instances of those types

    • Count people with gender set to schema:Male

    • List birth dates of individuals

By combining JSON-LD, RDFLib, and SPARQL:

  • You can enrich your data with standardized semantics

  • Enable interoperability across systems and domains

  • Perform structured, meaningful queries over data

  • Integrate your metadata with larger knowledge graphs (e.g., Wikidata, Google Knowledge Graph)

This notebook serves as a practical introduction to semantic data modeling and querying — a foundational component of linked data applications and the Semantic Web.