Skip to content

excelparser

Module from parsing an excelfile and creating an ontology from it.

The excelfile is read by pandas and the pandas dataframe should have column names: prefLabel, altLabel, Elucidation, Comments, Examples, subClassOf, Relations.

Note that correct case is mandatory.

ExcelError

Bases: EMMOntoPyException

Raised on errors in Excel file.

Source code in ontopy/excelparser.py
25
26
class ExcelError(EMMOntoPyException):
    """Raised on errors in Excel file."""

create_ontology_from_excel(excelpath, concept_sheet_name='Concepts', metadata_sheet_name='Metadata', imports_sheet_name='ImportedOntologies', base_iri='http://emmo.info/emmo/domain/onto#', base_iri_from_metadata=True, imports=None, catalog=None, force=False)

Creates an ontology from an Excel-file.

Parameters:

Name Type Description Default
excelpath str

Path to Excel workbook.

required
concept_sheet_name str

Name of sheet where concepts are defined. The second row of this sheet should contain column names that are supported. Currently these are 'prefLabel','altLabel', 'Elucidation', 'Comments', 'Examples', 'subClassOf', 'Relations'. Multiple entries are separated with ';'.

'Concepts'
metadata_sheet_name str

Name of sheet where metadata are defined. The first row contains column names 'Metadata name' and 'Value' Supported 'Metadata names' are: 'Ontology IRI', 'Ontology vesion IRI', 'Ontology version Info', 'Title', 'Abstract', 'License', 'Comment', 'Author', 'Contributor'. Multiple entries are separated with a semi-colon (;).

'Metadata'
imports_sheet_name str

Name of sheet where imported ontologies are defined. Column name is 'Imported ontologies'. Fully resolvable URL or path to imported ontologies provided one per row.

'ImportedOntologies'
base_iri str

Base IRI of the new ontology.

'http://emmo.info/emmo/domain/onto#'
base_iri_from_metadata bool

Whether to use base IRI defined from metadata.

True
imports list

List of imported ontologies.

None
catalog dict

Imported ontologies with (name, full path) key/value-pairs.

None
force bool

Forcibly make an ontology by skipping concepts that are erroneously defined or other errors in the excel sheet.

False

Returns:

Type Description
Tuple[ontopy.ontology.Ontology, dict, dict]

A tuple with the:

  • created ontology
  • associated catalog of ontology names and resolvable path as dict
  • a dictionary with lists of concepts that raise errors, with the following keys:

    • "already_defined": These are concepts that are already in the ontology, either because they were already added in a previous line of the excelfile/pandas dataframe, or because it is already defined in the imported ontologies.
    • "in_imported_ontologies": Concepts that are defined in the excel, but already exist in the imported ontologies. This is a subset of the 'already_defined'.
    • "wrongly_defined": Concepts that are given an invalid prefLabel (e.g. with a space in the name).
    • "missing_parents": Concepts that are missing parents. These concepts are added directly under owl:Thing.
    • "invalid_parents": Concepts with invalidly defined parents. These concepts are added directly under owl:Thing.
    • "nonadded_concepts": List of all concepts that are not added, either because the prefLabel is invalid, or because the concept has already been added once or already exists in an imported ontology.
Source code in ontopy/excelparser.py
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
def create_ontology_from_excel(  # pylint: disable=too-many-arguments
    excelpath: str,
    concept_sheet_name: str = "Concepts",
    metadata_sheet_name: str = "Metadata",
    imports_sheet_name: str = "ImportedOntologies",
    base_iri: str = "http://emmo.info/emmo/domain/onto#",
    base_iri_from_metadata: bool = True,
    imports: list = None,
    catalog: dict = None,
    force: bool = False,
) -> Tuple[ontopy.ontology.Ontology, dict, dict]:
    """
    Creates an ontology from an Excel-file.

    Arguments:
        excelpath: Path to Excel workbook.
        concept_sheet_name: Name of sheet where concepts are defined.
            The second row of this sheet should contain column names that are
            supported. Currently these are 'prefLabel','altLabel',
            'Elucidation', 'Comments', 'Examples', 'subClassOf', 'Relations'.
            Multiple entries are separated with ';'.
        metadata_sheet_name: Name of sheet where metadata are defined.
            The first row contains column names 'Metadata name' and 'Value'
            Supported 'Metadata names' are: 'Ontology IRI',
            'Ontology vesion IRI', 'Ontology version Info', 'Title',
            'Abstract', 'License', 'Comment', 'Author', 'Contributor'.
            Multiple entries are separated with a semi-colon (`;`).
        imports_sheet_name: Name of sheet where imported ontologies are
            defined.
            Column name is 'Imported ontologies'.
            Fully resolvable URL or path to imported ontologies provided one
            per row.
        base_iri: Base IRI of the new ontology.
        base_iri_from_metadata: Whether to use base IRI defined from metadata.
        imports: List of imported ontologies.
        catalog: Imported ontologies with (name, full path) key/value-pairs.
        force: Forcibly make an ontology by skipping concepts
            that are erroneously defined or other errors in the excel sheet.

    Returns:
        A tuple with the:

            * created ontology
            * associated catalog of ontology names and resolvable path as dict
            * a dictionary with lists of concepts that raise errors, with the
              following keys:

                - "already_defined": These are concepts that are already in the
                    ontology, either because they were already added in a
                    previous line of the excelfile/pandas dataframe, or because
                    it is already defined in the imported ontologies.
                - "in_imported_ontologies": Concepts that are defined in the
                    excel, but already exist in the imported ontologies.
                    This is a subset of the 'already_defined'.
                - "wrongly_defined": Concepts that are given an invalid
                    prefLabel (e.g. with a space in the name).
                - "missing_parents": Concepts that are missing parents.
                    These concepts are added directly under owl:Thing.
                - "invalid_parents": Concepts with invalidly defined parents.
                    These concepts are added directly under owl:Thing.
                - "nonadded_concepts": List of all concepts that are not added,
                    either because the prefLabel is invalid, or because the
                    concept has already been added once or already exists in an
                    imported ontology.

    """
    # Get imported ontologies from optional "Imports" sheet
    if not imports:
        imports = []
    try:
        imports_frame = pd.read_excel(
            excelpath, sheet_name=imports_sheet_name, skiprows=[1]
        )
    except ValueError:
        pass
    else:
        # Strip leading and trailing white spaces in path
        imports.extend(
            imports_frame["Imported ontologies"].str.strip().to_list()
        )

    # Read datafile TODO: Some magic to identify the header row
    conceptdata = pd.read_excel(
        excelpath, sheet_name=concept_sheet_name, skiprows=[0, 2]
    )
    metadata = pd.read_excel(excelpath, sheet_name=metadata_sheet_name)
    return create_ontology_from_pandas(
        data=conceptdata,
        metadata=metadata,
        imports=imports,
        base_iri=base_iri,
        base_iri_from_metadata=base_iri_from_metadata,
        catalog=catalog,
        force=force,
    )

create_ontology_from_pandas(data, metadata, imports, base_iri='http://emmo.info/emmo/domain/onto#', base_iri_from_metadata=True, catalog=None, force=False)

Create an ontology from a pandas DataFrame.

Check 'create_ontology_from_excel' for complete documentation.

Source code in ontopy/excelparser.py
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
def create_ontology_from_pandas(  # pylint:disable=too-many-locals,too-many-branches,too-many-statements,too-many-arguments
    data: pd.DataFrame,
    metadata: pd.DataFrame,
    imports: list,
    base_iri: str = "http://emmo.info/emmo/domain/onto#",
    base_iri_from_metadata: bool = True,
    catalog: dict = None,
    force: bool = False,
) -> Tuple[ontopy.ontology.Ontology, dict]:
    """
    Create an ontology from a pandas DataFrame.

    Check 'create_ontology_from_excel' for complete documentation.
    """

    # Remove lines with empty prefLabel
    data = data[data["prefLabel"].notna()]
    # Convert all data to string, remove spaces, and finally remove
    # additional rows with empty prefLabel.
    data = data.astype(str)
    data["prefLabel"] = data["prefLabel"].str.strip()
    data = data[data["prefLabel"].str.len() > 0]
    data.reset_index(drop=True, inplace=True)

    # Make new ontology
    onto, catalog = get_metadata_from_dataframe(
        metadata, base_iri, imports=imports
    )
    # Get a set of imported concepts
    imported_concepts = {
        concept.prefLabel.first() for concept in onto.get_entities()
    }

    # Set given or default base_iri if base_iri_from_metadata is False.
    if not base_iri_from_metadata:
        onto.base_iri = base_iri

    labels = set(data["prefLabel"])
    for altlabel in data["altLabel"].str.strip():
        if not altlabel == "nan":
            labels.update(altlabel.split(";"))

    # Dictionary with lists of concepts that raise errors
    concepts_with_errors = {
        "already_defined": [],
        "in_imported_ontologies": [],
        "wrongly_defined": [],
        "missing_parents": [],
        "invalid_parents": [],
        "nonadded_concepts": [],
        "errors_in_properties": [],
    }

    onto.sync_python_names()
    with onto:
        remaining_rows = set(range(len(data)))
        all_added_rows = []
        while remaining_rows:
            added_rows = set()
            for index in remaining_rows:
                row = data.loc[index]
                name = row["prefLabel"]
                try:
                    onto.get_by_label(name)
                    if not force:
                        raise ExcelError(
                            f'Concept "{name}" already in ontology'
                        )
                    warnings.warn(
                        f'Ignoring concept "{name}" since it is already in '
                        "the ontology."
                    )
                    concepts_with_errors["already_defined"].append(name)
                    # What to do if we want to add info to this concept?
                    # Should that be not allowed?
                    # If it should be allowed the index has to be added to
                    # added_rows
                    continue
                except (ValueError, TypeError) as err:
                    warnings.warn(
                        f'Ignoring concept "{name}". '
                        f'The following error was raised: "{err}"'
                    )
                    concepts_with_errors["wrongly_defined"].append(name)
                    continue
                except NoSuchLabelError:
                    pass

                if row["subClassOf"] == "nan":
                    if not force:
                        raise ExcelError(f"{row[0]} has no subClassOf")
                    parent_names = []  # Should be "owl:Thing"
                    concepts_with_errors["missing_parents"].append(name)
                else:
                    parent_names = str(row["subClassOf"]).split(";")

                parents = []
                invalid_parent = False
                for parent_name in parent_names:
                    try:
                        parent = onto.get_by_label(parent_name.strip())
                    except (NoSuchLabelError, ValueError) as exc:
                        if parent_name not in labels:
                            if force:
                                warnings.warn(
                                    f'Invalid parents for "{name}": '
                                    f'"{parent_name}".'
                                )
                                concepts_with_errors["invalid_parents"].append(
                                    name
                                )
                                break
                            raise ExcelError(
                                f'Invalid parents for "{name}": {exc}\n'
                                "Have you forgotten an imported ontology?"
                            ) from exc
                        invalid_parent = True
                        break
                    else:
                        parents.append(parent)

                if invalid_parent:
                    continue

                if not parents:
                    parents = [owlready2.Thing]

                concept = onto.new_entity(name, parents)
                added_rows.add(index)
                # Add elucidation
                try:
                    _add_literal(
                        row,
                        concept.elucidation,
                        "Elucidation",
                        only_one=True,
                    )
                except AttributeError as err:
                    if force:
                        _add_literal(
                            row,
                            concept.comment,
                            "Elucidation",
                            only_one=True,
                        )
                        warnings.warn("Elucidation added as comment.")
                    else:
                        raise ExcelError(
                            f"Not able to add elucidations. {err}."
                        ) from err

                # Add examples
                try:
                    _add_literal(
                        row, concept.example, "Examples", expected=False
                    )
                except AttributeError:
                    if force:
                        warnings.warn(
                            "Not able to add examples. "
                            "Did you forget to import an ontology?."
                        )

                # Add comments
                _add_literal(row, concept.comment, "Comments", expected=False)

                # Add altLabels
                try:
                    _add_literal(
                        row, concept.altLabel, "altLabel", expected=False
                    )
                except AttributeError as err:
                    if force is True:
                        _add_literal(
                            row,
                            concept.label,
                            "altLabel",
                            expected=False,
                        )
                        warnings.warn("altLabel added as rdfs.label.")
                    else:
                        raise ExcelError(
                            f"Not able to add altLabels. " f"{err}."
                        ) from err

            remaining_rows.difference_update(added_rows)

            # Detect infinite loop...
            if not added_rows and remaining_rows:
                unadded = [data.loc[i].prefLabel for i in remaining_rows]
                if force is True:
                    warnings.warn(
                        f"Not able to add the following concepts: {unadded}."
                        " Will continue without these."
                    )
                    remaining_rows = False
                    concepts_with_errors["nonadded_concepts"] = unadded
                else:
                    raise ExcelError(
                        f"Not able to add the following concepts: {unadded}."
                    )
            all_added_rows.extend(added_rows)

    # Add properties in a second loop

    for index in all_added_rows:
        row = data.loc[index]
        properties = row["Relations"]
        if properties == "nan":
            properties = None
        if isinstance(properties, str):
            try:
                concept = onto.get_by_label(row["prefLabel"].strip())
            except NoSuchLabelError:
                pass
            props = properties.split(";")
            for prop in props:
                try:
                    concept.is_a.append(evaluate(onto, prop.strip()))
                except pyparsing.ParseException as exc:
                    warnings.warn(
                        f"Error in Property assignment for: '{concept}'. "
                        f"Property to be Evaluated: '{prop}'. "
                        f"{exc}"
                    )
                    concepts_with_errors["errors_in_properties"].append(name)
                except NoSuchLabelError as exc:
                    msg = (
                        f"Error in Property assignment for: {concept}. "
                        f"Property to be Evaluated: {prop}. "
                        f"{exc}"
                    )
                    if force is True:
                        warnings.warn(msg)
                        concepts_with_errors["errors_in_properties"].append(
                            name
                        )
                    else:
                        raise ExcelError(msg) from exc

    # Synchronise Python attributes to ontology
    onto.sync_attributes(
        name_policy="uuid", name_prefix="EMMO_", class_docstring="elucidation"
    )
    onto.dir_label = False
    concepts_with_errors = {
        key: set(value) for key, value in concepts_with_errors.items()
    }
    concepts_with_errors["in_imported_ontologies"] = concepts_with_errors[
        "already_defined"
    ].intersection(imported_concepts)
    return onto, catalog, concepts_with_errors

english(string)

Returns string as an English location string.

Source code in ontopy/excelparser.py
29
30
31
def english(string):
    """Returns `string` as an English location string."""
    return owlready2.locstr(string, lang="en")

get_metadata_from_dataframe(metadata, base_iri, base_iri_from_metadata=True, imports=(), catalog=None)

Create ontology with metadata from pd.DataFrame

Source code in ontopy/excelparser.py
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
def get_metadata_from_dataframe(  # pylint: disable=too-many-locals,too-many-branches,too-many-statements
    metadata: pd.DataFrame,
    base_iri: str,
    base_iri_from_metadata: bool = True,
    imports: Sequence = (),
    catalog: dict = None,
) -> Tuple[ontopy.ontology.Ontology, dict]:
    """Create ontology with metadata from pd.DataFrame"""

    # base_iri from metadata if it exists and base_iri_from_metadata
    if base_iri_from_metadata:
        try:
            base_iris = _parse_literal(metadata, "Ontology IRI", metadata=True)
            if len(base_iris) > 1:
                warnings.warn(
                    "More than one Ontology IRI given. The first was chosen."
                )
            base_iri = base_iris[0] + "#"
        except (TypeError, ValueError, AttributeError, IndexError):
            pass

    # Create new ontology
    onto = get_ontology(base_iri)

    # Add imported ontologies
    catalog = {} if catalog is None else catalog
    locations = set()
    for location in imports:
        if not pd.isna(location) and location not in locations:
            imported = onto.world.get_ontology(location).load()
            onto.imported_ontologies.append(imported)
            catalog[imported.base_iri.rstrip("#/")] = location
            locations.add(location)

    with onto:
        # Add title
        try:
            _add_literal(
                metadata,
                onto.metadata.title,
                "Title",
                metadata=True,
                only_one=True,
            )
        except AttributeError:
            pass

        # Add license
        try:
            _add_literal(
                metadata, onto.metadata.license, "License", metadata=True
            )
        except AttributeError:
            pass

        # Add authors/creators
        try:
            _add_literal(
                metadata, onto.metadata.creator, "Author", metadata=True
            )
        except AttributeError:
            pass

        # Add contributors
        try:
            _add_literal(
                metadata,
                onto.metadata.contributor,
                "Contributor",
                metadata=True,
            )
        except AttributeError:
            pass

        # Add versionInfo
        try:
            _add_literal(
                metadata,
                onto.metadata.versionInfo,
                "Ontology version Info",
                metadata=True,
                only_one=True,
            )
        except AttributeError:
            pass

    return onto, catalog