Skip to content
Josef Hardi edited this page Jul 18, 2018 · 7 revisions

CAML is a recursive acronym for "CAML is Another Mapping Language". It is a native mapping language used by the schemaorg-pipeline library to write a data map between schema.org terms and the source data.

A data map in CAML is composed of one or more mapping definitions. A mapping definition is written as a key-value pair separated by a colon and at least one space, where the key is the schema.org keyword and the value is either a data path, a data object or a constant value.

Basic Expressions

Data Path

A data path represents the physical data location situated at the source. The notation always starts with a slash / followed by a node name. The slash character is also used as a delimiter to separate multiple node names used in the path. For example:

name:         /Dataset/Title
description:  /Dataset/Description
keyword:      /Dataset/Keywords/Keyword

(Note again the left-hand side is for the vocabulary in schema.org and the right-hand side is the data path)

Data Object

A data object is a group of mapping definitions at the same indentation level. For schema.org-compatibility, every data object must have a type definition, indicated by the keyword @type and followed by the corresponding schema.org type name.

distribution:   /Dataset/Distributions/Distribution
   @type:       'DataDownload'
   contentUrl:  /AccessUrl
   fileFormat:  /Format
   publisher:   /Source

(Line 2-5 is the data object with a type of DataDownload)

When a data object is nested within another data object, it must first define the root data path before attaching the data object. Consequently, all data paths inside the nested data object will have the same root path. For example, using the same example above, the DataDownload object has a root path /Dataset/Distributions/Distribution and all the succeeding data paths use the same root path for getting the content URL, file format, and publisher.

Constant Value

A constant value is any other text that is enclosed by single quotation marks. A backslash should be used as an escape character in the text. For example:

@type:       'Dataset'
inLanguage:  'EN'

Array

An array can be constructed by creating a multiple mapping definitions but with the same key label. For example:

identifier:  /Dataset/Identifier
identifier:  /Dataset/SecondaryIdentifier
identifier:  /Dataset/Others/PMID

Extended Expressions

Pair

A pair is a double-constant value enclosed by round brackets. It is useful to assign two strings in a single mapping definition. For example:

@prefix:  ('schema', 'http://schema.org/')
@prefix:  ('rdf', 'http://www.w3.org/1999/02/22-rdf-syntax-ns#')
@prefix:  ('rdfs', 'http://www.w3.org/2000/01/rdf-schema#')

(For now, the expression is used exclusively by the @prefix keyword)

Concat

Concat is a function to concatenate one value from a data source with some strings. Example usages:

=concat(/dataset/identifier, '-ID')
=concat('ID-', /dataset/identifier)
=concat('http://identifier.org/mesh/', /dataset/identifier, '-ID')

Assuming the /dataset/identifier contains a value "12345", then the function will give the outputs, as follows:

"12345-ID"
"ID-12345"
"http://identifier.org/mesh/12345-ID"

Reserved Keywords

  • @id: (optional) to indicate the instance unique identifier. If present, the value is used for filtering in the data extraction step.
  • @type: (mandatory) to indicate the instance's schema.org type.
  • @prefix: (optional) to specify the prefix definition used by the source data.

Data Map Example

The example below shows an example of a data map used to generate schema.org markup data from XML documents in ClinicalTrials.gov website.

@type:                       'MedicalTrial'
name:                        /clinical_study/official_title
alternateName:               /clinical_study/brief_title
alternateName:               /clinical_study/acronym
identifier:                  /clinical_study/id_info/org_study_id
identifier:                  /clinical_study/id_info/nct_id
identifier:                  /clinical_study/id_info/secondary_id
status:                      /clinical_study/overall_status
description:                 /clinical_study/detailed_description/textblock
studySubject:                /clinical_study/condition
phase:                       /clinical_study/phase
code:                        /clinical_study/condition_browse
    @type:                   'MedicalCode'
    codeValue:               /mesh_term
    codingSystem:            'MeSH'
sponsor:                     /clinical_study/sponsors/lead_sponsor
    @type:                   'Organization'
    name:                    /agency
    additionalType:          'Lead Sponsor'
sponsor:                     /clinical_study/sponsors/collaborator
    @type:                   'Organization'
    name:                    /agency
    additionalType:          'Collaborator'
studyLocation:               /clinical_study/location/facility
    @type:                   'AdministrativeArea'
    name:                    /name
    additionalType:          'Facility'
    address:                 /address
        @type:               'PostalAddress'
        addressLocality:     /city
        addressRegion:       /state
        postalCode:          /zip
        addressCountry:      /country

Please visit the playground (Try Example > Example CAML: Annotate ClinicalTrials.gov XML document) to see the full-length map and a live demo of evaluating this mapping.