API Reference

class nightingale.config.Output(directory)[source]

Bases: object

Parameters:

directory (Path)

directory: Path
class nightingale.config.Datasource(connection)[source]

Bases: object

Parameters:

connection (str)

connection: str
class nightingale.config.Publishing(publisher, base_uri, version='', publisher_uid='', publisher_scheme='', publisher_uri='', license='', publicationPolicy='')[source]

Bases: object

Parameters:
  • publisher (str)

  • base_uri (str)

  • version (str)

  • publisher_uid (str)

  • publisher_scheme (str)

  • publisher_uri (str)

  • license (str)

  • publicationPolicy (str)

publisher: str
base_uri: str
version: str = ''
publisher_uid: str = ''
publisher_scheme: str = ''
publisher_uri: str = ''
license: str = ''
publication_policy: str = ''
class nightingale.config.Mapping(file, ocid_prefix, selector, force_publish=False, codelists=None, milestone_lookup_sql=None, split_milestone_codes=False, codelist_passthrough_paths=())[source]

Bases: object

Parameters:
  • file (Path)

  • ocid_prefix (str)

  • selector (str)

  • force_publish (bool | None)

  • codelists (Path | None)

  • milestone_lookup_sql (str | None)

  • split_milestone_codes (bool)

  • codelist_passthrough_paths (tuple[str, ...])

file: Path
ocid_prefix: str
selector: str
force_publish: bool | None = False
codelists: Path | None = None
milestone_lookup_sql: str | None = None

SQL query to load milestone code metadata. Must return code, title and description columns. Used to enrich milestones with titles and descriptions, and to identify known codes for deduplication.

split_milestone_codes: bool = False

Whether to split space-separated milestone codes into individual milestone objects. When enabled, a value like "CA AT AU" in the milestone code field produces three separate milestones. Requires the source data to encode multiple codes in a single field; do not enable if milestone codes can legitimately contain spaces.

codelist_passthrough_paths: tuple[str, ...] = ()

OCDS paths at which to keep source values that aren’t in the codelist, instead of discarding them. Useful when values are derived via SQL logic (e.g. CASE expressions) and are absent from the codelist file.

class nightingale.config.Config(datasource, mapping, publishing, output)[source]

Bases: object

Parameters:
datasource: Datasource
mapping: Mapping
publishing: Publishing
output: Output
classmethod from_file(config_file)[source]
Parameters:

config_file (Path)

Return type:

Config

class nightingale.loader.DataLoader(config, connection=None)[source]

Bases: object

Load data from a database using a SQL query.

load(selector)[source]
get_cursor()[source]
get_connection()[source]
close()[source]
nightingale.mapper.LARGE_RELEASE_ROW_THRESHOLD = 500000

Log a progress message every this many rows, and report releases that meet or exceed this row count.

nightingale.mapper.SLOW_RELEASE_SECONDS = 18000

Report releases that take longer than this many seconds to process.

class nightingale.mapper.OCDSDataMapper(config, writer=None)[source]

Bases: object

Maps data from a source to the OCDS format.

Parameters:
  • config (Config) – Configuration object containing settings for the mapper.

  • writer (DataWriter | None)

produce_ocid(value)[source]

Produce an OCID based on the given value.

Parameters:

value (str) – The value to use for generating the OCID.

Returns:

The produced OCID.

Return type:

str

map(loader, *, validate_mapping=False)[source]

Map data from the loader to the OCDS format.

Parameters:
  • loader (Any) – Data loader object.

  • validate_mapping (bool)

Returns:

List of mapped release dictionaries.

Return type:

list[dict[str, Any]]

transform_data(data, mapping, codelists=None)[source]

Transform the input data to the OCDS format.

Parameters:
Returns:

List of transformed release dictionaries.

Return type:

list[dict[str, Any]]

finish_release(curr_ocid, curr_release, mapped, release_date)[source]
transform_row(input_data, mapping_config, flattened_schema, result=None, array_counters=None, codelists=None, curr_release_dates=None)[source]

Transform a single row of input data to the OCDS format.

Parameters:
  • input_data (dict[Any, Any]) – Dictionary of input data.

  • mapping_config (MappingTemplate) – Mapping configuration object.

  • flattened_schema (dict[str, Any]) – Flattened schema dictionary.

  • result (dict, optional) – Existing result dictionary to update.

  • array_counters (dict | None)

  • codelists (CodelistsMapping | None)

  • curr_release_dates (set[str] | None)

Returns:

Transformed row dictionary.

Return type:

dict

shift_current_array(current, array_path, array_counters)[source]
make_release_id(curr_row)[source]

Generate and set a unique ID for the release based on its content.

Parameters:

curr_row (dict) – The current release row dictionary.

Return type:

None

date_release(curr_row, curr_date)[source]

Set the release date to the current date and time.

Parameters:
  • curr_row (dict) – The current release row dictionary.

  • curr_date (str | None)

Return type:

None

tag_initiation_type(curr_row)[source]

Tag the initiation type of the release as ‘tender’ if applicable.

Parameters:

curr_row (dict) – The current release row dictionary.

Return type:

None

tag_ocid(curr_row, curr_ocid)[source]

Set the OCID for the release.

Parameters:
  • curr_row (dict) – The current release row dictionary.

  • curr_ocid (str) – The OCID value to set.

Return type:

None

generate_tags(release_data)[source]

Generate the release tag(s) based on the current release data, without considering prior releases.

Exclude ‘update’ tags, ‘cancellation’ tags and the ‘compiled’ tag.

Parameters:

release_data – The current release data (dict).

Returns:

A list of tags (list of str).

Return type:

None

remove_empty_id_arrays(data)[source]

Recursively remove arrays that do not contain an ‘id’ field.

Parameters:

data (dict[str, Any]) – The data dictionary to process.

Return type:

Any

map_codelist_value(keys, schema, codelists, value)[source]
nightingale.mapper.find_array_element_by_id(current, array_element_id)[source]

Find the first dictionary in a list that contains the given ‘id’ value.

If no dictionary with the matching ‘id’ is found, return the last dictionary in the list.

Parameters:
  • current – List[Dict], a list of dictionaries to search.

  • array_element_id – Any, the target ‘id’ value to search for.

Returns:

Dict, the dictionary with the matching ‘id’ value, or the last dictionary if not found.

Examples:
>>> dict_list = [{'id': 1, 'name': 'Alice'}, {'id': 2, 'name': 'Bob'}, {'id': 3, 'name': 'Charlie'}]
>>> find_array_element_by_id(dict_list, 2)
{'id': 2, 'name': 'Bob'}
>>> find_array_element_by_id(dict_list, 4)
{'id': 3, 'name': 'Charlie'}
>>> find_array_element_by_id(dict_list, 3)
{'id': 3, 'name': 'Charlie'}
>>> find_array_element_by_id([], 1) is None
True
class nightingale.mapping_template.v09.MappingTemplate(config)[source]

Bases: object

get_schema_sheet()[source]
normmalize_mapping_column(mappings)[source]

Normalize the mapping column by setting all space separators to one space.

read_mapping_sheet(sheet)[source]
read_mappings()[source]
get_element_by_mapping(for_mapping)[source]
read_data_elements_sheet(sheet)[source]
read_extenions_info()[source]
get_data_elements()[source]
read_schema_sheet()[source]
enforce_mapping_structure(mappings)[source]
get_mappings()[source]
get_mapping_for(path)[source]
get_paths_for_mapping(key, *, force_publish=False)[source]
is_array_path(path)[source]
get_arrays()[source]
get_schema()[source]
get_ocid_mapping()[source]
get_containing_array_path(path)[source]
get_datetime_fields()[source]

Return a list of paths that are marked as ‘date-time’ in the ‘values’ column in the schema.

class nightingale.codelists.CodelistsMapping(config)[source]

Bases: object

normmalize_mapping_column(mappings)[source]

Normalize the mapping column by setting all space separators to one space.

get_mapping_for_codelist(name)[source]
Parameters:

name (str)

load_codelists_mapping()[source]
read_codelists_sheet(sheet)[source]
class nightingale.publisher.DataPublisher(config, mapping)[source]

Bases: object

Packs array of releases into a release package.

Parameters:

config (Publishing)

produce_uri()[source]

Produce a URI for the package based on the given date.

Returns:

The produced URI.

Return type:

str

package(data)[source]

Package the given data into a release package.

Parameters:

data (list[dict[str, Any]]) – List of release dictionaries to be packaged.

Returns:

A dictionary representing the release package.

Return type:

dict[str, Any]

get_publisher()[source]
get_extensions()[source]
get_version()[source]
nightingale.writer.new_name(package)[source]

Generate a new name for the package based on its published date.

Parameters:

package (dict | list) – The release package dictionary.

Returns:

The generated package name.

Return type:

str

class nightingale.writer.DataWriter(config)[source]

Bases: object

Writes release package to disk.

Parameters:

config (Output)

make_dirs()[source]

Create the necessary directories for storing the release package.

Returns:

The base directory path.

Return type:

Path

get_output_path(package)[source]

Get the output path for the release package.

Parameters:

package (dict | list) – The release package dictionary.

Returns:

The path where the package will be written.

Return type:

Path

write(package)[source]

Write the release package to disk in a single operation.

Parameters:

package (dict | list) – The release package dictionary or list of releases.

Return type:

None

start_package_stream(package_metadata)[source]

Start a streaming write session, write package metadata and prepare for releases.

Parameters:

package_metadata (dict)

Return type:

None

stream_release(release)[source]

Write a single release to the open package file stream.

Parameters:

release (dict)

Return type:

None

end_package_stream()[source]

Finalize the streaming write session by closing the JSON array and file.

This method is safe to call even if the stream was not started or already closed.

Return type:

None

is_streaming()[source]

Check if the writer is currently in a streaming session.

Return type:

bool

nightingale.util.produce_package_name(date)[source]
Return type:

str

nightingale.util.remove_dicts_without_id(data)[source]
nightingale.util.get_iso_now()[source]
nightingale.util.is_new_array(array_counters, child_path, array_key, array_value, array_path)[source]

Check if a new array should be created based on the given parameters.

Parameters:
  • array_counters (dict) – Dictionary keeping track of array counters.

  • child_path (str) – The child path in the schema.

  • array_key (str) – The key in the array.

  • array_value (str) – The value associated with the array key.

  • array_path (str) – The path of the array.

Returns:

True if a new array should be created, False otherwise.

Return type:

bool

>>> array_counters = {'/object/field2/array_field': '1'}
>>> child_path = '/id'
>>> array_key = 'id'
>>> array_value = '2'
>>> array_path = '/object/field2/array_field'
>>> is_new_array(array_counters, child_path, array_key, array_value, array_path)
True
>>> array_counters = {'/object/field2/array_field': '1'}
>>> child_path = '/id'
>>> array_key = 'id'
>>> array_value = '1'
>>> array_path = '/object/field2/array_field'
>>> is_new_array(array_counters, child_path, array_key, array_value, array_path)
False
>>> array_counters = {'/object/field2/array_field': '1'}
>>> child_path = '/name'
>>> array_key = 'name'
>>> array_value = 'example'
>>> array_path = '/object/field2/array_field'
>>> is_new_array(array_counters, child_path, array_key, array_value, array_path)
False
nightingale.util.get_longest_array_path(arrays, path)[source]
nightingale.util.group_contiguous_mappings(mapping_list)[source]

Group mapping items by contiguous blocks: group consecutive items that share the same block.

Parameters:

mapping_list (list[dict])

Return type:

list[tuple[str, list[dict]]]

nightingale.util.sort_group_by_parent_and_id(group)[source]

Sort a contiguous group of mapping items so that ‘/id’ paths come first within each parent.

Split the group into subgroups that share the same parent (i.e. everything before the final ‘/’). Then, for each subgroup, sort so that any item whose path ends with ‘/id’ comes first. The sorted subgroups are then concatenated in the original order.

Parameters:

group (list[dict])

Return type:

list[dict]

exception nightingale.exceptions.NightingaleError[source]

Bases: Exception

Base class for exceptions from within this package.

exception nightingale.exceptions.StreamNotStartedError[source]

Bases: NightingaleError

Raised when a streaming method is called before start_package_stream().