Concepts

Immutable vs. Mutable

Originally Configuration sought to be a drop-in replacement for dict, so that json.dumps() would just work. This goal has been given up on (as unmaintainable) with version 2.0. With the MutableMapping interface of dict no longer required and in order to add caching, it was decided that a mutable configuration was dangerous and immutability should be the default.

As such, Configuration and LazyLoadConfiguration were changed from MutableMapping to Mapping and loaded YAML sequences from changed from list to tuple/Sequence by default. Immutability makes them thread-safe, as well.

For compatibility, mutable configuration support was added explicitly, as MutableConfiguration and MutableLazyLoadConfiguration, both just adding MutableMapping. In mutable-mode, YAML sequences are loaded as list/MutableSequence and caching is disabled. Modifying a MutableConfiguration is not thread-safe. Documentation will reference Configuration or LazyLoadConfiguration, but all concepts apply to their mutable counterparts, unless noted in the Code Specification

You should highly consider using an immutable configuration in you code.


Lifecycle

  1. Import Time: LazyLoadConfiguration’s are defined (CONFIG = LazyLoadConfiguration(...)).

    • So long as the next step does not occur, all “identical immutable configurations”[1] are marked as using the same configuration cache.

      • Loading a configuration clears its marks from the cache, meaning if another identical immutable configuration is created, it will be loaded separately.

  2. First Fetch: Configuration is fetched for the first time (through CONFIG.value, CONFIG["value"], CONFIG.config, and such)

    1. Load Time:

      1. The file system is scanned for specified configuration files.

        • Paths are expanded (expanduser()) and resolved (resolve()) at Import Time, but checked for existence and read during Load Time.

      2. Each file that exists is read and loaded.

    2. Merge Time:

      1. Any Tags defined at the root of the file are run (i.e. the file beginning with a tag: !Parsefile ... or !Merge ...).

      2. The loaded Configuration instances are merged in-order into one Configuration.

        • Any files that do not define a Mapping are filtered out.

          • "str" is valid YAML, but not a Mapping.

          • Everything being filtered out results in an empty Configuration.

        • Mappings are merged recursively. Any non-mapping overrides. Newer values override older values. (See Merging for more)

          • {"a": "b": 1} + {"a: {"b": {"c": 1}}{"a: {"b": {"c": 1}}

          • {"a: {"b": {"c": 1}} + {"a: {"b": {"c": 2}}{"a: {"b": {"c": 2}}

          • {"a: {"b": {"c": 2}} + {"a: {"b": {"d": 3}}{"a: {"b": {"c": 2, "d": 3}}

          • {"a: {"b": {"c": 2, "d": 3}} + {"a": "b": 1}{"a": "b": 1}

    3. Build Time:

      1. The Base Path is applied.

      2. The Base Paths for any LazyLoadConfiguration that shared this identical immutable configuration are applied.

      3. LazyLoadConfiguration no longer holds a reference to the Root configuration (see Root for a more detailed definition).

        • If no tags depend on the Root, it will be freed.

          • !Ref is an example of a tag that holds a reference to the Root until it is run.

        • If an exception occurs, the Root is unavoidable caught in the frame.

  3. Fetching a Lazy Tag:

    1. Upon first get of the LazyEval object, the underlying function is called.

    2. The result replaces the LazyEval in the Configuration, so the LazyEval runs exactly once.


Making Copies

When making copies, it is important to note that LazyEval instance do not copy with either copy() or deepcopy() (they return themselves). This is to aid in running exactly once, prevent deep copies of Root leading to branches might never run their LazyEval instances, and unexpected memory use.

This means that a deepcopy() of a Configuration or MutableConfiguration instance can share state with the original, if any LazyEval is present, despite that breaking the definition of a deep copy.

Mitigation


Merging

Merging is the heart of this library. With it, you gain the ability to have settings defined in multiple possible locations and the ability to override settings based on a consistent pattern.

See Merge Equivalency for examples using merge.

Describing Priority

As a sentence

Mappings are merged, and everything else is replaced, with last-in winning.

As a table with code

From First-in.yaml
From Next-in.yaml

Outcome

Value
*

Next-in replaces First-in

Scalar
*

Next-in replaces First-in

Sequence
*

Next-in replaces First-in

Mapping
Value

Next-in replaces First-in

Mapping
Scalar

Next-in replaces First-in

Mapping
Sequence

Next-in replaces First-in

Mapping
Mapping

Next-in is merged into First-in

Code:

CONFIG = LazyLoadConfiguration("First-in.yaml", "Next-in.yaml")
CONFIG = merge("First-in.yaml", "Next-in.yaml")
CONFIG = LazyLoadConfiguration("merge.yaml")
# merge.yaml
!Merge
- !ParseFile First-in.yaml
- !ParseFile Next-in.yaml

As Explicit Examples

First-in
+
Next-in
Result
a:
  b: 1
+
a:
  b:
    c: 1
a:
  b:
    c: 1
a:
  b:
    c: 1
+
a:
  b:
    c: 2
a:
  b:
    c: 2
a:
  b:
    c: 2
+
a:
  b:
    d: 3
a:
  b:
    c: 2
    d: 3
a:
  b:
    c: 2
    d: 3
+
a:
  b: 1
a:
  b: 1

Merge Equivalency

The following options result is the same Configuration:

Case

Notes

CONFIG = LazyLoadConfiguration(
    "file1.yaml",
    "file2.yaml",
)

LazyLoadConfiguration

  • "file1.yaml" and "file2.yaml" are read during the “Load Time” of the “First Fetch”.

  • The merge occurs as a part of the “Merge Time” merge.

  • Best option.

CONFIG = LazyLoadConfiguration(
    "merged.yaml",
)
# merged.yaml
!Merge
- !OptionalParseFile file1.yaml
- !OptionalParseFile file2.yaml

!Merge

  • "file1.yaml" and "file2.yaml" are read during the “Load Time” of the “First Fetch”.

  • The merged occurs before the “Merge Time” merge.

    • The !Merge must be evaluated fully, in order to be merged into the final configuration.

  • This is less efficient than merging with LazyLoadConfiguration.

CONFIG = merge(
    "file1.yaml",
    "file2.yaml"
)

merge()

  • "file1.yaml" and "file2.yaml" are read immediately.

  • "file1.yaml" and "file2.yaml" are loaded as separate LazyLoadConfiguration with individual Load Boundaries.

  • This is far less efficient than merging with LazyLoadConfiguration

  • Exists for merging a framework configuration with a library-specific configuration.

    • The explicit case was for a pytest sub-plugin that was a part of a framework plugin.

    • Using merge() allows users to set settings in the framework configuration without requiring the framework configuration needing to know about the sub-plugin.


JSON Path/Pointer, !Ref, & Root

!Ref and !Sub have the concept of querying other sections of your configuration for values. This was added as a request to make for deployment configuration simpler.

Cases discussed included:

  • Using env_location_var_name from LazyLoadConfiguration, you would define environment-specific files. Then use the environment variable to select the associated file and a common config would pull strings from environment config to reduce copy-and-paste related problem.

    # config.yaml
    common_base_path:
      settings:
        setting1: !Sub ${$.common_base_path.lookup.environment.name} is cool
    
    # dev.yaml
    common_base_path:
      lookup:
        environment:
          name: dev
    
    # test.yaml
    common_base_path:
      lookup:
        environment:
          name: test
    
    # Getting the deployed setting
    LazyLoadConfiguration(
        "config.yaml",
        base_path="/common_base_path/settings",
        env_location_var_name="CONFIG_LOCATION"
    ).config.setting1
    
  • Using !Ref to select environment settings from a mapping of environment.

    # config.yaml
    common_base_path:
      all_setting:
        dev:
          setting1: dev is cool
        test:
          setting1: test is cooler
      settings: !Ref /common_base_path/all_setting/${ENVIRONMENT_NAME}
    
    # Getting the deployed setting
    LazyLoadConfiguration(
        "config.yaml",
        base_path="/common_base_path/settings"
    ).config.setting1
    

In order to not create a doubly-linked structure or lose base_path ability to dereference settings that are fenced out, it was decided to use root-orient syntax.

“Root” refers the configuration output after the Merge Time step, before base_path is applied. Within your configuration, you must explicitly include your base_path when querying.

JSON Path was selected as the syntax for being an open standard (and familiarity). JSON Pointer was added when python-jsonpath was selected as the JSON Path implementation, because it is ready supported. JSON Pointer is the more correct choice, as it can only be a reference.

About Types

If you explore the code or need to add a custom tag, Root and RootType represent Root as a type. LazyRoot is used during Build Time to allow delayed reference of Root until after it has been created.

About Memory

base_path will remove a reference count toward Root, but any Tag needing Root will hold a reference until evaluated. !Sub checks if it needs Root before holding a reference.

Load Boundary Limitations

A load boundary is created by Root. You cannot query outside the Root and every load event is an independent Root.

In more concrete terms, every LazyLoadConfiguration has an independent Root.

Where this matter is merging configuration. !ParseFile passes the Root to whatever it loads, so !Merge does not introduce Load Boundaries.

However, merge() does introduce Load Boundaries.

Working with an example

We have the following three files in ASSET_DIR / "ref_cannot_cross_loading_boundary/"

# 1.yaml
test:
  1: !Ref /ref
ref: I came from 1.yaml
# 2.yaml
test:
  2: !Ref /ref
ref: I came from 2.yaml
# 3.yaml
test:
  3: !Ref /ref
ref: I came from 3.yaml

With the following code:

files = (
    ASSET_DIR / "ref_cannot_cross_loading_boundary/1.yaml",
    ASSET_DIR / "ref_cannot_cross_loading_boundary/2.yaml",
    ASSET_DIR / "ref_cannot_cross_loading_boundary/3.yaml",
)

# Merging three separate `LazyLoadConfiguration` instances
config = merge(files)

assert config.as_dict() == {
    "test": {
        1: "I came from 1.yaml",
        2: "I came from 2.yaml",
        3: "I came from 3.yaml",
    },
    "ref": "I came from 3.yaml",
}

# One `LazyLoadConfiguration` merging three files
config = LazyLoadConfiguration(*files).config

assert config.as_dict() == {
    "test": {
        1: "I came from 3.yaml",
        2: "I came from 3.yaml",
        3: "I came from 3.yaml",
    },
    "ref": "I came from 3.yaml",
}

In the merge() case, merging works as expected. However, the three !Ref /ref ended up referencing three different Roots, which is unexpected when using !Ref.

In the LazyLoadConfiguration case, the three !Ref /ref reference the same Root, as is generally desired and expected of !Ref.

For completeness’ sake, merging with !Merge has the same result as the LazyLoadConfiguration case.

# ref_cannot_cross_loading_boundary.yaml
!Merge
- !ParseFile ref_cannot_cross_loading_boundary/1.yaml
- !ParseFile ref_cannot_cross_loading_boundary/2.yaml
- !ParseFile ref_cannot_cross_loading_boundary/3.yaml

Loading Loops

Because !ParseFile, !OptionalParseFile, and !ParseEnv load data from an external source (i.e. files and environment variables), they introduce the risk of circularly loading these sources.

Note

!ParseEnvSafe does not include support for tags, so it does not have this risk, as it can only ever be an end to the chain.

In order to prevent looping, each load of a file or environment is tracked per chain, and a ParsingTriedToCreateALoop exception is thrown just before a previously loaded (in chain) source tries to load.

This does not prevent the same source load being loaded more than once if it is multiple chains.

Example of Multiple Chains

Environment:

VAR=!ParseFile 2.yaml

Configuration:

# 1.yaml
chain1: !ParseEnv VAR
chain2: !ParseEnv VAR
# 2.yaml
key: value

Code:

CONFIG = LoadLazyConfiguration("1.yaml")

assert CONFIG.chain1.key == "value"  # 1.yaml→#VAR→2.yaml
assert CONFIG.chain2.key == "value"  # 1.yaml→#VAR→2.yaml

Sources $VAR and 2.yaml are loaded twice. Once for CONFIG.chain1 and once for CONFIG.chain2.

(Note: Using !Ref chain1 for chain2 would have prevented the second load)

Looping Example with Environment Variables

The following is an example of a catastrophic loop, using !ParseEnv

Environment:

VAR1=!ParseEnv VAR2
VAR2=!ParseEnv VAR3
VAR3=!ParseEnv VAR1

Configuration:

# config.yaml
setting1: !ParseEnv VAR1

Code:

CONFIG = LoadLazyConfiguration("config.yaml")

CONFIG.setting1 # Would cause an infinite loop without detection.
                # Note: This is not recursion, because a new LazyEval
                #       instance is created every load.
                #       You would be waiting to run out of memory or stack.

Looping Example with Files

The following is an example of a loop, using !ParseFile:

Configuration:

# 1.yaml
safe: 1.yaml
next: !ParseFile 2.yaml
# 2.yaml
safe: 2.yaml
next: !ParseFile 3.yaml
# 3.yaml
safe: 3.yaml
next: !ParseFile 1.yaml

Code:

CONFIG = LoadLazyConfiguration("1.yaml")

CONFIG.safe           # "1.yaml"
CONFIG.next.safe      # "2.yaml"
CONFIG.next.next.safe # "3.yaml"
CONFIG.next.next.next # Would load `1.yaml` again without detection.

# Without detection, `.next` could be appended endlessly
CONFIG.next.next.next                # 1.yaml→2.yaml→3.yaml→1.yaml
CONFIG.next.next.next.next           # 1.yaml→2.yaml→3.yaml→1.yaml→2.yaml
CONFIG.next.next.next.next.next      # 1.yaml→2.yaml→3.yaml→1.yaml→2.yaml→3.yaml
CONFIG.next.next.next.next.next.next # 1.yaml→2.yaml→3.yaml→1.yaml→2.yaml→3.yaml→1.yaml