Quick Start¶
See the usage, design, and development documentation for more details on the Data stage.
Imports¶
# Import SuPAErnova
import supaernova
# Import pathlib, used to define paths required by SNPAE
from pathlib import Path
# Import Pretty Printing, only used for demonstration
from pprint import pp
/home/docs/checkouts/readthedocs.org/user_builds/supaernova/envs/latest/lib/python3.12/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html from .autonotebook import tqdm as notebook_tqdm
Configuration¶
To run the Data stage you need to provide SNPAE with a DataStepConfig.
Warning: These configuration options are likely to change in the future.
To begin, we will create a dictionary which will store our Data configurations.
config = {}
config["data"] = {}
In addition to per-stage configuration, SNPAE also requires a number of global configurations which we will set here.
# Set logging verbosity
# If True, then debug messages will be written to STDOUT
# These messages are written to the log files regardless, so you usually don't need to enable this
verbose = False
# Force SNPAE to rerun everything, even when it could normally reuse old results
# Usually only needed if you have changed your config after running it, however here we set it to True for demonstration purposes.
force = True
# Sets the base path to which all other paths are relative
# If base_path is relative, it is assumed to be relative to CWD
# Here we set it to the `examples/` directory
base_path = Path.cwd().parent
# Determines where all output (logs, checkpoints, plots, etc...) will go
out_path = base_path / "outputs" / "data" / "quick_start"
We now have our global configuration, and an empty Data stage configuration. Before we start configuring the Data stage, why don't we try running SNPAE:
snpae = supaernova.prepare_config(
config,
verbose=verbose,
force=force,
base_path=base_path,
out_path=out_path
)
--------------------------------------------------------------------------- KeyError Traceback (most recent call last) Cell In[4], line 1 ----> 1 snpae = supaernova.prepare_config( 2 config, 3 verbose=verbose, 4 force=force, 5 base_path=base_path, 6 out_path=out_path 7 ) File ~/checkouts/readthedocs.org/user_builds/supaernova/checkouts/latest/src/supaernova/supaernova.py:59, in prepare_config(input_config, verbose, force, base_path, out_path, plots_path) 47 base_path = PathConfig.resolve_path( 48 base_path or user_config["paths"].get("base"), 49 default_path=Path.cwd(), 50 relative_path=Path.cwd(), 51 ) 52 out_path = PathConfig.resolve_path( 53 out_path or user_config["paths"].get("out"), 54 default_path=base_path / "output", 55 relative_path=base_path, 56 mkdir=True, 57 ) 58 plots_path = PathConfig.resolve_path( ---> 59 plots_path or user_config["paths"].get("plots"), 60 default_path=out_path / "plots", 61 relative_path=out_path, 62 mkdir=True, 63 ) 64 log_path = PathConfig.resolve_path( 65 user_config["paths"].get("logs") or out_path / "logs", 66 default_path=out_path / "logs", 67 relative_path=base_path, 68 mkdir=True, 69 ) 70 user_config["paths"] = PathConfig.from_config( 71 {}, 72 base_path=base_path, (...) 75 log_path=log_path, 76 ) KeyError: 'paths'
As you can see, SNPAE does its best to warn you when your provided configuration has any problems. Here it's warning us that we're missing a number of required keys in our DataStepConfig. Let's fill those in now.
config["data"] = {
"data_dir": base_path.parent / "data",
"meta": "meta.csv",
"idr": "IDR_eTmax.txt",
"mask": "mask_info_wmin_wmax.txt",
"colourlaw": "colourlaws/F99_colourlaw.txt"
}
With everything fully configured, let's try running SNPAE again.
snpae = supaernova.prepare_config(
config,
verbose=verbose,
force=force,
base_path=base_path,
out_path=out_path
)
print("snpae:")
pp(snpae.model_dump())
--------------------------------------------------------------------------- KeyError Traceback (most recent call last) Cell In[6], line 1 ----> 1 snpae = supaernova.prepare_config( 2 config, 3 verbose=verbose, 4 force=force, 5 base_path=base_path, 6 out_path=out_path 7 ) 8 print("snpae:") 9 pp(snpae.model_dump()) File ~/checkouts/readthedocs.org/user_builds/supaernova/checkouts/latest/src/supaernova/supaernova.py:59, in prepare_config(input_config, verbose, force, base_path, out_path, plots_path) 47 base_path = PathConfig.resolve_path( 48 base_path or user_config["paths"].get("base"), 49 default_path=Path.cwd(), 50 relative_path=Path.cwd(), 51 ) 52 out_path = PathConfig.resolve_path( 53 out_path or user_config["paths"].get("out"), 54 default_path=base_path / "output", 55 relative_path=base_path, 56 mkdir=True, 57 ) 58 plots_path = PathConfig.resolve_path( ---> 59 plots_path or user_config["paths"].get("plots"), 60 default_path=out_path / "plots", 61 relative_path=out_path, 62 mkdir=True, 63 ) 64 log_path = PathConfig.resolve_path( 65 user_config["paths"].get("logs") or out_path / "logs", 66 default_path=out_path / "logs", 67 relative_path=base_path, 68 mkdir=True, 69 ) 70 user_config["paths"] = PathConfig.from_config( 71 {}, 72 base_path=base_path, (...) 75 log_path=log_path, 76 ) KeyError: 'paths'
As you can see, the snpae object has a lot more than the few keys we configured. Logging has been set up, relevant paths have been created, and a {{DataStep}} object has been created, ready to be run.
Execution¶
Let's run our snpae object.
snpae.run()
print("snpae:")
pp(snpae.model_dump())
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In[7], line 1 ----> 1 snpae.run() 2 print("snpae:") 3 pp(snpae.model_dump()) NameError: name 'snpae' is not defined
As you can see, we now have a DataStep object stored in snpae.data_step. This is here the results of the run are stored, ready to be used by later stages. The most important attributres are snpae.data_step.data, which contains the complete dataset, and snpae.data_step.train_data and snpae.data_step.test_data which are the training and testing kfolds respectively.
print(f"data keys: {list(snpae.data_step.data.model_dump().keys())}")
print(f"Number of training kfolds: {len(snpae.data_step.train_data)}")
print(f"Number of SNe per training kfold: {[kfold.sn_name.shape[0] for kfold in snpae.data_step.train_data]}")
print(f"Number of testing kfolds: {len(snpae.data_step.test_data)}")
print(f"Number of SNe per testing kfold: {[kfold.sn_name.shape[0] for kfold in snpae.data_step.test_data]}")
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In[8], line 1 ----> 1 print(f"data keys: {list(snpae.data_step.data.model_dump().keys())}") 2 print(f"Number of training kfolds: {len(snpae.data_step.train_data)}") 3 print(f"Number of SNe per training kfold: {[kfold.sn_name.shape[0] for kfold in snpae.data_step.train_data]}") NameError: name 'snpae' is not defined