A curated list of open technology projects to sustain a stable climate, energy supply, biodiversity and natural resources.

Recent Releases of GEOPM

GEOPM - Version 3.1.0

Fri May 17 2024 Christopher M Cantalupo [email protected] v3.1.0

  • Official v3.1.0 release tag
  • ABI bump moving so-version from 2.0.0 -> 2.1.0 with backward compatibility for release v3.0
  • Support for building on non-x86 CPU architectures
  • Support for CPU frequency metrics and controls through standard Linux cpufreq sysfs interfaces
  • Support GPU features through standard Linux DRM sysfs interfaces
  • Support for LevelZero RAS signals #3155
  • Update packaging to comply with standards
  • Support for Rocky Linux packaging
  • Implement versioning solution for python packages that works with python v3.6 - v3.11
  • Setting the GEOPM_PROGRAM_FILTER environment variable is now a requirement for libgeopm to register a process for profiling
  • Clarify copyright documentation
  • Improve and publish OpenSSF scorecard
  • Documentation and web page improvements
  • Release distinct packages for documentation
  • Improved error messaging
  • Update IOGroup and Agent tutorial
  • Remove dated runtime tutorial
  • Reorganize source code repository directory structure
  • Improve github CI automation
  • Run Coverity static analysis as part of CI workflow
  • Add package.sh script to build all of the repository packages
  • Remove all use of autotools in python build and packaging
  • Update integration tests to run on a wider range of systems
  • Allow push_signal/control() for previous requests after read/write_batch()
  • Use libz crc32 implementation to replace direct call to intrinsic
  • Add performance test for the GEOPM Service
  • Add upstream openmp.m4 macro from fossies
  • Fix issues by deleting topology cache file when geopmd starts up
  • Fix issues with installed headers: removing unwanted dependencies and specifying public symbol visibility
  • Fix issue when --disable-systemd configure option is provided #3289
  • Fix issue with SaveControl class in cases where controls are pruned from support at runtime
  • Fix issues running geopmctl as root #3352 (not regression from 3.0.1)
  • Fix SysfsIOGroup batch write issue #3388 (not regression from 3.0.1)
  • Fix static analysis issues (not regression from 3.0.1)

Consumption - Computation and Communication - C++
Published by cmcantalupo 11 months ago

GEOPM - Version 3.0.1

  • Hotfix for v3.0.0 release.
  • Fix missing systemd dependency on the msr-safe systemd service. This bug could cause MSRs to be unavailable from the GEOPM Service if load order is incorrect.
  • Fix systemd unit definition to maintain same model for GPUs/chip topology when linked against versions of libze_loader.so where "COMPOSITE" is not the default.
  • Fix security issue where UID 0 was being used to indicate privilege, switched to using libcap for capabilities checks instead.
  • Fix bug in startup that was causing long delays when initializing batch interface of PlatformIO
  • Fix potential lock when creating PlatformTopo object as user with CAP_SYS_ADMIN.
  • Fix several build and packaging issues that could cause problems when dependency packages are not installed to standard locations.
  • Fix "make coverage" build target dependency
  • Fix issue with sphinx documentation generation
  • Fix regression in support for client Intel platforms.
  • Fix install failures on some SLES systems by modifying helper install script to prefer the zypper command to the rpm command.
  • Add documentation for non-MPI application integration test for GEOPM Runtime.

Consumption - Computation and Communication - C++
Published by cmcantalupo over 1 year ago

GEOPM - Version 3.0.0

  • Official v3.0.0 release tag.
  • GEOPM Runtime support for non-MPI applications.
  • Integration with OpenPBS through plugins and launcher support.
  • Security improvements and bug fixes.
  • Additional GEOPM Service DBus APIs to support application profiling.
  • Communication between controller and application is managed by GEOPM Service.
  • Creation of topo-cache and responsibility for determining system topology is managed by GEOPM Service.
  • Update C++ standard requirement to C++17.
  • Add more signals and controls including GPU and platform features.
  • ConstConfigIOGroup uses JSON file to define constant settings/configurations as signals.
  • Increase the sample period of the monitor agent from 5 ms to 200 ms to reduce default CPU requirements of runtime.
  • Add Sapphire Rapids server (SPR) as a supported platform.
  • Removal of libgeopmpolicy.so, use libgeopm.so instead.
  • Removal of geopmdpy.runtime module: no support for python based agents.
  • GEOPM_PERIOD / --geopm-period sets the sample period for controller in units of seconds.
  • GEOPM_INIT_CONTROL / --geopm-init-control to write a batch of controls at application startup.
  • GEOPM_CTL_LOCAL / --geopm-ctl-local disable controller's use of MPI.
  • GEOPM_PROGRAM_FILTER / --geopm-program-filter to select processes for profiling.
  • GEOPM_NUM_PROC sets number of processes per node for controller process to track.
  • geopmlaunch support for PALS.
  • geopmlaunch --geopm-preload option required for ld preloading libgeopm.so, not on by default.
  • Default for --geopm-ctl is now "application".
  • geopmlaunch does not control CPU affinity application by default (--geopm-affinity-enable now required).
  • Debian / Ubuntu packaging support.
  • Renamed runtime packages for all distros.
  • Improvements for NVML and LevelZero support for GPUs.
  • Documentation improvements including "Quick Start Guide"
  • Improved error and warning messages.
  • ABI so-version for libgeopm and libgeopmd increased to 2.0.0.
  • Added --direct option for geopmaccess.
  • Add GPU-CA agent for beta testing.
  • Add FFNet agent for beta testing.
  • Add CPU-CA agent for beta testing.
  • FrequencyMapAgent can now control GPU frequency.
  • Configuration and plugin directories for GEOPM renamed and combined.
  • Add PBS integration for power capping clusters.
  • Fuzz test integration and support for sanitizer builds.
  • The environment of controller determines output file paths, not the application environment.
  • Support for liburing for batching kernel I/O.
  • Python interface for endpoint in beta.
  • Program name is no longer the default profile name, "default" is used instead.
  • Track time spent in MPI_Init*() by the application.
  • Removed nearly all use of the /tmp directory (topo-cache still created in /tmp if GEOPM Service is not running)
  • More detailed and accurate reporting of GEOPM overhead, MPI overhead, and controller startup time.
  • Generic runner for GEOPM experiment infrastructure.
  • MSR, NVML and LevelZero IOGroups not loaded except when user has CAP_SYSADMIN or through the GEOPM Service.

Consumption - Computation and Communication - C++
Published by cmcantalupo over 1 year ago

GEOPM - Version 2.0.2

  • Hot fix 2 for release 2.0.
  • Add security.md doc for vulnerability reporting.
  • Align behavior of secure_make_dirs() to documentation w.r.t. intermediate directories.
  • Includes bug fixes and documentation improvements.
  • Fix constness of return value from dgcm_device_pool().
  • Fix warning from recent gcc about uninitialized variables.
  • Use PALSLauncher on australis.
  • PALSLauncher: use list option to cpu-bind
  • Fix for suppressed error reporting.
  • Fix for SST kernel driver on SLES 15.3.
  • Fix for issue where missing data can cause Controller crash.
  • Update copyright year to 2023.
  • Fix LevelZero exception location.
  • Fix error when GPUs are supported by service but not client.
  • Swap load order of msr and service iogroups.
  • Resolve service integration test issues.

Consumption - Computation and Communication - C++
Published by bgeltz about 2 years ago

GEOPM - Version 2.0.1

  • Hot fix 1 for release 2.0.
  • Includes bug fixes and documentation improvements.
  • Fix install and packaging of plugin directory (#2823).
  • Fixes for IMPI mpiexec launch wrapper (#2822, #2820)
  • Fix issues discovered in with recent Clang and in the Ubuntu 22 environment (#2829, #2740)
  • Better error reporting from geopmd signal handler (#2789).
  • Fix for supporting LevelZero when MPI also initializes LevelZero (#2802).
  • Better error reporting when application handshake fails (#2801).
  • Use multi-user.target in systemd unit file rather than default.
  • Fix overwrite of access list with --force option (#2712).
  • Use control access list to generate signal list (#2707).
  • Fix spelling errors in documentation (#2644).
  • Support for recent LevelZero implementations which require user to zero call by reference parameters.
  • Better error reporting with LevelZero topology failures.
  • Update spec file to make LevelZero inclusion parameterized and suggestions from SUSE maintainers.
  • Enable CNLIOGroup by default.
  • Fix potential memory issue with CircularBuffer (not exposed by current implementation).
  • Use more robust method to obtain sticker frequency.
  • Use SKX MSR definitions for newer architectures.

Consumption - Computation and Communication - C++
Published by cmcantalupo about 2 years ago

GEOPM - Version 2.0.0

  • Official v2.0.0 release tag.
  • Provides the GEOPM Systemd Service.
  • Removes Python 2 support, only supporting Python 3.
  • Support for GPUs from Intel and NVIDIA.
  • Support for the isst_interface driver.
  • Support for new server processors including Sky Lake, Cascade Lake and Ice Lake.
  • Support for Cray Linux energy counters.
  • Higher performance / lower latency profile interface.
  • More consistent naming scheme for PlatformIO signals and controls.
  • Extended set of signals and controls provided by PlatformIO.
  • Removed msr-safe requirement though GEOPM Service features.
  • Support for new HPC runtime launchers (pals, impi).
  • Flexible YAML report generation and parsing that may contain arbitrary content.
  • Extended python interface support including Reporter features.
  • Python based agents for prototyping runtime algorithms that do not require application feedback.
  • Removed Energy Efficient Agent (will be replaced in a future release).
  • Documentation and web page improvements.
  • Other improvements and feature additions.

Consumption - Computation and Communication - C++
Published by cmcantalupo over 2 years ago

GEOPM - GEOPM 2.0.0 Release Candidate 3

  • Release candidate 3 for version 2.0
  • This is a pre-release version of GEOPM that has all features that will be present in the v2.0.0 release.
  • No changes other than documentation and possible bug fixes are expected prior to v2.0.0.
  • This represents a code freeze and version 2.0 is anticipated soon after this release.
  • All feedback about this release candidate is appreciated: https://geopm.github.io/contrib.html

Consumption - Computation and Communication - C++
Published by cmcantalupo over 2 years ago

GEOPM - GEOPM 2.0.0 Release Candidate 2

  • Release candidate 2 for version 2.0
  • This is a pre-release version of GEOPM that has all features that will be present in the v2.0.0 release.
  • The names of signals and controls provided by the PIO interface have changed for rc2 as described here: https://github.com/geopm/geopm/issues/1671
  • Chapter 7 man page documentation has been added for the PlatformIO interface and supported signals and controls.
  • Other changes required for version 2.0 have also been made.
  • All feedback about this release candidate is appreciated: https://geopm.github.io/contrib.html

Consumption - Computation and Communication - C++
Published by cmcantalupo almost 3 years ago

GEOPM - GEOPM 2.0.0 Release Candidate 1

Consumption - Computation and Communication - C++
Published by cmcantalupo almost 3 years ago

GEOPM - GEOPM 1.1.0

  • Tue Nov 5 2019 Diana Guttman [email protected] v1.1.0
  • Release overview:
    • Support for Python 3.6 has been added.
    • Support for Python 2.7 continues but will be removed in a future release.
    • New features targeting integration with resource managers.
    • Enhancements to EnergyEfficientAgent.
    • Improved support for automatic OpenMP region detection.
    • Support for launching with OpenMPI.
    • Bug fixes, new and updated tests, and updates to documentation.
  • New features:
    • GEOPM environment variables can now be initialized from a JSON file.
    • Add geopm_agent_enforce_policy() function and Agent::enforce_policy() to public interface.
    • Add tracing for the profile table log with GEOPM_TRACE_PROFILE.
    • Add REGION_COUNT signal to get number times a region has been seen.
    • Add REGION_COUNT signal to default trace columns.
    • Add python wrappers for geopm_pio_c, geopm_topo_c, geopm_error_c, and geopm_agent_c interfaces.
    • Add format_function() method to IOGroups to get a formatting function from a signal name.
    • Add IOGroup for Compute Node Linux PM counters.
    • Allow the FrequencyMapAgent to come from the agent's policy rather than the deprecated environment variable.
    • Add launcher for OpenMPI.
  • New beta features:
    • Add geopmconvertreport script to convert report file into yaml and json.
    • Add a new error type for data store errors.
    • Add PolicyStore class to map agents and profiles to policies.
    • Introduce new Endpoint API, which replaces and extends the ManagerIO.
    • Implement geopm_endpoint_c API.
  • Modified implementations and interfaces:
    • Add CSV class to support CSV files created by GEOPM.
    • Modify Tracer and ProfileTracer to use the CSV class.
    • Add trace_formats() method to Agents.
    • Change freq_sweep analysis to use system max frequency for default max.
    • Move geopmpy package to 'production' status.
    • Minimize set of functions in Environment C interface.
    • Change Environment class variable names for better readability.
    • Update FrequencyMapAgent to use Environment class for its environment variable.
    • Add TEMPERATURE_* signals to list shown by geopmread.
    • Change REGION_RUNTIME signal reflect time of outer region only.
    • Add MSR turbo ratio limit for KNL.
    • Use max turbo ratio limit for platform max frequency.
    • Remove ability to write turbo ratio limit.
    • Add MPI_Barrier before entering all2all model region.
    • Increase problem size of FFT to D class.
    • Add IMPI support to tutorials.
    • Add feature to geopmagent and Agent interface where partial policies will be completed with NANs.
    • Add SLURM -bootstrap option for IMPI.
    • Add geopm_time_to_string() to convert a time structure into a string.
    • Add write_file() helper function.
    • Add value of policy to report, or DYNAMIC when policy comes from an Endpoint.
    • Separate Agent creation time from init() in Controller.
    • Add DebugIOGroup for extending trace with internal Agent values.
    • Add pthread mutex to beginning of SharedMemory regions, with get_scoped_lock() as the only method to lock the mutex.
    • Remove pthread mutex from ManagerIO struct.
    • Use git ls-tree to generate the MANIFEST in any git repo.
    • Remove m_request_s from PlatformIO public interface.
    • Change RPM to build libgeopmpolicy only and remove check step.
    • Add get_hostnames() method to Controller.
    • Add unlink() method to SharedMemory.
    • Update VERSION with each call to autogen.sh.
    • Do not markup anything in geopmbench if all regions are suffixed with '-unmarked'.
    • Update OMPT interface to newest standard.
    • Use libdl and libelf to map instruction address to symbol name.
    • Remove hard requirement for hosts file usage in tutorials.
    • Remove MacOS portability.
    • Remove signal handling logic from Controller.
    • Change board power min/max/tdp to use sum aggregation.
    • Change power cap policy of PowerGovernorAgent and PowerBalancerAgent to POWER_PACKAGE_LIMIT_TOTAL.
    • Change "mpi-time" in report to "network-time" and change time to include all network time.
    • Rename EPOCH_RUNTIME_MPI signal to EPOCH_RUNTIME_NETWORK.
    • Move Environment class definition to header.
    • Split geopm_pmpi.c into C/C++ parts.
    • Clean up build and run scripts for tutorials.
    • Remove region entry and exit lines from the trace by default; they can be added with --enable-bloat.
  • Improved error messages and warnings:
    • Make prefix of runtime warning strings consistently start with "Warning: ".
    • Improve error message when msr driver can't be loaded.
    • Print a proper message on failure to launch lscpu job.
    • Add more verbose geopm plugin load failure warning.
    • Add more detailed description to geopm_error_message() based on last exception thrown.
    • Change throw to warning for PowerBalancerAgent running on a single node.
    • Fix error message when MSR read fails.
  • Extensive changes to EnergyEfficientAgent algorithm:
    • Change EE Agent to learn separately for each control domain.
    • Add max filtering to EnergyEfficientRegion.
    • Use sticker when passing NaN in the policy.
    • Add PERF_MARGIN as a policy for EnergyEfficientAgent.
    • Do not set frequency for regions shorter than 50 ms or unmarked.
    • Have EE Agent always use min frequency for network regions.
    • Update EE agent to use region count to detect adjacent regions with same hash.
    • Add separate max frequency to use for static policy.
    • Bug fixes and refactoring in EnergyEfficientAgent.
  • Updates to integration tests:
    • Increase iterations for EnergyEfficientAgent test.
    • Decrease margin in test for geopm python wrapper measuring time.
    • Add a integration test checking that chosen frequencies increase monotonically with CPU-bound time in regions.
    • Update integration tests to use new trace file format.
    • Add imbalance to power_balancer integration test.
    • Refactor report mock functions in integration tests.
    • Move integration test helpers into util.py.
    • Add integration test for the epoch data in report.
    • Add msr save and restore calls to test launcher.
  • Updates to unit tests:
    • Add unit tests for EnergyEfficientAgent.
    • Cleanup environment variables in unit tests.
    • Add unit tests for the geopmpy.io module.
    • Add unit tests for the geopmpy.launcher module.
    • Make profile tests work with different task sets.
    • Fix TestAffinity to check for OMP_NUM_THREADS in test setup.
    • Fix ExceptionTest to account for extra char in error message.
  • Updates to documentation:
    • Add Daniel Wilson to the AUTHORS file.
    • Change CONTRIBUTING instructions on how to get version.
    • Add version to geopm man pages.
    • Update man pages and README to describe Environment changes and integration with resource managers.
    • Fix PlatformTopo C++ man page to match new interfaces.
    • Add section to README about user environment for non-standard install.
    • Modify frequency_map man page to use floating point frequencies.
    • Rename geopm_pio_c man page to show its section number.
    • Add man page for Endpoint class.
    • Update endpoint_c man page.
    • Remove references to uninstalled man pages from geopm.7.
    • Remove specific list of available launchers from geopm.7.
    • Add documentation to README for Ubuntu support.
    • Add example for systems programmers using PlatformIO.
    • Fix typos in documentation.
  • Bug fixes:
    • Fix paths for building tutorial from module environment.
    • Fix Tracer handling of # signals from environment.
    • Fix Tracer handling of region hash and hint integers.
    • Fix a bug where regions with the same name as the profile did not appear in the report.
    • Fix trace file cache loading print in io.py.
    • Rename and fix analysis for EE and frequency map agents.
    • Fix a bug where LD_PRELOAD was always set.
    • Update geopmplotter to sue agents and cosmetic fixes to plots.
    • Fix geopm::string_split() so it works with multi-character delimiters.
    • Fix build when using --disable-openmp.
    • Fix build when using --disable-mpi.
    • Fix a bug where launcher did not use srun reservation for geopmread cache.
    • Fix placement of verbose flag for geopmbench.
    • Fix epoch reporting when there are no regions.
    • Fix generation of report hdf5 cache.
    • Fix date generation in geopm_time.h.
    • Only overwrite roff pages with ronn if the roff page is missing.
    • Avoid a buffer overrun when copying cpusets.
    • Check if MPI has been finalized before freeing the comm.
    • Fix stderr piping in autogen.sh.
    • Fix build errors from gcc8.
    • Fixes to allow installed headers to be used out of source.
    • Fix a bug where tutorial tarball was not built when docs are disabled.
    • Remove DRAM power from PowerGovernorAgent samples.
    • Avoid loss of precision when converting policies to json strings.
    • Do not use GEOPM_REGION_HASH_INVALID in Agent implementations.
    • Remove '0x' from IMPI affinity mask.

Consumption - Computation and Communication - C++
Published by cmcantalupo over 5 years ago

GEOPM - GEOPM 1.0.0

  • Tue Apr 16 2019 Christopher M. Cantalupo [email protected] v1.0.0
  • Release overview:
    • The official 1.0 release of the GEOPM software!
    • Primary changes are bug fixes and documentation updates since release candidate 3.
  • Updates to integration tests:
    • Fix test_runtime_regulator integration test which had improper tolerances for sleep() interface.
    • Update some integration tests to print errors when platform read/write fails.
  • Updates to unit tests:
    • Add more unit tests for launcher affinity.
  • Updates to documentation:
    • Clean up geopm_pio_c(3) and geopm_topo_c(3) man pages.
    • Remove references to Comm man pages that are not installed.
    • Add include and linking instructions to geopm_pio.3.ronn.
  • Installed header clean up:
    • Update PlatformTopo singleton to return const reference.
    • Clean up forward declaration in public header.
  • Bug fixes:
    • Fix tprof API calls when Controller is not present to avoid segmentation fault.
    • Fix issue by removing call to EnergyEfficientRegion::update_freq_range().
    • Fix issue where FrequencyGovernor was being used but not created by agents above the leaf.
    • Fix missing hidden header dependencies.
    • Fix OMP_NUM_THREADS calculation when --geopm-hyperthreads-disable option is provided to launcher.
    • Fix IOGroup and Agent tutorials to use new Agent interfaces.
    • Fix domain for frequency signal/control on some x86 platforms.

Consumption - Computation and Communication - C++
Published by cmcantalupo about 6 years ago

GEOPM - GEOPM 1.0.0 Release Candidate 3

  • Wed Apr 3 2019 Christopher M. Cantalupo [email protected] v1.0.0+rc3
  • Modified implementations and interfaces:
    • Finalized interfaces for 1.0.0 release.
    • Changed class naming scheme to drop "I" prefix from interface base classes and add "Imp" suffix to implementation classes.
    • Replaced ascend() and descend() Agent methods with more fine grained interface.
    • Modified MSRIOGroup to use JSON to store MSR data.
    • Updated utility classes for Agent interface changes.
    • Removed use of raw pointers from MSRIOGroup.
    • Added Helper function to list files in a directory.
    • Renamed split_string() to string_split().
    • Removed sort call from table dump since no longer needed.
    • Removed samples sent up tree from MonitorAgent.
    • Moved "PlatformTopo::m_domain_e" to a C enum "geopm_domain_e" in geopm_topo.h.
    • Changed GEOPM_DOMAIN_INVALID to -1 and shifted the all other domains values by one.
    • Renamed all references to the PlatformTopo::m_domain_e enum to use geopm_domain_e.
    • Removed PlatformIO::num_signal() and PlatformIO::num_control() from public interface.
    • Renamed PlatformIO method is_domain_within() to is_nested_domain().
    • Moved geopm_region_info_s to geopm.h.
    • Renamed Agent::report_node() to report_host().
    • Removed ProfileIOGroup from installed headers.
    • Renamed CircularBufferImp to CircularBuffer.
    • Moved MSRSignal and MSRControl into their own files.
    • Moved Imp classes for installed classes to own non-installed header.
    • Moved SharedMemory and SharedMemoryUser classes into separate headers.
    • Introduced FrequencyGovernor that holds common code for setting frequency.
    • Updated EnergyEfficientAgent and FrequencyMapAgent to use FrequencyGovernor.
    • Replaced ascend() and descend() methods in all built in agents to use new APIs.
    • Removed num_signal_pushed() and num_control_pushed() from public PlatformIO APIs.
    • Made tutorial shell scripts compatible with more shell variants.
  • Updated features:
    • Implemented and documented C wrappers for the PlatformIO class: geopm_pio_c(3).
    • Implemented and documented C wrappers for the PlatformTopo class: geopm_topo_c(3).
    • Changed implementation to stop sending messages about MPI regions nested inside of network hint regions.
    • Added command line option to geopmread(1) and geopmwrite(1) to create topology cache file.
    • Added make_unique and make_shared factory methods all installed C++ header classes.
    • Added check for RAPL lock bit when using power controls
    • Added UNCORE_RATIO_LIMIT MSR support for HSX, BDX, and SKX.
    • Added per-region power to Report.
    • Enabled MSRIOGroup to extend MSRs through JSON file at runtime located in GEOPM_PLUGIN_PATH.
    • Added MSR methods for parsing function and units strings.
    • Introduced FrequencyMapAgent which runs regions at specified frequencies.
    • Added --enable-beta configure flag which installs beta features with make install target.
  • Updated and extended integration tests:
    • Ignore failures for missing python packages.
    • Added feature to save/restore power limit and frequency between each integration test.
  • Updated unit tests:
    • Added more unit tests for Helper.
    • Fixed AgentFactoryTest.
  • Updates to documentation:
    • Added documentation on MPI requirements for geopm_prof_c(3) APIs.
    • Removed references to endpoint in documentation since this is still a beta feature.
    • Added documentation about Agent report/trace extension name conventions.
    • Add man page for geopm_pio_c(3) and geopm_topo_c(3).
    • Add man page for geopm_agent_frequency_map(7).
  • Bug fixes:
    • Fixed EnergyEfficientAgent so it actually functions properly.
    • Fixed issue with using temporary script in launcher to execute lscpu.
    • Fixed missing input parameter checks in PlatformTopo and PlatformIO.
    • Fixed Fortran build and missing dependency that could break parallel builds.

Consumption - Computation and Communication - C++
Published by cmcantalupo about 6 years ago

GEOPM - GEOPM 1.0.0 Release Candidate 2

  • Fri Feb 22 2019 Christopher M. Cantalupo [email protected] v1.0.0+rc2
  • Modified implementations and interfaces:
    • Rename GEOPM_PROFILE_TIMEOUT environment variable to GEOPM_TIMEOUT.
    • Modify default behavior when using the geopmlaunch: --geopm-ctl=process --geopm-report=geopm.report.
    • Introduce --geopm-disable-ctl CLI option for geopmlaunch to preserve passthrough behavior.
    • Remove geopm_prof_init() interface from installed header.
    • Fix geopmhash example command line tool.
    • Update plugin loading implementation to use C++.
    • Refactor IOGroup lookup in PlatformIO.
    • Modify analysis power sweep to consider multiple packages.
    • Support lscpu versions that omit 0x from hex values.
    • Do not install Comm.hpp or MPIComm.hpp.
    • Modify time signal to be scoped to the CPU.
    • Rename M_UNITS_HZ to M_UNITS_HERTZ
    • Add tables module to Python requirements.
    • Change MSR names to match names in Intel (R) Software Developers Manual.
    • Make end bit of MSR bitfield inclusive.
    • Add descriptions for built-in signals and controls.
    • Align launcher names and programmatically generate list of supported launchers.
    • Modified Agent::validate_policy() interface.
    • Add stricter domain checks in TimeIOGroup and CpuinfoIOGroup
    • Fix configuration and build issues with ompt.
    • Disable python unit testing in RPM check target.
    • Remove uninstalled files from spec file.
  • Updated features:
    • Update tracer to enable user specified column signals to also specify domain.
    • Update reporter to enable user specified signals and domains.
    • Add REGION_HASH and REGION_HINT signals.
    • Remove all references to the region_id from public interfaces.
    • Add domain aggregation for read_signal and write_control.
    • Add TEMPERATURE as default trace column.
    • Add split_string() helper function.
    • Install geopm_hash.h and add man page.
    • Add helper function to replace gethostname().
    • Improve trace column header names for PowerBalancerAgent.
    • Modify how epoch totals are calculated.
  • Updated and extended integration tests:
    • Fix fence-post problem in test_trace_runtimes.
    • Skip EnergyEfficientAgent integration test on non-BDX platforms.
  • Updated unit tests:
    • Fix timing issue with PowerGovernorAgentTest.wait test.
    • Fix geopmagent CLI test.
    • Clean up PlatformIOTest.
    • Update to googletest v1.8.1.
    • Optimize Travis CI build.
  • Updates to documentation:
    • Update man pages to reflect environment extension of report and trace.
    • Update man pages for Agg, CircularBuffer, IOGroup, Exception, Helper, RegionAggregator, SharedMemory, PluginFactory, MSR, MSRIO, and MSRIOGroup classes.
    • Update geopm_region_id_c.3 man page.
    • Update geopm_sched.3.ronn.
    • Clean up geopmlaunch man page.
    • Update man pages for IOGroups
    • Add tutorial about plugin loading order.
    • Add missing links to geopm(7) man page.
    • Update copyright date to 2019.
    • Use BLURB in geopm.7 man page.
    • Sync spec file for OpenHPC with the one published with OpenHPC.
    • Change die.net links to man7.org
  • Bug fixes:
    • Fix all timeouts for usages of SharedMemoryUser to reflect geopm_env_profile_timeout().
    • Fix energy status units for DRAM on Haswell and Broadwell.
    • Fix energy reporting on multi-socket systems.
    • Fix issue when application calls MPI_Init_thread() to increase thread level to match GEOPM requirements.
    • Fix broken build when configured with --enable-overhead.
    • Fix issues detected with clang.
    • Fix launcher args for IMPI.
    • Fix throw in Tracer when reading hash and hint which are allowed to be zero.

Consumption - Computation and Communication - C++
Published by cmcantalupo about 6 years ago

GEOPM - GEOPM 1.0.0 Release Candidate 1

  • Release overview:
    • This is the first candidate for the v1.0.0 release of the GEOPM package.
    • The version 1.0 is significant in that semantic versioning https://semver.org/ is intended for all subsequent releases.
    • The APIs defined by all installed header files and the documented behavior of those interfaces shall remain compatible with linking applications until version 2.0.
    • The documented definition for all built in signals and controls supported by PlatformIO is not intended to change prior to version 2.0.
  • Expected changes prior to v1.0.0 release:
    • The documentation included in this release candidate will be improved upon prior to the actual v1.0.0 release.
    • Man pages which currently link to doxygen will be filled in.
    • The definition of the high order bits in the REGION_ID# signal supported by PlatformIO may be changed in the way documented in the PlatformIO(3) man page to split into two signals (REGION_ID AND REGION_HINT).
    • It is possible that interface classes currently prefixed with "I" may be renamed to exclude the "I" (e.g. IPlatformIO -> PlatformIO).
    • In this case the concrete implementation would be appended with "Imp" (e.g. PlatformIO -> PlatformIOImp).
    • The appearance of the epoch signal in the REGION_ID column of the trace will be removed.
    • The EPOCH_COUNT signal will be added to the default set of traced signals to enable tracking of epoch calls.
  • High level summary of changes since v0.6.1:
    • With this release we have removed all references to the Policy, Decider, Platform and PlatformImp objects.
    • These have been replaced by the PlatformIO / IOGroup / Agent class interactions.
    • The Kontroller object which was supporting the new code path has been renamed Controller.
    • The legacy Controller implementation has been removed.
    • GEOPM no longer depends on the hwloc library, and is relying on running lscpu on compute node instead.
  • Modified implementations and interfaces:
    • Rename launcher to geopmlaunch.
    • Do not install geopmanalysis and geopmplotter command line utilities.
    • The command line interfaces for these tools will be changing.
    • Once they are committed, we will begin installing them again.
    • Remove unused error codes from geopm_error.h.
    • Remove some deprecated interfaces and files.
    • Remove legacy artifacts from Reporter and Tracer.
    • Remove legacy structures from geopm_message.h.
    • Remove deprecated API headers.
    • Remove CtlConf Python object.
    • Remove region ID memory from derivative for power signals, this is a feature for agent to implement.
    • Remove unused arguments from the geopmctl_main.
    • Remove push_combined_signal() from PlatformIO interface.
    • Remove NAN check for policy in Controller. Agents are responsible for handling NAN.
    • Remove IPlatformTopo::define_cpu_group(). This method is not implemented and not used.
    • Remove MPI bit from region ID in report.
    • Remove install of geopm_message.h and geopm_plugin.h.
    • Remove environment variables for min/max frequency used by EnergyEfficientAgent: this functionality is provided through the policy as documented.
    • Fixes for online mode of EnergyEfficientAgent: ignore 0.0 when sampling runtime, fix min/max frequency range in analysis.py, fix final requested frequency printed in report.
    • EnergyEfficientAgent no longer considers DRAM energy in its optimization.
    • Change default frequency for hints from min to max in EnergyEfficientAgent.
    • Implement EnergyEfficientAgent analysis using hints only.
    • Change meaning of EPOCH_RUNTIME signal: MPI and ignore time reported explicitly and a separately.
    • Install many C++ headers into /usr/include/geopm.
    • Move geopmbench source files files from tutorial directory into src.
    • Don't copy any files from src into tutorials.
    • Update tutorials to use Agent code path.
    • Throw if multiple hints given to geopm_prof_region.
    • Allow writing controls for containing domains: the same value will be written to every subdomain.
    • Update EpochRuntimeRegulator accounting: PKG and DRAM energy dissociated from rank.
    • Updated to report pre-epoch MPI and ignore runtime.
    • Make TreeComm fan out configurable with environment variable.
    • Per thread progress is supported by the 'REGION_THREAD_PROGRESS' signal.
    • Align command line options to the launcher and the environment variables used by the controller.
    • Merge tutorial Makefiles into one and remove duplicate scripts.
    • Rename runtime related APIs.
    • Merge ProfileIO into ProfileIOSample.
    • Refactor analysis.py command line parsing to use argparse, etc.
    • Move some header includes from headers into source files when possible.
    • Change "POWER_PACKAGE" control name to "POWER_PACKAGE_LIMIT".
    • Expose MSR PKG_POWER_LIMIT fields as signals.
    • Reorder directory search in plugin load: load plugins from right to left to so leftmost plugin wins in case of IOGroup loading same name for controls and signals.
    • Use accumulator member in EpochRuntimeRegulator for MPI runtime.
    • Changes to the launcher for mpiexec using in hydra
    • Move set_policy_defaults to Agent interface
    • Aggregation functions have been moved out of PlatformIO and into their own class: Agg.
    • Implement agg_function for IOGroups, including tutorial.
    • Do not stop integration test in looper if one test fails.
    • Increase shmem table size to 2MB per rank to reduce risk of overflow.
    • Remove hash table structure in ProfileTable; all regions now use the same table entry.
    • Change CpuinfoIOGroup to throw in constructor if cpuinfo could not be parsed.
    • In python analysis do not parse traces if total size is more than half of memory.
    • Remove redundant HDF5 cache from analysis.py.
    • Remove TURBO_RATIO_LIMIT2 control for platforms where it is not in whitelist.
    • Read multiple samples for a short time in geopmread to support POWER signals.
    • Narrow scope of warning message about cpufreq governor: only print warning when an attempt is made to write to a control that begins with POWER or FREQUENCY.
    • Prevent MSRIOGroup from throwing when saving MSRs.
    • Implement and use AgentConf in python code to create agent polices.
  • Updated features:
    • Add timestamp counter to available signals.
    • Add --info option to geopmread and geopmwrite.
    • Add check for invalid GEOPM_CTL values.
    • Add temperature signals.
    • Add Imbalancer interface to libgeopm and libgeopmpolicy: Imbalancer_() -> geopm_imbalancer_().
    • Add some placeholder descriptions to MSRIOGroup and TimeIOGroup to support integration tests.
    • Add methods to RegionAggregator to get region IDs and signals.
    • Add methods to PlatformIO to provide signal/control descriptions: this will be used to augment geopmread/write with descriptions.
    • Add description APIs for IOGroup: allows IOGroups to provide a user-friendly description of signals/controls.
    • Add GEOPM_TIME_REF constant for use with geopm_time_*() APIs.
    • Add INSTRUCTIONS_RETIRED alias signal.
    • Add TIMESTAMP_COUNTER alias for MSRIOGroup.
    • Add signal to enable reading of the RAPL lock bit.
    • Add PKG_POWER_LIMIT MSR fields as a signal.
    • Add expect_same aggregation function that returns NAN if any elements of the vector differ.
    • Add average node frequency to EnergyEfficientAgent tree samples.
    • Add support for POWER_* as signals that give meaningful results without runtime.
    • Add module conflict of darshan to theta module file.
    • Add psutils python dependency.
    • Add warnings for system misconfiguration.
    • Add read_file() to Helper.hpp.
    • Add job start in Trace and Report headers.
    • Add outlier detector script.
    • Add handling of NAN for default policy values to all agents.
    • Add parsing for overhead fields to io.py.
    • Add reading of the thread table through PlatformIO.
  • Updated and extended integration tests:
    • Ignore misconfigured system warnings in integration test.
    • Remove ignore of multiple plugin load warnings that stopped occurring after removal of legacy code.
    • Do not test epoch runtime in test_region_runtimes.
    • Add all2all to power_balancer integration test.
    • Adjust power_balancer test logic to compare Governor and Balancer relatively.
    • Fix EnergyEfficientAgent integration test.
    • Test decorators implemented to use launcher. This forces the checks to be run on the compute nodes.
    • Update integration tests to reflect removal of legacy code path.
    • Update test_power_consumption to use PowerGovernor.
    • Fix integration test to exclude MPI and model-init regions from tests using traces.
    • Fix integration test to use assertNear to account for new MPI region markup.
    • Move GEOPM_EXEC_WRAPPER functionality into integration test.
  • Updated unit tests:
    • Add tests of domain aggregation for pushed signals.
    • Add test for geopmread signal aggregation.
    • Stop the unit tests from littering files.
    • Fixed signed / unsigned comparison issue in PlatformIO test.
    • Update unit tests to reflect removal of legacy code path.
    • Add test of IOGroup factory that checks that an IOGroup's list of signal/control names are all valid.
  • Updates to documentation:
    • Update GEOPM main README.
    • Add doxygen target for public interface files.
    • Add man pages for all C++ headers that are now installed to support plugin development.
    • Full man pages have been added for PluginFactory, PlatformIO, PlatformTopo, Agent, and IOGroup.
    • Add documentation about aliasing signals and controls.
    • Update launcher ronn to include references to env vars.
    • Add README for outlier_detection.
    • Update the tutorial README.md to reference geopmbench and point out the agent and iogroup subdirectories.
    • Document how to build GEOPM with Intel Toolchain.
    • Fix example source code in geopm_prof_c.3 man page.
    • Add man pages for geopm_time.h and geopm_imbalancer.h.
    • Update Doxygen to reflect removal of legacy code path.
    • Remove alpha and beta labels from documentation.
  • Bug fixes:
    • Fix how starting energy counters are recorded in EpochRuntimeRegulator.
    • Fix timestamp issue with Tracer.
    • Fix region handling in Reporter hints.
    • Fix OMPT enabled pthread launch with Controller/Agent.
    • Fix for invalid function for some MSR signals.
    • Fix for EnergyEfficientAgent policy: initialize min and max frequency to NAN.
    • Fix EnergyEfficentAgent offline analysis parsing.
    • Fix geopmbench stream benchmark which was using too little memory.
    • Fix python tests to print better warnings and avoid print command.
    • Fix for MPI region entry: MPI regions used in GEOPM startup were given a region ID of 0.
    • Fix initialization of per rank ignore and mpi runtime.
    • Fix default policy generated by geopmagent to properly represent NAN.
    • Fix reporting of MPI and ignore runtime prior to first epoch for report totals.

Consumption - Computation and Communication - C++
Published by cmcantalupo over 6 years ago

GEOPM - GEOPM 0.6.1

  • Hotfix for v0.6.0 release.
    • Fix MPI functions called during startup getting assigned region 0.
    • Fix missing profiling of some MPI functions when called from fortran.
    • Fix performance regression due to attempt to profile non-blocking MPI calls.
    • Fix to remove unsupported MSR from skylake platform definition (TURBO_RATIO_LIMIT2).
    • Fix to prevent throw when trying to save/restore MSRs that are not supported on the system.

Consumption - Computation and Communication - C++
Published by cmcantalupo over 6 years ago

GEOPM - GEOPM 0.6.0

  • Stabilized Agent code path.
  • Last release with Decider/Platform/PlatformImp support.
  • Modified implementations and interfaces:
    • Modify PowerGovernor to ignore DRAM power and tune parameters for power balancer.
    • Profile larger set of MPI functions including non-blocking routines.
    • Removed push_region_signal_total() and sample_region_total() from PlatformIO.
    • This functionality is available to Agents by creating an instance of RegionAggregator.
    • Redesigned geopmanalysis command line interface so that the first argument selects the analysis type.
    • Add options to geopmanalysis for min and max frequency for frequency sweep analysis types.
    • Remove geopmanalysis --level option and replace with --summary and --plot.
    • This allows summaries and/or plots to be generated separately.
    • Add option to use agent code path to geopmanalysis (use_agent).
    • Change EnergyEfficientAgent frequency map to use JSON format.
    • Introducing GEOPM_EXEC_WRAPPER environment variable useful for inserting a debugger into the integration tests.
    • Reuse same idx val for repeated pushes of signals/controls.
    • Cat lscpu output to /tmp prior to running job and avoid popen call inside of MPI app.
    • Change PowerGovernorAgent::wait() to use time instead of RAPL updates.
    • Get rid of C-string from ProfileTable implementation.
    • Add max_level() to TreeComm.
    • Introducing the PowerGovernor class.
    • Introducing Agent::aggregate_sample() static helper function for Agents.
    • Add agent field to io.py dataframe index. Note: this will break compatibility with scripts that use the old index.
    • Rename RAPL related MSR names: SOFT_POWER_LIMIT to PL1_POWER_LIMIT and HARD_POWER_LIMIT to PL2_POWER_LIMIT.
    • Add geopm_time_since() method.
    • Update the analysis.py energy references.
    • Add RegionAggregator class for per-region signal totals.
    • Update Reporter to use RegionAggregator.
    • Changed region counts to start at -1 before first entry.
    • Get rid of unused and undocumented environment variable GEOPM_REPORT_VERBOSITY.
    • Modify launcher to set LD_PRELOAD only for application.
    • Change some AppOutput methods to return pandas Dataframes instead of Report/Region objects.
    • Add barrier in MPI_Init prior to GEOPM startup.
    • Have RootRole throw if bad power cap is set.
  • Updated features:
    • Introducing the new PowerBalancer agent with many commits since v0.5.1 that tweak the algorithm.
    • Ignore epoch calls when made inside of a region marked with the ignore hint.
    • Add MSRIOGroup signals that return the raw value of an MSR.
    • Use slurm option to select the performance power governor when using GEOPM.
    • Add a spec file for building GEOPM for ALCF Theta.
    • Add profile name and agent to trace header.
    • Add CYCLES_THREAD and CYCLES_REFERENCE to trace.
    • Add Agent support in python scripts.
    • Add CORAL 2 version of AMG to examples.
    • Update markup for miniFE example to set region ID once per region.
    • Update nekbone patches for scaling studies.
    • Suppress OMP warnings in launcher when using Intel toolchain.
    • Add PowerSweepAnalysis type to geopmanalysis.
    • Add BalancerAnalysis type to geopmanalysis.
    • Add NodeEfficiencyAnalysis type to geopmanalysis.
    • Add NodePowerAnalysis type to geopmanalysis.
    • Introduce a plotter method to generate histograms.
    • Have ManagerIO skip policy file parsing if agent has no policies.
    • Add HDF5 caching for parsed reports and traces to io.py.
    • Add summary features to analysis where summarized data is written to files in ascii tables.
  • Updated and extended integration tests:
    • Updates to integration tests to support the Agent / PlatformIO code path are a major feature of this release.
    • Adding back integration test for power balancer with increased time limit.
    • Automatically infer architecture based on hostname.
    • Add monitor as available agent to run integration tests.
    • Use regular runtime for epoch in test_region_runtimes.
    • Require balancer test to run in an allocation.
    • Checks average power limit across nodes is under cap in test_power_balancer.
    • Add integration test that runs GEOPM, but does not generate reports.
  • Updates to documentation:
    • Add documentation to the README about the scaling_governor.
    • Add documentation of constructor attribute for plugins to geopm(7) man page.
    • Add documentation for hint ignore interaction with geopm_prof_epoch().
    • Add documentation for all of the supported region hints.
    • Remove documentation about node barrier enforced by epoch call, this is no longer true.
    • Remove reference to MPIEXEC from spec file.
    • Add missing launcher options to help text.
  • Updated unit tests:
    • Add PowerBalancer unit tests.
    • Add PowerBalancerAgent unit tests.
    • Add analysis.py unit tests.
    • Add more detailed checks of TreeComm calls to KontrollerTest.
    • Add tests of geopmanalysis CLI.
    • Fix tests for ControlMessage.
  • Bug fixes:
    • Fix catch-value warning from GCC 8.
    • Fix possible C string truncation.
    • Fix for null characters sometimes appearing in report header.
    • Fix string sizing for strncpy and snprintf for gnu8.
    • Fix null termination in case of string overflow.
    • Fix in PowerGovernorAgent where fan_in could be accessed out of bounds.
    • Fix Kontroller index into Agent array; the level 0 Agent should not do descend() or ascend().
    • Fix issue where second region runtime is longer than first: move region exit barrier after call to sample.
    • Fix geopmagent so it can create empty json files.
    • Fix launcher to handle --cpu-bind as well as --cpu_bind.
    • Fix failure to restore fixed counter MSRs at end of GEOPM runtime.
    • Fix epoch region ID detection in io.py.
    • Fix for test_trace_runtimes with agent code path.
    • Fix performance issue: if power will be controlled, adjust one CPU per package.
    • Fix EnergyEfficientAgent init().
    • Fix issue where geopm would try to restore MSR MISC_ENABLE which is read only.
    • Fix test_power_consumption to measure socket power only.
    • Fix order of MSR save / agent init() to avoid failure to restore time window setting.
    • Fix --enable-overhead configure option
    • Fix pthread launch for Agent code path.
    • Fix Fortran comm initialization.
    • Fix handling of bad OMP masks.
    • Fix for klocwork error: missing null check.
    • Fix pthread launch when using MPICH by enabling MPI_THREAD_MULTIPLE in environment.
    • Fix pthread launch issue in Cray Linux by using secure versions of the CPU_SET macros.
    • Fix hang when runtime is active but report has not been requested.
    • Fix python scripts to support old data missing separate dram energy in report.
    • Fix python scripts to handle new agent field in parsed header.
    • Fix race in ControlMessage that could cause hang at GEOPM runtime start up.
    • Fix for ompt region names in Reporter.
    • Fix issue where slack was calculated prior to adding in extra power in PowerBalancingAgent.

Consumption - Computation and Communication - C++
Published by cmcantalupo over 6 years ago

GEOPM - GEOPM 0.5.1

  • Introduce the PowerGovernorAgent. This agent is implemented and fully featured.
  • Restoring the MSR values at the end of a run is now best effort since the system whitelist may prevent the write from being allowed.
  • Allow min/max frequencies to be specified in the EnergyEfficientAgent's policy.
  • Fix geopmread usages for tutorial.
  • Fix MSR overflow logic, performance counter initialization, and MSR encode/decode functions.
  • Fix integration tests for geopmwrite use cases.

Consumption - Computation and Communication - C++
Published by bgeltz almost 7 years ago

GEOPM - GEOPM 0.5.0

  • Community updates:

  • Modified implementations and interfaces:

    • Major refactor of the controller and plugin architecture is provided as an optional new code path.
    • Most of the changes made to the implementation for this release modify the new code path.
    • The old code path is still available for users as long as the controller is run without the GEOPM_AGENT environment variable set.
    • The new code path will be active if the user selects an agent by name with the GEOPM_AGENT environment variable when launching the controller.
    • The old code path is maintained in the current Controller object along with the the Decider / Platform / PlatformImp plugins.
    • The new code path is maintained in a replacement for the Controller which has been temporarily named the Kontroller.
    • The Kontroller will be renamed the Controller after this release, and the old code path will no longer be available.
    • Similar to the Kontroller/Controller replacement, the KprofileIOGroup KprofileIOSample and KruntimeRegulator are temporary replacements for their non-K counterparts and will be renamed.
    • The beta release enables a new set of plugin interfaces named the IOGroup, Agent, and Comm.
    • It is through the IOGroup, Agent and Comm plugins that the GEOPM runtime can be extended.
    • The Decider / Platform / PlatformImp plugin extensions are deprecated and will be removed after this release.
    • The IOGroup plugin enables a user to add new signal and control mechanisms for an Agent to read and write.
    • The Agent plugin enables a user to add new monitor and control algorithms to the GEOPM runtime.
    • MPI use by the GEOPM runtime which is not linked by application has been completely encapsulated in the Comm object.
    • The tutorial has been extended with two new directories: tutorial/agent and tutorial/iogroup.
    • The tutorial/iogroup directory documents how to write an IOGroup plugin.
    • The tutorial/agent directory documents how to write an Agent plugin.
    • The interface to the resource manager has been made much more flexible for supporting the new Agent interfaces.
    • The resource manager interface is documented in the geopm_agent_c(3) and geopm_endpoint_c(3) man pages.
    • Additionally command line tools have been proposed and partially implemented to support the interfaces documented in those man pages.
    • The geopm_agent_c(3) APIs and geopmagent(1) CLI has software support.
    • The endpoint interfaces are a work in progress that has not yet been integrated into the mainline source.
    • The PlatformIO object provides the interface to the IOGroups.
    • The PlatformIO C++ object will soon have an associated C interface documented as geopm_platformio_c(3).
    • The geopmread and geopmwrite provide a CLI to the PlatformIO features.
    • Introducing the MSRIOGroup which provides an implementation of the IOGroup for MSRs.
    • Introducing the TimeIOGroup which provides an IOGroup for the time signal.
    • Introducing the CpuinfoIOGroup which provides data from /proc/cpuinfo as signals.
    • Introducing the ProfileIOGroup which provides profile data collected from the main compute application through the geopm_prof_c(3) APIs.
    • The release includes three new installed binaries: geopmread, geopmwrite, and geopmagent.
    • Each of these command line interfaces is documented with a man page and there is a man page for a future command line tool called geopmendpoint.
    • Deprecated geopm_policy_() interfaces that have been replaced with the geopm_agent_() and geopm_endpoint_*() APIs.
    • Introducing the first three Agent implementations: MonitorAgent, PowerBalancerAgent, and EnergyEfficientAgent.
    • Introducing PlatformTopo, replacement for PlatformTopology.
    • Introducing DefaultProfile singleton which supports geopm_prof_c(3) APIs for profiling.
    • Added documentation for monitor, energy_efficient, and power_balancer Agents, but the implementation is not currently aligned.
    • The monitor agent is implemented and fully featured.
    • The energy_efficient agent will soon be extended to match the man page, and currently use of the network is not enabled.
    • The existing implementation of the energy_efficient agent does currently provide similar functionality to the efficient_freq Decider.
    • The power_balancer agent is a work in progress that is not well aligned with the man page, but will be feature complete soon.
    • Reports and traces generated by Agent code path are designed to be backward compatible with reports and traces generated with the Decider code path.
    • New environment variables documented in geopm(7): GEOPM_ENDPOINT, GEOPM_AGENT, GEOPM_TRACE_SIGNALS, and GEOPM_DISABLE_HYPERTHREADS.
    • Remove GEOPM_ERROR_AFFINITY_IGNORE environment variable, no longer required for testing.
    • New plugin registration mechanism has been put in place and new factory has been implemented.
    • Replace independent factories with single templated class the PluginFactory.
    • No longer register a plugin using a half instantiated object.
    • Removed call to dlsym, and plugins now use attribute((constructor)) to specify a callback target used when plugin is loaded.
    • In this callback the plugin should register with its respective factory.
    • Each plugin type has a make_plugin() static method that creates the plugin object and returns a pointer to the base class.
    • The make_plugin() function pointer is what is registered with the factory.
    • Extend the PluginFactory to require a the registration of a dictionary (map<string,string>) to enable queries of plugin capabilities.
    • Use stricter criterion for selecting plugin files to load, name must be of the form libgeopmpi*.so.0.0.0 where 0.0.0 is the GEOPM ABI version.
    • Moved geopm_plugin_description_s definition to geopm.h.
    • Add a configure option to enable use of the msr-safe ioctl interface for writing with PlatformIO.
    • The msr-safe ioctl interface should not be used for writing unless the system has an msr-safe installation that has fixed https://github.com/LLNL/msr-safe/issues/38.
    • Added APIs for manipulating hint bits in region id hash.
    • Many changes were made to modernize the use of C++.
    • Change protected members of all classes to private where possible.
    • Replace all raw pointer usage with C++11 smart pointers if possible.
    • Use default keyword for constructors and destructors where appropriate.
    • Use delete keyword rather than throw to avoid copy constructor.
    • Add override keyword to derived classes.
    • Use forward declaration of classes rather than include one header inside of another.
    • Add and integrate make_unique implementation for C++11.
    • Confirmed const correctness for all class methods.
    • Add public interface to register IOGroups with PlatformIO which enables IOGroups to be created at runtime.
    • Standardize the IOGroup signal and control names so that they are prefixed by the IOGroup name and two colons.
    • Agents should generally use high level aliases rather than these low level signals and controls.
    • Introduce functions for converting between signals and bit-fields to allow for PlatformIO to provide full 64 bit integer signals like the region ID.
    • Add overflow function type to MSR class.
    • Change frequency APIs to use Hz to enforce uniform use of SI units.
    • Use instruction offset in OMPT derived region name; this resolves a name ambiguity when more than one OpenMP region is discovered within the same function.
    • Use gmock archive uploaded to the geopm organization on github.
    • PlatformTopo is built on top of lscpu and does not require hwloc.
    • Throw on GlobalPolicy misconfiguration earlier in the runtime execution.
    • Rename SimpleFreqDecider to EfficientFreqDecider which will be replaced by EnergyEfficientAgent.
    • Update to efficient Decider and Agent related environment variables according to above name changes.
    • The json-c library is no longer a dependency, all references have been removed.
    • Now using the json11 library which is distributed in the "contrib" sub-directory.
  • Updated features:

    • Enable Agent to augment report and trace.
    • Enable user to augment trace through environment variable GEOPM_TRACE_SIGNALS in new code path.
    • Changes to PlatformIO to support non-CPU domains.
    • Added MSR save/restore functionality to PlatformIO save/reset interfaces.
    • Allow loading PlatformIO when some IOGroups fail to load.
    • Add aggregation functions to PlatformIO to encode how to combine signals.
    • Add PlatformTopo methods for converting domain to string and vice-versa.
    • Add signal_names() and control_names() to PlatformIO and IOGroup.
    • Add Skylake server (SKX) as a supported platform.
    • Add Haswell and SandyBridge MSRs to PlatformIO interface.
    • OMPT report region names include instruction offset, now two OpenMP regions within the same function can be distinguished.
    • Add region runtime as default trace column.
    • Simpler column names in trace; print some columns using old names.
    • Change region ID to hex in report and trace.
    • Order regions in report by runtime.
    • Add application total ignore time to report.
    • Replace tabs with spaces for report formatting.
    • Enable PlatformIO to support Epoch based signals.
    • Add power signals to PlatformIO using derivative calculation previously done in Region object.
    • Add PlatformIO aliases for region ID, progress, frequency and energy.
    • Add CombinedSignal class which is used to combine signals from different IOGroups.
    • Allow for a user provided number of experiment iterations (loops) to perform for each geopmanalysis type
    • Enable geopmanalysis to provide more detailed information about the results
    • Allow turbo to be skipped by geopmanalysis when determining the best per-region frequencies.
    • Updates to geopmanalysis python script to bypass trace parsing if requested and in debug plot ignore check for multiple profile names.
    • Use hyphen instead of underscore in geopmanalysis options for consistency with other interfaces.
    • Don't require -n and -N with geopmanalysis when skipping launch.
    • Pass output_dir through to plotter when using geopmanalysis.
    • Changes to analysis.py for SC17 data: multiply energy percent by 100, have frequency sweep plots use frequencies from profile name.
    • Add geopmanalysis option to specify controller launch method.
  • Updated and extended integration tests:

    • Integration tests validated with the GEOPM_AGENT set to test new code path.
    • A few problems with the new code path exposed by integration tests have been added to github issues.
    • A few changes to support integration tests with new code path have been integrated.
    • Change io.py and integration tests: Allow hex numbers for region ID in report, skip extra lines in report.
    • Remove Platform plugin registration.
    • Update EfficientFreqDecider to use new runtime metric for performance.
    • Update EfficientFreqDecider to use PlatformIO directly and remove method from Policy object for adjusting frequency.
  • Updated unit tests:

    • Many unit tests have been added to accompany the new code path which has many new classes.
    • The new classes were specifically designed to enable unit testing poorly covered code that it refactors.
    • Refactor Profile constructor into testable functions.
    • Add unit tests for Profile class.
    • Simple profile class in test directory for testing and debug: enables profiling of the GEOPM runtime itself.
    • More detailed checks of messages in unit tests when exceptions are thrown.
    • Fix test-license to assert that files in MANIFEST.EXEMPT exist.
    • Remove TestPlugin code that is not used by tests.
    • Add make check target to tutorial build.
  • Bug fixes:

    • Update GEOPM runtime C APIs to print to standard error instead of having the controller suppress error messages.
    • Handle exceptions that occur during app/controller handshake.
    • Enable timeout rather than hang if Controller or application fail during execution.
    • Fix for package-scoped MSRs that will write to all CPUs in a package rather than just one.
    • Fix HSX and SKX frequency control MSRs to core domain.
    • Fix issue when running on systems with offline CPUs.
    • Do not report a completed send if policy or sample contains a NAN.
    • Fix lscpu parsing for offline CPUs.
    • Exclude regions with 0 count from report, except unmarked region, which is always 0.
    • Add verbose error message when PluginFactory::dictionary() is called with plugin name that has not been registered.
    • Fix get_alloc_nodes for slurm in geopmpy launcher
    • Fix for test_power_consumption to checks the current platform cpuid to decide power budget.
    • Fix geopmpy.launcher for Intel's mpiexec: does not accept -- as a separator for positional arguments.
    • Fix for when GEOPM_PLUGIN_PATH contains multiple paths.
    • Fix tutorial tarball so that it will build out of place.
    • Fix shared memory issues during start-up when launching the Controller as a separate application.
    • Remove erroneous double split of the Controller's comm; the ppn1 comm is already passed into the constructor.
    • Fix test to use in-memory file system to avoid adding missing msync() calls.
    • Fix resource leak in TreeCommunicator constructor.
    • Fix tracing capability with geopmanalysis.
    • Leave -- separator in list of arguments to avoid parsing command line arguments intended for application as launcher arguments.

Consumption - Computation and Communication - C++
Published by cmcantalupo almost 7 years ago

GEOPM - GEOPM 0.4.0

  • Modified implementations and interfaces:
    • Updated algorithm for choosing CPU affinity in the launcher: fill application CPUs from back to front, and never share physical cores between MPI ranks.
    • Created new abstraction for interfacing with MSRs and more broadly for abstracting hardware IO (PlatformIO, MSRIO, and MSR classes).
    • Application region hints are now properly exposed to the decider.
    • Added geopmanalysis executable to the geopmpy package; this executable runs applications and performs analysis of power and performance based on GEOPM report and trace data.
    • Added geopmbench to the installed binaries; this is simply an installed version of the tutorial_6 executable.
    • Added GEOPM_RM environment variable and --geopm-rm command line option to select geopmpy.launcher's back end resource manager.
    • Updated man pages to include geopmanalysis and geopmbench.
    • Removed handling of SIGCHLD signal in GEOPM runtime (commonly raised in non-error conditions when using popen(3)).
    • Launcher will guess correct number of OpenMP threads if user has not specified.
    • Added warning message at start up if report and trace files will not be created due to permissions issues.
    • Added better error handling to tutorial sources.
    • Added support for geopmctl to be run as a different user than application.
    • Added support for user provided shmkey's that do not begin with '/'.
    • Added error checking in launcher user requests more ranks per node than there are cores per node.
    • Added more robust error checking for command line issues in launcher.
    • Added command line option to launcher to exclude use of hyperthreads: --geopm-disable-hyperthreads.
    • If a plugin fails at registration time, do not bring down the controller; a warning is printed if debug is enabled.
    • Remove -s parameter from geopmctl CLI (was being ignored).
    • Encapsulated use of MPI by GEOPM inside of a class abstraction (IComm), but controller has not been modified to use the new class due to deadlock bug.
    • Encapsulated in a class the handshake interface between the controller and the application across shared memory.
    • General clean up of the geompy.plotter implementation.
    • Added more error checking in Controller.
    • Some fixes for issues exposed by static analysis.
  • Updated features:
    • Added new decider called "simple_freq" that adjusts CPU frequency to save energy with a small impact to performance; name will likely change to "efficient_freq" in the future.
    • Added region runtime reporting to traces and Region objects based on the average execution time of a region by all of the ranks on a node.
    • Added a method to the Region object to give access to the telemetry time stamps to the decider.
    • Added online learning approach to energy efficient frequency decider.
    • Added support to geopmpy.launcher for launching with Intel(R) MPI's mpiexec.
    • Added option to plotter to use all samples or just epoch samples.
    • Modified the tutorials to enable use of the geopmpy launcher.
    • Improved tutorial Makefile to allow user override of GNU Make standard variables.
    • Added an RPM spec file for use with the OpenHPC distribution.
  • Updated and extended integration tests:
    • Moved Controller death test from the unit tests to the integration tests.
    • Added integration tests for pthread an application launch of the controller.
    • Added an isolated hardware test for RAPL power limit functionality.
    • Updated documentation: both man pages and doxygen have been reviewed and cleaned up.
  • Updated unit tests:
    • Added unit test for SubsetOptionParser.
    • Reduced dependence of unit tests on MPI runtime.
    • Removed MPIProfileTest unit test which is covered by integration tests, and not really a unit test.
    • Removed unused MPIControllerTest.
    • Removed MVAPICH2 Fortran tests.
  • Bug fixes:
    • Fixed broken build in tutorials (tutorial_region.c).
    • Fixed faulty argument parsing by the geopmpy launcher.
    • Fixed error reporting when using geopmpy with python 3.x.
    • Fixed issues with affinity when launching the controller as a pthread.
    • Fixed issue in passing power budgets down a multi-level tree.
    • Fixed issue in platform choice when head node architecture differs from the compute nodes.
    • Fixed broken build if --disable-doc configuration option is passed.
    • Fixed decider setup code to correctly propagate power bounds down tree.
    • Fixed the way RAPL time window is set.
    • Fixed the use of cached data by geopmpy.plotter.
    • Fixed integration test issues related to systems with multiple cluster node partitions.
    • Fixed process CPU affinity implementation (don't use hwloc) and added unit tests for this.
    • Fixed potential overflow issue with error messages in PlatformImp.cpp.
    • Fixed race in SharedMemory test.
    • Fixed markup patch for MiniFE.
    • Fixed launcher when user explicitly requests OMP_NUM_THREADS=1.
    • Fixed MPIInterfaceTests so it uses only mocked MPI interfaces, and does not explicitly require MPI.
    • Fixed memory leaks in GlobalPolicy.
    • Fixed linking order of libgeopm and libmpi.
    • Fixed non-performance mode integration test launcher.
    • Fixed issue where libgeopmpolicy had false dependence on OMPT.cpp
    • Fixed rpm Makefile target to avoid the rpmbuild -t option to avoid trying to use the OpenHPC spec file.
    • Fixed issue where platform topology could be determined from nodes other than the ones that run the job.
    • Fixed Intel(R) MPI launcher's use of host files and the --ppn CLI.
    • Fixed incompatibility between MVAPICH2 affinity and srun affinity.
    • Fixed test_progress_exit integration test to account for extrapolation error.
    • Fixed integration test for MPI time accounting.
    • Fixed launcher problem when node is listed in multiple queues by sinfo.
    • Fixed and improved affinity assignment in corner cases.
    • Fixed use of sched_getcpu() for Mac OS X.

Consumption - Computation and Communication - C++
Published by cmcantalupo over 7 years ago

GEOPM - GEOPM Alpha Release

Consumption - Computation and Communication - C++
Published by cmcantalupo over 7 years ago