Skip to content

[profiling] Gather low-level profile data for the various benchmarks. #76

Closed
@ericsnowcurrently

Description

@ericsnowcurrently

In #75 we presented profile data for ./python -c pass, gathered using the linux "perf" tool and formatted as a flame graph. That information helps identify the code that is likely to have the biggest impact on startup/shutdown when optimized.

While improved performance in startup/shutdown is a benefit to all uses of CPython, the focus of this project extends beyond that to real-world workloads (simulated by benchmarks). So profile data for our target workloads (benchmarks) would help us make better (data-driven) decisions about where to focus our efforts.


To Do

Profile Data

metadata

  • ID (e.g. "pass-perf-freq100000-nosite", "bm_genshi-perf-freq1000x10"):
    • workload
    • tool used to produce the data
    • sample frequency
    • number of combined runs (if any)
    • tags (e.g "nosite")
  • timestamp
  • host
  • python commandline
  • python build
    • git remote
    • git branch
    • git revision
    • build options
  • workload
    • name
    • version?
  • profiling
    • tool?
    • sample frequency
    • number of combined runs
  • ...

views

Profile tools typically produce one or more raw data file. Such files aren't directly useful for analysis. However, that raw data can be transformed into more human-friendly formats, AKA "views".

Useful views:

  • flamegraph
  • flat, sorted by # samples
  • call tree view (show aggregate #samples for subtree of each root)
  • tree rooted by shared object files
  • ...

data files

  • data files are to be uploaded to https://github.com/faster-cpython/ideas/tree/main/profile-data
  • the filename of each data file identifies:
    • ID
    • data kind (e.g. raw, flamegraph)
    • file type (AKA ext, e.g. "svg")
  • a set of profile data is identified by having the same ID in the filenames
  • every set of data files will include one file with raw data, along with files that provide views

Open Questions

  • upload a metadata file for each set of profile data, in addition to data files?
  • are all target workloads sufficiently covered by the available benchmarks?
  • remove profiling tool from profile data?
  • remove "swapper" (and other non-python "commands") from at least some of the profile data?
  • collect file accesses?
  • memory profiling?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions