[profiling] Gather low-level profile data for the various benchmarks.

In #75 we presented profile data for `./python -c pass`, gathered using the linux "perf" tool and formatted as a flame graph.  That information helps identify the code that is likely to have the biggest impact on startup/shutdown when optimized.

While improved performance in startup/shutdown is a benefit to all uses of CPython, the focus of this project extends beyond that to real-world workloads (simulated by benchmarks).  So profile data for our target workloads (benchmarks) would help us make better (data-driven) decisions about where to focus our efforts.

----

### To Do

- [ ] investigate combining multiple runs to stabilize/normalize the profile samples
   - [ ] figure out how to combine multiple perf runs into a single data file
   - [ ] decide if this really helps stabilize results
- [ ] ensure we have *sufficient* tooling
   * (see https://github.com/faster-cpython/tools/tree/main/scripts/run-perf.sh)
   - [ ] ? modify pyperformance to gather profile data (instead of producing pyperf output)
   - [ ] ? script to generate meaningful views of the data
      * there will be many runs so manual effort may not be feasible for timely results
- [ ] run the profiler
- [ ] generate the views
- [ ] upload all the (unique) data to https://github.com/faster-cpython/ideas/tree/main/profile-data
- [ ] post (initial) analysis here

### Profile Data

#### metadata

* ID (e.g. "pass-perf-freq100000-nosite", "bm_genshi-perf-freq1000x10"):
   * workload
   * tool used to produce the data
   * sample frequency
   * number of combined runs (if any)
   * tags (e.g "nosite")
* timestamp
* host
* python commandline
* python build
   * git remote
   * git branch
   * git revision
   * build options
* workload
   * name
   * version?
* profiling
   * tool?
   * sample frequency
   * number of combined runs
* ...

#### views

Profile tools typically produce one or more raw data file.  Such files aren't directly useful for analysis.  However, that raw data can be transformed into more human-friendly formats, AKA "views".

Useful views:

* flamegraph
* flat, sorted by # samples
* call tree view (show aggregate #samples for subtree of each root)
* tree rooted by shared object files
* ...

#### data files

* data files are to be uploaded to https://github.com/faster-cpython/ideas/tree/main/profile-data
* the filename of each data file identifies:
   * ID
   * data kind (e.g. raw, flamegraph)
   * file type (AKA ext, e.g. "svg")
* a set of profile data is identified by having the same ID in the filenames
* every set of data files will include one file with raw data, along with files that provide views

### Open Questions

* upload a metadata file for each set of profile data, in addition to data files?
* are all target workloads sufficiently covered by the available benchmarks?
* remove profiling tool from profile data?
* remove "swapper" (and other non-python "commands") from at least some of the profile data?
* collect file accesses?
* memory profiling?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[profiling] Gather low-level profile data for the various benchmarks. #76

To Do

Profile Data

metadata

views

data files

Open Questions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[profiling] Gather low-level profile data for the various benchmarks. #76

Description

To Do

Profile Data

metadata

views

data files

Open Questions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions