# Profile-guided optimization

`rustc` supports doing profile-guided optimization (PGO).
This chapter describes what PGO is and how the support for it is
implemented in `rustc`.

## What is profiled-guided optimization?

The basic concept of PGO is to collect data about the typical execution of
a program (e.g. which branches it is likely to take) and then use this data
to inform optimizations such as inlining, machine-code layout,
register allocation, etc.

There are different ways of collecting data about a program's execution.
One is to run the program inside a profiler (such as `perf`) and another
is to create an instrumented binary, that is, a binary that has data
collection built into it, and run that.
The latter usually provides more accurate data.

## How is PGO implemented in `rustc`?

`rustc` current PGO implementation relies entirely on LLVM.
LLVM actually [supports multiple forms][clang-pgo] of PGO:

[clang-pgo]: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization

- Sampling-based PGO where an external profiling tool like `perf` is used
  to collect data about a program's execution.
- GCOV-based profiling, where code coverage infrastructure is used to collect
  profiling information.
- Front-end based instrumentation, where the compiler front-end (e.g. Clang)
  inserts instrumentation intrinsics into the LLVM IR it generates (but see the
  [^note-instrument-coverage]"Note").
- IR-level instrumentation, where LLVM inserts the instrumentation intrinsics
  itself during optimization passes.

`rustc` supports only the last approach, IR-level instrumentation, mainly
because it is almost exclusively implemented in LLVM and needs little
maintenance on the Rust side. Fortunately, it is also the most modern approach,
yielding the best results.

So, we are dealing with an instrumentation-based approach, i.e. profiling data
is generated by a specially instrumented version of the program that's being
optimized. Instrumentation-based PGO has two components: a compile-time
component and run-time component, and one needs to understand the overall
workflow to see how they interact.

[^note-instrument-coverage]: Note: `rustc` now supports front-end-based coverage
instrumentation, via the experimental option
[`-C instrument-coverage`](./llvm-coverage-instrumentation.md), but using these
coverage results for PGO has not been attempted at this time.

### Overall workflow

Generating a PGO-optimized program involves the following four steps:

1. Compile the program with instrumentation enabled (e.g. `rustc -C profile-generate main.rs`)
2. Run the instrumented program (e.g. `./main`) which generates a `default-<id>.profraw` file
3. Convert the `.profraw` file into a `.profdata` file using LLVM's `llvm-profdata` tool.
4. Compile the program again, this time making use of the profiling data
   (e.g. `rustc -C profile-use=merged.profdata main.rs`)

### Compile-time aspects

Depending on which step in the above workflow we are in, two different things
can happen at compile time:

#### Create binaries with instrumentation

As mentioned above, the profiling instrumentation is added by LLVM.
`rustc` instructs LLVM to do so [by setting the appropriate][pgo-gen-passmanager]
flags when creating LLVM `PassManager`s:

```C
	// `PMBR` is an `LLVMPassManagerBuilderRef`
    unwrap(PMBR)->EnablePGOInstrGen = true;
    // Instrumented binaries have a default output path for the `.profraw` file
    // hard-coded into them:
    unwrap(PMBR)->PGOInstrGen = PGOGenPath;
```

`rustc` also has to make sure that some of the symbols from LLVM's profiling
runtime are not removed [by marking the with the right export level][pgo-gen-symbols].

[pgo-gen-passmanager]: https://github.com/rust-lang/rust/blob/1.34.1/src/rustllvm/PassWrapper.cpp#L412-L416
[pgo-gen-symbols]:https://github.com/rust-lang/rust/blob/1.34.1/src/librustc_codegen_ssa/back/symbol_export.rs#L212-L225


#### Compile binaries where optimizations make use of profiling data

In the final step of the workflow described above, the program is compiled
again, with the compiler using the gathered profiling data in order to drive
optimization decisions. `rustc` again leaves most of the work to LLVM here,
basically [just telling][pgo-use-passmanager] the LLVM `PassManagerBuilder`
where the profiling data can be found:

```C
	unwrap(PMBR)->PGOInstrUse = PGOUsePath;
```

[pgo-use-passmanager]: https://github.com/rust-lang/rust/blob/1.34.1/src/rustllvm/PassWrapper.cpp#L417-L420

LLVM does the rest (e.g. setting branch weights, marking functions with
`cold` or `inlinehint`, etc).


### Runtime aspects

Instrumentation-based approaches always also have a runtime component, i.e.
once we have an instrumented program, that program needs to be run in order
to generate profiling data, and collecting and persisting this profiling
data needs some infrastructure in place.

In the case of LLVM, these runtime components are implemented in
[compiler-rt][compiler-rt-profile] and statically linked into any instrumented
binaries.
The `rustc` version of this can be found in `library/profiler_builtins` which
basically packs the C code from `compiler-rt` into a Rust crate.

In order for `profiler_builtins` to be built, `profiler = true` must be set
in `rustc`'s `bootstrap.toml`.

[compiler-rt-profile]: https://github.com/llvm/llvm-project/tree/main/compiler-rt/lib/profile

## Testing PGO

Since the PGO workflow spans multiple compiler invocations most testing happens
in [run-make tests][rmake-tests] (the relevant tests have `pgo` in their name).
There is also a [codegen test][codegen-test] that checks that some expected
instrumentation artifacts show up in LLVM IR.

[rmake-tests]: https://github.com/rust-lang/rust/tree/master/tests/run-make
[codegen-test]: https://github.com/rust-lang/rust/blob/master/tests/codegen-llvm/pgo-instrumentation.rs

## Additional information

Clang's documentation contains a good overview on [PGO in LLVM][llvm-pgo].

[llvm-pgo]: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
