_2025-06-03_15:08:38_ | 2025-06-03 15:08:38 | BOLT_ 链接后优化技术简介 - 知乎 | 原文链接失效了?试试备份 | TAGs:高性能 | Summary: BOLT is a Facebook-developed optimization tool that enhances performance by optimizing code layout in binary files. It is a post-link optimizer that uses sampling-based profile information to enhance the performance of binary files that have already undergone feedback-driven optimization (FDO) and link-time optimization (LTO). BOLT is particularly beneficial for data center applications, where improving code locality is crucial for performance optimization. Unlike FDO, which can be complex and resource-intensive due to its instrumentation-based approach, BOLT uses sampling-based PGO and reduces the complexity and resource requirements. It also operates on the binary level, allowing for simpler optimization compared to PGO's machine code generation before the optimization phase and LTO. By optimizing code layout, BOLT can improve the performance of GCC and Clang compilers by up to 20.4%. It can also be used to optimize third-party libraries without their source code. BOLT is now part of the LLVM repository and provides official documentation for its use. BOLT is essential for data center applications as it improves code locality, which is crucial for performance optimization. FDO, also known as profile-guided optimization, can be complex and resource-intensive due to its instrumentation-based approach. BOLT uses sampling-based PGO, which reduces complexity and resource requirements while also improving accuracy and matching. It operates on the binary level, allowing for simpler optimization and higher accuracy compared to PGO's machine code generation before the optimization phase and LTO. BOLT can skip functions it cannot handle, such as complex, unsupported functions or those without a clear control-flow graph. It may also increase code size due to the addition of jump execution instructions for cold paths, which can be effective in improving memory usage. BOLT uses LLVM to handle reverse engineering, construct control-flow graphs, modify binary files, and analyze function and internal symbol references. It also uses profile information for data flow analysis. Besides optimizing code layout, BOLT has other passes, such as the strip-rep-ret pass, ICF, ICP, simplify-ro-loads, reorder-bbs, and reorder-functions. These passes can optimize code layout, improve I-TLB and I-cache performance, and simplify code. However, BOLT has some limitations, such as its inability to effectively utilize distributed compile systems and the need for significant resources to optimize large code segments. To address these limitations, there are parallel optimization tools like Lightning BOLT and Propeller, which can perform optimization in parallel and optimize entire programs based on profile information.BOLT 是 Facebook 开发的优化工具,它通过优化二进制文件中的代码布局来提高性能。它是一个链接后优化器,它使用基于采样的配置文件信息来增强已经经过反馈驱动优化 (FDO) 和链接时优化 (LTO) 的二进制文件的性能。BOLT 对数据中心应用程序特别有益,在这些应用程序中,改进代码局部性对于性能优化至关重要。与 FDO 不同,FDO 由于其基于仪器的方法而可能非常复杂且需要大量资源,而 BOLT 使用基于采样的 PGO 并降低了复杂性和资源要求。它还在二进制级别运行,与优化阶段和 LTO 之前的 PGO 机器代码生成相比,优化更简单。通过优化代码布局,BOLT 可以将 GCC 和 Clang 编译器的性能提高多达 20.4%。它还可用于优化第三方库,而无需其源代码。BOLT 现在是 LLVM 存储库的一部分,并提供其使用的官方文档。BOLT 对于数据中心应用程序至关重要,因为它可以提高代码局部性,这对于性能优化至关重要。FDO 也称为按配置优化,由于其基于插桩的方法,它可能很复杂且需要大量资源。BOLT 使用基于采样的 PGO,这降低了复杂性和资源需求,同时还提高了准确性和匹配性。它在二进制级别运行,与 PGO 在优化阶段和 LTO 之前生成机器代码相比,优化更简单,精度更高。BOLT 可以跳过它无法处理的函数,例如复杂、不受支持的函数或没有清晰控制流图的函数。 由于为冷路径添加了跳转执行指令,它还可能增加代码大小,这可以有效提高内存使用率。BOLT 使用 LLVM 处理逆向工程、构建控制流图、修改二进制文件以及分析函数和内部符号引用。它还使用用户档案信息进行数据流分析。除了优化代码布局外,BOLT 还有其他传递,例如 strip-rep-ret 传递、ICF、ICP、simplify-ro-loads、reorder-bbs 和 reorder-functions。这些通道可以优化代码布局,提高 I-TLB 和 I-cache 性能,并简化代码。但是,BOLT 有一些局限性,例如它无法有效利用分布式编译系统,并且需要大量资源来优化大型代码段。为了解决这些限制,Lightning BOLT 和 Propeller 等并行优化工具可以并行执行优化并根据配置文件信息优化整个程序。 | |
|
|
|
|
|
|
|
_2025-06-03_16:07:18_ | 2025-06-03 16:07:18 | llvm-project_bolt at main · llvm_llvm-project | 原文链接失效了?试试备份 | TAGs:高性能 | Summary: This text describes GitHub's BOLT project, a post-link optimizer designed to speed up applications by optimizing code layout based on execution profile gathered by sampling profilers. The project is compatible with X86-64 and AArch64 ELF binaries, and requires unstripped symbol tables and relocations for maximum performance gains. BOLT disassembles functions and reconstructs the control flow graph before optimizations, relying on heuristics to accomplish this task. The project is heavily based on LLVM libraries and can be built manually or using a docker image. Users can improve BOLT's performance by linking against memory allocation libraries with good concurrency support. The text also provides instructions on how to use BOLT with different types of executables and services. BOLT is licensed under the Apache License v2.0 with LLVM Exceptions.本文介绍了 GitHub 的 BOLT 项目,这是一个链接后优化器,旨在通过根据采样分析器收集的执行配置文件优化代码布局来加速应用程序。该项目与 X86-64 和 AArch64 ELF 二进制文件兼容,并且需要未剥离的符号表和重定位,以实现最大的性能增益。BOLT 在优化之前反汇编函数并重建控制流图,依靠启发式来完成此任务。该项目在很大程度上基于 LLVM 库,可以手动构建或使用 docker 镜像构建。用户可以通过链接具有良好并发支持的内存分配库来提高 BOLT 的性能。该文本还提供了有关如何将 BOLT 与不同类型的可执行文件和服务一起使用的说明。BOLT 根据 Apache 许可证 v2.0 获得许可,但存在 LLVM 例外。 | |
|
_2025-05-30_16:09:15_ | 2025-05-30 16:09:15 | openEuler sysboost 助力数据库性能优化技术内幕 | 原文链接失效了?试试备份 | TAGs:高性能 | Summary: openEuler, a Chinese community, introduced sysboost performance optimization technology in version 22.03 LTS to enhance database performance with easy-to-use and generalizable solutions. This article explains the basic implementation principles of sysboost, which optimizes startup processes while reducing memory usage and provides automatic feedback to improve performance in specific scenarios. The technology has shown a 16.68% improvement in MySQL's TPCC scenario and has also boosted performance in nginx, memcached, and other non-database applications. | |
|
|