欢迎关注大数据技术架构与案例微信公众号:过往记忆大数据
过往记忆博客公众号iteblog_hadoop
欢迎关注微信公众号:
过往记忆大数据

如何调试 Apache Gluten 的 C++ 代码

Apache Gluten 是一个开源的高性能向量化执行引擎,旨在提升大数据处理框架(如 Apache Spark)的查询性能。其核心目标是通过优化数据处理的底层执行过程,减少 CPU 和内存开销,从而显著加速复杂分析任务。Gluten 通过集成 Velox(Meta 开源的向量化计算库)作为默认后端,利用列式内存格式和 SIMD 指令实现高效计算,同时兼容 Spark 的原有 API,用户无需修改代码即可无缝迁移。

众所周知,Apache Spark是用Scala和Java语言编写的计算引擎,而Velox则是采用C++编写。正因如此,Apache Gluten在实现过程中必然运用了大量的JNI技术,这就涉及到众多Java调用C++的代码。这些C++代码被打包到Jar文件中的so文件里,在调试Gluten代码时,往往会引发各种各样的问题。本文将着重介绍如何对Gluten的C++代码进行调试,并为大家提供两种实用的方法:GDB和CLion。

准备工作

首先,我们需要以debug模式对Gluten的C++代码进行编译。具体操作如下:

[iteblog@VM cpp]$ ./dev/buildbundle-veloxbe.sh --spark_version=3.5 --build_type=Debug

执行上述命令后,会生成 libvelox.so 以及 libgluten.so 两个重要的库文件。

随后,在本地以local模式启动 spark-sql 客户端,并再次通过 jps 命令查看 spark-sql 客户端程序的进程ID:

[iteblog@VM cpp]$ jps
3892287 Main
3893411 RemoteMavenServer36
3903443 Main
415645 SparkSubmit
421419 Jps

可以看到,spark-sql 客户端的进程ID为415645。此时,我们在 spark-sql 客户端中随意执行一个查询,之后便可以开始调试Gluten的C++代码了。

通过 GDB 调试 Gluten 代码

第一步,通过GDB附加到正在运行的Java进程。在终端输入:

[iteblog@VM cpp]$ gdb -p 415645
Type "apropos word" to search for commands related to "word".
Attaching to process 415645
[New LWP 415703]
[New LWP 415704]
[New LWP 418594]
[New LWP 418634]
[Thread debugging using libthread_db enabled]

第二步,通过 sharedlibrary 查看加载的.so文件路径。输入命令:

(gdb) info sharedlibrary
From                To                  Syms Read   Shared Object Library
0x00007f053bf5f180  0x00007f053bf60ee8  Yes (*)     /lib64/libtdsp_block.so
0x00007f053bd5bee0  0x00007f053bd5d888  Yes         /lib64/libtdsp.so
0x00007f053c387210  0x00007f053c3884f3  Yes (*)     /lib64/libonion.so
0x00007f053bb41840  0x00007f053bb4fd15  Yes (*)     /lib64/libpthread.so.0
0x00007f053b937e70  0x00007f053b938a82  Yes (*)     /lib64/libdl.so.2
0x00007f053b582c40  0x00007f053b6df4bd  Yes (*)     /lib64/libc.so.6
0x00007f053c163080  0x00007f053c187337  Yes         /lib64/ld-linux-x86-64.so.2
0x00007f053a3b1520  0x00007f053a45080a  Yes (*)     /lib64/libm.so.6
0x00007f053a19f430  0x00007f053a1a2770  Yes (*)     /lib64/librt.so.1
0x00007f0539cc8b80  0x00007f0539ccf126  Yes (*)     /lib64/libnss_sss.so.2
0x00007f0539ab67b0  0x00007f0539abcd32  Yes (*)     /lib64/libnss_files.so.2
0x00007f05397dfd90  0x00007f05398749d4  Yes (*)     /lib64/libnss_systemd.so.2
0x00007f05395cc690  0x00007f05395cf72a  Yes (*)     /lib64/libcap.so.2
0x00007f053937eab0  0x00007f05393b5032  Yes (*)     /lib64/libmount.so.1
0x00007f053915ae00  0x00007f053916ba55  Yes (*)     /lib64/libgcc_s.so.1
0x00007f0538f10740  0x00007f0538f420d2  Yes (*)     /lib64/libblkid.so.1
0x00007f0538cfeac0  0x00007f0538d027e1  Yes (*)     /lib64/libuuid.so.1
0x00007f0538ad9a80  0x00007f0538af17bf  Yes (*)     /lib64/libselinux.so.1
0x00007f0538850380  0x00007f05388ab973  Yes (*)     /lib64/libpcre2-8.so.0
0x00007f0538509a50  0x00007f05385eee48  Yes         /lib64/libnss_tjj.so.2
0x00007f05381f6b90  0x00007f05382acae2  Yes (*)     /lib64/libstdc++.so.6
0x00007f04d97fa210  0x00007f04d97fd7a3  Yes (*)     /lib64/libnss_dns.so.2
0x00007f04d95e4d60  0x00007f04d95f116e  Yes (*)     /lib64/libresolv.so.2
0x00007f04d9342520  0x00007f04d93ac124  Yes (*)     /lib64/libnss_myhostname.so.2
0x00007f04ca331700  0x00007f04caee0190  Yes         /tmp/gluten-b373c4c4-8323-4d38-b98f-a4ebd0a1e234/jni/cbe3bc4f-472f-4e63-83f2-2c8397032b09/gluten-7729880650374993107/libgluten.so
0x00007f04d0096c10  0x00007f04d00ada33  Yes (*)     /usr/local/lib64/libglog.so.1
0x00007f04d005b690  0x00007f04d0072a44  Yes (*)     /usr/local/lib64/libgflags.so.2.2
0x00007f031e1e7a90  0x00007f032456f304  Yes         /tmp/gluten-b373c4c4-8323-4d38-b98f-a4ebd0a1e234/jni/cbe3bc4f-472f-4e63-83f2-2c8397032b09/gluten-7729880650374993107/libvelox.so
0x00007f04c979eb80  0x00007f04c97e76d4  Yes (*)     /lib64/libre2.so.0
0x00007f04d8018090  0x00007f04d80183f1  Yes (*)     /usr/local/lib/libboost_context.so.1.84.0
0x00007f04d0022a10  0x00007f04d003599c  Yes (*)     /usr/local/lib/libboost_filesystem.so.1.84.0
0x00007f04c9741be0  0x00007f04c97830a8  Yes (*)     /usr/local/lib/libboost_program_options.so.1.84.0
0x00007f04c96e75b0  0x00007f04c971fdb8  Yes (*)     /usr/local/lib/libboost_regex.so.1.84.0
0x00007f04d8013040  0x00007f04d8013101  Yes (*)     /usr/local/lib/libboost_system.so.1.84.0
0x00007f04c96ca660  0x00007f04c96d8998  Yes (*)     /usr/local/lib/libboost_thread.so.1.84.0
0x00007f04d80090e0  0x00007f04d800a002  Yes (*)     /usr/local/lib/libboost_atomic.so.1.84.0
0x00007f04c94b08d0  0x00007f04c94bbeaf  Yes (*)     /lib64/libdouble-conversion.so.3
0x00007f04c9265a20  0x00007f04c9297eb1  Yes (*)     /lib64/libevent-2.1.so.6
0x00007f04c903f740  0x00007f04c904cb27  Yes (*)     /lib64/libz.so.1
0x00007f04c8dc2670  0x00007f04c8e131ea  Yes (*)     /lib64/libssl.so.1.1
0x00007f04c892b000  0x00007f04c8ad6700  Yes (*)     /lib64/libcrypto.so.1.1
0x00007f04c868b5d0  0x00007f04c86a2516  Yes (*)     /lib64/liblzma.so.5
0x00007f04c866b550  0x00007f04c86826eb  Yes (*)     /lib64/liblz4.so.1
0x00007f04c83c9e70  0x00007f04c845667a  Yes (*)     /lib64/libzstd.so.1
0x00007f04c81725a0  0x00007f04c81ab80e  Yes (*)     /lib64/libdwarf.so.1
0x00007f04c3db4b40  0x00007f04c3dec5f9  Yes (*)     /lib64/libsodium.so.23
0x00007f04c21fe4a0  0x00007f04c21fe559  Yes (*)     /lib64/libicudata.so.60
0x00007f04c1e0cdc0  0x00007f04c1f6945e  Yes (*)     /lib64/libicui18n.so.60
0x00007f04c19d9fc0  0x00007f04c1aab2ea  Yes (*)     /lib64/libicuuc.so.60
0x00007f04c1762660  0x00007f04c1772e7f  Yes (*)     /lib64/libelf.so.1
0x00007f04c154f940  0x00007f04c155c5e6  Yes (*)     /lib64/libbz2.so.1
0x00007f02c42d76f0  0x00007f02c42f5ff8  Yes (*)     /tmp/liblz4-java-6335363052964923188.so

第三步,加载 .so 的符号表(地址从 info sharedlibrary 获取),以调试 velox 的代码为例,输入:

(gdb) add-symbol-file /tmp/gluten-b373c4c4-8323-4d38-b98f-a4ebd0a1e234/jni/cbe3bc4f-472f-4e63-83f2-2c8397032b09/gluten-7729880650374993107/libvelox.so 0x00007f031e1e7a90
add symbol table from file "/tmp/gluten-b373c4c4-8323-4d38-b98f-a4ebd0a1e234/jni/cbe3bc4f-472f-4e63-83f2-2c8397032b09/gluten-7729880650374993107/libvelox.so" at
        .text_addr = 0x7f031e1e7a90
(y or n) y
Reading symbols from /tmp/gluten-b373c4c4-8323-4d38-b98f-a4ebd0a1e234/jni/cbe3bc4f-472f-4e63-83f2-2c8397032b09/gluten-7729880650374993107/libvelox.so...
(gdb) 

第四步,在指定文件和行数添加断点。比如在 velox/dwio/parquet/reader/ParquetReader.cpp 文件的159行加一个断点:

(gdb) break ParquetReader.cpp:159
Breakpoint 1 at 0x7f03232d4cee: file /home/iteblog/data/code/apache/incubator-gluten/ep/build-velox/build/velox_ep/velox/dwio/parquet/reader/ParquetReader.cpp, line 159.
(gdb) 

系统会反馈断点设置成功的信息。

第五步,通过执行 continue 让代码继续执行,当程序运行到 ParquetReader.cpp 的159行时就会停止:

(gdb) continue
Continuing.
[Switching to Thread 0x7f02a38fe700 (LWP 443416)]

Thread 144 "Executor task l" hit Breakpoint 1, facebook::velox::parquet::ReaderBase::ReaderBase (this=0x7f04300064c0, Python Exception <class 'ValueError'> Unsupported implementation for unique_ptr: std::__uniq_ptr_data<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput>, true, true>: 
input=..., options=...) at /home/iteblog/data/code/apache/incubator-gluten/ep/build-velox/build/velox_ep/velox/dwio/parquet/reader/ParquetReader.cpp:159
159       loadFileMetaData();
(gdb)

第六步,代码停止后,我们可以通过 print 查看 ParquetReader.cpp 文件里面变量的值。例如:

(gdb) print fileLength_
$1 = 2215
(gdb)

我们还可以通过 backtrace 命令查看程序的调用堆栈:

(gdb) backtrace
#0  facebook::velox::parquet::ReaderBase::ReaderBase (this=0x7f130c02e0a0, Python Exception <class 'ValueError'> Unsupported implementation for unique_ptr: std::__uniq_ptr_data<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput>, true, true>: 
input=..., options=...) at /home/iteblog/data/code/apache/incubator-gluten/ep/build-velox/build/velox_ep/velox/dwio/parquet/reader/ParquetReader.cpp:159
#1  0x00007f13832e752b in std::_Construct<facebook::velox::parquet::ReaderBase, std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >, facebook::velox::dwio::common::ReaderOptions const&> (__p=0x7f130c02e0a0)
    at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/stl_construct.h:119
#2  0x00007f13832e6c91 in std::allocator_traits<std::allocator<void> >::construct<facebook::velox::parquet::ReaderBase, std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >, facebook::velox::dwio::common::ReaderOptions const&> (
    __p=0x7f130c02e0a0) at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/alloc_traits.h:635
#3  0x00007f13832e5f60 in std::_Sp_counted_ptr_inplace<facebook::velox::parquet::ReaderBase, std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_Sp_counted_ptr_inplace<std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >, facebook::velox::dwio::common::ReaderOptions const&> (this=0x7f130c02e090, __a=...) at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/shared_ptr_base.h:604
#4  0x00007f13832e4ba8 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<facebook::velox::parquet::ReaderBase, std::allocator<void>, std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >, facebook::velox::dwio::common::ReaderOptions const&> (this=0x7f130c04e390, __p=@0x7f130c04e388: 0x0, __a=...) at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/shared_ptr_base.h:971
#5  0x00007f13832e3320 in std::__shared_ptr<facebook::velox::parquet::ReaderBase, (__gnu_cxx::_Lock_policy)2>::__shared_ptr<std::allocator<void>, std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >, facebook::velox::dwio::common::ReaderOptions const&> (this=0x7f130c04e388, __tag=...) at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/shared_ptr_base.h:1712
#6  0x00007f13832e19c1 in std::shared_ptr<facebook::velox::parquet::ReaderBase>::shared_ptr<std::allocator<void>, std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >, facebook::velox::dwio::common::ReaderOptions const&> (
    this=0x7f130c04e388, __tag=...) at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/shared_ptr.h:464
#7  0x00007f13832df857 in std::make_shared<facebook::velox::parquet::ReaderBase, std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >, facebook::velox::dwio::common::ReaderOptions const&> ()
    at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/shared_ptr.h:1010
#8  0x00007f13832d88df in facebook::velox::parquet::ParquetReader::ParquetReader (this=0x7f130c04e380, Python Exception <class 'ValueError'> Unsupported implementation for unique_ptr: std::__uniq_ptr_data<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput>, true, true>: 
input=..., options=...) at /home/iteblog/data/code/apache/incubator-gluten/ep/build-velox/build/velox_ep/velox/dwio/parquet/reader/ParquetReader.cpp:992
#9  0x00007f13832d1a90 in std::make_unique<facebook::velox::parquet::ParquetReader, std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >, facebook::velox::dwio::common::ReaderOptions const&> ()
    at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/unique_ptr.h:1065
#10 0x00007f13832d19c4 in facebook::velox::parquet::ParquetReaderFactory::createReader (this=0x7f15b398cb80, Python Exception <class 'ValueError'> Unsupported implementation for unique_ptr: std::__uniq_ptr_data<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput>, true, true>: 
input=..., options=...) at /home/iteblog/data/code/apache/incubator-gluten/ep/build-velox/build/velox_ep/./velox/dwio/parquet/reader/ParquetReader.h:119
#11 0x00007f137e6b2506 in facebook::velox::connector::hive::SplitReader::createReader (this=0x7f130c005370, metadataFilter=std::shared_ptr<facebook::velox::common::MetadataFilter> (empty) = {...}, rowIndexColumn=std::shared_ptr<facebook::velox::connector::hive::HiveColumnHandle> (empty) = {...})
    at /home/iteblog/data/code/apache/incubator-gluten/ep/build-velox/build/velox_ep/velox/connectors/hive/SplitReader.cpp:261
#12 0x00007f137e6b1c5a in facebook::velox::connector::hive::SplitReader::prepareSplit (this=0x7f130c005370, metadataFilter=std::shared_ptr<facebook::velox::common::MetadataFilter> (empty) = {...}, runtimeStats=..., 
    rowIndexColumn=std::shared_ptr<facebook::velox::connector::hive::HiveColumnHandle> (empty) = {...}) at /home/iteblog/data/code/apache/incubator-gluten/ep/build-velox/build/velox_ep/velox/connectors/hive/SplitReader.cpp:149
#13 0x00007f137e68be8b in facebook::velox::connector::hive::HiveDataSource::addSplit (this=0x7f130c04ec60, split=std::shared_ptr<facebook::velox::connector::ConnectorSplit> (use count 5, weak count 0) = {...})
    at /home/iteblog/data/code/apache/incubator-gluten/ep/build-velox/build/velox_ep/velox/connectors/hive/HiveDataSource.cpp:263
#14 0x00007f13826d26b7 in facebook::velox::exec::TableScan::getOutput (this=0x7f130c042a60) at /home/iteblog/data/code/apache/incubator-gluten/ep/build-velox/build/velox_ep/velox/exec/TableScan.cpp:196
#15 0x00007f13824a0115 in facebook::velox::exec::Driver::runInternal (this=0x7f130c02f850, self=std::shared_ptr<facebook::velox::exec::Driver> (use count 2, weak count 3) = {...}, blockingState=std::shared_ptr<facebook::velox::exec::BlockingState> (empty) = {...}, result=
    std::shared_ptr<facebook::velox::RowVector> (empty) = {...}) at /home/iteblog/data/code/apache/incubator-gluten/ep/build-velox/build/velox_ep/velox/exec/Driver.cpp:637
#16 0x00007f138249e2b9 in facebook::velox::exec::Driver::next (this=0x7f130c02f850, blockingState=std::shared_ptr<facebook::velox::exec::BlockingState> (empty) = {...}) at /home/iteblog/data/code/apache/incubator-gluten/ep/build-velox/build/velox_ep/velox/exec/Driver.cpp:365
#17 0x00007f1382554012 in facebook::velox::exec::Task::next (this=0x7f130c03bde0, future=0x0) at /home/iteblog/data/code/apache/incubator-gluten/ep/build-velox/build/velox_ep/velox/exec/Task.cpp:646
#18 0x00007f137e22cb08 in gluten::WholeStageResultIterator::next (this=0x7f130c03ae80) at /home/iteblog/data/code/apache/incubator-gluten/cpp/velox/compute/WholeStageResultIterator.cc:197
#19 0x00007f155b1c2d95 in gluten::ResultIterator::getNext (this=0x7f130c03a810) at /home/iteblog/data/code/apache/incubator-gluten/cpp/core/compute/ResultIterator.h:83
#20 0x00007f155b1c2bba in gluten::ResultIterator::hasNext (this=0x7f130c03a810) at /home/iteblog/data/code/apache/incubator-gluten/cpp/core/compute/ResultIterator.h:45
#21 0x00007f155b1ba22b in Java_org_apache_gluten_vectorized_ColumnarBatchOutIterator_nativeHasNext (env=0x7f1408003a70, wrapper=0x7f1529bf7050, iterHandle=60129542148) at /home/iteblog/data/code/apache/incubator-gluten/cpp/core/jni/JniWrapper.cc:412
#22 0x00007f159ee353ef in ?? ()
#23 0x00007f1529bf7030 in ?? ()
#24 0x00007f15b4d4aca0 in CallInfo::set_common(KlassHandle, KlassHandle, methodHandle, methodHandle, CallInfo::CallKind, int, Thread*) () from /usr/lib/jvm/TencentKona-8.0.19-422/jre/lib/amd64/server/libjvm.so
#25 0x00007f159ee1c8f0 in ?? ()
#26 0x0000000000000000 in ?? ()

以上就是使用GDB调试Gluten C++代码的基本步骤,通过这些操作,我们能够更深入地了解代码的运行情况,定位并解决潜在的问题。

通过 CLion 调试 Gluten 的代码

虽然 gdb 工具很强大,但是如果能通过 UI 界面进行代码调试的话,对我们来说会更方便。下面我们来介绍一下如何通过 CLion 调试 Gluten 的代码。
首先我们还是通过 jps 查看 spark-sql 客户端的进程 ID,然后到 CLion 里面以此点击菜单 Run -> Attach to process...(或者通过 Ctrl+Alt+F5 快捷键进入)。这时候会弹出以下对话框:

Apache gluten” class=
如果想及时了解Spark、Hadoop或者HBase相关的文章,欢迎关注微信公众号:过往记忆大数据

我们在上面对话框里面输入 spark-sql 客户端的进程 ID,然后点击 Attach with Bundled GDB 按钮。这时候 CLion 的终端将输出一下信息,代表我们连上了 spark-sql 客户端的进程。

Debugger attached to process 415645

我们同样在 ParquetReader.cpp 的 159 行打上断点,然后执行 SQL,这时候代码执行到 159 行时,我们可以看到程序停止到对应的行数了:

Apache gluten” class=
如果想及时了解Spark、Hadoop或者HBase相关的文章,欢迎关注微信公众号:过往记忆大数据

通过上图可以看到 ParquetReader.cpp 159 行调用的堆栈,以及 ParquetReader.cpp 文件里面的变量值。

本博客文章除特别声明,全部都是原创!
原创文章版权归过往记忆大数据(过往记忆)所有,未经许可不得转载。
本文链接: 【如何调试 Apache Gluten 的 C++ 代码】(https://www.iteblog.com/archives/10236.html)
喜欢 (0)
分享 (0)
发表我的评论
取消评论

表情
本博客评论系统带有自动识别垃圾评论功能,请写一些有意义的评论,谢谢!