Apache Gluten 是一个开源的高性能向量化执行引擎,旨在提升大数据处理框架(如 Apache Spark)的查询性能。其核心目标是通过优化数据处理的底层执行过程,减少 CPU 和内存开销,从而显著加速复杂分析任务。Gluten 通过集成 Velox(Meta 开源的向量化计算库)作为默认后端,利用列式内存格式和 SIMD 指令实现高效计算,同时兼容 Spark 的原有 API,用户无需修改代码即可无缝迁移。
众所周知,Apache Spark是用Scala和Java语言编写的计算引擎,而Velox则是采用C++编写。正因如此,Apache Gluten在实现过程中必然运用了大量的JNI技术,这就涉及到众多Java调用C++的代码。这些C++代码被打包到Jar文件中的so文件里,在调试Gluten代码时,往往会引发各种各样的问题。本文将着重介绍如何对Gluten的C++代码进行调试,并为大家提供两种实用的方法:GDB和CLion。
准备工作
首先,我们需要以debug模式对Gluten的C++代码进行编译。具体操作如下:
[iteblog@VM cpp]$ ./dev/buildbundle-veloxbe.sh --spark_version=3.5 --build_type=Debug
执行上述命令后,会生成 libvelox.so 以及 libgluten.so 两个重要的库文件。
随后,在本地以local模式启动 spark-sql 客户端,并再次通过 jps 命令查看 spark-sql 客户端程序的进程ID:
[iteblog@VM cpp]$ jps 3892287 Main 3893411 RemoteMavenServer36 3903443 Main 415645 SparkSubmit 421419 Jps
可以看到,spark-sql 客户端的进程ID为415645。此时,我们在 spark-sql 客户端中随意执行一个查询,之后便可以开始调试Gluten的C++代码了。
通过 GDB 调试 Gluten 代码
第一步,通过GDB附加到正在运行的Java进程。在终端输入:
[iteblog@VM cpp]$ gdb -p 415645 Type "apropos word" to search for commands related to "word". Attaching to process 415645 [New LWP 415703] [New LWP 415704] [New LWP 418594] [New LWP 418634] [Thread debugging using libthread_db enabled]
第二步,通过 sharedlibrary 查看加载的.so文件路径。输入命令:
(gdb) info sharedlibrary From To Syms Read Shared Object Library 0x00007f053bf5f180 0x00007f053bf60ee8 Yes (*) /lib64/libtdsp_block.so 0x00007f053bd5bee0 0x00007f053bd5d888 Yes /lib64/libtdsp.so 0x00007f053c387210 0x00007f053c3884f3 Yes (*) /lib64/libonion.so 0x00007f053bb41840 0x00007f053bb4fd15 Yes (*) /lib64/libpthread.so.0 0x00007f053b937e70 0x00007f053b938a82 Yes (*) /lib64/libdl.so.2 0x00007f053b582c40 0x00007f053b6df4bd Yes (*) /lib64/libc.so.6 0x00007f053c163080 0x00007f053c187337 Yes /lib64/ld-linux-x86-64.so.2 0x00007f053a3b1520 0x00007f053a45080a Yes (*) /lib64/libm.so.6 0x00007f053a19f430 0x00007f053a1a2770 Yes (*) /lib64/librt.so.1 0x00007f0539cc8b80 0x00007f0539ccf126 Yes (*) /lib64/libnss_sss.so.2 0x00007f0539ab67b0 0x00007f0539abcd32 Yes (*) /lib64/libnss_files.so.2 0x00007f05397dfd90 0x00007f05398749d4 Yes (*) /lib64/libnss_systemd.so.2 0x00007f05395cc690 0x00007f05395cf72a Yes (*) /lib64/libcap.so.2 0x00007f053937eab0 0x00007f05393b5032 Yes (*) /lib64/libmount.so.1 0x00007f053915ae00 0x00007f053916ba55 Yes (*) /lib64/libgcc_s.so.1 0x00007f0538f10740 0x00007f0538f420d2 Yes (*) /lib64/libblkid.so.1 0x00007f0538cfeac0 0x00007f0538d027e1 Yes (*) /lib64/libuuid.so.1 0x00007f0538ad9a80 0x00007f0538af17bf Yes (*) /lib64/libselinux.so.1 0x00007f0538850380 0x00007f05388ab973 Yes (*) /lib64/libpcre2-8.so.0 0x00007f0538509a50 0x00007f05385eee48 Yes /lib64/libnss_tjj.so.2 0x00007f05381f6b90 0x00007f05382acae2 Yes (*) /lib64/libstdc++.so.6 0x00007f04d97fa210 0x00007f04d97fd7a3 Yes (*) /lib64/libnss_dns.so.2 0x00007f04d95e4d60 0x00007f04d95f116e Yes (*) /lib64/libresolv.so.2 0x00007f04d9342520 0x00007f04d93ac124 Yes (*) /lib64/libnss_myhostname.so.2 0x00007f04ca331700 0x00007f04caee0190 Yes /tmp/gluten-b373c4c4-8323-4d38-b98f-a4ebd0a1e234/jni/cbe3bc4f-472f-4e63-83f2-2c8397032b09/gluten-7729880650374993107/libgluten.so 0x00007f04d0096c10 0x00007f04d00ada33 Yes (*) /usr/local/lib64/libglog.so.1 0x00007f04d005b690 0x00007f04d0072a44 Yes (*) /usr/local/lib64/libgflags.so.2.2 0x00007f031e1e7a90 0x00007f032456f304 Yes /tmp/gluten-b373c4c4-8323-4d38-b98f-a4ebd0a1e234/jni/cbe3bc4f-472f-4e63-83f2-2c8397032b09/gluten-7729880650374993107/libvelox.so 0x00007f04c979eb80 0x00007f04c97e76d4 Yes (*) /lib64/libre2.so.0 0x00007f04d8018090 0x00007f04d80183f1 Yes (*) /usr/local/lib/libboost_context.so.1.84.0 0x00007f04d0022a10 0x00007f04d003599c Yes (*) /usr/local/lib/libboost_filesystem.so.1.84.0 0x00007f04c9741be0 0x00007f04c97830a8 Yes (*) /usr/local/lib/libboost_program_options.so.1.84.0 0x00007f04c96e75b0 0x00007f04c971fdb8 Yes (*) /usr/local/lib/libboost_regex.so.1.84.0 0x00007f04d8013040 0x00007f04d8013101 Yes (*) /usr/local/lib/libboost_system.so.1.84.0 0x00007f04c96ca660 0x00007f04c96d8998 Yes (*) /usr/local/lib/libboost_thread.so.1.84.0 0x00007f04d80090e0 0x00007f04d800a002 Yes (*) /usr/local/lib/libboost_atomic.so.1.84.0 0x00007f04c94b08d0 0x00007f04c94bbeaf Yes (*) /lib64/libdouble-conversion.so.3 0x00007f04c9265a20 0x00007f04c9297eb1 Yes (*) /lib64/libevent-2.1.so.6 0x00007f04c903f740 0x00007f04c904cb27 Yes (*) /lib64/libz.so.1 0x00007f04c8dc2670 0x00007f04c8e131ea Yes (*) /lib64/libssl.so.1.1 0x00007f04c892b000 0x00007f04c8ad6700 Yes (*) /lib64/libcrypto.so.1.1 0x00007f04c868b5d0 0x00007f04c86a2516 Yes (*) /lib64/liblzma.so.5 0x00007f04c866b550 0x00007f04c86826eb Yes (*) /lib64/liblz4.so.1 0x00007f04c83c9e70 0x00007f04c845667a Yes (*) /lib64/libzstd.so.1 0x00007f04c81725a0 0x00007f04c81ab80e Yes (*) /lib64/libdwarf.so.1 0x00007f04c3db4b40 0x00007f04c3dec5f9 Yes (*) /lib64/libsodium.so.23 0x00007f04c21fe4a0 0x00007f04c21fe559 Yes (*) /lib64/libicudata.so.60 0x00007f04c1e0cdc0 0x00007f04c1f6945e Yes (*) /lib64/libicui18n.so.60 0x00007f04c19d9fc0 0x00007f04c1aab2ea Yes (*) /lib64/libicuuc.so.60 0x00007f04c1762660 0x00007f04c1772e7f Yes (*) /lib64/libelf.so.1 0x00007f04c154f940 0x00007f04c155c5e6 Yes (*) /lib64/libbz2.so.1 0x00007f02c42d76f0 0x00007f02c42f5ff8 Yes (*) /tmp/liblz4-java-6335363052964923188.so
第三步,加载 .so 的符号表(地址从 info sharedlibrary 获取),以调试 velox 的代码为例,输入:
(gdb) add-symbol-file /tmp/gluten-b373c4c4-8323-4d38-b98f-a4ebd0a1e234/jni/cbe3bc4f-472f-4e63-83f2-2c8397032b09/gluten-7729880650374993107/libvelox.so 0x00007f031e1e7a90 add symbol table from file "/tmp/gluten-b373c4c4-8323-4d38-b98f-a4ebd0a1e234/jni/cbe3bc4f-472f-4e63-83f2-2c8397032b09/gluten-7729880650374993107/libvelox.so" at .text_addr = 0x7f031e1e7a90 (y or n) y Reading symbols from /tmp/gluten-b373c4c4-8323-4d38-b98f-a4ebd0a1e234/jni/cbe3bc4f-472f-4e63-83f2-2c8397032b09/gluten-7729880650374993107/libvelox.so... (gdb)
第四步,在指定文件和行数添加断点。比如在 velox/dwio/parquet/reader/ParquetReader.cpp 文件的159行加一个断点:
(gdb) break ParquetReader.cpp:159 Breakpoint 1 at 0x7f03232d4cee: file /home/iteblog/data/code/apache/incubator-gluten/ep/build-velox/build/velox_ep/velox/dwio/parquet/reader/ParquetReader.cpp, line 159. (gdb)
系统会反馈断点设置成功的信息。
第五步,通过执行 continue 让代码继续执行,当程序运行到 ParquetReader.cpp 的159行时就会停止:
(gdb) continue Continuing. [Switching to Thread 0x7f02a38fe700 (LWP 443416)] Thread 144 "Executor task l" hit Breakpoint 1, facebook::velox::parquet::ReaderBase::ReaderBase (this=0x7f04300064c0, Python Exception <class 'ValueError'> Unsupported implementation for unique_ptr: std::__uniq_ptr_data<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput>, true, true>: input=..., options=...) at /home/iteblog/data/code/apache/incubator-gluten/ep/build-velox/build/velox_ep/velox/dwio/parquet/reader/ParquetReader.cpp:159 159 loadFileMetaData(); (gdb)
第六步,代码停止后,我们可以通过 print 查看 ParquetReader.cpp 文件里面变量的值。例如:
(gdb) print fileLength_ $1 = 2215 (gdb)
我们还可以通过 backtrace 命令查看程序的调用堆栈:
(gdb) backtrace #0 facebook::velox::parquet::ReaderBase::ReaderBase (this=0x7f130c02e0a0, Python Exception <class 'ValueError'> Unsupported implementation for unique_ptr: std::__uniq_ptr_data<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput>, true, true>: input=..., options=...) at /home/iteblog/data/code/apache/incubator-gluten/ep/build-velox/build/velox_ep/velox/dwio/parquet/reader/ParquetReader.cpp:159 #1 0x00007f13832e752b in std::_Construct<facebook::velox::parquet::ReaderBase, std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >, facebook::velox::dwio::common::ReaderOptions const&> (__p=0x7f130c02e0a0) at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/stl_construct.h:119 #2 0x00007f13832e6c91 in std::allocator_traits<std::allocator<void> >::construct<facebook::velox::parquet::ReaderBase, std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >, facebook::velox::dwio::common::ReaderOptions const&> ( __p=0x7f130c02e0a0) at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/alloc_traits.h:635 #3 0x00007f13832e5f60 in std::_Sp_counted_ptr_inplace<facebook::velox::parquet::ReaderBase, std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_Sp_counted_ptr_inplace<std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >, facebook::velox::dwio::common::ReaderOptions const&> (this=0x7f130c02e090, __a=...) at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/shared_ptr_base.h:604 #4 0x00007f13832e4ba8 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<facebook::velox::parquet::ReaderBase, std::allocator<void>, std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >, facebook::velox::dwio::common::ReaderOptions const&> (this=0x7f130c04e390, __p=@0x7f130c04e388: 0x0, __a=...) at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/shared_ptr_base.h:971 #5 0x00007f13832e3320 in std::__shared_ptr<facebook::velox::parquet::ReaderBase, (__gnu_cxx::_Lock_policy)2>::__shared_ptr<std::allocator<void>, std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >, facebook::velox::dwio::common::ReaderOptions const&> (this=0x7f130c04e388, __tag=...) at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/shared_ptr_base.h:1712 #6 0x00007f13832e19c1 in std::shared_ptr<facebook::velox::parquet::ReaderBase>::shared_ptr<std::allocator<void>, std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >, facebook::velox::dwio::common::ReaderOptions const&> ( this=0x7f130c04e388, __tag=...) at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/shared_ptr.h:464 #7 0x00007f13832df857 in std::make_shared<facebook::velox::parquet::ReaderBase, std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >, facebook::velox::dwio::common::ReaderOptions const&> () at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/shared_ptr.h:1010 #8 0x00007f13832d88df in facebook::velox::parquet::ParquetReader::ParquetReader (this=0x7f130c04e380, Python Exception <class 'ValueError'> Unsupported implementation for unique_ptr: std::__uniq_ptr_data<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput>, true, true>: input=..., options=...) at /home/iteblog/data/code/apache/incubator-gluten/ep/build-velox/build/velox_ep/velox/dwio/parquet/reader/ParquetReader.cpp:992 #9 0x00007f13832d1a90 in std::make_unique<facebook::velox::parquet::ParquetReader, std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >, facebook::velox::dwio::common::ReaderOptions const&> () at /opt/rh/gcc-toolset-12/root/usr/include/c++/12/bits/unique_ptr.h:1065 #10 0x00007f13832d19c4 in facebook::velox::parquet::ParquetReaderFactory::createReader (this=0x7f15b398cb80, Python Exception <class 'ValueError'> Unsupported implementation for unique_ptr: std::__uniq_ptr_data<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput>, true, true>: input=..., options=...) at /home/iteblog/data/code/apache/incubator-gluten/ep/build-velox/build/velox_ep/./velox/dwio/parquet/reader/ParquetReader.h:119 #11 0x00007f137e6b2506 in facebook::velox::connector::hive::SplitReader::createReader (this=0x7f130c005370, metadataFilter=std::shared_ptr<facebook::velox::common::MetadataFilter> (empty) = {...}, rowIndexColumn=std::shared_ptr<facebook::velox::connector::hive::HiveColumnHandle> (empty) = {...}) at /home/iteblog/data/code/apache/incubator-gluten/ep/build-velox/build/velox_ep/velox/connectors/hive/SplitReader.cpp:261 #12 0x00007f137e6b1c5a in facebook::velox::connector::hive::SplitReader::prepareSplit (this=0x7f130c005370, metadataFilter=std::shared_ptr<facebook::velox::common::MetadataFilter> (empty) = {...}, runtimeStats=..., rowIndexColumn=std::shared_ptr<facebook::velox::connector::hive::HiveColumnHandle> (empty) = {...}) at /home/iteblog/data/code/apache/incubator-gluten/ep/build-velox/build/velox_ep/velox/connectors/hive/SplitReader.cpp:149 #13 0x00007f137e68be8b in facebook::velox::connector::hive::HiveDataSource::addSplit (this=0x7f130c04ec60, split=std::shared_ptr<facebook::velox::connector::ConnectorSplit> (use count 5, weak count 0) = {...}) at /home/iteblog/data/code/apache/incubator-gluten/ep/build-velox/build/velox_ep/velox/connectors/hive/HiveDataSource.cpp:263 #14 0x00007f13826d26b7 in facebook::velox::exec::TableScan::getOutput (this=0x7f130c042a60) at /home/iteblog/data/code/apache/incubator-gluten/ep/build-velox/build/velox_ep/velox/exec/TableScan.cpp:196 #15 0x00007f13824a0115 in facebook::velox::exec::Driver::runInternal (this=0x7f130c02f850, self=std::shared_ptr<facebook::velox::exec::Driver> (use count 2, weak count 3) = {...}, blockingState=std::shared_ptr<facebook::velox::exec::BlockingState> (empty) = {...}, result= std::shared_ptr<facebook::velox::RowVector> (empty) = {...}) at /home/iteblog/data/code/apache/incubator-gluten/ep/build-velox/build/velox_ep/velox/exec/Driver.cpp:637 #16 0x00007f138249e2b9 in facebook::velox::exec::Driver::next (this=0x7f130c02f850, blockingState=std::shared_ptr<facebook::velox::exec::BlockingState> (empty) = {...}) at /home/iteblog/data/code/apache/incubator-gluten/ep/build-velox/build/velox_ep/velox/exec/Driver.cpp:365 #17 0x00007f1382554012 in facebook::velox::exec::Task::next (this=0x7f130c03bde0, future=0x0) at /home/iteblog/data/code/apache/incubator-gluten/ep/build-velox/build/velox_ep/velox/exec/Task.cpp:646 #18 0x00007f137e22cb08 in gluten::WholeStageResultIterator::next (this=0x7f130c03ae80) at /home/iteblog/data/code/apache/incubator-gluten/cpp/velox/compute/WholeStageResultIterator.cc:197 #19 0x00007f155b1c2d95 in gluten::ResultIterator::getNext (this=0x7f130c03a810) at /home/iteblog/data/code/apache/incubator-gluten/cpp/core/compute/ResultIterator.h:83 #20 0x00007f155b1c2bba in gluten::ResultIterator::hasNext (this=0x7f130c03a810) at /home/iteblog/data/code/apache/incubator-gluten/cpp/core/compute/ResultIterator.h:45 #21 0x00007f155b1ba22b in Java_org_apache_gluten_vectorized_ColumnarBatchOutIterator_nativeHasNext (env=0x7f1408003a70, wrapper=0x7f1529bf7050, iterHandle=60129542148) at /home/iteblog/data/code/apache/incubator-gluten/cpp/core/jni/JniWrapper.cc:412 #22 0x00007f159ee353ef in ?? () #23 0x00007f1529bf7030 in ?? () #24 0x00007f15b4d4aca0 in CallInfo::set_common(KlassHandle, KlassHandle, methodHandle, methodHandle, CallInfo::CallKind, int, Thread*) () from /usr/lib/jvm/TencentKona-8.0.19-422/jre/lib/amd64/server/libjvm.so #25 0x00007f159ee1c8f0 in ?? () #26 0x0000000000000000 in ?? ()
以上就是使用GDB调试Gluten C++代码的基本步骤,通过这些操作,我们能够更深入地了解代码的运行情况,定位并解决潜在的问题。
通过 CLion 调试 Gluten 的代码
虽然 gdb 工具很强大,但是如果能通过 UI 界面进行代码调试的话,对我们来说会更方便。下面我们来介绍一下如何通过 CLion 调试 Gluten 的代码。
首先我们还是通过 jps 查看 spark-sql 客户端的进程 ID,然后到 CLion 里面以此点击菜单 Run -> Attach to process...(或者通过 Ctrl+Alt+F5 快捷键进入)。这时候会弹出以下对话框:
我们在上面对话框里面输入 spark-sql 客户端的进程 ID,然后点击 Attach with Bundled GDB 按钮。这时候 CLion 的终端将输出一下信息,代表我们连上了 spark-sql 客户端的进程。
Debugger attached to process 415645
我们同样在 ParquetReader.cpp 的 159 行打上断点,然后执行 SQL,这时候代码执行到 159 行时,我们可以看到程序停止到对应的行数了:
通过上图可以看到 ParquetReader.cpp 159 行调用的堆栈,以及 ParquetReader.cpp 文件里面的变量值。
本博客文章除特别声明,全部都是原创!原创文章版权归过往记忆大数据(过往记忆)所有,未经许可不得转载。
本文链接: 【如何调试 Apache Gluten 的 C++ 代码】(https://www.iteblog.com/archives/10236.html)