ChatGPT Took My Job! (Part Four)

The Intriguing Case of an Inefficient Byte Swap

Aug 06, 2025

Once we had all our bases covered in Part One (strlen benchmarks), Part Two (reversal benchmarks), and Part Three (assembly inspection), we are ready to dive into what ChatGPT did when confronted about its weird optimization guesses.

Just in case they got better or fixed their issues with user feedback, I have ran the questions again. Here is a summary.

My question was:

Can you please write a C++ function to reverse a C-string?

to which ChatGPT answered:

#include <iostream>
#include <cstring> // For strlen

void reverseCString(char* str) {
    if (str == nullptr) return;

    size_t len = std::strlen(str);
    size_t i = 0;
    size_t j = len - 1;

    while (i < j) {
        std::swap(str[i], str[j]);
        ++i;
        --j;
    }
}

Then I asked it if it could write a more optimized version of this algorithm. The answer was:

void reverseCString(char* str) {
    if (str == nullptr) return;

    char* end = str;
    while (*end) {
        ++end;
    }
    --end; // Move back from null terminator to last character

    // Swap characters from both ends
    while (str < end) {
        std::swap(*str, *end);
        ++str;
        --end;
    }
}

with the explanation:

Yes — we can optimize the C-string reversal function by:
Avoiding the use of strlen(), which scans the entire string to compute the length (O(n)) — we can instead scan once from both ends.
Combining length computation and reversal into a single pass, saving time and improving cache locality.

We saw from the first article that computing the string length the trivial way is very bad for any size that’s not a tiny string - less than 8 bytes. For any other case, the slowdown is greater than 600%! That was the first blunder.

Then we saw in the second article that the indexing method did not matter at all. If we did it using double pointers (this case) or indexing, the results were practically the same.

According to the third article, what really made a huge difference was the compiler finding out exactly what we wanted and simplifying the loop, which we only got to do with Clang 12+, never with GCC.

So to the point I’m making, ChatGPT’s second optimization was completely misguided and based on decades old assumptions, not taking in consideration the huge advancements that took place on compiler optimizations over the past 20 years due to advances in Data Flow Analysis, in particular SSA (Static Single Assignment).

Among the seminal papers from this era, we can cite:

Kildall, G. A. (1973). "A Unified Approach to Global Program Optimization." laid the groundwork by introducing iterative data flow analysis and its application to optimization.
Kam, J. B., & Ullman, J. D. (1976). "Global Data Flow Analysis and Iterative Algorithms." provided a rigorous mathematical framework, generalizing the approach to cover a wide range of problems.
Kam, J. B., & Ullman, J. D. (1977). "Monotone Data Flow Analysis Frameworks." established a general theory of monotone frameworks, which became the standard for designing data flow-based optimizations in compilers.
Hecht, M. S., & Ullman, J. D. (1977). "A Simple Algorithm for Global Data Flow Analysis Problems." made the algorithms practical for compiler implementation.
Cytron, R., Ferrante, J., Rosen, B. K., Wegman, M. N., & Zadeck, F. K. (1989). "An Efficient Method of Computing Static Single Assignment Form." extended data flow analysis by introducing SSA, which became a critical tool for modern compilers due to its ability to simplify and enhance data flow-based optimizations.

At this point I asked ChatGPT to compare its own string function with the CEO’s XOR-based string reversal algorithm, repeated below

char* reverseCStringMine(char *string_to_reverse) {
    char *pointer1, *pointer2;
    int length = strlen(string_to_reverse);
    for (pointer1 = string_to_reverse,
         pointer2 = string_to_reverse + length - 1;
         pointer2 > pointer1; ++pointer1, --pointer2) {
        *pointer1 ^= *pointer2;
        *pointer2 ^= *pointer1;
        *pointer1 ^= *pointer2;
    }
    return string_to_reverse;
}

To which ChatGPT answered:

Pros:
Compact and clever: Uses bitwise XOR to swap characters without a temp variable.
Reverses the string in place.
Cons:
Two passes: One to compute the length (strlen is O(n)), another to reverse (O(n/2) swaps).
Bitwise XOR swapping is harder for CPUs to optimize than a simple std::swap. Modern compilers will often turn std::swap(a, b) into optimal instructions.
XOR swap can cause undefined behavior or data corruption if both pointers point to the same memory (e.g., if the string has repeated characters at midpoint).
Readability is lower, and some compilers may not optimize this as efficiently as expected.

That was a much better answer than the original one - ChatGPT agreed that the XOR version was much better. Maybe it is learning?

Still, it misses the point with the one pass/two passes debacle as it does not matter at all to this algorithm.

Its point about std::swap is only slightly true as we saw in article three that the swap method matters only in the direct use of the swap function. Both modern compilers (GCC 10+ and Clang 12+) were able to figure out perfectly what both the XOR and the trivial temp-swap functions were trying to do.

Then I confronted ChatGPT with

Your last point is incorrect. strlen from glibc is highly optimized and uses SIMD instructions to rapidly scan through the string and find its length.

And ChatGPT’s answer was

You're correct: the strlen() function in glibc (and most modern C standard libraries) is highly optimized, often using:
SIMD instructions (e.g., SSE2, AVX2)
Word-aligned scanning
Zero-byte detection with bit tricks (like has_zero())
When Your Version Might Win
Your version can actually be faster when:
The string is long enough for strlen to fully leverage SIMD,
And the XOR operations are faster due to tight inner loop and no extra instructions.

Oh it was doing so well! It recognized the error but right after added that “the XOR operations are faster”. Bummer. And that was right after it had (correctly) stated that “Bitwise XOR swapping is harder for CPUs to optimize”.

And with that I close this series of four articles with a short one. Were three introductory articles needed to the present one? Absolutely! For some other writers, staying at rhetoric level stating that “A is better than B” and citing some obscure rule while not actually producing any assembly code or running any benchmarks, that would be acceptable. But It tend to avoid that, not only because it is pedant but also because I have been wrong many times with respect to compiler optimization in the past. Technology, both hardware technology and software technology, change much faster than we can cope with, with our busy agendas.

I sincerely hope that this walkthrough had some usefulness for you out there in the jungle trying to survive your day job and climb the mythical corporate ladder.

ChatGPT Took My Job! (Part Four)

The Intriguing Case of an Inefficient Byte Swap

When Your Version Might Win

Discussion about this post