VuGen String Comparison Behaviour

Anyone who works with VuGen should know that they should compare strings using the standard C function strcmp(), rather than the equality operator (==).

In the example below, there are three string variables that each contain “hello world”. Comparing the strings using strcmp() shows that all the strings are the same, but comparing them using “==” gives TRUE for string1==string2, but FALSE for string1==string3.

I leave this as a challenge to the reader to explain this behaviour (please leave a comment below).

Here is the example code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
extern char* strtok(char *token, const char *delimiter); 
 
Action()
{
  char* string1 = "hello world";
  char* string2 = "hello world";
  char buf[12];
  char* string3;
 
  strcat(buf, "hello ");
  string3 = (char*)strcat(buf, "world");
  lr_output_message("string3: %s", string3);
 
  // Compare two identical strings using strcmp
  if (strcmp(string1, string2) == 0) {
    lr_output_message("string1 and string2 match using strcmp");
  } else {
    lr_output_message("string1 and string2 do not match using strcmp");
  }
 
  // Compare two identical strings using "=="
  if (string1 == string2) {
    lr_output_message("string1 and string2 match using \"==\"");
  } else {
    lr_output_message("string1 and string2 do not match using \"==\"");
  }
 
  // Compare two identical strings using strcmp
  if (strcmp(string1, string3) == 0) {
    lr_output_message("string1 and string3 match using strcmp");
  } else {
    lr_output_message("string1 and string3 do not match using strcmp");
  }
 
  // Compare two identical strings using "=="
  if (string1 == string3) {
    lr_output_message("string1 and string3 match using \"==\"");
  } else {
    lr_output_message("string1 and string3 do not match using \"==\"");
  }
 
  return 0;
}

Here is the output from the code:

1
2
3
4
5
6
7
8
9
10
11
Running Vuser...
Starting iteration 1.
Starting action Action.
Action.c(12): string3: hello world
Action.c(18): string1 and string2 match using strcmp
Action.c(25): string1 and string2 match using "=="
Action.c(32): string1 and string3 match using strcmp
Action.c(41): string1 and string3 do not match using "=="
Ending action Action.
Ending iteration 1.
Ending Vuser...

6 comments

thanks stuart, very well explained.

-MS

Stuart Moncrieff

DING DING DING. We have a winner!

Doug’s comment is spot-on.

The == operator is comparing the memory address of the first character of the string, while strcmp compares the strings character by character.

Here is some code to demonstrate:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Action()
{
  char* string1 = "hello world";
  char* string2 = "hello world";
  char buf[12];
  char* string3;
 
  strcat(buf, "hello ");
  string3 = (char*)strcat(buf, "world");
  lr_output_message("string3: %s", string3);
 
  // Print memory addresses
  lr_output_message("string1 memory address: %u", string1);
  lr_output_message("string2 memory address: %u", string2);
  lr_output_message("string3 memory address: %u", string3);
}

Here is the output from the code:

1
2
3
4
5
6
7
Starting action Action.
Action.c(12): string3: hello world
Action.c(15): string1 memory address: 28181591
Action.c(16): string2 memory address: 28181591
Action.c(17): string3 memory address: 29622316
Ending action Action.
Ending iteration 1.

string1 and string2 have the same memory address because of a compiler optimisation. The compiler can see that two character arrays have been created that are identical. It saves memory by only creating one character array in memory, and putting a reference to the first element of the array in string1 and string2.

In Java, you would use the term “string pool”. More generally, the concept is known as “string interning“, and is common to many languages/compilers.

Doug Sayer-Jones

Hi,

The reason that string1==string2 is true is a product of compiler optimisation and nothing else.

The two “strings” are defined as char arrays, and the compiler will notice that they are in fact the same string. Because of the initialisation at definition the compiler will therefore create only one array in memory and will place the memory address of the start of the array in to both char pointers (string1 & string2).

In fact early C compilers did not do this optimisation and would have blindly created two arrays and put the appropriate starting memory addresses into each char pointer, and in this case string1==string2 would return false.

The reason string1==string3 is false is that the buf array is uninitialised at creation time so the compiler cannot create the buffer at the same memory location as that used to hold the “hello world” array.

The reason they never become true is that C does not have the concept of singleton string literals (or strings at all of course) that some modern languages do. So even after making the content of the buf array contain the same values, this does not move it in memory. So the start of the buf array (what is held in the pointer string3) will never be the same as the start of the array created with “hello world” in it.

Also please be aware that this should be ANSI-C not C++ so there is no concept of “objects”, a char array is just a piece of memory holding char values, and initialised with ascii zero (null) as the final character of the array in the static array between quotes so that the standard C string library functions work.

Right, string “hello ” is actually ‘hello ‘ and when its concatenated to “world” the overall string becomes “hello world”. That’s why strcmp doesn’t work. If you replace the existing code with strncat(buf, “hello “,6); in line 10, it should work.

[Stuart’s Reply: Nope. The strings are clearly identical when using strcmp, but using == shows inequality. Why?]

== operator is used to compare primitive data types like int,char and double. It can also be used to compare two objects,but it will only check to see if they are the same objects, not if they hold the same contents.

so,

string1 == string2 –> checks only whether they are same objects.(string1 and string2 are same objects)
string1 == string3 –> Here both are different objects as string 3 is a buffer.

[Stuart’s Reply: Nice try, but the == operator does not compare types]

I think it’s because you have allocated more space to buf (12) and so when this is used to marry up with string3 it actually has more space in reserve although you can’t see it. By using the == it is able to tell the difference.

Alternatively, the == is somehow able to see the difference between how the two strings have been put together.

The big question is “Am I correct?”.

[Stuart’s Reply: No, you are not correct. :) ]

Leave a Reply