Why would you ever store a reference to a Perl string?

| | TrackBacks (0)
To use less memory! You may think that

my $x = 'foo'; my $y = $x;

would store the string 'foo' in one place, and then when $y gets changed, use the new string. It doesn't. To see this, we can use Devel::Peek from the perl debugger. We start it up with

perl -de0

which I have aliased to p in vim, because yes, I am that lazy and I loves me some debugger. Once we are in the debugger you have to use the module. Note that the debugger output is bolded.

 DB<1> use Devel::Peek

Then define a variable

 DB<2> $x = 'foo'

PROTIP: You can't use my $x in the interactive debugger because each line of input to the debugger gets wrapped in it's own lexical scope, a lexical variable (i.e one defined with my) will not exist outside of this inner scope.

Now we use the "x" debugger command, which prints out the result of executing some Perl code. We give it the code Dump $x, which calls Devel::Peek's Dump method on $x, which prints out low-level information about the internal properites of a variable.

 DB<3> x Dump $x
SV = PV(0x9542ca4) at 0x9565924
 PV = 0x95731b8 "foo"\0
 CUR = 3
 LEN = 4
 empty array

SV is Perl-internals-speak for "scalar value" and the PV means that it is a string. The reference count is 1, which means only one thing, itself, points to itself. When a variable goes out of scope, the REFCNT goes to 0 and the Perl garbage collector recycles it. The FLAGS containing POK tells us that is a valid string. Take note that the PV line shows us the memory address that the string is stored at, as well as it's value and that it is null-terminated.

Now let's set a new variable equal to $x.

 DB<4> $y = $x

Now let's take a look at it:

 DB<5> x Dump $y
SV = PV(0x9542c2c) at 0x958ab70
 PV = 0x958b968 "foo"\0
 CUR = 3
 LEN = 4
 empty array

The PV line is the most interesting. If you compare it to above, you will see that it is a different memory address! Not just the overhead of each Perl variable is being stored, but the string 'foo' is being stored in two different places!

Now imagine that our string is a few hundred megabytes, and you are manipulating it. You are really going to notice having a few extra copies of it around, aren't you! If you know that you have a lot of data structures that have the same string in them, use a reference to save memory. If you are running up against "perl Out of memory!", this could be a trick that gets you out of that bind without buying more RAM.

Let's check out the memory savings with Devel::Size, a nice module which can tell you how much memory your Perl variables are using.
DB<1> use Devel::Size qw/size total_size/              

Next we create a longish string.

DB<2> $x = 'i like eating memory' x 1000                                

We can use the size() function to see how much memory it is using:

DB<3> x size($x)                                                                                           
0  20036

Now let us dump a hash which has this string repeated as two values. We use the total_size() function, which follows references:

DB<4> x total_size( { a => "foo", b => $x, c => $x } )                             
0  40253

Now we do the same, but we store references to the strings:

DB<5> x total_size( { a => "foo", b => \$x, c => \$x } )                    
0  20249

Roughly half the size!

And that is why you would use references to Perl strings. Why else would you?

0 TrackBacks

Listed below are links to blogs that reference this entry: Why would you ever store a reference to a Perl string?.

TrackBack URL for this entry: http://leto.net/mt/mt-tb.cgi/152

About this Entry

This page contains a single entry by Jonathan Leto published on June 5, 2009 8:12 PM.

Test::More eating your memory? was the previous entry in this blog.

The Perl Foundation GSoC2009 Roundup is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

Clicky Web Analytics 42