Size: 2016
Comment: converted to 1.6 markup
|
Size: 7515
Comment: join vs concat - another prog.
|
Deletions are marked like this. | Additions are marked like this. |
Line 10: | Line 10: |
Line 24: | Line 24: |
Line 63: | Line 63: |
---- I tend to create a stream of strings in a loop that should be concatenated. I generated the script to test the join vs += performance for some randomly generated data and found that for 100,000 strings of length up to ten characters, join is maybe 20% faster than using +=. It certainly was not an order of magnitude faster. The results tended to vary each time through the outer loop, even though I attempted to control the garbage collection and ensured my Windows XP machine was 95% idle apart from running the script. {{{#!python from time import time import random, gc ''' Check speed of string concatenation vs joining in different versions of Python Python 2.6.1 (r261:67517, Dec 4 2008, 16:51:00) [MSC v.1500 32 bit (Intel)] on win32 Type "copyright", "credits" or "license()" for more information. **************************************************************** Personal firewall software may warn about the connection IDLE makes to its subprocess using this computer's internal loopback interface. This connection is not visible on any external interface and no data is sent to or received from the Internet. **************************************************************** IDLE 2.6.1 >>> ================================ RESTART ================================ >>> jointime = 0.063 concattime = 0.0780001 join/concat = 80.77% jointime = 0.062 concattime = 0.0780001 join/concat = 79.49% jointime = 0.063 concattime = 0.0780001 join/concat = 80.77% jointime = 0.062 concattime = 0.0780001 join/concat = 79.49% jointime = 0.062 concattime = 0.0780001 join/concat = 79.49% jointime = 0.063 concattime = 0.0780001 join/concat = 80.77% jointime = 0.062 concattime = 0.0779998 join/concat = 79.49% jointime = 0.062 concattime = 0.0780001 join/concat = 79.49% jointime = 0.062 concattime = 0.0940001 join/concat = 65.96% jointime = 0.0469999 concattime = 0.0780001 join/concat = 60.26% Python 2.5.3 (r253:67855, Dec 19 2008, 16:58:30) [MSC v.1310 32 bit (Intel)] on win32 Type "copyright", "credits" or "license()" for more information. **************************************************************** Personal firewall software may warn about the connection IDLE makes to its subprocess using this computer's internal loopback interface. This connection is not visible on any external interface and no data is sent to or received from the Internet. **************************************************************** IDLE 1.2.3 >>> ================================ RESTART ================================ >>> jointime = 0.0779998 concattime = 0.063 join/concat = 123.81% jointime = 0.0780001 concattime = 0.0780001 join/concat = 100.00% jointime = 0.109 concattime = 0.0939999 join/concat = 115.96% jointime = 0.0780001 concattime = 0.062 join/concat = 125.81% jointime = 0.063 concattime = 0.0780001 join/concat = 80.77% jointime = 0.0780001 concattime = 0.0779998 join/concat = 100.00% jointime = 0.063 concattime = 0.0780001 join/concat = 80.77% jointime = 0.062 concattime = 0.172 join/concat = 36.05% jointime = 0.079 concattime = 0.0779998 join/concat = 101.28% jointime = 0.063 concattime = 0.0780001 join/concat = 80.77% >>> PythonWin 2.4.3 - Enthought Edition 1.0.0 (#69, Aug 2 2006, 12:09:59) [MSC v.1310 32 bit (Intel)] on win32. Portions Copyright 1994-2004 Mark Hammond (mhammond@skippinet.com.au) - see 'Help/About PythonWin' for further copyright information. >>> jointime = 0.062 concattime = 0.0940001 join/concat = 65.96% jointime = 0.0929999 concattime = 0.0780001 join/concat = 119.23% jointime = 0.063 concattime = 0.0780001 join/concat = 80.77% jointime = 0.062 concattime = 0.0780001 join/concat = 79.49% jointime = 0.062 concattime = 0.062 join/concat = 100.00% jointime = 0.063 concattime = 0.0780001 join/concat = 80.77% jointime = 0.0780001 concattime = 0.0780001 join/concat = 100.00% jointime = 0.0940001 concattime = 0.0940001 join/concat = 100.00% jointime = 0.063 concattime = 0.0780001 join/concat = 80.77% jointime = 0.0780001 concattime = 0.0780001 join/concat = 100.00% ''' def stringstotest(n=100000, rmin=0, rmax=10): ' Returns a list of random strings of between rmin to rmax characters in length' allchars = 'qwertyuiopasdfghjklzxcvbnm' allchars += allchars.upper() return [ "".join( random.choice(allchars) for i in xrange(random.randint(rmin, rmax)) ) for j in xrange(n) ] strings = stringstotest() for i in xrange(10): gc.collect() gc.disable() # JOIN t0 = time() l = [] # list to "".join() for string in strings: l.append(string) joined = "".join(l) jointime = time() - t0 gc.enable() del l, joined gc.collect() gc.disable() # CONCATENATION t0 = time() s = '' # string to +=, concatenate for string in strings: s += string concattime = time() - t0 del s print " jointime = %10g concattime = %10g join/concat = %6.2f%%" % ( jointime, concattime, jointime/float(concattime)*100 ) gc.enable() }}} -- Paddy3118 2009-01-01 09:48:00 |
Counter to the PythonSpeed/PerformanceTips, on python 2.4 the following string concatenation is almost twice as fast:
as:
On the win32 Python 2.4 I'm seeing the join sample above complete in less than half the time of the concatenating sample.
- -db
Usually the join() is located outside the loop, that code makes this extremely hard though (becuase of the self-referencing of the generated string). But that situation is not the norm. -- JürgenHermann 2005-08-01 06:07:51
Are you guys kidding? The whole page is contrieved. Correct implementation of "join" is:
from time import time t = time() s = 'lksdajflakjdsflku09uweoir' r = [s] for x in range(40): r.append(s[len(s)/2:]) s = "".join(r) print 'duration:', time()-t
which gives on PythonWin 2.4 (#60, Nov 30 2004, 09:34:21) [MSC v.1310 32 bit (Intel)] on win32 execution times:
1st duration: 54.4060001373 Last duration: 0.0160000324249
-- -- MikeRovner 2005-08-02 10:19:06
Mike, that code generates a very different (and much shorter) s. Note how the original code takes the half of the preconcatenated s, making the size grow exponentially (which generates megabytes of data). -- JürgenHermann 2005-08-30 18:44:05
-- -- DavidFord 2005-10-18 10:19:06 A few notes (your mileage may vary - this is a 4Mb file being stripped of unprintable characters)
- Regex replacement rather than creating a list and joining it is 2.5x faster than the tooling above
This is far slower than the equivalent Java code (around 4x slower) using String.charAt() and StringBuffers
I tend to create a stream of strings in a loop that should be concatenated. I generated the script to test the join vs += performance for some randomly generated data and found that for 100,000 strings of length up to ten characters, join is maybe 20% faster than using +=. It certainly was not an order of magnitude faster. The results tended to vary each time through the outer loop, even though I attempted to control the garbage collection and ensured my Windows XP machine was 95% idle apart from running the script.
1 from time import time
2 import random, gc
3
4 '''
5 Check speed of string concatenation vs joining in different versions of Python
6
7
8
9 Python 2.6.1 (r261:67517, Dec 4 2008, 16:51:00) [MSC v.1500 32 bit (Intel)] on win32
10 Type "copyright", "credits" or "license()" for more information.
11
12 ****************************************************************
13 Personal firewall software may warn about the connection IDLE
14 makes to its subprocess using this computer's internal loopback
15 interface. This connection is not visible on any external
16 interface and no data is sent to or received from the Internet.
17 ****************************************************************
18
19 IDLE 2.6.1
20 >>> ================================ RESTART ================================
21 >>>
22 jointime = 0.063 concattime = 0.0780001 join/concat = 80.77%
23 jointime = 0.062 concattime = 0.0780001 join/concat = 79.49%
24 jointime = 0.063 concattime = 0.0780001 join/concat = 80.77%
25 jointime = 0.062 concattime = 0.0780001 join/concat = 79.49%
26 jointime = 0.062 concattime = 0.0780001 join/concat = 79.49%
27 jointime = 0.063 concattime = 0.0780001 join/concat = 80.77%
28 jointime = 0.062 concattime = 0.0779998 join/concat = 79.49%
29 jointime = 0.062 concattime = 0.0780001 join/concat = 79.49%
30 jointime = 0.062 concattime = 0.0940001 join/concat = 65.96%
31 jointime = 0.0469999 concattime = 0.0780001 join/concat = 60.26%
32
33
34
35 Python 2.5.3 (r253:67855, Dec 19 2008, 16:58:30) [MSC v.1310 32 bit (Intel)] on win32
36 Type "copyright", "credits" or "license()" for more information.
37
38 ****************************************************************
39 Personal firewall software may warn about the connection IDLE
40 makes to its subprocess using this computer's internal loopback
41 interface. This connection is not visible on any external
42 interface and no data is sent to or received from the Internet.
43 ****************************************************************
44
45 IDLE 1.2.3
46 >>> ================================ RESTART ================================
47 >>>
48 jointime = 0.0779998 concattime = 0.063 join/concat = 123.81%
49 jointime = 0.0780001 concattime = 0.0780001 join/concat = 100.00%
50 jointime = 0.109 concattime = 0.0939999 join/concat = 115.96%
51 jointime = 0.0780001 concattime = 0.062 join/concat = 125.81%
52 jointime = 0.063 concattime = 0.0780001 join/concat = 80.77%
53 jointime = 0.0780001 concattime = 0.0779998 join/concat = 100.00%
54 jointime = 0.063 concattime = 0.0780001 join/concat = 80.77%
55 jointime = 0.062 concattime = 0.172 join/concat = 36.05%
56 jointime = 0.079 concattime = 0.0779998 join/concat = 101.28%
57 jointime = 0.063 concattime = 0.0780001 join/concat = 80.77%
58 >>>
59
60
61 PythonWin 2.4.3 - Enthought Edition 1.0.0 (#69, Aug 2 2006, 12:09:59) [MSC v.1310 32 bit (Intel)] on win32.
62 Portions Copyright 1994-2004 Mark Hammond (mhammond@skippinet.com.au) - see 'Help/About PythonWin' for further copyright information.
63 >>> jointime = 0.062 concattime = 0.0940001 join/concat = 65.96%
64 jointime = 0.0929999 concattime = 0.0780001 join/concat = 119.23%
65 jointime = 0.063 concattime = 0.0780001 join/concat = 80.77%
66 jointime = 0.062 concattime = 0.0780001 join/concat = 79.49%
67 jointime = 0.062 concattime = 0.062 join/concat = 100.00%
68 jointime = 0.063 concattime = 0.0780001 join/concat = 80.77%
69 jointime = 0.0780001 concattime = 0.0780001 join/concat = 100.00%
70 jointime = 0.0940001 concattime = 0.0940001 join/concat = 100.00%
71 jointime = 0.063 concattime = 0.0780001 join/concat = 80.77%
72 jointime = 0.0780001 concattime = 0.0780001 join/concat = 100.00%
73
74
75 '''
76
77 def stringstotest(n=100000, rmin=0, rmax=10):
78 ' Returns a list of random strings of between rmin to rmax characters in length'
79 allchars = 'qwertyuiopasdfghjklzxcvbnm'
80 allchars += allchars.upper()
81
82 return [ "".join( random.choice(allchars)
83 for i in xrange(random.randint(rmin, rmax)) )
84 for j in xrange(n) ]
85
86 strings = stringstotest()
87
88 for i in xrange(10):
89 gc.collect()
90 gc.disable()
91 # JOIN
92 t0 = time()
93 l = [] # list to "".join()
94 for string in strings:
95 l.append(string)
96 joined = "".join(l)
97 jointime = time() - t0
98
99 gc.enable()
100 del l, joined
101 gc.collect()
102 gc.disable()
103
104 # CONCATENATION
105 t0 = time()
106 s = '' # string to +=, concatenate
107 for string in strings:
108 s += string
109 concattime = time() - t0
110
111 del s
112
113 print " jointime = %10g concattime = %10g join/concat = %6.2f%%" % (
114 jointime, concattime, jointime/float(concattime)*100 )
115
116 gc.enable()
-- Paddy3118 2009-01-01 09:48:00