1PERLTODO(1) Perl Programmers Reference Guide PERLTODO(1)
2
3
4
6 perltodo - Perl TO-DO List
7
9 This is a list of wishes for Perl. The most up to date version of this
10 file is at
11 http://perl5.git.perl.org/perl.git/blob_plain/HEAD:/pod/perltodo.pod
12
13 The tasks we think are smaller or easier are listed first. Anyone is
14 welcome to work on any of these, but it's a good idea to first contact
15 perl5-porters@perl.org to avoid duplication of effort, and to learn
16 from any previous attempts. By all means contact a pumpking privately
17 first if you prefer.
18
19 Whilst patches to make the list shorter are most welcome, ideas to add
20 to the list are also encouraged. Check the perl5-porters archives for
21 past ideas, and any discussion about them. One set of archives may be
22 found at:
23
24 http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/
25
26 What can we offer you in return? Fame, fortune, and everlasting glory?
27 Maybe not, but if your patch is incorporated, then we'll add your name
28 to the AUTHORS file, which ships in the official distribution. How many
29 other programming languages offer you 1 line of immortality?
30
32 Improve Porting/cmpVERSION.pl to work from git tags
33 See Porting/release_managers_guide.pod for a bit more detail.
34
35 Migrate t/ from custom TAP generation
36 Many tests below t/ still generate TAP by "hand", rather than using
37 library functions. As explained in "Writing a test" in perlhack, tests
38 in t/ are written in a particular way to test that more complex
39 constructions actually work before using them routinely. Hence they
40 don't use "Test::More", but instead there is an intentionally simpler
41 library, t/test.pl. However, quite a few tests in t/ have not been
42 refactored to use it. Refactoring any of these tests, one at a time, is
43 a useful thing TODO.
44
45 The subdirectories base, cmd and comp, that contain the most basic
46 tests, should be excluded from this task.
47
48 Test that regen.pl was run
49 There are various generated files shipped with the perl distribution,
50 for things like header files generate from data. The generation scripts
51 are written in perl, and all can be run by regen.pl. However, because
52 they're written in perl, we can't run them before we've built perl. We
53 can't run them as part of the Makefile, because changing files
54 underneath make confuses it completely, and we don't want to run them
55 automatically anyway, as they change files shipped by the distribution,
56 something we seek not do to.
57
58 If someone changes the data, but forgets to re-run regen.pl then the
59 generated files are out of sync. It would be good to have a test in
60 t/porting that checks that the generated files are in sync, and fails
61 otherwise, to alert someone before they make a poor commit. I suspect
62 that this would require adapting the scripts run from regen.pl to have
63 dry-run options, and invoking them with these, or by refactoring them
64 into a library that does the generation, which can be called by the
65 scripts, and by the test.
66
67 Automate perldelta generation
68 The perldelta file accompanying each release summaries the major
69 changes. It's mostly manually generated currently, but some of that
70 could be automated with a bit of perl, specifically the generation of
71
72 Modules and Pragmata
73 New Documentation
74 New Tests
75
76 See Porting/how_to_write_a_perldelta.pod for details.
77
78 Remove duplication of test setup.
79 Schwern notes, that there's duplication of code - lots and lots of
80 tests have some variation on the big block of $Is_Foo checks. We can
81 safely put this into a file, change it to build an %Is hash and require
82 it. Maybe just put it into test.pl. Throw in the handy tainting
83 subroutines.
84
85 POD -> HTML conversion in the core still sucks
86 Which is crazy given just how simple POD purports to be, and how simple
87 HTML can be. It's not actually as simple as it sounds, particularly
88 with the flexibility POD allows for "=item", but it would be good to
89 improve the visual appeal of the HTML generated, and to avoid it having
90 any validation errors. See also "make HTML install work", as the layout
91 of installation tree is needed to improve the cross-linking.
92
93 The addition of "Pod::Simple" and its related modules may make this
94 task easier to complete.
95
96 Make ExtUtils::ParseXS use strict;
97 lib/ExtUtils/ParseXS.pm contains this line
98
99 # use strict; # One of these days...
100
101 Simply uncomment it, and fix all the resulting issues :-)
102
103 The more practical approach, to break the task down into manageable
104 chunks, is to work your way though the code from bottom to top, or if
105 necessary adding extra "{ ... }" blocks, and turning on strict within
106 them.
107
108 Make Schwern poorer
109 We should have tests for everything. When all the core's modules are
110 tested, Schwern has promised to donate to $500 to TPF. We may need
111 volunteers to hold him upside down and shake vigorously in order to
112 actually extract the cash.
113
114 Improve the coverage of the core tests
115 Use Devel::Cover to ascertain the core modules' test coverage, then add
116 tests that are currently missing.
117
118 test B
119 A full test suite for the B module would be nice.
120
121 A decent benchmark
122 "perlbench" seems impervious to any recent changes made to the perl
123 core. It would be useful to have a reasonable general benchmarking
124 suite that roughly represented what current perl programs do, and
125 measurably reported whether tweaks to the core improve, degrade or
126 don't really affect performance, to guide people attempting to optimise
127 the guts of perl. Gisle would welcome new tests for perlbench.
128
129 fix tainting bugs
130 Fix the bugs revealed by running the test suite with the "-t" switch
131 (via "make test.taintwarn").
132
133 Dual life everything
134 As part of the "dists" plan, anything that doesn't belong in the
135 smallest perl distribution needs to be dual lifed. Anything else can be
136 too. Figure out what changes would be needed to package that module and
137 its tests up for CPAN, and do so. Test it with older perl releases, and
138 fix the problems you find.
139
140 To make a minimal perl distribution, it's useful to look at
141 t/lib/commonsense.t.
142
143 Move dual-life pod/*.PL into ext
144 Nearly all the dual-life modules have been moved to ext. However, we
145 still need to move pod/*.PL into their respective directories in ext/.
146 They're referenced by (at least) "plextract" in Makefile.SH and "utils"
147 in win32/Makefile and win32/makefile.ml, and listed explicitly in
148 win32/pod.mak, vms/descrip_mms.template and utils.lst
149
150 POSIX memory footprint
151 Ilya observed that use POSIX; eats memory like there's no tomorrow, and
152 at various times worked to cut it down. There is probably still fat to
153 cut out - for example POSIX passes Exporter some very memory hungry
154 data structures.
155
156 embed.pl/makedef.pl
157 There is a script embed.pl that generates several header files to
158 prefix all of Perl's symbols in a consistent way, to provide some
159 semblance of namespace support in "C". Functions are declared in
160 embed.fnc, variables in interpvar.h. Quite a few of the functions and
161 variables are conditionally declared there, using "#ifdef". However,
162 embed.pl doesn't understand the C macros, so the rules about which
163 symbols are present when is duplicated in makedef.pl. Writing things
164 twice is bad, m'kay. It would be good to teach "embed.pl" to
165 understand the conditional compilation, and hence remove the
166 duplication, and the mistakes it has caused.
167
168 use strict; and AutoLoad
169 Currently if you write
170
171 package Whack;
172 use AutoLoader 'AUTOLOAD';
173 use strict;
174 1;
175 __END__
176 sub bloop {
177 print join (' ', No, strict, here), "!\n";
178 }
179
180 then "use strict;" isn't in force within the autoloaded subroutines. It
181 would be more consistent (and less surprising) to arrange for all
182 lexical pragmas in force at the __END__ block to be in force within
183 each autoloaded subroutine.
184
185 There's a similar problem with SelfLoader.
186
187 profile installman
188 The installman script is slow. All it is doing text processing, which
189 we're told is something Perl is good at. So it would be nice to know
190 what it is doing that is taking so much CPU, and where possible address
191 it.
192
193 enable lexical enabling/disabling of inidvidual warnings
194 Currently, warnings can only be enabled or disabled by category. There
195 are times when it would be useful to quash a single warning, not a
196 whole category.
197
199 Or if you prefer, tasks that you would learn from, and broaden your
200 skills base...
201
202 make HTML install work
203 There is an "installhtml" target in the Makefile. It's marked as
204 "experimental". It would be good to get this tested, make it work
205 reliably, and remove the "experimental" tag. This would include
206
207 1. Checking that cross linking between various parts of the
208 documentation works. In particular that links work between the
209 modules (files with POD in lib/) and the core documentation (files
210 in pod/)
211
212 2. Work out how to split "perlfunc" into chunks, preferably one per
213 function group, preferably with general case code that could be
214 used elsewhere. Challenges here are correctly identifying the
215 groups of functions that go together, and making the right named
216 external cross-links point to the right page. Things to be aware of
217 are "-X", groups such as "getpwnam" to "endservent", two or more
218 "=items" giving the different parameter lists, such as
219
220 =item substr EXPR,OFFSET,LENGTH,REPLACEMENT
221 =item substr EXPR,OFFSET,LENGTH
222 =item substr EXPR,OFFSET
223
224 and different parameter lists having different meanings. (eg
225 "select")
226
227 compressed man pages
228 Be able to install them. This would probably need a configure test to
229 see how the system does compressed man pages (same directory/different
230 directory? same filename/different filename), as well as tweaking the
231 installman script to compress as necessary.
232
233 Add a code coverage target to the Makefile
234 Make it easy for anyone to run Devel::Cover on the core's tests. The
235 steps to do this manually are roughly
236
237 · do a normal "Configure", but include Devel::Cover as a module to
238 install (see INSTALL for how to do this)
239
240 ·
241
242
243 make perl
244
245 ·
246
247
248 cd t; HARNESS_PERL_SWITCHES=-MDevel::Cover ./perl -I../lib harness
249
250 · Process the resulting Devel::Cover database
251
252 This just give you the coverage of the .pms. To also get the C level
253 coverage you need to
254
255 · Additionally tell "Configure" to use the appropriate C compiler
256 flags for "gcov"
257
258 ·
259
260
261 make perl.gcov
262
263 (instead of "make perl")
264
265 · After running the tests run "gcov" to generate all the .gcov files.
266 (Including down in the subdirectories of ext/
267
268 · (From the top level perl directory) run "gcov2perl" on all the
269 ".gcov" files to get their stats into the cover_db directory.
270
271 · Then process the Devel::Cover database
272
273 It would be good to add a single switch to "Configure" to specify that
274 you wanted to perform perl level coverage, and another to specify C
275 level coverage, and have "Configure" and the Makefile do all the right
276 things automatically.
277
278 Make Config.pm cope with differences between built and installed perl
279 Quite often vendors ship a perl binary compiled with their (pay-for)
280 compilers. People install a free compiler, such as gcc. To work out
281 how to build extensions, Perl interrogates %Config, so in this
282 situation %Config describes compilers that aren't there, and extension
283 building fails. This forces people into choosing between re-compiling
284 perl themselves using the compiler they have, or only using modules
285 that the vendor ships.
286
287 It would be good to find a way teach "Config.pm" about the installation
288 setup, possibly involving probing at install time or later, so that the
289 %Config in a binary distribution better describes the installed
290 machine, when the installed machine differs from the build machine in
291 some significant way.
292
293 linker specification files
294 Some platforms mandate that you provide a list of a shared library's
295 external symbols to the linker, so the core already has the
296 infrastructure in place to do this for generating shared perl
297 libraries. My understanding is that the GNU toolchain can accept an
298 optional linker specification file, and restrict visibility just to
299 symbols declared in that file. It would be good to extend makedef.pl to
300 support this format, and to provide a means within "Configure" to
301 enable it. This would allow Unix users to test that the export list is
302 correct, and to build a perl that does not pollute the global namespace
303 with private symbols.
304
305 Cross-compile support
306 Currently "Configure" understands "-Dusecrosscompile" option. This
307 option arranges for building "miniperl" for TARGET machine, so this
308 "miniperl" is assumed then to be copied to TARGET machine and used as a
309 replacement of full "perl" executable.
310
311 This could be done little differently. Namely "miniperl" should be
312 built for HOST and then full "perl" with extensions should be compiled
313 for TARGET. This, however, might require extra trickery for %Config:
314 we have one config first for HOST and then another for TARGET. Tools
315 like MakeMaker will be mightily confused. Having around two different
316 types of executables and libraries (HOST and TARGET) makes life
317 interesting for Makefiles and shell (and Perl) scripts. There is
318 $Config{run}, normally empty, which can be used as an execution
319 wrapper. Also note that in some cross-compilation/execution
320 environments the HOST and the TARGET do not see the same filesystem(s),
321 the $Config{run} may need to do some file/directory copying back and
322 forth.
323
324 roffitall
325 Make pod/roffitall be updated by pod/buildtoc.
326
327 Split "linker" from "compiler"
328 Right now, Configure probes for two commands, and sets two variables:
329
330 · "cc" (in cc.U)
331
332 This variable holds the name of a command to execute a C compiler
333 which can resolve multiple global references that happen to have
334 the same name. Usual values are cc and gcc. Fervent ANSI
335 compilers may be called c89. AIX has xlc.
336
337 · "ld" (in dlsrc.U)
338
339 This variable indicates the program to be used to link libraries
340 for dynamic loading. On some systems, it is ld. On ELF systems,
341 it should be $cc. Mostly, we'll try to respect the hint file
342 setting.
343
344 There is an implicit historical assumption from around Perl5.000alpha
345 something, that $cc is also the correct command for linking object
346 files together to make an executable. This may be true on Unix, but
347 it's not true on other platforms, and there are a maze of work arounds
348 in other places (such as Makefile.SH) to cope with this.
349
350 Ideally, we should create a new variable to hold the name of the
351 executable linker program, probe for it in Configure, and centralise
352 all the special case logic there or in hints files.
353
354 A small bikeshed issue remains - what to call it, given that $ld is
355 already taken (arguably for the wrong thing now, but on SunOS 4.1 it is
356 the command for creating dynamically-loadable modules) and $link could
357 be confused with the Unix command line executable of the same name,
358 which does something completely different. Andy Dougherty makes the
359 counter argument "In parrot, I tried to call the command used to link
360 object files and libraries into an executable link, since that's what
361 my vaguely-remembered DOS and VMS experience suggested. I don't think
362 any real confusion has ensued, so it's probably a reasonable name for
363 perl5 to use."
364
365 "Alas, I've always worried that introducing it would make things worse,
366 since now the module building utilities would have to look for
367 $Config{link} and institute a fall-back plan if it weren't found."
368 Although I can see that as confusing, given that $Config{d_link} is
369 true when (hard) links are available.
370
371 Configure Windows using PowerShell
372 Currently, Windows uses hard-coded config files based to build the
373 config.h for compiling Perl. Makefiles are also hard-coded and need to
374 be hand edited prior to building Perl. While this makes it easy to
375 create a perl.exe that works across multiple Windows versions, being
376 able to accurately configure a perl.exe for a specific Windows versions
377 and VS C++ would be a nice enhancement. With PowerShell available on
378 Windows XP and up, this may now be possible. Step 1 might be to
379 investigate whether this is possible and use this to clean up our
380 current makefile situation. Step 2 would be to see if there would be a
381 way to use our existing metaconfig units to configure a Windows Perl or
382 whether we go in a separate direction and make it so. Of course, we
383 all know what step 3 is.
384
385 decouple -g and -DDEBUGGING
386 Currently Configure automatically adds "-DDEBUGGING" to the C compiler
387 flags if it spots "-g" in the optimiser flags. The pre-processor
388 directive "DEBUGGING" enables perl's command line "-D" options, but in
389 the process makes perl slower. It would be good to disentangle this
390 logic, so that C-level debugging with "-g" and Perl level debugging
391 with "-D" can easily be enabled independently.
392
394 These tasks would need a little C knowledge, but don't need any
395 specific background or experience with XS, or how the Perl interpreter
396 works
397
398 Weed out needless PERL_UNUSED_ARG
399 The C code uses the macro "PERL_UNUSED_ARG" to stop compilers warning
400 about unused arguments. Often the arguments can't be removed, as there
401 is an external constraint that determines the prototype of the
402 function, so this approach is valid. However, there are some cases
403 where "PERL_UNUSED_ARG" could be removed. Specifically
404
405 · The prototypes of (nearly all) static functions can be changed
406
407 · Unused arguments generated by short cut macros are wasteful - the
408 short cut macro used can be changed.
409
410 Modernize the order of directories in @INC
411 The way @INC is laid out by default, one cannot upgrade core (dual-
412 life) modules without overwriting files. This causes problems for
413 binary package builders. One possible proposal is laid out in this
414 message:
415 http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2002-04/msg02380.html
416 <http://www.xray.mpe.mpg.de/mailing-
417 lists/perl5-porters/2002-04/msg02380.html>.
418
419 -Duse32bit*
420 Natively 64-bit systems need neither -Duse64bitint nor -Duse64bitall.
421 On these systems, it might be the default compilation mode, and there
422 is currently no guarantee that passing no use64bitall option to the
423 Configure process will build a 32bit perl. Implementing -Duse32bit*
424 options would be nice for perl 5.12.
425
426 Profile Perl - am I hot or not?
427 The Perl source code is stable enough that it makes sense to profile
428 it, identify and optimise the hotspots. It would be good to measure the
429 performance of the Perl interpreter using free tools such as
430 cachegrind, gprof, and dtrace, and work to reduce the bottlenecks they
431 reveal.
432
433 As part of this, the idea of pp_hot.c is that it contains the hot ops,
434 the ops that are most commonly used. The idea is that by grouping them,
435 their object code will be adjacent in the executable, so they have a
436 greater chance of already being in the CPU cache (or swapped in) due to
437 being near another op already in use.
438
439 Except that it's not clear if these really are the most commonly used
440 ops. So as part of exercising your skills with coverage and profiling
441 tools you might want to determine what ops really are the most commonly
442 used. And in turn suggest evictions and promotions to achieve a better
443 pp_hot.c.
444
445 One piece of Perl code that might make a good testbed is installman.
446
447 Allocate OPs from arenas
448 Currently all new OP structures are individually malloc()ed and
449 free()d. All "malloc" implementations have space overheads, and are
450 now as fast as custom allocates so it would both use less memory and
451 less CPU to allocate the various OP structures from arenas. The SV
452 arena code can probably be re-used for this.
453
454 Note that Configuring perl with "-Accflags=-DPL_OP_SLAB_ALLOC" will use
455 Perl_Slab_alloc() to pack optrees into a contiguous block, which is
456 probably superior to the use of OP arenas, esp. from a cache locality
457 standpoint. See "Profile Perl - am I hot or not?".
458
459 Improve win32/wince.c
460 Currently, numerous functions look virtually, if not completely,
461 identical in both "win32/wince.c" and "win32/win32.c" files, which
462 can't be good.
463
464 Use secure CRT functions when building with VC8 on Win32
465 Visual C++ 2005 (VC++ 8.x) deprecated a number of CRT functions on the
466 basis that they were "unsafe" and introduced differently named secure
467 versions of them as replacements, e.g. instead of writing
468
469 FILE* f = fopen(__FILE__, "r");
470
471 one should now write
472
473 FILE* f;
474 errno_t err = fopen_s(&f, __FILE__, "r");
475
476 Currently, the warnings about these deprecations have been disabled by
477 adding -D_CRT_SECURE_NO_DEPRECATE to the CFLAGS. It would be nice to
478 remove that warning suppressant and actually make use of the new secure
479 CRT functions.
480
481 There is also a similar issue with POSIX CRT function names like fileno
482 having been deprecated in favour of ISO C++ conformant names like
483 _fileno. These warnings are also currently suppressed by adding
484 -D_CRT_NONSTDC_NO_DEPRECATE. It might be nice to do as Microsoft
485 suggest here too, although, unlike the secure functions issue, there is
486 presumably little or no benefit in this case.
487
488 Fix POSIX::access() and chdir() on Win32
489 These functions currently take no account of DACLs and therefore do not
490 behave correctly in situations where access is restricted by DACLs (as
491 opposed to the read-only attribute).
492
493 Furthermore, POSIX::access() behaves differently for directories having
494 the read-only attribute set depending on what CRT library is being
495 used. For example, the _access() function in the VC6 and VC7 CRTs
496 (wrongly) claim that such directories are not writable, whereas in fact
497 all directories are writable unless access is denied by DACLs. (In the
498 case of directories, the read-only attribute actually only means that
499 the directory cannot be deleted.) This CRT bug is fixed in the VC8 and
500 VC9 CRTs (but, of course, the directory may still not actually be
501 writable if access is indeed denied by DACLs).
502
503 For the chdir() issue, see ActiveState bug #74552:
504 http://bugs.activestate.com/show_bug.cgi?id=74552
505
506 Therefore, DACLs should be checked both for consistency across CRTs and
507 for the correct answer.
508
509 (Note that perl's -w operator should not be modified to check DACLs. It
510 has been written so that it reflects the state of the read-only
511 attribute, even for directories (whatever CRT is being used), for
512 symmetry with chmod().)
513
514 strcat(), strcpy(), strncat(), strncpy(), sprintf(), vsprintf()
515 Maybe create a utility that checks after each libperl.a creation that
516 none of the above (nor sprintf(), vsprintf(), or *SHUDDER* gets()) ever
517 creep back to libperl.a.
518
519 nm libperl.a | ./miniperl -alne '$o = $F[0] if /:$/; print "$o $F[1]" if $F[0] eq "U" && $F[1] =~ /^(?:strn?c(?:at|py)|v?sprintf|gets)$/'
520
521 Note, of course, that this will only tell whether your platform is
522 using those naughty interfaces.
523
524 -D_FORTIFY_SOURCE=2, -fstack-protector
525 Recent glibcs support "-D_FORTIFY_SOURCE=2" and recent gcc (4.1
526 onwards?) supports "-fstack-protector", both of which give protection
527 against various kinds of buffer overflow problems. These should
528 probably be used for compiling Perl whenever available, Configure
529 and/or hints files should be adjusted to probe for the availability of
530 these features and enable them as appropriate.
531
532 Arenas for GPs? For MAGIC?
533 "struct gp" and "struct magic" are both currently allocated by
534 "malloc". It might be a speed or memory saving to change to using
535 arenas. Or it might not. It would need some suitable benchmarking
536 first. In particular, "GP"s can probably be changed with minimal
537 compatibility impact (probably nothing outside of the core, or even
538 outside of gv.c allocates them), but they probably aren't
539 allocated/deallocated often enough for a speed saving. Whereas "MAGIC"
540 is allocated/deallocated more often, but in turn, is also something
541 more externally visible, so changing the rules here may bite external
542 code.
543
544 Shared arenas
545 Several SV body structs are now the same size, notably PVMG and PVGV,
546 PVAV and PVHV, and PVCV and PVFM. It should be possible to allocate and
547 return same sized bodies from the same actual arena, rather than
548 maintaining one arena for each. This could save 4-6K per thread, of
549 memory no longer tied up in the not-yet-allocated part of an arena.
550
552 These tasks would need C knowledge, and roughly the level of knowledge
553 of the perl API that comes from writing modules that use XS to
554 interface to C.
555
556 Write an XS cookbook
557 Create pod/perlxscookbook.pod with short, task-focused 'recipes' in XS
558 that demonstrate common tasks and good practices. (Some of these might
559 be extracted from perlguts.) The target audience should be XS novices,
560 who need more examples than perlguts but something less overwhelming
561 than perlapi. Recipes should provide "one pretty good way to do it"
562 instead of TIMTOWTDI.
563
564 Rather than focusing on interfacing Perl to C libraries, such a
565 cookbook should probably focus on how to optimize Perl routines by re-
566 writing them in XS. This will likely be more motivating to those who
567 mostly work in Perl but are looking to take the next step into XS.
568
569 Deconstructing and explaining some simpler XS modules could be one way
570 to bootstrap a cookbook. (List::Util? Class::XSAccessor?
571 Tree::Ternary_XS?) Another option could be deconstructing the
572 implementation of some simpler functions in op.c.
573
574 Allow XSUBs to inline themselves as OPs
575 For a simple XSUB, often the subroutine dispatch takes more time than
576 the XSUB itself. The tokeniser already has the ability to inline
577 constant subroutines - it would be good to provide a way to inline
578 other subroutines.
579
580 Specifically, simplest approach looks to be to allow an XSUB to provide
581 an alternative implementation of itself as a custom OP. A new flag bit
582 in "CvFLAGS()" would signal to the peephole optimiser to take an optree
583 such as this:
584
585 b <@> leave[1 ref] vKP/REFC ->(end)
586 1 <0> enter ->2
587 2 <;> nextstate(main 1 -e:1) v:{ ->3
588 a <2> sassign vKS/2 ->b
589 8 <1> entersub[t2] sKS/TARG,1 ->9
590 - <1> ex-list sK ->8
591 3 <0> pushmark s ->4
592 4 <$> const(IV 1) sM ->5
593 6 <1> rv2av[t1] lKM/1 ->7
594 5 <$> gv(*a) s ->6
595 - <1> ex-rv2cv sK ->-
596 7 <$> gv(*x) s/EARLYCV ->8
597 - <1> ex-rv2sv sKRM*/1 ->a
598 9 <$> gvsv(*b) s ->a
599
600 perform the symbol table lookup of "rv2cv" and "gv(*x)", locate the
601 pointer to the custom OP that provides the direct implementation, and
602 re- write the optree something like:
603
604 b <@> leave[1 ref] vKP/REFC ->(end)
605 1 <0> enter ->2
606 2 <;> nextstate(main 1 -e:1) v:{ ->3
607 a <2> sassign vKS/2 ->b
608 7 <1> custom_x -> 8
609 - <1> ex-list sK ->7
610 3 <0> pushmark s ->4
611 4 <$> const(IV 1) sM ->5
612 6 <1> rv2av[t1] lKM/1 ->7
613 5 <$> gv(*a) s ->6
614 - <1> ex-rv2cv sK ->-
615 - <$> ex-gv(*x) s/EARLYCV ->7
616 - <1> ex-rv2sv sKRM*/1 ->a
617 8 <$> gvsv(*b) s ->a
618
619 i.e. the gv(*) OP has been nulled and spliced out of the execution
620 path, and the "entersub" OP has been replaced by the custom op.
621
622 This approach should provide a measurable speed up to simple XSUBs
623 inside tight loops. Initially one would have to write the OP
624 alternative implementation by hand, but it's likely that this should be
625 reasonably straightforward for the type of XSUB that would benefit the
626 most. Longer term, once the run-time implementation is proven, it
627 should be possible to progressively update ExtUtils::ParseXS to
628 generate OP implementations for some XSUBs.
629
630 Remove the use of SVs as temporaries in dump.c
631 dump.c contains debugging routines to dump out the contains of perl
632 data structures, such as "SV"s, "AV"s and "HV"s. Currently, the dumping
633 code uses "SV"s for its temporary buffers, which was a logical initial
634 implementation choice, as they provide ready made memory handling.
635
636 However, they also lead to a lot of confusion when it happens that what
637 you're trying to debug is seen by the code in dump.c, correctly or
638 incorrectly, as a temporary scalar it can use for a temporary buffer.
639 It's also not possible to dump scalars before the interpreter is
640 properly set up, such as during ithreads cloning. It would be good to
641 progressively replace the use of scalars as string accumulation buffers
642 with something much simpler, directly allocated by "malloc". The dump.c
643 code is (or should be) only producing 7 bit US-ASCII, so output
644 character sets are not an issue.
645
646 Producing and proving an internal simple buffer allocation would make
647 it easier to re-write the internals of the PerlIO subsystem to avoid
648 using "SV"s for its buffers, use of which can cause problems similar to
649 those of dump.c, at similar times.
650
651 safely supporting POSIX SA_SIGINFO
652 Some years ago Jarkko supplied patches to provide support for the POSIX
653 SA_SIGINFO feature in Perl, passing the extra data to the Perl signal
654 handler.
655
656 Unfortunately, it only works with "unsafe" signals, because under safe
657 signals, by the time Perl gets to run the signal handler, the extra
658 information has been lost. Moreover, it's not easy to store it
659 somewhere, as you can't call mutexs, or do anything else fancy, from
660 inside a signal handler.
661
662 So it strikes me that we could provide safe SA_SIGINFO support
663
664 1. Provide global variables for two file descriptors
665
666 2. When the first request is made via "sigaction" for "SA_SIGINFO",
667 create a pipe, store the reader in one, the writer in the other
668
669 3. In the "safe" signal handler
670 ("Perl_csighandler()"/"S_raise_signal()"), if the "siginfo_t"
671 pointer non-"NULL", and the writer file handle is open,
672
673 1. serialise signal number, "struct siginfo_t" (or at least
674 the parts we care about) into a small auto char buff
675
676 2. "write()" that (non-blocking) to the writer fd
677
678 1. if it writes 100%, flag the signal in a counter
679 of "signals on the pipe" akin to the current
680 per-signal-number counts
681
682 2. if it writes 0%, assume the pipe is full. Flag
683 the data as lost?
684
685 3. if it writes partially, croak a panic, as your
686 OS is broken.
687
688 4. in the regular "PERL_ASYNC_CHECK()" processing, if there are
689 "signals on the pipe", read the data out, deserialise, build the
690 Perl structures on the stack (code in "Perl_sighandler()", the
691 "unsafe" handler), and call as usual.
692
693 I think that this gets us decent "SA_SIGINFO" support, without the
694 current risk of running Perl code inside the signal handler context.
695 (With all the dangers of things like "malloc" corruption that that
696 currently offers us)
697
698 For more information see the thread starting with this message:
699 http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2008-03/msg00305.html
700
701 autovivification
702 Make all autovivification consistent w.r.t LVALUE/RVALUE and strict/no
703 strict;
704
705 This task is incremental - even a little bit of work on it will help.
706
707 Unicode in Filenames
708 chdir, chmod, chown, chroot, exec, glob, link, lstat, mkdir, open,
709 opendir, qx, readdir, readlink, rename, rmdir, stat, symlink, sysopen,
710 system, truncate, unlink, utime, -X. All these could potentially
711 accept Unicode filenames either as input or output (and in the case of
712 system and qx Unicode in general, as input or output to/from the
713 shell). Whether a filesystem - an operating system pair understands
714 Unicode in filenames varies.
715
716 Known combinations that have some level of understanding include
717 Microsoft NTFS, Apple HFS+ (In Mac OS 9 and X) and Apple UFS (in Mac OS
718 X), NFS v4 is rumored to be Unicode, and of course Plan 9. How to
719 create Unicode filenames, what forms of Unicode are accepted and used
720 (UCS-2, UTF-16, UTF-8), what (if any) is the normalization form used,
721 and so on, varies. Finding the right level of interfacing to Perl
722 requires some thought. Remember that an OS does not implicate a
723 filesystem.
724
725 (The Windows -C command flag "wide API support" has been at least
726 temporarily retired in 5.8.1, and the -C has been repurposed, see
727 perlrun.)
728
729 Most probably the right way to do this would be this: "Virtualize
730 operating system access".
731
732 Unicode in %ENV
733 Currently the %ENV entries are always byte strings. See "Virtualize
734 operating system access".
735
736 Unicode and glob()
737 Currently glob patterns and filenames returned from File::Glob::glob()
738 are always byte strings. See "Virtualize operating system access".
739
740 use less 'memory'
741 Investigate trade offs to switch out perl's choices on memory usage.
742 Particularly perl should be able to give memory back.
743
744 This task is incremental - even a little bit of work on it will help.
745
746 Re-implement ":unique" in a way that is actually thread-safe
747 The old implementation made bad assumptions on several levels. A good
748 90% solution might be just to make ":unique" work to share the string
749 buffer of SvPVs. That way large constant strings can be shared between
750 ithreads, such as the configuration information in Config.
751
752 Make tainting consistent
753 Tainting would be easier to use if it didn't take documented shortcuts
754 and allow taint to "leak" everywhere within an expression.
755
756 readpipe(LIST)
757 system() accepts a LIST syntax (and a PROGRAM LIST syntax) to avoid
758 running a shell. readpipe() (the function behind qx//) could be
759 similarly extended.
760
761 Audit the code for destruction ordering assumptions
762 Change 25773 notes
763
764 /* Need to check SvMAGICAL, as during global destruction it may be that
765 AvARYLEN(av) has been freed before av, and hence the SvANY() pointer
766 is now part of the linked list of SV heads, rather than pointing to
767 the original body. */
768 /* FIXME - audit the code for other bugs like this one. */
769
770 adding the "SvMAGICAL" check to
771
772 if (AvARYLEN(av) && SvMAGICAL(AvARYLEN(av))) {
773 MAGIC *mg = mg_find (AvARYLEN(av), PERL_MAGIC_arylen);
774
775 Go through the core and look for similar assumptions that SVs have
776 particular types, as all bets are off during global destruction.
777
778 Extend PerlIO and PerlIO::Scalar
779 PerlIO::Scalar doesn't know how to truncate(). Implementing this would
780 require extending the PerlIO vtable.
781
782 Similarly the PerlIO vtable doesn't know about formats (write()), or
783 about stat(), or chmod()/chown(), utime(), or flock().
784
785 (For PerlIO::Scalar it's hard to see what e.g. mode bits or ownership
786 would mean.)
787
788 PerlIO doesn't do directories or symlinks, either: mkdir(), rmdir(),
789 opendir(), closedir(), seekdir(), rewinddir(), glob(); symlink(),
790 readlink().
791
792 See also "Virtualize operating system access".
793
794 -C on the #! line
795 It should be possible to make -C work correctly if found on the #!
796 line, given that all perl command line options are strict ASCII, and -C
797 changes only the interpretation of non-ASCII characters, and not for
798 the script file handle. To make it work needs some investigation of the
799 ordering of function calls during startup, and (by implication) a bit
800 of tweaking of that order.
801
802 Organize error messages
803 Perl's diagnostics (error messages, see perldiag) could use
804 reorganizing and formalizing so that each error message has its stable-
805 for-all-eternity unique id, categorized by severity, type, and
806 subsystem. (The error messages would be listed in a datafile outside
807 of the Perl source code, and the source code would only refer to the
808 messages by the id.) This clean-up and regularizing should apply for
809 all croak() messages.
810
811 This would enable all sorts of things: easier translation/localization
812 of the messages (though please do keep in mind the caveats of
813 Locale::Maketext about too straightforward approaches to translation),
814 filtering by severity, and instead of grepping for a particular error
815 message one could look for a stable error id. (Of course, changing the
816 error messages by default would break all the existing software
817 depending on some particular error message...)
818
819 This kind of functionality is known as message catalogs. Look for
820 inspiration for example in the catgets() system, possibly even use it
821 if available-- but only if available, all platforms will not have
822 catgets().
823
824 For the really pure at heart, consider extending this item to cover
825 also the warning messages (see perllexwarn, "warnings.pl").
826
828 These tasks would need C knowledge, and knowledge of how the
829 interpreter works, or a willingness to learn.
830
831 forbid labels with keyword names
832 Currently "goto keyword" "computes" the label value:
833
834 $ perl -e 'goto print'
835 Can't find label 1 at -e line 1.
836
837 It is controversial if the right way to avoid the confusion is to
838 forbid labels with keyword names, or if it would be better to always
839 treat bareword expressions after a "goto" as a label and never as a
840 keyword.
841
842 truncate() prototype
843 The prototype of truncate() is currently $$. It should probably be "*$"
844 instead. (This is changed in opcode.pl)
845
846 decapsulation of smart match argument
847 Currently "$foo ~~ $object" will die with the message "Smart matching a
848 non-overloaded object breaks encapsulation". It would be nice to allow
849 to bypass this by using explictly the syntax "$foo ~~ %$object" or
850 "$foo ~~ @$object".
851
852 error reporting of [$a ; $b]
853 Using ";" inside brackets is a syntax error, and we don't propose to
854 change that by giving it any meaning. However, it's not reported very
855 helpfully:
856
857 $ perl -e '$a = [$b; $c];'
858 syntax error at -e line 1, near "$b;"
859 syntax error at -e line 1, near "$c]"
860 Execution of -e aborted due to compilation errors.
861
862 It should be possible to hook into the tokeniser or the lexer, so that
863 when a ";" is parsed where it is not legal as a statement terminator
864 (ie inside "{}" used as a hashref, "[]" or "()") it issues an error
865 something like ';' isn't legal inside an expression - if you need
866 multiple statements use a do {...} block. See the thread starting at
867 http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2008-09/msg00573.html
868
869 lexicals used only once
870 This warns:
871
872 $ perl -we '$pie = 42'
873 Name "main::pie" used only once: possible typo at -e line 1.
874
875 This does not:
876
877 $ perl -we 'my $pie = 42'
878
879 Logically all lexicals used only once should warn, if the user asks for
880 warnings. An unworked RT ticket (#5087) has been open for almost seven
881 years for this discrepancy.
882
883 UTF-8 revamp
884 The handling of Unicode is unclean in many places. For example, the
885 regexp engine matches in Unicode semantics whenever the string or the
886 pattern is flagged as UTF-8, but that should not be dependent on an
887 internal storage detail of the string.
888
889 Properly Unicode safe tokeniser and pads.
890 The tokeniser isn't actually very UTF-8 clean. "use utf8;" is a hack -
891 variable names are stored in stashes as raw bytes, without the utf-8
892 flag set. The pad API only takes a "char *" pointer, so that's all
893 bytes too. The tokeniser ignores the UTF-8-ness of "PL_rsfp", or any
894 SVs returned from source filters. All this could be fixed.
895
896 state variable initialization in list context
897 Currently this is illegal:
898
899 state ($a, $b) = foo();
900
901 In Perl 6, "state ($a) = foo();" and "(state $a) = foo();" have
902 different semantics, which is tricky to implement in Perl 5 as
903 currently they produce the same opcode trees. The Perl 6 design is
904 firm, so it would be good to implement the necessary code in Perl 5.
905 There are comments in "Perl_newASSIGNOP()" that show the code paths
906 taken by various assignment constructions involving state variables.
907
908 Implement $value ~~ 0 .. $range
909 It would be nice to extend the syntax of the "~~" operator to also
910 understand numeric (and maybe alphanumeric) ranges.
911
912 A does() built-in
913 Like ref(), only useful. It would call the "DOES" method on objects; it
914 would also tell whether something can be dereferenced as an
915 array/hash/etc., or used as a regexp, etc.
916 http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2007-03/msg00481.html
917 <http://www.xray.mpe.mpg.de/mailing-
918 lists/perl5-porters/2007-03/msg00481.html>
919
920 Tied filehandles and write() don't mix
921 There is no method on tied filehandles to allow them to be called back
922 by formats.
923
924 Propagate compilation hints to the debugger
925 Currently a debugger started with -dE on the command-line doesn't see
926 the features enabled by -E. More generally hints ($^H and "%^H") aren't
927 propagated to the debugger. Probably it would be a good thing to
928 propagate hints from the innermost non-"DB::" scope: this would make
929 code eval'ed in the debugger see the features (and strictures, etc.)
930 currently in scope.
931
932 Attach/detach debugger from running program
933 The old perltodo notes "With "gdb", you can attach the debugger to a
934 running program if you pass the process ID. It would be good to do this
935 with the Perl debugger on a running Perl program, although I'm not sure
936 how it would be done." ssh and screen do this with named pipes in /tmp.
937 Maybe we can too.
938
939 LVALUE functions for lists
940 The old perltodo notes that lvalue functions don't work for list or
941 hash slices. This would be good to fix.
942
943 regexp optimiser optional
944 The regexp optimiser is not optional. It should configurable to be, to
945 allow its performance to be measured, and its bugs to be easily
946 demonstrated.
947
948 "/w" regex modifier
949 That flag would enable to match whole words, and also to interpolate
950 arrays as alternations. With it, "/P/w" would be roughly equivalent to:
951
952 do { local $"='|'; /\b(?:P)\b/ }
953
954 See
955 http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2007-01/msg00400.html
956 <http://www.xray.mpe.mpg.de/mailing-
957 lists/perl5-porters/2007-01/msg00400.html> for the discussion.
958
959 optional optimizer
960 Make the peephole optimizer optional. Currently it performs two tasks
961 as it walks the optree - genuine peephole optimisations, and necessary
962 fixups of ops. It would be good to find an efficient way to switch out
963 the optimisations whilst keeping the fixups.
964
965 You WANT *how* many
966 Currently contexts are void, scalar and list. split has a special
967 mechanism in place to pass in the number of return values wanted. It
968 would be useful to have a general mechanism for this, backwards
969 compatible and little speed hit. This would allow proposals such as
970 short circuiting sort to be implemented as a module on CPAN.
971
972 lexical aliases
973 Allow lexical aliases (maybe via the syntax "my \$alias = \$foo".
974
975 entersub XS vs Perl
976 At the moment pp_entersub is huge, and has code to deal with entering
977 both perl and XS subroutines. Subroutine implementations rarely change
978 between perl and XS at run time, so investigate using 2 ops to enter
979 subs (one for XS, one for perl) and swap between if a sub is redefined.
980
981 Self-ties
982 Self-ties are currently illegal because they caused too many segfaults.
983 Maybe the causes of these could be tracked down and self-ties on all
984 types reinstated.
985
986 Optimize away @_
987 The old perltodo notes "Look at the "reification" code in "av.c"".
988
989 Virtualize operating system access
990 Implement a set of "vtables" that virtualizes operating system access
991 (open(), mkdir(), unlink(), readdir(), getenv(), etc.) At the very
992 least these interfaces should take SVs as "name" arguments instead of
993 bare char pointers; probably the most flexible and extensible way would
994 be for the Perl-facing interfaces to accept HVs. The system needs to
995 be per-operating-system and per-file-system hookable/filterable,
996 preferably both from XS and Perl level ("Files and Filesystems" in
997 perlport is good reading at this point, in fact, all of perlport is.)
998
999 This has actually already been implemented (but only for Win32), take a
1000 look at iperlsys.h and win32/perlhost.h. While all Win32 variants go
1001 through a set of "vtables" for operating system access, non-Win32
1002 systems currently go straight for the POSIX/Unix-style system/library
1003 call. Similar system as for Win32 should be implemented for all
1004 platforms. The existing Win32 implementation probably does not need to
1005 survive alongside this proposed new implementation, the approaches
1006 could be merged.
1007
1008 What would this give us? One often-asked-for feature this would enable
1009 is using Unicode for filenames, and other "names" like %ENV, usernames,
1010 hostnames, and so forth. (See "When Unicode Does Not Happen" in
1011 perlunicode.)
1012
1013 But this kind of virtualization would also allow for things like
1014 virtual filesystems, virtual networks, and "sandboxes" (though as long
1015 as dynamic loading of random object code is allowed, not very safe
1016 sandboxes since external code of course know not of Perl's vtables).
1017 An example of a smaller "sandbox" is that this feature can be used to
1018 implement per-thread working directories: Win32 already does this.
1019
1020 See also "Extend PerlIO and PerlIO::Scalar".
1021
1022 Investigate PADTMP hash pessimisation
1023 The peephole optimiser converts constants used for hash key lookups to
1024 shared hash key scalars. Under ithreads, something is undoing this
1025 work. See
1026 http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2007-09/msg00793.html
1027
1028 Store the current pad in the OP slab allocator
1029 Currently we leak ops in various cases of parse failure. I suggested
1030 that we could solve this by always using the op slab allocator, and
1031 walking it to free ops. Dave comments that as some ops are already
1032 freed during optree creation one would have to mark which ops are
1033 freed, and not double free them when walking the slab. He notes that
1034 one problem with this is that for some ops you have to know which pad
1035 was current at the time of allocation, which does change. I suggested
1036 storing a pointer to the current pad in the memory allocated for the
1037 slab, and swapping to a new slab each time the pad changes. Dave thinks
1038 that this would work.
1039
1040 repack the optree
1041 Repacking the optree after execution order is determined could allow
1042 removal of NULL ops, and optimal ordering of OPs with respect to cache-
1043 line filling. The slab allocator could be reused for this purpose. I
1044 think that the best way to do this is to make it an optional step just
1045 before the completed optree is attached to anything else, and to use
1046 the slab allocator unchanged, so that freeing ops is identical whether
1047 or not this step runs. Note that the slab allocator allocates ops
1048 downwards in memory, so one would have to actually "allocate" the ops
1049 in reverse-execution order to get them contiguous in memory in
1050 execution order.
1051
1052 See
1053 http://www.nntp.perl.org/group/perl.perl5.porters/2007/12/msg131975.html
1054
1055 Note that running this copy, and then freeing all the old location ops
1056 would cause their slabs to be freed, which would eliminate possible
1057 memory wastage if the previous suggestion is implemented, and we swap
1058 slabs more frequently.
1059
1060 eliminate incorrect line numbers in warnings
1061 This code
1062
1063 use warnings;
1064 my $undef;
1065
1066 if ($undef == 3) {
1067 } elsif ($undef == 0) {
1068 }
1069
1070 used to produce this output:
1071
1072 Use of uninitialized value in numeric eq (==) at wrong.pl line 4.
1073 Use of uninitialized value in numeric eq (==) at wrong.pl line 4.
1074
1075 where the line of the second warning was misreported - it should be
1076 line 5. Rafael fixed this - the problem arose because there was no
1077 nextstate OP between the execution of the "if" and the "elsif", hence
1078 "PL_curcop" still reports that the currently executing line is line 4.
1079 The solution was to inject a nextstate OPs for each "elsif", although
1080 it turned out that the nextstate OP needed to be a nulled OP, rather
1081 than a live nextstate OP, else other line numbers became misreported.
1082 (Jenga!)
1083
1084 The problem is more general than "elsif" (although the "elsif" case is
1085 the most common and the most confusing). Ideally this code
1086
1087 use warnings;
1088 my $undef;
1089
1090 my $a = $undef + 1;
1091 my $b
1092 = $undef
1093 + 1;
1094
1095 would produce this output
1096
1097 Use of uninitialized value $undef in addition (+) at wrong.pl line 4.
1098 Use of uninitialized value $undef in addition (+) at wrong.pl line 7.
1099
1100 (rather than lines 4 and 5), but this would seem to require every OP to
1101 carry (at least) line number information.
1102
1103 What might work is to have an optional line number in memory just
1104 before the BASEOP structure, with a flag bit in the op to say whether
1105 it's present. Initially during compile every OP would carry its line
1106 number. Then add a late pass to the optimiser (potentially combined
1107 with "repack the optree") which looks at the two ops on every edge of
1108 the graph of the execution path. If the line number changes, flags the
1109 destination OP with this information. Once all paths are traced,
1110 replace every op with the flag with a nextstate-light op (that just
1111 updates "PL_curcop"), which in turn then passes control on to the true
1112 op. All ops would then be replaced by variants that do not store the
1113 line number. (Which, logically, why it would work best in conjunction
1114 with "repack the optree", as that is already copying/reallocating all
1115 the OPs)
1116
1117 (Although I should note that we're not certain that doing this for the
1118 general case is worth it)
1119
1120 optimize tail-calls
1121 Tail-calls present an opportunity for broadly applicable optimization;
1122 anywhere that "return foo(...)" is called, the outer return can be
1123 replaced by a goto, and foo will return directly to the outer caller,
1124 saving (conservatively) 25% of perl's call&return cost, which is
1125 relatively higher than in C. The scheme language is known to do this
1126 heavily. B::Concise provides good insight into where this optimization
1127 is possible, ie anywhere entersub,leavesub op-sequence occurs.
1128
1129 perl -MO=Concise,-exec,a,b,-main -e 'sub a{ 1 }; sub b {a()}; b(2)'
1130
1131 Bottom line on this is probably a new pp_tailcall function which
1132 combines the code in pp_entersub, pp_leavesub. This should probably be
1133 done 1st in XS, and using B::Generate to patch the new OP into the
1134 optrees.
1135
1137 Tasks that will get your name mentioned in the description of the
1138 "Highlights of 5.12"
1139
1140 make ithreads more robust
1141 Generally make ithreads more robust. See also "iCOW"
1142
1143 This task is incremental - even a little bit of work on it will help,
1144 and will be greatly appreciated.
1145
1146 One bit would be to write the missing code in sv.c:Perl_dirp_dup.
1147
1148 Fix Perl_sv_dup, et al so that threads can return objects.
1149
1150 iCOW
1151 Sarathy and Arthur have a proposal for an improved Copy On Write which
1152 specifically will be able to COW new ithreads. If this can be
1153 implemented it would be a good thing.
1154
1155 (?{...}) closures in regexps
1156 Fix (or rewrite) the implementation of the "/(?{...})/" closures.
1157
1158 A re-entrant regexp engine
1159 This will allow the use of a regex from inside (?{ }), (??{ }) and
1160 (?(?{ })|) constructs.
1161
1162 Add class set operations to regexp engine
1163 Apparently these are quite useful. Anyway, Jeffery Friedl wants them.
1164
1165 demerphq has this on his todo list, but right at the bottom.
1166
1168 [ Each and every one of these may be obsolete, but they were listed
1169 in the old Todo.micro file]
1170
1171 make creating uconfig.sh automatic
1172 make creating Makefile.micro automatic
1173 do away with fork/exec/wait?
1174 (system, popen should be enough?)
1175
1176 some of the uconfig.sh really needs to be probed (using cc) in buildtime:
1177 (uConfigure? :-) native datatype widths and endianness come to mind
1178
1179
1180
1181perl v5.12.4 2011-06-07 PERLTODO(1)