1PERLTODO(1)            Perl Programmers Reference Guide            PERLTODO(1)
2
3
4

NAME

6       perltodo - Perl TO-DO List
7

DESCRIPTION

9       This is a list of wishes for Perl. The most up to date version of this
10       file is at
11       http://perl5.git.perl.org/perl.git/blob_plain/HEAD:/pod/perltodo.pod
12
13       The tasks we think are smaller or easier are listed first. Anyone is
14       welcome to work on any of these, but it's a good idea to first contact
15       perl5-porters@perl.org to avoid duplication of effort, and to learn
16       from any previous attempts. By all means contact a pumpking privately
17       first if you prefer.
18
19       Whilst patches to make the list shorter are most welcome, ideas to add
20       to the list are also encouraged. Check the perl5-porters archives for
21       past ideas, and any discussion about them. One set of archives may be
22       found at:
23
24           http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/
25
26       What can we offer you in return? Fame, fortune, and everlasting glory?
27       Maybe not, but if your patch is incorporated, then we'll add your name
28       to the AUTHORS file, which ships in the official distribution. How many
29       other programming languages offer you 1 line of immortality?
30

Tasks that only need Perl knowledge

32   Remove macperl references from tests
33       MacPerl is gone. The tests don't need to be there.
34
35   Remove duplication of test setup.
36       Schwern notes, that there's duplication of code - lots and lots of
37       tests have some variation on the big block of $Is_Foo checks.  We can
38       safely put this into a file, change it to build an %Is hash and require
39       it.  Maybe just put it into test.pl. Throw in the handy tainting
40       subroutines.
41
42   POD -> HTML conversion in the core still sucks
43       Which is crazy given just how simple POD purports to be, and how simple
44       HTML can be. It's not actually as simple as it sounds, particularly
45       with the flexibility POD allows for "=item", but it would be good to
46       improve the visual appeal of the HTML generated, and to avoid it having
47       any validation errors. See also "make HTML install work", as the layout
48       of installation tree is needed to improve the cross-linking.
49
50       The addition of "Pod::Simple" and its related modules may make this
51       task easier to complete.
52
53   Make ExtUtils::ParseXS use strict;
54       lib/ExtUtils/ParseXS.pm contains this line
55
56           # use strict;  # One of these days...
57
58       Simply uncomment it, and fix all the resulting issues :-)
59
60       The more practical approach, to break the task down into manageable
61       chunks, is to work your way though the code from bottom to top, or if
62       necessary adding extra "{ ... }" blocks, and turning on strict within
63       them.
64
65   Parallel testing
66       (This probably impacts much more than the core: also the Test::Harness
67       and TAP::* modules on CPAN.)
68
69       All of the tests in t/ can now be run in parallel, if $ENV{TEST_JOBS}
70       is set. However, tests within each directory in ext and lib are still
71       run in series, with directories run in parallel. This is an adequate
72       heuristic, but it might be possible to relax it further, and get more
73       throughput. Specifically, it would be good to audit all of lib/*.t, and
74       make them use "File::Temp".
75
76   Make Schwern poorer
77       We should have tests for everything. When all the core's modules are
78       tested, Schwern has promised to donate to $500 to TPF. We may need
79       volunteers to hold him upside down and shake vigorously in order to
80       actually extract the cash.
81
82   Improve the coverage of the core tests
83       Use Devel::Cover to ascertain the core modules's test coverage, then
84       add tests that are currently missing.
85
86   test B
87       A full test suite for the B module would be nice.
88
89   A decent benchmark
90       "perlbench" seems impervious to any recent changes made to the perl
91       core. It would be useful to have a reasonable general benchmarking
92       suite that roughly represented what current perl programs do, and
93       measurably reported whether tweaks to the core improve, degrade or
94       don't really affect performance, to guide people attempting to optimise
95       the guts of perl. Gisle would welcome new tests for perlbench.
96
97   fix tainting bugs
98       Fix the bugs revealed by running the test suite with the "-t" switch
99       (via "make test.taintwarn").
100
101   Dual life everything
102       As part of the "dists" plan, anything that doesn't belong in the
103       smallest perl distribution needs to be dual lifed. Anything else can be
104       too. Figure out what changes would be needed to package that module and
105       its tests up for CPAN, and do so. Test it with older perl releases, and
106       fix the problems you find.
107
108       To make a minimal perl distribution, it's useful to look at
109       t/lib/commonsense.t.
110
111   Bundle dual life modules in ext/
112       For maintenance (and branch merging) reasons, it would be useful to
113       move some architecture-independent dual-life modules from lib/ to ext/,
114       if this has no negative impact on the build of perl itself.
115
116   POSIX memory footprint
117       Ilya observed that use POSIX; eats memory like there's no tomorrow, and
118       at various times worked to cut it down. There is probably still fat to
119       cut out - for example POSIX passes Exporter some very memory hungry
120       data structures.
121
122   embed.pl/makedef.pl
123       There is a script embed.pl that generates several header files to
124       prefix all of Perl's symbols in a consistent way, to provide some
125       semblance of namespace support in "C". Functions are declared in
126       embed.fnc, variables in interpvar.h. Quite a few of the functions and
127       variables are conditionally declared there, using "#ifdef". However,
128       embed.pl doesn't understand the C macros, so the rules about which
129       symbols are present when is duplicated in makedef.pl. Writing things
130       twice is bad, m'kay.  It would be good to teach "embed.pl" to
131       understand the conditional compilation, and hence remove the
132       duplication, and the mistakes it has caused.
133
134   use strict; and AutoLoad
135       Currently if you write
136
137           package Whack;
138           use AutoLoader 'AUTOLOAD';
139           use strict;
140           1;
141           __END__
142           sub bloop {
143               print join (' ', No, strict, here), "!\n";
144           }
145
146       then "use strict;" isn't in force within the autoloaded subroutines. It
147       would be more consistent (and less surprising) to arrange for all
148       lexical pragmas in force at the __END__ block to be in force within
149       each autoloaded subroutine.
150
151       There's a similar problem with SelfLoader.
152
153   profile installman
154       The installman script is slow. All it is doing text processing, which
155       we're told is something Perl is good at. So it would be nice to know
156       what it is doing that is taking so much CPU, and where possible address
157       it.
158

Tasks that need a little sysadmin-type knowledge

160       Or if you prefer, tasks that you would learn from, and broaden your
161       skills base...
162
163   make HTML install work
164       There is an "installhtml" target in the Makefile. It's marked as
165       "experimental". It would be good to get this tested, make it work
166       reliably, and remove the "experimental" tag. This would include
167
168       1.  Checking that cross linking between various parts of the
169           documentation works.  In particular that links work between the
170           modules (files with POD in lib/) and the core documentation (files
171           in pod/)
172
173       2.  Work out how to split "perlfunc" into chunks, preferably one per
174           function group, preferably with general case code that could be
175           used elsewhere.  Challenges here are correctly identifying the
176           groups of functions that go together, and making the right named
177           external cross-links point to the right page. Things to be aware of
178           are "-X", groups such as "getpwnam" to "endservent", two or more
179           "=items" giving the different parameter lists, such as
180
181               =item substr EXPR,OFFSET,LENGTH,REPLACEMENT
182               =item substr EXPR,OFFSET,LENGTH
183               =item substr EXPR,OFFSET
184
185           and different parameter lists having different meanings. (eg
186           "select")
187
188   compressed man pages
189       Be able to install them. This would probably need a configure test to
190       see how the system does compressed man pages (same directory/different
191       directory?  same filename/different filename), as well as tweaking the
192       installman script to compress as necessary.
193
194   Add a code coverage target to the Makefile
195       Make it easy for anyone to run Devel::Cover on the core's tests. The
196       steps to do this manually are roughly
197
198       ·   do a normal "Configure", but include Devel::Cover as a module to
199           install (see INSTALL for how to do this)
200
201       ·
202
203
204               make perl
205
206       ·
207
208
209               cd t; HARNESS_PERL_SWITCHES=-MDevel::Cover ./perl -I../lib harness
210
211       ·   Process the resulting Devel::Cover database
212
213       This just give you the coverage of the .pms. To also get the C level
214       coverage you need to
215
216       ·   Additionally tell "Configure" to use the appropriate C compiler
217           flags for "gcov"
218
219       ·
220
221
222               make perl.gcov
223
224           (instead of "make perl")
225
226       ·   After running the tests run "gcov" to generate all the .gcov files.
227           (Including down in the subdirectories of ext/
228
229       ·   (From the top level perl directory) run "gcov2perl" on all the
230           ".gcov" files to get their stats into the cover_db directory.
231
232       ·   Then process the Devel::Cover database
233
234       It would be good to add a single switch to "Configure" to specify that
235       you wanted to perform perl level coverage, and another to specify C
236       level coverage, and have "Configure" and the Makefile do all the right
237       things automatically.
238
239   Make Config.pm cope with differences between built and installed perl
240       Quite often vendors ship a perl binary compiled with their (pay-for)
241       compilers.  People install a free compiler, such as gcc. To work out
242       how to build extensions, Perl interrogates %Config, so in this
243       situation %Config describes compilers that aren't there, and extension
244       building fails. This forces people into choosing between re-compiling
245       perl themselves using the compiler they have, or only using modules
246       that the vendor ships.
247
248       It would be good to find a way teach "Config.pm" about the installation
249       setup, possibly involving probing at install time or later, so that the
250       %Config in a binary distribution better describes the installed
251       machine, when the installed machine differs from the build machine in
252       some significant way.
253
254   linker specification files
255       Some platforms mandate that you provide a list of a shared library's
256       external symbols to the linker, so the core already has the
257       infrastructure in place to do this for generating shared perl
258       libraries. My understanding is that the GNU toolchain can accept an
259       optional linker specification file, and restrict visibility just to
260       symbols declared in that file. It would be good to extend makedef.pl to
261       support this format, and to provide a means within "Configure" to
262       enable it. This would allow Unix users to test that the export list is
263       correct, and to build a perl that does not pollute the global namespace
264       with private symbols.
265
266   Cross-compile support
267       Currently "Configure" understands "-Dusecrosscompile" option. This
268       option arranges for building "miniperl" for TARGET machine, so this
269       "miniperl" is assumed then to be copied to TARGET machine and used as a
270       replacement of full "perl" executable.
271
272       This could be done little differently. Namely "miniperl" should be
273       built for HOST and then full "perl" with extensions should be compiled
274       for TARGET.  This, however, might require extra trickery for %Config:
275       we have one config first for HOST and then another for TARGET.  Tools
276       like MakeMaker will be mightily confused.  Having around two different
277       types of executables and libraries (HOST and TARGET) makes life
278       interesting for Makefiles and shell (and Perl) scripts.  There is
279       $Config{run}, normally empty, which can be used as an execution
280       wrapper.  Also note that in some cross-compilation/execution
281       environments the HOST and the TARGET do not see the same filesystem(s),
282       the $Config{run} may need to do some file/directory copying back and
283       forth.
284
285   roffitall
286       Make pod/roffitall be updated by pod/buildtoc.
287
288   Split "linker" from "compiler"
289       Right now, Configure probes for two commands, and sets two variables:
290
291       ·   "cc" (in cc.U)
292
293           This variable holds the name of a command to execute a C compiler
294           which can resolve multiple global references that happen to have
295           the same name.  Usual values are cc and gcc.  Fervent ANSI
296           compilers may be called c89.  AIX has xlc.
297
298       ·   "ld" (in dlsrc.U)
299
300           This variable indicates the program to be used to link libraries
301           for dynamic loading.  On some systems, it is ld.  On ELF systems,
302           it should be $cc.  Mostly, we'll try to respect the hint file
303           setting.
304
305       There is an implicit historical assumption from around Perl5.000alpha
306       something, that $cc is also the correct command for linking object
307       files together to make an executable. This may be true on Unix, but
308       it's not true on other platforms, and there are a maze of work arounds
309       in other places (such as Makefile.SH) to cope with this.
310
311       Ideally, we should create a new variable to hold the name of the
312       executable linker program, probe for it in Configure, and centralise
313       all the special case logic there or in hints files.
314
315       A small bikeshed issue remains - what to call it, given that $ld is
316       already taken (arguably for the wrong thing now, but on SunOS 4.1 it is
317       the command for creating dynamically-loadable modules) and $link could
318       be confused with the Unix command line executable of the same name,
319       which does something completely different. Andy Dougherty makes the
320       counter argument "In parrot, I tried to call the command used to link
321       object files and  libraries into an executable link, since that's what
322       my vaguely-remembered DOS and VMS experience suggested. I don't think
323       any real confusion has ensued, so it's probably a reasonable name for
324       perl5 to use."
325
326       "Alas, I've always worried that introducing it would make things worse,
327       since now the module building utilities would have to look for
328       $Config{link} and institute a fall-back plan if it weren't found."
329       Although I can see that as confusing, given that $Config{d_link} is
330       true when (hard) links are available.
331
332   Configure Windows using PowerShell
333       Currently, Windows uses hard-coded config files based to build the
334       config.h for compiling Perl.  Makefiles are also hard-coded and need to
335       be hand edited prior to building Perl. While this makes it easy to
336       create a perl.exe that works across multiple Windows versions, being
337       able to accurately configure a perl.exe for a specific Windows versions
338       and VS C++ would be a nice enhancement.  With PowerShell available on
339       Windows XP and up, this may now be possible.  Step 1 might be to
340       investigate whether this is possible and use this to clean up our
341       current makefile situation.  Step 2 would be to see if there would be a
342       way to use our existing metaconfig units to configure a Windows Perl or
343       whether we go in a separate direction and make it so.  Of course, we
344       all know what step 3 is.
345
346   decouple -g and -DDEBUGGING
347       Currently Configure automatically adds "-DDEBUGGING" to the C compiler
348       flags if it spots "-g" in the optimiser flags. The pre-processor
349       directive "DEBUGGING" enables perl's command line <-D> options, but in
350       the process makes perl slower. It would be good to disentangle this
351       logic, so that C-level debugging with "-g" and Perl level debugging
352       with "-D" can easily be enabled independently.
353

Tasks that need a little C knowledge

355       These tasks would need a little C knowledge, but don't need any
356       specific background or experience with XS, or how the Perl interpreter
357       works
358
359   Weed out needless PERL_UNUSED_ARG
360       The C code uses the macro "PERL_UNUSED_ARG" to stop compilers warning
361       about unused arguments. Often the arguments can't be removed, as there
362       is an external constraint that determines the prototype of the
363       function, so this approach is valid. However, there are some cases
364       where "PERL_UNUSED_ARG" could be removed. Specifically
365
366       ·   The prototypes of (nearly all) static functions can be changed
367
368       ·   Unused arguments generated by short cut macros are wasteful - the
369           short cut macro used can be changed.
370
371   Modernize the order of directories in @INC
372       The way @INC is laid out by default, one cannot upgrade core (dual-
373       life) modules without overwriting files. This causes problems for
374       binary package builders.  One possible proposal is laid out in this
375       message:
376       <http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2002-04/msg02380.html>.
377
378   -Duse32bit*
379       Natively 64-bit systems need neither -Duse64bitint nor -Duse64bitall.
380       On these systems, it might be the default compilation mode, and there
381       is currently no guarantee that passing no use64bitall option to the
382       Configure process will build a 32bit perl. Implementing -Duse32bit*
383       options would be nice for perl 5.12.
384
385   Profile Perl - am I hot or not?
386       The Perl source code is stable enough that it makes sense to profile
387       it, identify and optimise the hotspots. It would be good to measure the
388       performance of the Perl interpreter using free tools such as
389       cachegrind, gprof, and dtrace, and work to reduce the bottlenecks they
390       reveal.
391
392       As part of this, the idea of pp_hot.c is that it contains the hot ops,
393       the ops that are most commonly used. The idea is that by grouping them,
394       their object code will be adjacent in the executable, so they have a
395       greater chance of already being in the CPU cache (or swapped in) due to
396       being near another op already in use.
397
398       Except that it's not clear if these really are the most commonly used
399       ops. So as part of exercising your skills with coverage and profiling
400       tools you might want to determine what ops really are the most commonly
401       used. And in turn suggest evictions and promotions to achieve a better
402       pp_hot.c.
403
404       One piece of Perl code that might make a good testbed is installman.
405
406   Allocate OPs from arenas
407       Currently all new OP structures are individually malloc()ed and
408       free()d.  All "malloc" implementations have space overheads, and are
409       now as fast as custom allocates so it would both use less memory and
410       less CPU to allocate the various OP structures from arenas. The SV
411       arena code can probably be re-used for this.
412
413       Note that Configuring perl with "-Accflags=-DPL_OP_SLAB_ALLOC" will use
414       Perl_Slab_alloc() to pack optrees into a contiguous block, which is
415       probably superior to the use of OP arenas, esp. from a cache locality
416       standpoint.  See "Profile Perl - am I hot or not?".
417
418   Improve win32/wince.c
419       Currently, numerous functions look virtually, if not completely,
420       identical in both "win32/wince.c" and "win32/win32.c" files, which
421       can't be good.
422
423   Use secure CRT functions when building with VC8 on Win32
424       Visual C++ 2005 (VC++ 8.x) deprecated a number of CRT functions on the
425       basis that they were "unsafe" and introduced differently named secure
426       versions of them as replacements, e.g. instead of writing
427
428           FILE* f = fopen(__FILE__, "r");
429
430       one should now write
431
432           FILE* f;
433           errno_t err = fopen_s(&f, __FILE__, "r");
434
435       Currently, the warnings about these deprecations have been disabled by
436       adding -D_CRT_SECURE_NO_DEPRECATE to the CFLAGS. It would be nice to
437       remove that warning suppressant and actually make use of the new secure
438       CRT functions.
439
440       There is also a similar issue with POSIX CRT function names like fileno
441       having been deprecated in favour of ISO C++ conformant names like
442       _fileno. These warnings are also currently suppressed by adding
443       -D_CRT_NONSTDC_NO_DEPRECATE. It might be nice to do as Microsoft
444       suggest here too, although, unlike the secure functions issue, there is
445       presumably little or no benefit in this case.
446
447   Fix POSIX::access() and chdir() on Win32
448       These functions currently take no account of DACLs and therefore do not
449       behave correctly in situations where access is restricted by DACLs (as
450       opposed to the read-only attribute).
451
452       Furthermore, POSIX::access() behaves differently for directories having
453       the read-only attribute set depending on what CRT library is being
454       used. For example, the _access() function in the VC6 and VC7 CRTs
455       (wrongly) claim that such directories are not writable, whereas in fact
456       all directories are writable unless access is denied by DACLs. (In the
457       case of directories, the read-only attribute actually only means that
458       the directory cannot be deleted.) This CRT bug is fixed in the VC8 and
459       VC9 CRTs (but, of course, the directory may still not actually be
460       writable if access is indeed denied by DACLs).
461
462       For the chdir() issue, see ActiveState bug #74552:
463       http://bugs.activestate.com/show_bug.cgi?id=74552
464
465       Therefore, DACLs should be checked both for consistency across CRTs and
466       for the correct answer.
467
468       (Note that perl's -w operator should not be modified to check DACLs. It
469       has been written so that it reflects the state of the read-only
470       attribute, even for directories (whatever CRT is being used), for
471       symmetry with chmod().)
472
473   strcat(), strcpy(), strncat(), strncpy(), sprintf(), vsprintf()
474       Maybe create a utility that checks after each libperl.a creation that
475       none of the above (nor sprintf(), vsprintf(), or *SHUDDER* gets()) ever
476       creep back to libperl.a.
477
478         nm libperl.a | ./miniperl -alne '$o = $F[0] if /:$/; print "$o $F[1]" if $F[0] eq "U" && $F[1] =~ /^(?:strn?c(?:at|py)|v?sprintf|gets)$/'
479
480       Note, of course, that this will only tell whether your platform is
481       using those naughty interfaces.
482
483   -D_FORTIFY_SOURCE=2, -fstack-protector
484       Recent glibcs support "-D_FORTIFY_SOURCE=2" and recent gcc (4.1
485       onwards?) supports "-fstack-protector", both of which give protection
486       against various kinds of buffer overflow problems.  These should
487       probably be used for compiling Perl whenever available, Configure
488       and/or hints files should be adjusted to probe for the availability of
489       these features and enable them as appropriate.
490
491   Arenas for GPs? For MAGIC?
492       "struct gp" and "struct magic" are both currently allocated by
493       "malloc".  It might be a speed or memory saving to change to using
494       arenas. Or it might not. It would need some suitable benchmarking
495       first. In particular, "GP"s can probably be changed with minimal
496       compatibility impact (probably nothing outside of the core, or even
497       outside of gv.c allocates them), but they probably aren't
498       allocated/deallocated often enough for a speed saving. Whereas "MAGIC"
499       is allocated/deallocated more often, but in turn, is also something
500       more externally visible, so changing the rules here may bite external
501       code.
502
503   Shared arenas
504       Several SV body structs are now the same size, notably PVMG and PVGV,
505       PVAV and PVHV, and PVCV and PVFM. It should be possible to allocate and
506       return same sized bodies from the same actual arena, rather than
507       maintaining one arena for each. This could save 4-6K per thread, of
508       memory no longer tied up in the not-yet-allocated part of an arena.
509

Tasks that need a knowledge of XS

511       These tasks would need C knowledge, and roughly the level of knowledge
512       of the perl API that comes from writing modules that use XS to
513       interface to C.
514
515   Remove the use of SVs as temporaries in dump.c
516       dump.c contains debugging routines to dump out the contains of perl
517       data structures, such as "SV"s, "AV"s and "HV"s. Currently, the dumping
518       code uses "SV"s for its temporary buffers, which was a logical initial
519       implementation choice, as they provide ready made memory handling.
520
521       However, they also lead to a lot of confusion when it happens that what
522       you're trying to debug is seen by the code in dump.c, correctly or
523       incorrectly, as a temporary scalar it can use for a temporary buffer.
524       It's also not possible to dump scalars before the interpreter is
525       properly set up, such as during ithreads cloning. It would be good to
526       progressively replace the use of scalars as string accumulation buffers
527       with something much simpler, directly allocated by "malloc". The dump.c
528       code is (or should be) only producing 7 bit US-ASCII, so output
529       character sets are not an issue.
530
531       Producing and proving an internal simple buffer allocation would make
532       it easier to re-write the internals of the PerlIO subsystem to avoid
533       using "SV"s for its buffers, use of which can cause problems similar to
534       those of dump.c, at similar times.
535
536   safely supporting POSIX SA_SIGINFO
537       Some years ago Jarkko supplied patches to provide support for the POSIX
538       SA_SIGINFO feature in Perl, passing the extra data to the Perl signal
539       handler.
540
541       Unfortunately, it only works with "unsafe" signals, because under safe
542       signals, by the time Perl gets to run the signal handler, the extra
543       information has been lost. Moreover, it's not easy to store it
544       somewhere, as you can't call mutexs, or do anything else fancy, from
545       inside a signal handler.
546
547       So it strikes me that we could provide safe SA_SIGINFO support
548
549       1.  Provide global variables for two file descriptors
550
551       2.  When the first request is made via "sigaction" for "SA_SIGINFO",
552           create a pipe, store the reader in one, the writer in the other
553
554       3.  In the "safe" signal handler
555           ("Perl_csighandler()"/"S_raise_signal()"), if the "siginfo_t"
556           pointer non-"NULL", and the writer file handle is open,
557
558           1.      serialise signal number, "struct siginfo_t" (or at least
559                   the parts we care about) into a small auto char buff
560
561           2.      "write()" that (non-blocking) to the writer fd
562
563                   1.          if it writes 100%, flag the signal in a counter
564                               of "signals on the pipe" akin to the current
565                               per-signal-number counts
566
567                   2.          if it writes 0%, assume the pipe is full. Flag
568                               the data as lost?
569
570                   3.          if it writes partially, croak a panic, as your
571                               OS is broken.
572
573       4.  in the regular "PERL_ASYNC_CHECK()" processing, if there are
574           "signals on the pipe", read the data out, deserialise, build the
575           Perl structures on the stack (code in "Perl_sighandler()", the
576           "unsafe" handler), and call as usual.
577
578       I think that this gets us decent "SA_SIGINFO" support, without the
579       current risk of running Perl code inside the signal handler context.
580       (With all the dangers of things like "malloc" corruption that that
581       currently offers us)
582
583       For more information see the thread starting with this message:
584       http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2008-03/msg00305.html
585
586   autovivification
587       Make all autovivification consistent w.r.t LVALUE/RVALUE and strict/no
588       strict;
589
590       This task is incremental - even a little bit of work on it will help.
591
592   Unicode in Filenames
593       chdir, chmod, chown, chroot, exec, glob, link, lstat, mkdir, open,
594       opendir, qx, readdir, readlink, rename, rmdir, stat, symlink, sysopen,
595       system, truncate, unlink, utime, -X.  All these could potentially
596       accept Unicode filenames either as input or output (and in the case of
597       system and qx Unicode in general, as input or output to/from the
598       shell).  Whether a filesystem - an operating system pair understands
599       Unicode in filenames varies.
600
601       Known combinations that have some level of understanding include
602       Microsoft NTFS, Apple HFS+ (In Mac OS 9 and X) and Apple UFS (in Mac OS
603       X), NFS v4 is rumored to be Unicode, and of course Plan 9.  How to
604       create Unicode filenames, what forms of Unicode are accepted and used
605       (UCS-2, UTF-16, UTF-8), what (if any) is the normalization form used,
606       and so on, varies.  Finding the right level of interfacing to Perl
607       requires some thought.  Remember that an OS does not implicate a
608       filesystem.
609
610       (The Windows -C command flag "wide API support" has been at least
611       temporarily retired in 5.8.1, and the -C has been repurposed, see
612       perlrun.)
613
614       Most probably the right way to do this would be this: "Virtualize
615       operating system access".
616
617   Unicode in %ENV
618       Currently the %ENV entries are always byte strings.  See "Virtualize
619       operating system access".
620
621   Unicode and glob()
622       Currently glob patterns and filenames returned from File::Glob::glob()
623       are always byte strings.  See "Virtualize operating system access".
624
625   Unicode and lc/uc operators
626       Some built-in operators ("lc", "uc", etc.) behave differently, based on
627       what the internal encoding of their argument is. That should not be the
628       case. Maybe add a pragma to switch behaviour.
629
630   use less 'memory'
631       Investigate trade offs to switch out perl's choices on memory usage.
632       Particularly perl should be able to give memory back.
633
634       This task is incremental - even a little bit of work on it will help.
635
636   Re-implement ":unique" in a way that is actually thread-safe
637       The old implementation made bad assumptions on several levels. A good
638       90% solution might be just to make ":unique" work to share the string
639       buffer of SvPVs. That way large constant strings can be shared between
640       ithreads, such as the configuration information in Config.
641
642   Make tainting consistent
643       Tainting would be easier to use if it didn't take documented shortcuts
644       and allow taint to "leak" everywhere within an expression.
645
646   readpipe(LIST)
647       system() accepts a LIST syntax (and a PROGRAM LIST syntax) to avoid
648       running a shell. readpipe() (the function behind qx//) could be
649       similarly extended.
650
651   Audit the code for destruction ordering assumptions
652       Change 25773 notes
653
654           /* Need to check SvMAGICAL, as during global destruction it may be that
655              AvARYLEN(av) has been freed before av, and hence the SvANY() pointer
656              is now part of the linked list of SV heads, rather than pointing to
657              the original body.  */
658           /* FIXME - audit the code for other bugs like this one.  */
659
660       adding the "SvMAGICAL" check to
661
662           if (AvARYLEN(av) && SvMAGICAL(AvARYLEN(av))) {
663               MAGIC *mg = mg_find (AvARYLEN(av), PERL_MAGIC_arylen);
664
665       Go through the core and look for similar assumptions that SVs have
666       particular types, as all bets are off during global destruction.
667
668   Extend PerlIO and PerlIO::Scalar
669       PerlIO::Scalar doesn't know how to truncate().  Implementing this would
670       require extending the PerlIO vtable.
671
672       Similarly the PerlIO vtable doesn't know about formats (write()), or
673       about stat(), or chmod()/chown(), utime(), or flock().
674
675       (For PerlIO::Scalar it's hard to see what e.g. mode bits or ownership
676       would mean.)
677
678       PerlIO doesn't do directories or symlinks, either: mkdir(), rmdir(),
679       opendir(), closedir(), seekdir(), rewinddir(), glob(); symlink(),
680       readlink().
681
682       See also "Virtualize operating system access".
683
684   -C on the #! line
685       It should be possible to make -C work correctly if found on the #!
686       line, given that all perl command line options are strict ASCII, and -C
687       changes only the interpretation of non-ASCII characters, and not for
688       the script file handle. To make it work needs some investigation of the
689       ordering of function calls during startup, and (by implication) a bit
690       of tweaking of that order.
691
692   Duplicate logic in S_method_common() and Perl_gv_fetchmethod_autoload()
693       A comment in "S_method_common" notes
694
695               /* This code tries to figure out just what went wrong with
696                  gv_fetchmethod.  It therefore needs to duplicate a lot of
697                  the internals of that function.  We can't move it inside
698                  Perl_gv_fetchmethod_autoload(), however, since that would
699                  cause UNIVERSAL->can("NoSuchPackage::foo") to croak, and we
700                  don't want that.
701               */
702
703       If "Perl_gv_fetchmethod_autoload" gets rewritten to take (more) flag
704       bits, then it ought to be possible to move the logic from
705       "S_method_common" to the "right" place. When making this change it
706       would probably be good to also pass in at least the method name length,
707       if not also pre-computed hash values when known. (I'm contemplating a
708       plan to pre-compute hash values for common fixed strings such as "ISA"
709       and pass them in to functions.)
710
711   Organize error messages
712       Perl's diagnostics (error messages, see perldiag) could use
713       reorganizing and formalizing so that each error message has its stable-
714       for-all-eternity unique id, categorized by severity, type, and
715       subsystem.  (The error messages would be listed in a datafile outside
716       of the Perl source code, and the source code would only refer to the
717       messages by the id.)  This clean-up and regularizing should apply for
718       all croak() messages.
719
720       This would enable all sorts of things: easier translation/localization
721       of the messages (though please do keep in mind the caveats of
722       Locale::Maketext about too straightforward approaches to translation),
723       filtering by severity, and instead of grepping for a particular error
724       message one could look for a stable error id.  (Of course, changing the
725       error messages by default would break all the existing software
726       depending on some particular error message...)
727
728       This kind of functionality is known as message catalogs.  Look for
729       inspiration for example in the catgets() system, possibly even use it
730       if available-- but only if available, all platforms will not have
731       catgets().
732
733       For the really pure at heart, consider extending this item to cover
734       also the warning messages (see perllexwarn, "warnings.pl").
735

Tasks that need a knowledge of the interpreter

737       These tasks would need C knowledge, and knowledge of how the
738       interpreter works, or a willingness to learn.
739
740   forbid labels with keyword names
741       Currently "goto keyword" "computes" the label value:
742
743           $ perl -e 'goto print'
744           Can't find label 1 at -e line 1.
745
746       It would be nice to forbid labels with keyword names, to avoid
747       confusion.
748
749   truncate() prototype
750       The prototype of truncate() is currently $$. It should probably be "*$"
751       instead. (This is changed in opcode.pl)
752
753   decapsulation of smart match argument
754       Currently "$foo ~~ $object" will die with the message "Smart matching a
755       non-overloaded object breaks encapsulation". It would be nice to allow
756       to bypass this by using explictly the syntax "$foo ~~ %$object" or
757       "$foo ~~ @$object".
758
759   error reporting of [$a ; $b]
760       Using ";" inside brackets is a syntax error, and we don't propose to
761       change that by giving it any meaning. However, it's not reported very
762       helpfully:
763
764           $ perl -e '$a = [$b; $c];'
765           syntax error at -e line 1, near "$b;"
766           syntax error at -e line 1, near "$c]"
767           Execution of -e aborted due to compilation errors.
768
769       It should be possible to hook into the tokeniser or the lexer, so that
770       when a ";" is parsed where it is not legal as a statement terminator
771       (ie inside "{}" used as a hashref, "[]" or "()") it issues an error
772       something like ';' isn't legal inside an expression - if you need
773       multiple statements use a do {...} block. See the thread starting at
774       http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2008-09/msg00573.html
775
776   lexicals used only once
777       This warns:
778
779           $ perl -we '$pie = 42'
780           Name "main::pie" used only once: possible typo at -e line 1.
781
782       This does not:
783
784           $ perl -we 'my $pie = 42'
785
786       Logically all lexicals used only once should warn, if the user asks for
787       warnings.  An unworked RT ticket (#5087) has been open for almost seven
788       years for this discrepancy.
789
790   UTF-8 revamp
791       The handling of Unicode is unclean in many places. For example, the
792       regexp engine matches in Unicode semantics whenever the string or the
793       pattern is flagged as UTF-8, but that should not be dependent on an
794       internal storage detail of the string. Likewise, case folding behaviour
795       is dependent on the UTF8 internal flag being on or off.
796
797   Properly Unicode safe tokeniser and pads.
798       The tokeniser isn't actually very UTF-8 clean. "use utf8;" is a hack -
799       variable names are stored in stashes as raw bytes, without the utf-8
800       flag set. The pad API only takes a "char *" pointer, so that's all
801       bytes too. The tokeniser ignores the UTF-8-ness of "PL_rsfp", or any
802       SVs returned from source filters.  All this could be fixed.
803
804   state variable initialization in list context
805       Currently this is illegal:
806
807           state ($a, $b) = foo();
808
809       In Perl 6, "state ($a) = foo();" and "(state $a) = foo();" have
810       different semantics, which is tricky to implement in Perl 5 as
811       currently they produce the same opcode trees. The Perl 6 design is
812       firm, so it would be good to implement the necessary code in Perl 5.
813       There are comments in "Perl_newASSIGNOP()" that show the code paths
814       taken by various assignment constructions involving state variables.
815
816   Implement $value ~~ 0 .. $range
817       It would be nice to extend the syntax of the "~~" operator to also
818       understand numeric (and maybe alphanumeric) ranges.
819
820   A does() built-in
821       Like ref(), only useful. It would call the "DOES" method on objects; it
822       would also tell whether something can be dereferenced as an
823       array/hash/etc., or used as a regexp, etc.
824       <http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2007-03/msg00481.html>
825
826   Tied filehandles and write() don't mix
827       There is no method on tied filehandles to allow them to be called back
828       by formats.
829
830   Propagate compilation hints to the debugger
831       Currently a debugger started with -dE on the command-line doesn't see
832       the features enabled by -E. More generally hints ($^H and "%^H") aren't
833       propagated to the debugger. Probably it would be a good thing to
834       propagate hints from the innermost non-"DB::" scope: this would make
835       code eval'ed in the debugger see the features (and strictures, etc.)
836       currently in scope.
837
838   Attach/detach debugger from running program
839       The old perltodo notes "With "gdb", you can attach the debugger to a
840       running program if you pass the process ID. It would be good to do this
841       with the Perl debugger on a running Perl program, although I'm not sure
842       how it would be done." ssh and screen do this with named pipes in /tmp.
843       Maybe we can too.
844
845   LVALUE functions for lists
846       The old perltodo notes that lvalue functions don't work for list or
847       hash slices. This would be good to fix.
848
849   regexp optimiser optional
850       The regexp optimiser is not optional. It should configurable to be, to
851       allow its performance to be measured, and its bugs to be easily
852       demonstrated.
853
854   delete &function
855       Allow to delete functions. One can already undef them, but they're
856       still in the stash.
857
858   "/w" regex modifier
859       That flag would enable to match whole words, and also to interpolate
860       arrays as alternations. With it, "/P/w" would be roughly equivalent to:
861
862           do { local $"='|'; /\b(?:P)\b/ }
863
864       See
865       <http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2007-01/msg00400.html>
866       for the discussion.
867
868   optional optimizer
869       Make the peephole optimizer optional. Currently it performs two tasks
870       as it walks the optree - genuine peephole optimisations, and necessary
871       fixups of ops. It would be good to find an efficient way to switch out
872       the optimisations whilst keeping the fixups.
873
874   You WANT *how* many
875       Currently contexts are void, scalar and list. split has a special
876       mechanism in place to pass in the number of return values wanted. It
877       would be useful to have a general mechanism for this, backwards
878       compatible and little speed hit.  This would allow proposals such as
879       short circuiting sort to be implemented as a module on CPAN.
880
881   lexical aliases
882       Allow lexical aliases (maybe via the syntax "my \$alias = \$foo".
883
884   entersub XS vs Perl
885       At the moment pp_entersub is huge, and has code to deal with entering
886       both perl and XS subroutines. Subroutine implementations rarely change
887       between perl and XS at run time, so investigate using 2 ops to enter
888       subs (one for XS, one for perl) and swap between if a sub is redefined.
889
890   Self-ties
891       Self-ties are currently illegal because they caused too many segfaults.
892       Maybe the causes of these could be tracked down and self-ties on all
893       types reinstated.
894
895   Optimize away @_
896       The old perltodo notes "Look at the "reification" code in "av.c"".
897
898   The yada yada yada operators
899       Perl 6's Synopsis 3 says:
900
901       The ... operator is the "yada, yada, yada" list operator, which is used
902       as the body in function prototypes. It complains bitterly (by calling
903       fail) if it is ever executed. Variant ??? calls warn, and !!! calls
904       die.
905
906       Those would be nice to add to Perl 5. That could be done without new
907       ops.
908
909   Virtualize operating system access
910       Implement a set of "vtables" that virtualizes operating system access
911       (open(), mkdir(), unlink(), readdir(), getenv(), etc.)  At the very
912       least these interfaces should take SVs as "name" arguments instead of
913       bare char pointers; probably the most flexible and extensible way would
914       be for the Perl-facing interfaces to accept HVs.  The system needs to
915       be per-operating-system and per-file-system hookable/filterable,
916       preferably both from XS and Perl level ("Files and Filesystems" in
917       perlport is good reading at this point, in fact, all of perlport is.)
918
919       This has actually already been implemented (but only for Win32), take a
920       look at iperlsys.h and win32/perlhost.h.  While all Win32 variants go
921       through a set of "vtables" for operating system access, non-Win32
922       systems currently go straight for the POSIX/UNIX-style system/library
923       call.  Similar system as for Win32 should be implemented for all
924       platforms.  The existing Win32 implementation probably does not need to
925       survive alongside this proposed new implementation, the approaches
926       could be merged.
927
928       What would this give us?  One often-asked-for feature this would enable
929       is using Unicode for filenames, and other "names" like %ENV, usernames,
930       hostnames, and so forth.  (See "When Unicode Does Not Happen" in
931       perlunicode.)
932
933       But this kind of virtualization would also allow for things like
934       virtual filesystems, virtual networks, and "sandboxes" (though as long
935       as dynamic loading of random object code is allowed, not very safe
936       sandboxes since external code of course know not of Perl's vtables).
937       An example of a smaller "sandbox" is that this feature can be used to
938       implement per-thread working directories: Win32 already does this.
939
940       See also "Extend PerlIO and PerlIO::Scalar".
941
942   Investigate PADTMP hash pessimisation
943       The peephole optimiser converts constants used for hash key lookups to
944       shared hash key scalars. Under ithreads, something is undoing this
945       work.  See
946       http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2007-09/msg00793.html
947
948   Store the current pad in the OP slab allocator
949       Currently we leak ops in various cases of parse failure. I suggested
950       that we could solve this by always using the op slab allocator, and
951       walking it to free ops. Dave comments that as some ops are already
952       freed during optree creation one would have to mark which ops are
953       freed, and not double free them when walking the slab. He notes that
954       one problem with this is that for some ops you have to know which pad
955       was current at the time of allocation, which does change. I suggested
956       storing a pointer to the current pad in the memory allocated for the
957       slab, and swapping to a new slab each time the pad changes. Dave thinks
958       that this would work.
959
960   repack the optree
961       Repacking the optree after execution order is determined could allow
962       removal of NULL ops, and optimal ordering of OPs with respect to cache-
963       line filling.  The slab allocator could be reused for this purpose.  I
964       think that the best way to do this is to make it an optional step just
965       before the completed optree is attached to anything else, and to use
966       the slab allocator unchanged, so that freeing ops is identical whether
967       or not this step runs.  Note that the slab allocator allocates ops
968       downwards in memory, so one would have to actually "allocate" the ops
969       in reverse-execution order to get them contiguous in memory in
970       execution order.
971
972       See
973       http://www.nntp.perl.org/group/perl.perl5.porters/2007/12/msg131975.html
974
975       Note that running this copy, and then freeing all the old location ops
976       would cause their slabs to be freed, which would eliminate possible
977       memory wastage if the previous suggestion is implemented, and we swap
978       slabs more frequently.
979
980   eliminate incorrect line numbers in warnings
981       This code
982
983           use warnings;
984           my $undef;
985
986           if ($undef == 3) {
987           } elsif ($undef == 0) {
988           }
989
990       used to produce this output:
991
992           Use of uninitialized value in numeric eq (==) at wrong.pl line 4.
993           Use of uninitialized value in numeric eq (==) at wrong.pl line 4.
994
995       where the line of the second warning was misreported - it should be
996       line 5.  Rafael fixed this - the problem arose because there was no
997       nextstate OP between the execution of the "if" and the "elsif", hence
998       "PL_curcop" still reports that the currently executing line is line 4.
999       The solution was to inject a nextstate OPs for each "elsif", although
1000       it turned out that the nextstate OP needed to be a nulled OP, rather
1001       than a live nextstate OP, else other line numbers became misreported.
1002       (Jenga!)
1003
1004       The problem is more general than "elsif" (although the "elsif" case is
1005       the most common and the most confusing). Ideally this code
1006
1007           use warnings;
1008           my $undef;
1009
1010           my $a = $undef + 1;
1011           my $b
1012             = $undef
1013             + 1;
1014
1015       would produce this output
1016
1017           Use of uninitialized value $undef in addition (+) at wrong.pl line 4.
1018           Use of uninitialized value $undef in addition (+) at wrong.pl line 7.
1019
1020       (rather than lines 4 and 5), but this would seem to require every OP to
1021       carry (at least) line number information.
1022
1023       What might work is to have an optional line number in memory just
1024       before the BASEOP structure, with a flag bit in the op to say whether
1025       it's present.  Initially during compile every OP would carry its line
1026       number. Then add a late pass to the optimiser (potentially combined
1027       with "repack the optree") which looks at the two ops on every edge of
1028       the graph of the execution path. If the line number changes, flags the
1029       destination OP with this information.  Once all paths are traced,
1030       replace every op with the flag with a nextstate-light op (that just
1031       updates "PL_curcop"), which in turn then passes control on to the true
1032       op. All ops would then be replaced by variants that do not store the
1033       line number. (Which, logically, why it would work best in conjunction
1034       with "repack the optree", as that is already copying/reallocating all
1035       the OPs)
1036
1037       (Although I should note that we're not certain that doing this for the
1038       general case is worth it)
1039
1040   optimize tail-calls
1041       Tail-calls present an opportunity for broadly applicable optimization;
1042       anywhere that "return foo(...)" is called, the outer return can be
1043       replaced by a goto, and foo will return directly to the outer caller,
1044       saving (conservatively) 25% of perl's call&return cost, which is
1045       relatively higher than in C.  The scheme language is known to do this
1046       heavily.  B::Concise provides good insight into where this optimization
1047       is possible, ie anywhere entersub,leavesub op-sequence occurs.
1048
1049        perl -MO=Concise,-exec,a,b,-main -e 'sub a{ 1 }; sub b {a()}; b(2)'
1050
1051       Bottom line on this is probably a new pp_tailcall function which
1052       combines the code in pp_entersub, pp_leavesub.  This should probably be
1053       done 1st in XS, and using B::Generate to patch the new OP into the
1054       optrees.
1055
1056   "\N"
1057       It should be possible to add a "\N" regex assertion, meaning "every
1058       character except "\n"AX independently of the context. That would of
1059       course imply that "\N" couldn't be followed by an opening "{".
1060

Big projects

1062       Tasks that will get your name mentioned in the description of the
1063       "Highlights of 5.12"
1064
1065   make ithreads more robust
1066       Generally make ithreads more robust. See also "iCOW"
1067
1068       This task is incremental - even a little bit of work on it will help,
1069       and will be greatly appreciated.
1070
1071       One bit would be to write the missing code in sv.c:Perl_dirp_dup.
1072
1073       Fix Perl_sv_dup, et al so that threads can return objects.
1074
1075   iCOW
1076       Sarathy and Arthur have a proposal for an improved Copy On Write which
1077       specifically will be able to COW new ithreads. If this can be
1078       implemented it would be a good thing.
1079
1080   (?{...}) closures in regexps
1081       Fix (or rewrite) the implementation of the "/(?{...})/" closures.
1082
1083   A re-entrant regexp engine
1084       This will allow the use of a regex from inside (?{ }), (??{ }) and
1085       (?(?{ })|) constructs.
1086
1087   Add class set operations to regexp engine
1088       Apparently these are quite useful. Anyway, Jeffery Friedl wants them.
1089
1090       demerphq has this on his todo list, but right at the bottom.
1091

Tasks for microperl

1093       [ Each and every one of these may be obsolete, but they were listed
1094         in the old Todo.micro file]
1095
1096   make creating uconfig.sh automatic
1097   make creating Makefile.micro automatic
1098   do away with fork/exec/wait?
1099       (system, popen should be enough?)
1100
1101   some of the uconfig.sh really needs to be probed (using cc) in buildtime:
1102       (uConfigure? :-) native datatype widths and endianness come to mind
1103
1104
1105
1106perl v5.10.1                      2009-08-10                       PERLTODO(1)
Impressum