1PERLTODO(1) Perl Programmers Reference Guide PERLTODO(1)
2
3
4
6 perltodo - Perl TO-DO List
7
9 This is a list of wishes for Perl. The most up to date version of this
10 file is at
11 http://perl5.git.perl.org/perl.git/blob_plain/HEAD:/pod/perltodo.pod
12
13 The tasks we think are smaller or easier are listed first. Anyone is
14 welcome to work on any of these, but it's a good idea to first contact
15 perl5-porters@perl.org to avoid duplication of effort, and to learn
16 from any previous attempts. By all means contact a pumpking privately
17 first if you prefer.
18
19 Whilst patches to make the list shorter are most welcome, ideas to add
20 to the list are also encouraged. Check the perl5-porters archives for
21 past ideas, and any discussion about them. One set of archives may be
22 found at:
23
24 http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/
25
26 What can we offer you in return? Fame, fortune, and everlasting glory?
27 Maybe not, but if your patch is incorporated, then we'll add your name
28 to the AUTHORS file, which ships in the official distribution. How many
29 other programming languages offer you 1 line of immortality?
30
32 Remove macperl references from tests
33 MacPerl is gone. The tests don't need to be there.
34
35 Remove duplication of test setup.
36 Schwern notes, that there's duplication of code - lots and lots of
37 tests have some variation on the big block of $Is_Foo checks. We can
38 safely put this into a file, change it to build an %Is hash and require
39 it. Maybe just put it into test.pl. Throw in the handy tainting
40 subroutines.
41
42 POD -> HTML conversion in the core still sucks
43 Which is crazy given just how simple POD purports to be, and how simple
44 HTML can be. It's not actually as simple as it sounds, particularly
45 with the flexibility POD allows for "=item", but it would be good to
46 improve the visual appeal of the HTML generated, and to avoid it having
47 any validation errors. See also "make HTML install work", as the layout
48 of installation tree is needed to improve the cross-linking.
49
50 The addition of "Pod::Simple" and its related modules may make this
51 task easier to complete.
52
53 Make ExtUtils::ParseXS use strict;
54 lib/ExtUtils/ParseXS.pm contains this line
55
56 # use strict; # One of these days...
57
58 Simply uncomment it, and fix all the resulting issues :-)
59
60 The more practical approach, to break the task down into manageable
61 chunks, is to work your way though the code from bottom to top, or if
62 necessary adding extra "{ ... }" blocks, and turning on strict within
63 them.
64
65 Parallel testing
66 (This probably impacts much more than the core: also the Test::Harness
67 and TAP::* modules on CPAN.)
68
69 All of the tests in t/ can now be run in parallel, if $ENV{TEST_JOBS}
70 is set. However, tests within each directory in ext and lib are still
71 run in series, with directories run in parallel. This is an adequate
72 heuristic, but it might be possible to relax it further, and get more
73 throughput. Specifically, it would be good to audit all of lib/*.t, and
74 make them use "File::Temp".
75
76 Make Schwern poorer
77 We should have tests for everything. When all the core's modules are
78 tested, Schwern has promised to donate to $500 to TPF. We may need
79 volunteers to hold him upside down and shake vigorously in order to
80 actually extract the cash.
81
82 Improve the coverage of the core tests
83 Use Devel::Cover to ascertain the core modules's test coverage, then
84 add tests that are currently missing.
85
86 test B
87 A full test suite for the B module would be nice.
88
89 A decent benchmark
90 "perlbench" seems impervious to any recent changes made to the perl
91 core. It would be useful to have a reasonable general benchmarking
92 suite that roughly represented what current perl programs do, and
93 measurably reported whether tweaks to the core improve, degrade or
94 don't really affect performance, to guide people attempting to optimise
95 the guts of perl. Gisle would welcome new tests for perlbench.
96
97 fix tainting bugs
98 Fix the bugs revealed by running the test suite with the "-t" switch
99 (via "make test.taintwarn").
100
101 Dual life everything
102 As part of the "dists" plan, anything that doesn't belong in the
103 smallest perl distribution needs to be dual lifed. Anything else can be
104 too. Figure out what changes would be needed to package that module and
105 its tests up for CPAN, and do so. Test it with older perl releases, and
106 fix the problems you find.
107
108 To make a minimal perl distribution, it's useful to look at
109 t/lib/commonsense.t.
110
111 Bundle dual life modules in ext/
112 For maintenance (and branch merging) reasons, it would be useful to
113 move some architecture-independent dual-life modules from lib/ to ext/,
114 if this has no negative impact on the build of perl itself.
115
116 POSIX memory footprint
117 Ilya observed that use POSIX; eats memory like there's no tomorrow, and
118 at various times worked to cut it down. There is probably still fat to
119 cut out - for example POSIX passes Exporter some very memory hungry
120 data structures.
121
122 embed.pl/makedef.pl
123 There is a script embed.pl that generates several header files to
124 prefix all of Perl's symbols in a consistent way, to provide some
125 semblance of namespace support in "C". Functions are declared in
126 embed.fnc, variables in interpvar.h. Quite a few of the functions and
127 variables are conditionally declared there, using "#ifdef". However,
128 embed.pl doesn't understand the C macros, so the rules about which
129 symbols are present when is duplicated in makedef.pl. Writing things
130 twice is bad, m'kay. It would be good to teach "embed.pl" to
131 understand the conditional compilation, and hence remove the
132 duplication, and the mistakes it has caused.
133
134 use strict; and AutoLoad
135 Currently if you write
136
137 package Whack;
138 use AutoLoader 'AUTOLOAD';
139 use strict;
140 1;
141 __END__
142 sub bloop {
143 print join (' ', No, strict, here), "!\n";
144 }
145
146 then "use strict;" isn't in force within the autoloaded subroutines. It
147 would be more consistent (and less surprising) to arrange for all
148 lexical pragmas in force at the __END__ block to be in force within
149 each autoloaded subroutine.
150
151 There's a similar problem with SelfLoader.
152
153 profile installman
154 The installman script is slow. All it is doing text processing, which
155 we're told is something Perl is good at. So it would be nice to know
156 what it is doing that is taking so much CPU, and where possible address
157 it.
158
160 Or if you prefer, tasks that you would learn from, and broaden your
161 skills base...
162
163 make HTML install work
164 There is an "installhtml" target in the Makefile. It's marked as
165 "experimental". It would be good to get this tested, make it work
166 reliably, and remove the "experimental" tag. This would include
167
168 1. Checking that cross linking between various parts of the
169 documentation works. In particular that links work between the
170 modules (files with POD in lib/) and the core documentation (files
171 in pod/)
172
173 2. Work out how to split "perlfunc" into chunks, preferably one per
174 function group, preferably with general case code that could be
175 used elsewhere. Challenges here are correctly identifying the
176 groups of functions that go together, and making the right named
177 external cross-links point to the right page. Things to be aware of
178 are "-X", groups such as "getpwnam" to "endservent", two or more
179 "=items" giving the different parameter lists, such as
180
181 =item substr EXPR,OFFSET,LENGTH,REPLACEMENT
182 =item substr EXPR,OFFSET,LENGTH
183 =item substr EXPR,OFFSET
184
185 and different parameter lists having different meanings. (eg
186 "select")
187
188 compressed man pages
189 Be able to install them. This would probably need a configure test to
190 see how the system does compressed man pages (same directory/different
191 directory? same filename/different filename), as well as tweaking the
192 installman script to compress as necessary.
193
194 Add a code coverage target to the Makefile
195 Make it easy for anyone to run Devel::Cover on the core's tests. The
196 steps to do this manually are roughly
197
198 · do a normal "Configure", but include Devel::Cover as a module to
199 install (see INSTALL for how to do this)
200
201 ·
202
203
204 make perl
205
206 ·
207
208
209 cd t; HARNESS_PERL_SWITCHES=-MDevel::Cover ./perl -I../lib harness
210
211 · Process the resulting Devel::Cover database
212
213 This just give you the coverage of the .pms. To also get the C level
214 coverage you need to
215
216 · Additionally tell "Configure" to use the appropriate C compiler
217 flags for "gcov"
218
219 ·
220
221
222 make perl.gcov
223
224 (instead of "make perl")
225
226 · After running the tests run "gcov" to generate all the .gcov files.
227 (Including down in the subdirectories of ext/
228
229 · (From the top level perl directory) run "gcov2perl" on all the
230 ".gcov" files to get their stats into the cover_db directory.
231
232 · Then process the Devel::Cover database
233
234 It would be good to add a single switch to "Configure" to specify that
235 you wanted to perform perl level coverage, and another to specify C
236 level coverage, and have "Configure" and the Makefile do all the right
237 things automatically.
238
239 Make Config.pm cope with differences between built and installed perl
240 Quite often vendors ship a perl binary compiled with their (pay-for)
241 compilers. People install a free compiler, such as gcc. To work out
242 how to build extensions, Perl interrogates %Config, so in this
243 situation %Config describes compilers that aren't there, and extension
244 building fails. This forces people into choosing between re-compiling
245 perl themselves using the compiler they have, or only using modules
246 that the vendor ships.
247
248 It would be good to find a way teach "Config.pm" about the installation
249 setup, possibly involving probing at install time or later, so that the
250 %Config in a binary distribution better describes the installed
251 machine, when the installed machine differs from the build machine in
252 some significant way.
253
254 linker specification files
255 Some platforms mandate that you provide a list of a shared library's
256 external symbols to the linker, so the core already has the
257 infrastructure in place to do this for generating shared perl
258 libraries. My understanding is that the GNU toolchain can accept an
259 optional linker specification file, and restrict visibility just to
260 symbols declared in that file. It would be good to extend makedef.pl to
261 support this format, and to provide a means within "Configure" to
262 enable it. This would allow Unix users to test that the export list is
263 correct, and to build a perl that does not pollute the global namespace
264 with private symbols.
265
266 Cross-compile support
267 Currently "Configure" understands "-Dusecrosscompile" option. This
268 option arranges for building "miniperl" for TARGET machine, so this
269 "miniperl" is assumed then to be copied to TARGET machine and used as a
270 replacement of full "perl" executable.
271
272 This could be done little differently. Namely "miniperl" should be
273 built for HOST and then full "perl" with extensions should be compiled
274 for TARGET. This, however, might require extra trickery for %Config:
275 we have one config first for HOST and then another for TARGET. Tools
276 like MakeMaker will be mightily confused. Having around two different
277 types of executables and libraries (HOST and TARGET) makes life
278 interesting for Makefiles and shell (and Perl) scripts. There is
279 $Config{run}, normally empty, which can be used as an execution
280 wrapper. Also note that in some cross-compilation/execution
281 environments the HOST and the TARGET do not see the same filesystem(s),
282 the $Config{run} may need to do some file/directory copying back and
283 forth.
284
285 roffitall
286 Make pod/roffitall be updated by pod/buildtoc.
287
288 Split "linker" from "compiler"
289 Right now, Configure probes for two commands, and sets two variables:
290
291 · "cc" (in cc.U)
292
293 This variable holds the name of a command to execute a C compiler
294 which can resolve multiple global references that happen to have
295 the same name. Usual values are cc and gcc. Fervent ANSI
296 compilers may be called c89. AIX has xlc.
297
298 · "ld" (in dlsrc.U)
299
300 This variable indicates the program to be used to link libraries
301 for dynamic loading. On some systems, it is ld. On ELF systems,
302 it should be $cc. Mostly, we'll try to respect the hint file
303 setting.
304
305 There is an implicit historical assumption from around Perl5.000alpha
306 something, that $cc is also the correct command for linking object
307 files together to make an executable. This may be true on Unix, but
308 it's not true on other platforms, and there are a maze of work arounds
309 in other places (such as Makefile.SH) to cope with this.
310
311 Ideally, we should create a new variable to hold the name of the
312 executable linker program, probe for it in Configure, and centralise
313 all the special case logic there or in hints files.
314
315 A small bikeshed issue remains - what to call it, given that $ld is
316 already taken (arguably for the wrong thing now, but on SunOS 4.1 it is
317 the command for creating dynamically-loadable modules) and $link could
318 be confused with the Unix command line executable of the same name,
319 which does something completely different. Andy Dougherty makes the
320 counter argument "In parrot, I tried to call the command used to link
321 object files and libraries into an executable link, since that's what
322 my vaguely-remembered DOS and VMS experience suggested. I don't think
323 any real confusion has ensued, so it's probably a reasonable name for
324 perl5 to use."
325
326 "Alas, I've always worried that introducing it would make things worse,
327 since now the module building utilities would have to look for
328 $Config{link} and institute a fall-back plan if it weren't found."
329 Although I can see that as confusing, given that $Config{d_link} is
330 true when (hard) links are available.
331
332 Configure Windows using PowerShell
333 Currently, Windows uses hard-coded config files based to build the
334 config.h for compiling Perl. Makefiles are also hard-coded and need to
335 be hand edited prior to building Perl. While this makes it easy to
336 create a perl.exe that works across multiple Windows versions, being
337 able to accurately configure a perl.exe for a specific Windows versions
338 and VS C++ would be a nice enhancement. With PowerShell available on
339 Windows XP and up, this may now be possible. Step 1 might be to
340 investigate whether this is possible and use this to clean up our
341 current makefile situation. Step 2 would be to see if there would be a
342 way to use our existing metaconfig units to configure a Windows Perl or
343 whether we go in a separate direction and make it so. Of course, we
344 all know what step 3 is.
345
346 decouple -g and -DDEBUGGING
347 Currently Configure automatically adds "-DDEBUGGING" to the C compiler
348 flags if it spots "-g" in the optimiser flags. The pre-processor
349 directive "DEBUGGING" enables perl's command line <-D> options, but in
350 the process makes perl slower. It would be good to disentangle this
351 logic, so that C-level debugging with "-g" and Perl level debugging
352 with "-D" can easily be enabled independently.
353
355 These tasks would need a little C knowledge, but don't need any
356 specific background or experience with XS, or how the Perl interpreter
357 works
358
359 Weed out needless PERL_UNUSED_ARG
360 The C code uses the macro "PERL_UNUSED_ARG" to stop compilers warning
361 about unused arguments. Often the arguments can't be removed, as there
362 is an external constraint that determines the prototype of the
363 function, so this approach is valid. However, there are some cases
364 where "PERL_UNUSED_ARG" could be removed. Specifically
365
366 · The prototypes of (nearly all) static functions can be changed
367
368 · Unused arguments generated by short cut macros are wasteful - the
369 short cut macro used can be changed.
370
371 Modernize the order of directories in @INC
372 The way @INC is laid out by default, one cannot upgrade core (dual-
373 life) modules without overwriting files. This causes problems for
374 binary package builders. One possible proposal is laid out in this
375 message:
376 <http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2002-04/msg02380.html>.
377
378 -Duse32bit*
379 Natively 64-bit systems need neither -Duse64bitint nor -Duse64bitall.
380 On these systems, it might be the default compilation mode, and there
381 is currently no guarantee that passing no use64bitall option to the
382 Configure process will build a 32bit perl. Implementing -Duse32bit*
383 options would be nice for perl 5.12.
384
385 Profile Perl - am I hot or not?
386 The Perl source code is stable enough that it makes sense to profile
387 it, identify and optimise the hotspots. It would be good to measure the
388 performance of the Perl interpreter using free tools such as
389 cachegrind, gprof, and dtrace, and work to reduce the bottlenecks they
390 reveal.
391
392 As part of this, the idea of pp_hot.c is that it contains the hot ops,
393 the ops that are most commonly used. The idea is that by grouping them,
394 their object code will be adjacent in the executable, so they have a
395 greater chance of already being in the CPU cache (or swapped in) due to
396 being near another op already in use.
397
398 Except that it's not clear if these really are the most commonly used
399 ops. So as part of exercising your skills with coverage and profiling
400 tools you might want to determine what ops really are the most commonly
401 used. And in turn suggest evictions and promotions to achieve a better
402 pp_hot.c.
403
404 One piece of Perl code that might make a good testbed is installman.
405
406 Allocate OPs from arenas
407 Currently all new OP structures are individually malloc()ed and
408 free()d. All "malloc" implementations have space overheads, and are
409 now as fast as custom allocates so it would both use less memory and
410 less CPU to allocate the various OP structures from arenas. The SV
411 arena code can probably be re-used for this.
412
413 Note that Configuring perl with "-Accflags=-DPL_OP_SLAB_ALLOC" will use
414 Perl_Slab_alloc() to pack optrees into a contiguous block, which is
415 probably superior to the use of OP arenas, esp. from a cache locality
416 standpoint. See "Profile Perl - am I hot or not?".
417
418 Improve win32/wince.c
419 Currently, numerous functions look virtually, if not completely,
420 identical in both "win32/wince.c" and "win32/win32.c" files, which
421 can't be good.
422
423 Use secure CRT functions when building with VC8 on Win32
424 Visual C++ 2005 (VC++ 8.x) deprecated a number of CRT functions on the
425 basis that they were "unsafe" and introduced differently named secure
426 versions of them as replacements, e.g. instead of writing
427
428 FILE* f = fopen(__FILE__, "r");
429
430 one should now write
431
432 FILE* f;
433 errno_t err = fopen_s(&f, __FILE__, "r");
434
435 Currently, the warnings about these deprecations have been disabled by
436 adding -D_CRT_SECURE_NO_DEPRECATE to the CFLAGS. It would be nice to
437 remove that warning suppressant and actually make use of the new secure
438 CRT functions.
439
440 There is also a similar issue with POSIX CRT function names like fileno
441 having been deprecated in favour of ISO C++ conformant names like
442 _fileno. These warnings are also currently suppressed by adding
443 -D_CRT_NONSTDC_NO_DEPRECATE. It might be nice to do as Microsoft
444 suggest here too, although, unlike the secure functions issue, there is
445 presumably little or no benefit in this case.
446
447 Fix POSIX::access() and chdir() on Win32
448 These functions currently take no account of DACLs and therefore do not
449 behave correctly in situations where access is restricted by DACLs (as
450 opposed to the read-only attribute).
451
452 Furthermore, POSIX::access() behaves differently for directories having
453 the read-only attribute set depending on what CRT library is being
454 used. For example, the _access() function in the VC6 and VC7 CRTs
455 (wrongly) claim that such directories are not writable, whereas in fact
456 all directories are writable unless access is denied by DACLs. (In the
457 case of directories, the read-only attribute actually only means that
458 the directory cannot be deleted.) This CRT bug is fixed in the VC8 and
459 VC9 CRTs (but, of course, the directory may still not actually be
460 writable if access is indeed denied by DACLs).
461
462 For the chdir() issue, see ActiveState bug #74552:
463 http://bugs.activestate.com/show_bug.cgi?id=74552
464
465 Therefore, DACLs should be checked both for consistency across CRTs and
466 for the correct answer.
467
468 (Note that perl's -w operator should not be modified to check DACLs. It
469 has been written so that it reflects the state of the read-only
470 attribute, even for directories (whatever CRT is being used), for
471 symmetry with chmod().)
472
473 strcat(), strcpy(), strncat(), strncpy(), sprintf(), vsprintf()
474 Maybe create a utility that checks after each libperl.a creation that
475 none of the above (nor sprintf(), vsprintf(), or *SHUDDER* gets()) ever
476 creep back to libperl.a.
477
478 nm libperl.a | ./miniperl -alne '$o = $F[0] if /:$/; print "$o $F[1]" if $F[0] eq "U" && $F[1] =~ /^(?:strn?c(?:at|py)|v?sprintf|gets)$/'
479
480 Note, of course, that this will only tell whether your platform is
481 using those naughty interfaces.
482
483 -D_FORTIFY_SOURCE=2, -fstack-protector
484 Recent glibcs support "-D_FORTIFY_SOURCE=2" and recent gcc (4.1
485 onwards?) supports "-fstack-protector", both of which give protection
486 against various kinds of buffer overflow problems. These should
487 probably be used for compiling Perl whenever available, Configure
488 and/or hints files should be adjusted to probe for the availability of
489 these features and enable them as appropriate.
490
491 Arenas for GPs? For MAGIC?
492 "struct gp" and "struct magic" are both currently allocated by
493 "malloc". It might be a speed or memory saving to change to using
494 arenas. Or it might not. It would need some suitable benchmarking
495 first. In particular, "GP"s can probably be changed with minimal
496 compatibility impact (probably nothing outside of the core, or even
497 outside of gv.c allocates them), but they probably aren't
498 allocated/deallocated often enough for a speed saving. Whereas "MAGIC"
499 is allocated/deallocated more often, but in turn, is also something
500 more externally visible, so changing the rules here may bite external
501 code.
502
503 Shared arenas
504 Several SV body structs are now the same size, notably PVMG and PVGV,
505 PVAV and PVHV, and PVCV and PVFM. It should be possible to allocate and
506 return same sized bodies from the same actual arena, rather than
507 maintaining one arena for each. This could save 4-6K per thread, of
508 memory no longer tied up in the not-yet-allocated part of an arena.
509
511 These tasks would need C knowledge, and roughly the level of knowledge
512 of the perl API that comes from writing modules that use XS to
513 interface to C.
514
515 Remove the use of SVs as temporaries in dump.c
516 dump.c contains debugging routines to dump out the contains of perl
517 data structures, such as "SV"s, "AV"s and "HV"s. Currently, the dumping
518 code uses "SV"s for its temporary buffers, which was a logical initial
519 implementation choice, as they provide ready made memory handling.
520
521 However, they also lead to a lot of confusion when it happens that what
522 you're trying to debug is seen by the code in dump.c, correctly or
523 incorrectly, as a temporary scalar it can use for a temporary buffer.
524 It's also not possible to dump scalars before the interpreter is
525 properly set up, such as during ithreads cloning. It would be good to
526 progressively replace the use of scalars as string accumulation buffers
527 with something much simpler, directly allocated by "malloc". The dump.c
528 code is (or should be) only producing 7 bit US-ASCII, so output
529 character sets are not an issue.
530
531 Producing and proving an internal simple buffer allocation would make
532 it easier to re-write the internals of the PerlIO subsystem to avoid
533 using "SV"s for its buffers, use of which can cause problems similar to
534 those of dump.c, at similar times.
535
536 safely supporting POSIX SA_SIGINFO
537 Some years ago Jarkko supplied patches to provide support for the POSIX
538 SA_SIGINFO feature in Perl, passing the extra data to the Perl signal
539 handler.
540
541 Unfortunately, it only works with "unsafe" signals, because under safe
542 signals, by the time Perl gets to run the signal handler, the extra
543 information has been lost. Moreover, it's not easy to store it
544 somewhere, as you can't call mutexs, or do anything else fancy, from
545 inside a signal handler.
546
547 So it strikes me that we could provide safe SA_SIGINFO support
548
549 1. Provide global variables for two file descriptors
550
551 2. When the first request is made via "sigaction" for "SA_SIGINFO",
552 create a pipe, store the reader in one, the writer in the other
553
554 3. In the "safe" signal handler
555 ("Perl_csighandler()"/"S_raise_signal()"), if the "siginfo_t"
556 pointer non-"NULL", and the writer file handle is open,
557
558 1. serialise signal number, "struct siginfo_t" (or at least
559 the parts we care about) into a small auto char buff
560
561 2. "write()" that (non-blocking) to the writer fd
562
563 1. if it writes 100%, flag the signal in a counter
564 of "signals on the pipe" akin to the current
565 per-signal-number counts
566
567 2. if it writes 0%, assume the pipe is full. Flag
568 the data as lost?
569
570 3. if it writes partially, croak a panic, as your
571 OS is broken.
572
573 4. in the regular "PERL_ASYNC_CHECK()" processing, if there are
574 "signals on the pipe", read the data out, deserialise, build the
575 Perl structures on the stack (code in "Perl_sighandler()", the
576 "unsafe" handler), and call as usual.
577
578 I think that this gets us decent "SA_SIGINFO" support, without the
579 current risk of running Perl code inside the signal handler context.
580 (With all the dangers of things like "malloc" corruption that that
581 currently offers us)
582
583 For more information see the thread starting with this message:
584 http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2008-03/msg00305.html
585
586 autovivification
587 Make all autovivification consistent w.r.t LVALUE/RVALUE and strict/no
588 strict;
589
590 This task is incremental - even a little bit of work on it will help.
591
592 Unicode in Filenames
593 chdir, chmod, chown, chroot, exec, glob, link, lstat, mkdir, open,
594 opendir, qx, readdir, readlink, rename, rmdir, stat, symlink, sysopen,
595 system, truncate, unlink, utime, -X. All these could potentially
596 accept Unicode filenames either as input or output (and in the case of
597 system and qx Unicode in general, as input or output to/from the
598 shell). Whether a filesystem - an operating system pair understands
599 Unicode in filenames varies.
600
601 Known combinations that have some level of understanding include
602 Microsoft NTFS, Apple HFS+ (In Mac OS 9 and X) and Apple UFS (in Mac OS
603 X), NFS v4 is rumored to be Unicode, and of course Plan 9. How to
604 create Unicode filenames, what forms of Unicode are accepted and used
605 (UCS-2, UTF-16, UTF-8), what (if any) is the normalization form used,
606 and so on, varies. Finding the right level of interfacing to Perl
607 requires some thought. Remember that an OS does not implicate a
608 filesystem.
609
610 (The Windows -C command flag "wide API support" has been at least
611 temporarily retired in 5.8.1, and the -C has been repurposed, see
612 perlrun.)
613
614 Most probably the right way to do this would be this: "Virtualize
615 operating system access".
616
617 Unicode in %ENV
618 Currently the %ENV entries are always byte strings. See "Virtualize
619 operating system access".
620
621 Unicode and glob()
622 Currently glob patterns and filenames returned from File::Glob::glob()
623 are always byte strings. See "Virtualize operating system access".
624
625 Unicode and lc/uc operators
626 Some built-in operators ("lc", "uc", etc.) behave differently, based on
627 what the internal encoding of their argument is. That should not be the
628 case. Maybe add a pragma to switch behaviour.
629
630 use less 'memory'
631 Investigate trade offs to switch out perl's choices on memory usage.
632 Particularly perl should be able to give memory back.
633
634 This task is incremental - even a little bit of work on it will help.
635
636 Re-implement ":unique" in a way that is actually thread-safe
637 The old implementation made bad assumptions on several levels. A good
638 90% solution might be just to make ":unique" work to share the string
639 buffer of SvPVs. That way large constant strings can be shared between
640 ithreads, such as the configuration information in Config.
641
642 Make tainting consistent
643 Tainting would be easier to use if it didn't take documented shortcuts
644 and allow taint to "leak" everywhere within an expression.
645
646 readpipe(LIST)
647 system() accepts a LIST syntax (and a PROGRAM LIST syntax) to avoid
648 running a shell. readpipe() (the function behind qx//) could be
649 similarly extended.
650
651 Audit the code for destruction ordering assumptions
652 Change 25773 notes
653
654 /* Need to check SvMAGICAL, as during global destruction it may be that
655 AvARYLEN(av) has been freed before av, and hence the SvANY() pointer
656 is now part of the linked list of SV heads, rather than pointing to
657 the original body. */
658 /* FIXME - audit the code for other bugs like this one. */
659
660 adding the "SvMAGICAL" check to
661
662 if (AvARYLEN(av) && SvMAGICAL(AvARYLEN(av))) {
663 MAGIC *mg = mg_find (AvARYLEN(av), PERL_MAGIC_arylen);
664
665 Go through the core and look for similar assumptions that SVs have
666 particular types, as all bets are off during global destruction.
667
668 Extend PerlIO and PerlIO::Scalar
669 PerlIO::Scalar doesn't know how to truncate(). Implementing this would
670 require extending the PerlIO vtable.
671
672 Similarly the PerlIO vtable doesn't know about formats (write()), or
673 about stat(), or chmod()/chown(), utime(), or flock().
674
675 (For PerlIO::Scalar it's hard to see what e.g. mode bits or ownership
676 would mean.)
677
678 PerlIO doesn't do directories or symlinks, either: mkdir(), rmdir(),
679 opendir(), closedir(), seekdir(), rewinddir(), glob(); symlink(),
680 readlink().
681
682 See also "Virtualize operating system access".
683
684 -C on the #! line
685 It should be possible to make -C work correctly if found on the #!
686 line, given that all perl command line options are strict ASCII, and -C
687 changes only the interpretation of non-ASCII characters, and not for
688 the script file handle. To make it work needs some investigation of the
689 ordering of function calls during startup, and (by implication) a bit
690 of tweaking of that order.
691
692 Duplicate logic in S_method_common() and Perl_gv_fetchmethod_autoload()
693 A comment in "S_method_common" notes
694
695 /* This code tries to figure out just what went wrong with
696 gv_fetchmethod. It therefore needs to duplicate a lot of
697 the internals of that function. We can't move it inside
698 Perl_gv_fetchmethod_autoload(), however, since that would
699 cause UNIVERSAL->can("NoSuchPackage::foo") to croak, and we
700 don't want that.
701 */
702
703 If "Perl_gv_fetchmethod_autoload" gets rewritten to take (more) flag
704 bits, then it ought to be possible to move the logic from
705 "S_method_common" to the "right" place. When making this change it
706 would probably be good to also pass in at least the method name length,
707 if not also pre-computed hash values when known. (I'm contemplating a
708 plan to pre-compute hash values for common fixed strings such as "ISA"
709 and pass them in to functions.)
710
711 Organize error messages
712 Perl's diagnostics (error messages, see perldiag) could use
713 reorganizing and formalizing so that each error message has its stable-
714 for-all-eternity unique id, categorized by severity, type, and
715 subsystem. (The error messages would be listed in a datafile outside
716 of the Perl source code, and the source code would only refer to the
717 messages by the id.) This clean-up and regularizing should apply for
718 all croak() messages.
719
720 This would enable all sorts of things: easier translation/localization
721 of the messages (though please do keep in mind the caveats of
722 Locale::Maketext about too straightforward approaches to translation),
723 filtering by severity, and instead of grepping for a particular error
724 message one could look for a stable error id. (Of course, changing the
725 error messages by default would break all the existing software
726 depending on some particular error message...)
727
728 This kind of functionality is known as message catalogs. Look for
729 inspiration for example in the catgets() system, possibly even use it
730 if available-- but only if available, all platforms will not have
731 catgets().
732
733 For the really pure at heart, consider extending this item to cover
734 also the warning messages (see perllexwarn, "warnings.pl").
735
737 These tasks would need C knowledge, and knowledge of how the
738 interpreter works, or a willingness to learn.
739
740 forbid labels with keyword names
741 Currently "goto keyword" "computes" the label value:
742
743 $ perl -e 'goto print'
744 Can't find label 1 at -e line 1.
745
746 It would be nice to forbid labels with keyword names, to avoid
747 confusion.
748
749 truncate() prototype
750 The prototype of truncate() is currently $$. It should probably be "*$"
751 instead. (This is changed in opcode.pl)
752
753 decapsulation of smart match argument
754 Currently "$foo ~~ $object" will die with the message "Smart matching a
755 non-overloaded object breaks encapsulation". It would be nice to allow
756 to bypass this by using explictly the syntax "$foo ~~ %$object" or
757 "$foo ~~ @$object".
758
759 error reporting of [$a ; $b]
760 Using ";" inside brackets is a syntax error, and we don't propose to
761 change that by giving it any meaning. However, it's not reported very
762 helpfully:
763
764 $ perl -e '$a = [$b; $c];'
765 syntax error at -e line 1, near "$b;"
766 syntax error at -e line 1, near "$c]"
767 Execution of -e aborted due to compilation errors.
768
769 It should be possible to hook into the tokeniser or the lexer, so that
770 when a ";" is parsed where it is not legal as a statement terminator
771 (ie inside "{}" used as a hashref, "[]" or "()") it issues an error
772 something like ';' isn't legal inside an expression - if you need
773 multiple statements use a do {...} block. See the thread starting at
774 http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2008-09/msg00573.html
775
776 lexicals used only once
777 This warns:
778
779 $ perl -we '$pie = 42'
780 Name "main::pie" used only once: possible typo at -e line 1.
781
782 This does not:
783
784 $ perl -we 'my $pie = 42'
785
786 Logically all lexicals used only once should warn, if the user asks for
787 warnings. An unworked RT ticket (#5087) has been open for almost seven
788 years for this discrepancy.
789
790 UTF-8 revamp
791 The handling of Unicode is unclean in many places. For example, the
792 regexp engine matches in Unicode semantics whenever the string or the
793 pattern is flagged as UTF-8, but that should not be dependent on an
794 internal storage detail of the string. Likewise, case folding behaviour
795 is dependent on the UTF8 internal flag being on or off.
796
797 Properly Unicode safe tokeniser and pads.
798 The tokeniser isn't actually very UTF-8 clean. "use utf8;" is a hack -
799 variable names are stored in stashes as raw bytes, without the utf-8
800 flag set. The pad API only takes a "char *" pointer, so that's all
801 bytes too. The tokeniser ignores the UTF-8-ness of "PL_rsfp", or any
802 SVs returned from source filters. All this could be fixed.
803
804 state variable initialization in list context
805 Currently this is illegal:
806
807 state ($a, $b) = foo();
808
809 In Perl 6, "state ($a) = foo();" and "(state $a) = foo();" have
810 different semantics, which is tricky to implement in Perl 5 as
811 currently they produce the same opcode trees. The Perl 6 design is
812 firm, so it would be good to implement the necessary code in Perl 5.
813 There are comments in "Perl_newASSIGNOP()" that show the code paths
814 taken by various assignment constructions involving state variables.
815
816 Implement $value ~~ 0 .. $range
817 It would be nice to extend the syntax of the "~~" operator to also
818 understand numeric (and maybe alphanumeric) ranges.
819
820 A does() built-in
821 Like ref(), only useful. It would call the "DOES" method on objects; it
822 would also tell whether something can be dereferenced as an
823 array/hash/etc., or used as a regexp, etc.
824 <http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2007-03/msg00481.html>
825
826 Tied filehandles and write() don't mix
827 There is no method on tied filehandles to allow them to be called back
828 by formats.
829
830 Propagate compilation hints to the debugger
831 Currently a debugger started with -dE on the command-line doesn't see
832 the features enabled by -E. More generally hints ($^H and "%^H") aren't
833 propagated to the debugger. Probably it would be a good thing to
834 propagate hints from the innermost non-"DB::" scope: this would make
835 code eval'ed in the debugger see the features (and strictures, etc.)
836 currently in scope.
837
838 Attach/detach debugger from running program
839 The old perltodo notes "With "gdb", you can attach the debugger to a
840 running program if you pass the process ID. It would be good to do this
841 with the Perl debugger on a running Perl program, although I'm not sure
842 how it would be done." ssh and screen do this with named pipes in /tmp.
843 Maybe we can too.
844
845 LVALUE functions for lists
846 The old perltodo notes that lvalue functions don't work for list or
847 hash slices. This would be good to fix.
848
849 regexp optimiser optional
850 The regexp optimiser is not optional. It should configurable to be, to
851 allow its performance to be measured, and its bugs to be easily
852 demonstrated.
853
854 delete &function
855 Allow to delete functions. One can already undef them, but they're
856 still in the stash.
857
858 "/w" regex modifier
859 That flag would enable to match whole words, and also to interpolate
860 arrays as alternations. With it, "/P/w" would be roughly equivalent to:
861
862 do { local $"='|'; /\b(?:P)\b/ }
863
864 See
865 <http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2007-01/msg00400.html>
866 for the discussion.
867
868 optional optimizer
869 Make the peephole optimizer optional. Currently it performs two tasks
870 as it walks the optree - genuine peephole optimisations, and necessary
871 fixups of ops. It would be good to find an efficient way to switch out
872 the optimisations whilst keeping the fixups.
873
874 You WANT *how* many
875 Currently contexts are void, scalar and list. split has a special
876 mechanism in place to pass in the number of return values wanted. It
877 would be useful to have a general mechanism for this, backwards
878 compatible and little speed hit. This would allow proposals such as
879 short circuiting sort to be implemented as a module on CPAN.
880
881 lexical aliases
882 Allow lexical aliases (maybe via the syntax "my \$alias = \$foo".
883
884 entersub XS vs Perl
885 At the moment pp_entersub is huge, and has code to deal with entering
886 both perl and XS subroutines. Subroutine implementations rarely change
887 between perl and XS at run time, so investigate using 2 ops to enter
888 subs (one for XS, one for perl) and swap between if a sub is redefined.
889
890 Self-ties
891 Self-ties are currently illegal because they caused too many segfaults.
892 Maybe the causes of these could be tracked down and self-ties on all
893 types reinstated.
894
895 Optimize away @_
896 The old perltodo notes "Look at the "reification" code in "av.c"".
897
898 The yada yada yada operators
899 Perl 6's Synopsis 3 says:
900
901 The ... operator is the "yada, yada, yada" list operator, which is used
902 as the body in function prototypes. It complains bitterly (by calling
903 fail) if it is ever executed. Variant ??? calls warn, and !!! calls
904 die.
905
906 Those would be nice to add to Perl 5. That could be done without new
907 ops.
908
909 Virtualize operating system access
910 Implement a set of "vtables" that virtualizes operating system access
911 (open(), mkdir(), unlink(), readdir(), getenv(), etc.) At the very
912 least these interfaces should take SVs as "name" arguments instead of
913 bare char pointers; probably the most flexible and extensible way would
914 be for the Perl-facing interfaces to accept HVs. The system needs to
915 be per-operating-system and per-file-system hookable/filterable,
916 preferably both from XS and Perl level ("Files and Filesystems" in
917 perlport is good reading at this point, in fact, all of perlport is.)
918
919 This has actually already been implemented (but only for Win32), take a
920 look at iperlsys.h and win32/perlhost.h. While all Win32 variants go
921 through a set of "vtables" for operating system access, non-Win32
922 systems currently go straight for the POSIX/UNIX-style system/library
923 call. Similar system as for Win32 should be implemented for all
924 platforms. The existing Win32 implementation probably does not need to
925 survive alongside this proposed new implementation, the approaches
926 could be merged.
927
928 What would this give us? One often-asked-for feature this would enable
929 is using Unicode for filenames, and other "names" like %ENV, usernames,
930 hostnames, and so forth. (See "When Unicode Does Not Happen" in
931 perlunicode.)
932
933 But this kind of virtualization would also allow for things like
934 virtual filesystems, virtual networks, and "sandboxes" (though as long
935 as dynamic loading of random object code is allowed, not very safe
936 sandboxes since external code of course know not of Perl's vtables).
937 An example of a smaller "sandbox" is that this feature can be used to
938 implement per-thread working directories: Win32 already does this.
939
940 See also "Extend PerlIO and PerlIO::Scalar".
941
942 Investigate PADTMP hash pessimisation
943 The peephole optimiser converts constants used for hash key lookups to
944 shared hash key scalars. Under ithreads, something is undoing this
945 work. See
946 http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2007-09/msg00793.html
947
948 Store the current pad in the OP slab allocator
949 Currently we leak ops in various cases of parse failure. I suggested
950 that we could solve this by always using the op slab allocator, and
951 walking it to free ops. Dave comments that as some ops are already
952 freed during optree creation one would have to mark which ops are
953 freed, and not double free them when walking the slab. He notes that
954 one problem with this is that for some ops you have to know which pad
955 was current at the time of allocation, which does change. I suggested
956 storing a pointer to the current pad in the memory allocated for the
957 slab, and swapping to a new slab each time the pad changes. Dave thinks
958 that this would work.
959
960 repack the optree
961 Repacking the optree after execution order is determined could allow
962 removal of NULL ops, and optimal ordering of OPs with respect to cache-
963 line filling. The slab allocator could be reused for this purpose. I
964 think that the best way to do this is to make it an optional step just
965 before the completed optree is attached to anything else, and to use
966 the slab allocator unchanged, so that freeing ops is identical whether
967 or not this step runs. Note that the slab allocator allocates ops
968 downwards in memory, so one would have to actually "allocate" the ops
969 in reverse-execution order to get them contiguous in memory in
970 execution order.
971
972 See
973 http://www.nntp.perl.org/group/perl.perl5.porters/2007/12/msg131975.html
974
975 Note that running this copy, and then freeing all the old location ops
976 would cause their slabs to be freed, which would eliminate possible
977 memory wastage if the previous suggestion is implemented, and we swap
978 slabs more frequently.
979
980 eliminate incorrect line numbers in warnings
981 This code
982
983 use warnings;
984 my $undef;
985
986 if ($undef == 3) {
987 } elsif ($undef == 0) {
988 }
989
990 used to produce this output:
991
992 Use of uninitialized value in numeric eq (==) at wrong.pl line 4.
993 Use of uninitialized value in numeric eq (==) at wrong.pl line 4.
994
995 where the line of the second warning was misreported - it should be
996 line 5. Rafael fixed this - the problem arose because there was no
997 nextstate OP between the execution of the "if" and the "elsif", hence
998 "PL_curcop" still reports that the currently executing line is line 4.
999 The solution was to inject a nextstate OPs for each "elsif", although
1000 it turned out that the nextstate OP needed to be a nulled OP, rather
1001 than a live nextstate OP, else other line numbers became misreported.
1002 (Jenga!)
1003
1004 The problem is more general than "elsif" (although the "elsif" case is
1005 the most common and the most confusing). Ideally this code
1006
1007 use warnings;
1008 my $undef;
1009
1010 my $a = $undef + 1;
1011 my $b
1012 = $undef
1013 + 1;
1014
1015 would produce this output
1016
1017 Use of uninitialized value $undef in addition (+) at wrong.pl line 4.
1018 Use of uninitialized value $undef in addition (+) at wrong.pl line 7.
1019
1020 (rather than lines 4 and 5), but this would seem to require every OP to
1021 carry (at least) line number information.
1022
1023 What might work is to have an optional line number in memory just
1024 before the BASEOP structure, with a flag bit in the op to say whether
1025 it's present. Initially during compile every OP would carry its line
1026 number. Then add a late pass to the optimiser (potentially combined
1027 with "repack the optree") which looks at the two ops on every edge of
1028 the graph of the execution path. If the line number changes, flags the
1029 destination OP with this information. Once all paths are traced,
1030 replace every op with the flag with a nextstate-light op (that just
1031 updates "PL_curcop"), which in turn then passes control on to the true
1032 op. All ops would then be replaced by variants that do not store the
1033 line number. (Which, logically, why it would work best in conjunction
1034 with "repack the optree", as that is already copying/reallocating all
1035 the OPs)
1036
1037 (Although I should note that we're not certain that doing this for the
1038 general case is worth it)
1039
1040 optimize tail-calls
1041 Tail-calls present an opportunity for broadly applicable optimization;
1042 anywhere that "return foo(...)" is called, the outer return can be
1043 replaced by a goto, and foo will return directly to the outer caller,
1044 saving (conservatively) 25% of perl's call&return cost, which is
1045 relatively higher than in C. The scheme language is known to do this
1046 heavily. B::Concise provides good insight into where this optimization
1047 is possible, ie anywhere entersub,leavesub op-sequence occurs.
1048
1049 perl -MO=Concise,-exec,a,b,-main -e 'sub a{ 1 }; sub b {a()}; b(2)'
1050
1051 Bottom line on this is probably a new pp_tailcall function which
1052 combines the code in pp_entersub, pp_leavesub. This should probably be
1053 done 1st in XS, and using B::Generate to patch the new OP into the
1054 optrees.
1055
1056 "\N"
1057 It should be possible to add a "\N" regex assertion, meaning "every
1058 character except "\n"AX independently of the context. That would of
1059 course imply that "\N" couldn't be followed by an opening "{".
1060
1062 Tasks that will get your name mentioned in the description of the
1063 "Highlights of 5.12"
1064
1065 make ithreads more robust
1066 Generally make ithreads more robust. See also "iCOW"
1067
1068 This task is incremental - even a little bit of work on it will help,
1069 and will be greatly appreciated.
1070
1071 One bit would be to write the missing code in sv.c:Perl_dirp_dup.
1072
1073 Fix Perl_sv_dup, et al so that threads can return objects.
1074
1075 iCOW
1076 Sarathy and Arthur have a proposal for an improved Copy On Write which
1077 specifically will be able to COW new ithreads. If this can be
1078 implemented it would be a good thing.
1079
1080 (?{...}) closures in regexps
1081 Fix (or rewrite) the implementation of the "/(?{...})/" closures.
1082
1083 A re-entrant regexp engine
1084 This will allow the use of a regex from inside (?{ }), (??{ }) and
1085 (?(?{ })|) constructs.
1086
1087 Add class set operations to regexp engine
1088 Apparently these are quite useful. Anyway, Jeffery Friedl wants them.
1089
1090 demerphq has this on his todo list, but right at the bottom.
1091
1093 [ Each and every one of these may be obsolete, but they were listed
1094 in the old Todo.micro file]
1095
1096 make creating uconfig.sh automatic
1097 make creating Makefile.micro automatic
1098 do away with fork/exec/wait?
1099 (system, popen should be enough?)
1100
1101 some of the uconfig.sh really needs to be probed (using cc) in buildtime:
1102 (uConfigure? :-) native datatype widths and endianness come to mind
1103
1104
1105
1106perl v5.10.1 2009-08-10 PERLTODO(1)