1ltx2crossrefxml(1)            LATEX CROSSREFWARE            ltx2crossrefxml(1)
2
3
4

NAME

6       ltx2crossrefxml.pl - create XML files for submitting to crossref.org
7

SYNOPSIS

9       ltx2crossrefxml [-c config_file]  [-o output_file] [-input-is-xml]
10                       latex_file1 latex_file2 ...
11

OPTIONS

13       -c config_file
14           Configuration file.  If this file is absent, defaults are used.
15           See below for its format.
16
17       -o output_file
18           Output file.  If this option is not used, the XML is output to
19           stdout.
20
21       -rpi-is-xml
22           Do not transform author and title input strings, assume they are
23           valid XML.
24
25       The usual "--help" and "--version" options are also supported. Options
26       can begin with either "-" or "--", and ordered arbitrarily.
27

DESCRIPTION

29       For each given latex_file, this script reads ".rpi" and (if they exist)
30       ".bbl" files and outputs corresponding XML that can be uploaded to
31       Crossref (<https://crossref.org>). Any extension of latex_file is
32       ignored, and latex_file itself is not read (and need not even exist).
33
34       Each ".rpi" file specifies the metadata for a single article to be
35       uploaded to Crossref (a "journal_article" element in their schema); an
36       example is below. These files are output by the "resphilosophica"
37       package (<https://ctan.org/pkg/resphilosophica>), but (as always) can
38       also be created by hand or by whatever other method you implement.
39
40       Any ".bbl" files present are used for the citation information in the
41       output XML. See the CITATIONS section below.
42
43       Unless "--rpi-is-xml" is specified, for all text (authors, title,
44       citations), standard TeX control sequences are replaced with plain text
45       or UTF-8 or eliminated, as appropriate. The "LaTeX::ToUnicode::convert"
46       routine is used for this (<https://ctan.org/pkg/bibtexperllibs>).
47       Tricky TeX control sequences will almost surely not be handled
48       correctly. If "--rpi-is-xml" is given, the author and title strings
49       from the rpi files are output as-is, assuming they are valid XML; no
50       checking is done. Citation text from ".bbl" files is always converted
51       from LaTeX to plain text.
52
53       This script just writes an XML file. It's up to you to actually do the
54       uploading to Crossref; for example, you can use their Java tool
55       "crossref-upload-tool.jar"
56       (<https://www.crossref.org/education/member-setup/direct-deposit-xml/https-post>).
57       For the definition of their schema, see
58       <https://data.crossref.org/reports/help/schema_doc/4.4.2/index.html>
59       (this is the schema version currently followed by this script).
60

CONFIGURATION FILE FORMAT

62       The configuration file is read as Perl code. Thus, comment lines
63       starting with "#" and blank lines are ignored. The other lines are
64       typically assignments in the form (spaces are optional):
65
66           $variable = value ;
67
68       Usually the value is a "string" enclosed in ASCII double-quote or
69       single-quote characters, per Perl syntax. The idea is to specify the
70       user-specific and journal-specific values needed for the Crossref
71       upload. The variables which are used are these:
72
73           $depositorName = "Depositor Name";
74           $depositorEmail = 'depositor@example.org';
75           $registrant = 'Registrant';  # organization name
76           $fullTitle = "FULL TITLE";   # journal name
77           $issn = "1234-5678";         # required
78           $abbrevTitle = "ABBR. TTL."; # optional
79           $coden = "CODEN";            # optional
80
81       For a given run, all ".rpi" data read is assumed to belong to the
82       journal that is specified in the configuration file. More precisely,
83       the configuration data is written as a "journal_metadata" element, with
84       given "full_title", "issn", etc., and then each ".rpi" is written as
85       "journal_issue" plus "journal_article" elements.
86
87       The configuration file can also define one Perl function:
88       "LaTeX_ToUnicode_convert_hook". If it is defined, it is called at the
89       beginning of the procedure that converts LaTeX text to Unicode, which
90       is done with the LaTeX::ToUnicode module, from the "bibtexperllibs"
91       package (<https://ctan.org/pkg/bibtexperllibs>). The function must
92       accept one string (the LaTeX text), and return one string (presumably
93       the transformed string). The standard conversions are then applied to
94       the returned string, so the configured function need only handle
95       special cases, such as control sequences particular to the journal at
96       hand.
97

RPI FILE FORMAT

99       Here's the (relevant part of the) ".rpi" file corresponding to the
100       "rpsample.tex" example in the "resphilosophica" package
101       (<https://ctan.org/pkg/resphilosophica>):
102
103         %authors=Boris Veytsman\and A. U. Th{\o }r\and C. O. R\"espondent
104         %title=A Sample Paper:\\ \emph  {A Template}
105         %year=2012
106         %volume=90
107         %issue=1--2
108         %startpage=1
109         %endpage=1
110         %doi=10.11612/resphil.A31245
111         %paperUrl=http://borisv.lk.net/paper12
112         %publicationType=full_text
113
114       Other lines, some not beginning with %, are ignored (and not shown).
115       For more details on processing, see the code.
116
117       The %paperUrl value is what will be associated with the given %doi
118       (output as the "resource" element). Crossref strongly recommends that
119       the url be for a so-called landing page, and not directly for a pdf
120       (<https://www.crossref.org/education/member-setup/creating-a-landing-page/>).
121       Special case: if the url is not specified, and the journal is
122       Res Philosophica, a special-purpose search url using pdcnet.org is
123       returned.  Any other journal must always specify this.
124
125       The %authors field is split at "\and" (ignoring whitespace before and
126       after), and output as the "contributors" element, using
127       "sequence="first"" for the first listed, "sequence="additional"" for
128       the remainder.
129
130       If the %publicationType is not specified, it defaults to "full_text",
131       since that has historically been the case; "full_text" can also be
132       given explicitly. The other values allowed by the Crossref schema are
133       "abstract_only" and "bibliographic_record". Finally, if the value is
134       "omit", the "publication_type" attribute is omitted entirely from the
135       given "journal_article" element.
136
137       Each ".rpi" must contain information for only one article, but multiple
138       files can be read in a single run. It would not be difficult to support
139       multiple articles in a single ".rpi" file, but it makes debugging and
140       error correction easier when each uploaded XML contains a single
141       article.
142
143   MORE ABOUT AUTHOR NAMES
144       The three formats for names recognized are (not coincidentally) the
145       same as BibTeX:
146
147          First von Last
148          von Last, First
149          von Last, Jr., First
150
151       The forms can be freely intermixed within a single %authors line,
152       separated with "\and" (including the backslash). Commas as name
153       separators are not supported, unlike BibTeX.
154
155       In short, you may almost always use the first form; you shouldn't if
156       either there's a Jr part, or the Last part has multiple tokens but
157       there's no von part. See the "btxdoc" (``BibTeXing'' by Oren Patashnik)
158       document for details.
159
160       In the %authors line of a ".rpi" file, some secondary directives are
161       recognized, indicated by "|" characters. Easiest to explain with an
162       example:
163
164         %authors=|organization|\LaTeX\ Project Team \and Alex Brown|orcid=123
165
166       Thus: 1) if "|organization|" is specified, the author name will be
167       output as an "organization" contributor, instead of the usual
168       "person_name", as the Crossref schema requires.
169
170       2) If "|orcid=value|" is specified, the value is output as an "ORCID"
171       element for that "person_name".
172
173       These two directives, "|organization"| and "|orcid|" are mutually
174       exclusive, because that's how the Crossref schema defines them. The "="
175       sign after "orcid" is required, while all spaces after the "orcid"
176       keyword are ignored. Other than that, the ORCID value is output
177       literally. (E.g., the ORCID value of 123 above is clearly invalid, but
178       it would be output anyway, with no warning.)
179
180       Extra "|" characters, at the beginning or end of the entire %authors
181       string, or doubled in the middle, are accepted and ignored. Whitespace
182       is ignored around all "|" characters.
183

CITATIONS

185       Each ".bbl" file corresponding to an input ".rpi" file is read and used
186       to output a "citation_list" element for that "journal_article" in the
187       output XML. If no ".bbl" file exists for a given ".rpi", no
188       "citation_list" is output for that article.
189
190       The ".bbl" processing is rudimentary: only so-called
191       "unstructured_citation" references are produced for Crossref, that is,
192       the contents of the citation (each paragraph in the ".bbl") is dumped
193       as a single flat string without markup.
194
195       Bibliography text is unconditionally converted from TeX to XML, via the
196       method described above. It is not unusual for the conversion to be
197       incomplete or incorrect.  It is up to you to check for this; e.g., if
198       any backslashes remain in the output, it is most likely an error.
199
200       Furthermore, it is assumed that the ".bbl" file contains a sequence of
201       references, each starting with "\bibitem{KEY}" (which itself must be at
202       the beginning of a line, preceded only by whitespace), and the whole
203       bibliography ending with "\end{thebibliography}" (similarly at the
204       beginning of a line). A bibliography not following this format will not
205       produce useful results. Bibliographies can be created by hand, or with
206       BibTeX, or any other method.
207
208       The "key" attribute for the "citation" element is taken as the KEY
209       argument to the "\bibitem" command. The sequential number of the
210       citation (1, 2, ...) is appended. The argument to "\bibitem" can be
211       empty ("\bibitem{}", and the sequence number will be used on its own.
212       Although TeX will not handle empty "\bibitem" keys, it can be
213       convenient when creating a ".bbl" purely for Crossref.
214
215       The ".rpi" file is also checked for the bibliography information, in
216       this same format.
217
218       Feature request: if anyone is interested in figuring out how to
219       generate structured citations
220       (<https://data.crossref.org/reports/help/schema_doc/4.4.2/schema_4_4_2.html#citation>)
221       instead of these flat text dumps, that would be great.
222

EXAMPLES

224         ltx2crossrefxml.pl ../paper1/paper1.tex ../paper2/paper2.tex \
225                             -o result.xml
226
227         ltx2crossrefxml.pl -c myconfig.cfg paper.tex -o paper.xml
228

AUTHOR

230       Boris Veytsman <https://github.com/borisveytsman/crossrefware>
231
233       Copyright (C) 2012-2022  Boris Veytsman
234
235       This is free software.  You may redistribute copies of it under the
236       terms of the GNU General Public License (any version)
237       <https://www.gnu.org/licenses/gpl.html>.  There is NO WARRANTY, to the
238       extent permitted by law.
239
240
241
242                                  2022-10-18                ltx2crossrefxml(1)
Impressum