1ltx2crossrefxml(1) LATEX CROSSREFWARE ltx2crossrefxml(1)
2
3
4
6 ltx2crossrefxml.pl - create XML files for submitting to crossref.org
7
9 ltx2crossrefxml [-c config_file] [-o output_file] [-input-is-xml]
10 latex_file1 latex_file2 ...
11
13 -c config_file
14 Configuration file. If this file is absent, defaults are used.
15 See below for its format.
16
17 -o output_file
18 Output file. If this option is not used, the XML is output to
19 stdout.
20
21 -rpi-is-xml
22 Do not transform author and title input strings, assume they are
23 valid XML.
24
25 The usual "--help" and "--version" options are also supported. Options
26 can begin with either "-" or "--", and ordered arbitrarily.
27
29 For each given latex_file, this script reads ".rpi" and (if they exist)
30 ".bbl" files and outputs corresponding XML that can be uploaded to
31 Crossref (<https://crossref.org>). Any extension of latex_file is
32 ignored, and latex_file itself is not read (and need not even exist).
33
34 Each ".rpi" file specifies the metadata for a single article to be
35 uploaded to Crossref (a "journal_article" element in their schema); an
36 example is below. These files are output by the "resphilosophica"
37 package (<https://ctan.org/pkg/resphilosophica>), but (as always) can
38 also be created by hand or by whatever other method you implement.
39
40 Any ".bbl" files present are used for the citation information in the
41 output XML. See the CITATIONS section below.
42
43 Unless "--rpi-is-xml" is specified, for all text (authors, title,
44 citations), standard TeX control sequences are replaced with plain text
45 or UTF-8 or eliminated, as appropriate. The "LaTeX::ToUnicode::convert"
46 routine is used for this (<https://ctan.org/pkg/bibtexperllibs>).
47 Tricky TeX control sequences will almost surely not be handled
48 correctly. If "--rpi-is-xml" is given, the author and title strings
49 from the rpi files are output as-is, assuming they are valid XML; no
50 checking is done. Citation text from ".bbl" files is always converted
51 from LaTeX to plain text.
52
53 This script just writes an XML file. It's up to you to actually do the
54 uploading to Crossref; for example, you can use their Java tool
55 "crossref-upload-tool.jar"
56 (<https://www.crossref.org/education/member-setup/direct-deposit-xml/https-post>).
57 For the definition of their schema, see
58 <https://data.crossref.org/reports/help/schema_doc/4.4.2/index.html>
59 (this is the schema version currently followed by this script).
60
62 The configuration file is read as Perl code. Thus, comment lines
63 starting with "#" and blank lines are ignored. The other lines are
64 typically assignments in the form (spaces are optional):
65
66 $variable = value ;
67
68 Usually the value is a "string" enclosed in ASCII double-quote or
69 single-quote characters, per Perl syntax. The idea is to specify the
70 user-specific and journal-specific values needed for the Crossref
71 upload. The variables which are used are these:
72
73 $depositorName = "Depositor Name";
74 $depositorEmail = 'depositor@example.org';
75 $registrant = 'Registrant'; # organization name
76 $fullTitle = "FULL TITLE"; # journal name
77 $issn = "1234-5678"; # required
78 $abbrevTitle = "ABBR. TTL."; # optional
79 $coden = "CODEN"; # optional
80
81 For a given run, all ".rpi" data read is assumed to belong to the
82 journal that is specified in the configuration file. More precisely,
83 the configuration data is written as a "journal_metadata" element, with
84 given "full_title", "issn", etc., and then each ".rpi" is written as
85 "journal_issue" plus "journal_article" elements.
86
87 The configuration file can also define one Perl function:
88 "LaTeX_ToUnicode_convert_hook". If it is defined, it is called at the
89 beginning of the procedure that converts LaTeX text to Unicode, which
90 is done with the LaTeX::ToUnicode module, from the "bibtexperllibs"
91 package (<https://ctan.org/pkg/bibtexperllibs>). The function must
92 accept one string (the LaTeX text), and return one string (presumably
93 the transformed string). The standard conversions are then applied to
94 the returned string, so the configured function need only handle
95 special cases, such as control sequences particular to the journal at
96 hand.
97
99 Here's the (relevant part of the) ".rpi" file corresponding to the
100 "rpsample.tex" example in the "resphilosophica" package
101 (<https://ctan.org/pkg/resphilosophica>):
102
103 %authors=Boris Veytsman\and A. U. Th{\o }r\and C. O. R\"espondent
104 %title=A Sample Paper:\\ \emph {A Template}
105 %year=2012
106 %volume=90
107 %issue=1--2
108 %startpage=1
109 %endpage=1
110 %doi=10.11612/resphil.A31245
111 %paperUrl=http://borisv.lk.net/paper12
112 %publicationType=full_text
113
114 Other lines, some not beginning with %, are ignored (and not shown).
115 For more details on processing, see the code.
116
117 The %paperUrl value is what will be associated with the given %doi
118 (output as the "resource" element). Crossref strongly recommends that
119 the url be for a so-called landing page, and not directly for a pdf
120 (<https://www.crossref.org/education/member-setup/creating-a-landing-page/>).
121 Special case: if the url is not specified, and the journal is
122 Res Philosophica, a special-purpose search url using pdcnet.org is
123 returned. Any other journal must always specify this.
124
125 The %authors field is split at "\and" (ignoring whitespace before and
126 after), and output as the "contributors" element, using
127 "sequence="first"" for the first listed, "sequence="additional"" for
128 the remainder.
129
130 If the %publicationType is not specified, it defaults to "full_text",
131 since that has historically been the case; "full_text" can also be
132 given explicitly. The other values allowed by the Crossref schema are
133 "abstract_only" and "bibliographic_record". Finally, if the value is
134 "omit", the "publication_type" attribute is omitted entirely from the
135 given "journal_article" element.
136
137 Each ".rpi" must contain information for only one article, but multiple
138 files can be read in a single run. It would not be difficult to support
139 multiple articles in a single ".rpi" file, but it makes debugging and
140 error correction easier when each uploaded XML contains a single
141 article.
142
143 MORE ABOUT AUTHOR NAMES
144 The three formats for names recognized are (not coincidentally) the
145 same as BibTeX:
146
147 First von Last
148 von Last, First
149 von Last, Jr., First
150
151 The forms can be freely intermixed within a single %authors line,
152 separated with "\and" (including the backslash). Commas as name
153 separators are not supported, unlike BibTeX.
154
155 In short, you may almost always use the first form; you shouldn't if
156 either there's a Jr part, or the Last part has multiple tokens but
157 there's no von part. See the "btxdoc" (``BibTeXing'' by Oren Patashnik)
158 document for details.
159
160 In the %authors line of a ".rpi" file, some secondary directives are
161 recognized, indicated by "|" characters. Easiest to explain with an
162 example:
163
164 %authors=|organization|\LaTeX\ Project Team \and Alex Brown|orcid=123
165
166 Thus: 1) if "|organization|" is specified, the author name will be
167 output as an "organization" contributor, instead of the usual
168 "person_name", as the Crossref schema requires.
169
170 2) If "|orcid=value|" is specified, the value is output as an "ORCID"
171 element for that "person_name".
172
173 These two directives, "|organization"| and "|orcid|" are mutually
174 exclusive, because that's how the Crossref schema defines them. The "="
175 sign after "orcid" is required, while all spaces after the "orcid"
176 keyword are ignored. Other than that, the ORCID value is output
177 literally. (E.g., the ORCID value of 123 above is clearly invalid, but
178 it would be output anyway, with no warning.)
179
180 Extra "|" characters, at the beginning or end of the entire %authors
181 string, or doubled in the middle, are accepted and ignored. Whitespace
182 is ignored around all "|" characters.
183
185 Each ".bbl" file corresponding to an input ".rpi" file is read and used
186 to output a "citation_list" element for that "journal_article" in the
187 output XML. If no ".bbl" file exists for a given ".rpi", no
188 "citation_list" is output for that article.
189
190 The ".bbl" processing is rudimentary: only so-called
191 "unstructured_citation" references are produced for Crossref, that is,
192 the contents of the citation (each paragraph in the ".bbl") is dumped
193 as a single flat string without markup.
194
195 Bibliography text is unconditionally converted from TeX to XML, via the
196 method described above. It is not unusual for the conversion to be
197 incomplete or incorrect. It is up to you to check for this; e.g., if
198 any backslashes remain in the output, it is most likely an error.
199
200 Furthermore, it is assumed that the ".bbl" file contains a sequence of
201 references, each starting with "\bibitem{KEY}" (which itself must be at
202 the beginning of a line, preceded only by whitespace), and the whole
203 bibliography ending with "\end{thebibliography}" (similarly at the
204 beginning of a line). A bibliography not following this format will not
205 produce useful results. Bibliographies can be created by hand, or with
206 BibTeX, or any other method.
207
208 The "key" attribute for the "citation" element is taken as the KEY
209 argument to the "\bibitem" command. The sequential number of the
210 citation (1, 2, ...) is appended. The argument to "\bibitem" can be
211 empty ("\bibitem{}", and the sequence number will be used on its own.
212 Although TeX will not handle empty "\bibitem" keys, it can be
213 convenient when creating a ".bbl" purely for Crossref.
214
215 The ".rpi" file is also checked for the bibliography information, in
216 this same format.
217
218 Feature request: if anyone is interested in figuring out how to
219 generate structured citations
220 (<https://data.crossref.org/reports/help/schema_doc/4.4.2/schema_4_4_2.html#citation>)
221 instead of these flat text dumps, that would be great.
222
224 ltx2crossrefxml.pl ../paper1/paper1.tex ../paper2/paper2.tex \
225 -o result.xml
226
227 ltx2crossrefxml.pl -c myconfig.cfg paper.tex -o paper.xml
228
230 Boris Veytsman <https://github.com/borisveytsman/crossrefware>
231
233 Copyright (C) 2012-2022 Boris Veytsman
234
235 This is free software. You may redistribute copies of it under the
236 terms of the GNU General Public License (any version)
237 <https://www.gnu.org/licenses/gpl.html>. There is NO WARRANTY, to the
238 extent permitted by law.
239
240
241
242 2022-10-18 ltx2crossrefxml(1)