/*********************************************************************** @C-file{ author = "Nelson H. F. Beebe", version = "2.08 WBL", date = "13 January 1995", time = "13:49:45 MST", filename = "bibclean.c", address = "Center for Scientific Computing Department of Mathematics University of Utah Salt Lake City, UT 84112 USA", telephone = "+1 801 581 5254", FAX = "+1 801 581 4148", checksum = "61000 7095 24304 187134", email = "beebe@math.utah.edu (Internet)", codetable = "ISO/ASCII", keywords = "prettyprint, bibliography", supported = "yes", docstring = {Prettyprint one or more BibTeX files on stdin, or specified files, to stdout, and check the brace balance and value strings as well. Text outside @item-type{...} BibTeX entries is passed through verbatim, except that trailing blanks are trimmed. BibTeX items are formatted into a consistent structure with one field = "value" pair per line, and the initial @ and trailing right brace in column 1. Long values are split at a blank and continued onto the next line with leading indentation. Tabs are expanded into blank strings; their use is discouraged because they inhibit portability, and can suffer corruption in electronic mail. Braced strings are converted to quoted strings. This format facilitates the later application of simple filters to process the text for extraction of items, and also is the one expected by the GNU Emacs BibTeX support functions. Usage: bibclean [ -author ] [ -error-log filename ] [ -help ] [ '-?' ] [ -init-file filename ] [ -max-width # ] [ -[no-]check-values ] [ -[no-]delete-empty-values ] [ -[no-]file-position ] [ -[no-]fix-font-changes ] [ -[no-]fix-initials ] [ -[no-]fix-names ] [ -[no-]par-breaks ] [ -[no-]prettyprint ] [ -[no-]print-patterns ] [ -[no-]read-init-files ] [ -[no-]remove-OPT-prefixes ] [ -[no-]scribe ] [ -[no-]trace-file-opening ] [ -[no-]warnings ] [ -version ] outfile The checksum field above contains a CRC-16 checksum as the first value, followed by the equivalent of the standard UNIX wc (word count) utility output of lines, words, and characters. This is produced by Robert Solovay's checksum utility.}, } ***********************************************************************/ /*********************************************************************** The formatting should perhaps be user-customizable; that is left for future work. The major goal has been to convert entries to the standard form @item-type{citation-key, field = "value", field = "value", ... } while applying heuristics to permit early error detection. If the input file is syntactically correct for BibTeX and LaTeX, this is reasonably easy. If the file has errors, error recovery is attempted, but cannot be guaranteed to be successful; however, the output file, and stderr, will contain an error message that should localize the error to a single entry where a human can find it more easily than a computer can. To facilitate error checking and recovery, the following conditions are used: @ starts a BibTeX entry only if it occurs at brace level 0 and is not preceded by non-blank text on the same line. " is significant only at brace level 1. {} are expected to occur at @-level 1 or higher } at beginning of line ends a BibTeX entry Backslashes preceding these 4 characters remove their special significance. These heuristics are needed to deal with legal value strings like {..."...} "...{..}..." and will flag as errors strings like "...{..." "...}..." The special treatment of @ and } at beginning of line attempts to detect errors in entries before the rest of the file is swallowed up in an attempt to complete an unclosed entry. The output bibliography file should be processed by BibTeX and the LaTeX without errors before discarding the original bibliography file. We do our own output and line buffering here so as to be able to trim trailing blanks, and output data in rather large blocks for efficiency (in filters of this type, I/O accounts for the bulk of the processing, so large output buffers offer significant performance gains). The -scribe option enables recognition of the extended syntax used by the Scribe document formatting system, originally developed by Brian Reid at Carnegie-Mellon University, and now marketed by Unilogic, Ltd. I have followed the syntax description in the Scribe Introductory User's Manual, 3rd Edition, May 1980. Scribe extensions include these features: (1) Letter case is not significant in field names and entry names, but case is preserved in value strings. (2) In field/value pairs, the field and value may be separated by one of three characters: =, /, or space. Space may optionally surround these separators. (3) Value delimiters are any of these seven pairs: { } [ ] ( ) < > ' ' " " ` ` (4) Value delimiters may not be nested, even when with the first four delimiter pairs, nested balanced delimiters would be unambiguous. (5) Delimiters can be omitted around values that contain only letters, digits, sharp (#), ampersand (&), period (.), and percent (%). (6) A literal at-sign (@) is represented by doubled at-signs (@@). (7) Bibliography entries begin with @name, as for BibTeX, but any of the seven Scribe value delimiters may be used to surround the field/value pairs. As in (4), nested delimiters are forbidden. (8) Arbitrary space may separate entry names from the following delimiters. (9) @Comment is a special command whose delimited value is discared. As in (4), nested delimiters are forbidden. (10) The special form @Begin{comment} ... @End{comment} permits encapsulating arbitrary text containing any characters or delimiters, other than "@End{comment}". Any of the seven delimiters may be used around the word comment following the @begin or @end. (11) The "key" field name is required in each bibliography entry. (12) Semicolons may be used in place of "and" in author lists (undocumented, but observed in practice). Because of this loose syntax, error detection heuristics are less effective, and consequently, Scribe mode input is not the default; it must be explicitly requested. ======================================================================== ***********************************************************************/ #define BIBCLEAN_VERSION "bibclean Version 2.08 WBL [13-Jan-1995]" /*********************************************************************** Revision history (reverse time order): [[04-Jun-1994]--[18-Oct-1994] 2.10 WBL hack..... Based on message from "Nelson H. F. Beebe" sent Thu, 12 Jan 95 18:56:30 MST ... Add World-Wide Web URI, URL, and URN names to the list of field names that are forced to uppercase. Disable code in out_s() that breaks lines at punctuation characters, because this can introduce unwanted line breaks in file names and WWW names. ... 24-Sep-1993] 2.08 Update bibclean.c to handle characters in range 128..255 when char is a signed data type, and extend testbib1.bib with more tests of the complete character set. Augment Makefile with convenience targets for each host environment. Add private input character pushback support, because ANSI/ISO Standard C only guarantees one character of pushback, and bibclean needs at least 3. [04-Jun-1993] 2.07 Update *.h files from DVI development to include support for DEC Alpha 3000/500x OSF 1.2 (cc, c89, cxx), and HP 9000/735 HP-UX A.09.01 (cc, c89, CC, gcc, g++). Private memset() below is not compiled on DEC Alpha to avoid conflicts with vendor-supplied version. [30-Nov-1992 -- 29-Jan-1993] 2.06 [1] Extend fix_author() to handle conversion of "Smith, Jr., A. B." to "A. B. {Smith, Jr.}", and "Smith Jr., A. B." to "A. B. {Smith Jr.}". Introduce new auxiliary function check_junior() called by fix_author(). [2] Extend month_pair[] table to include abbreviated month names. Add new function month_token() and rewrite fix_month() to use it to do a token-based parse of the current_value[] string in order to be able to handle conversions like these: "January" --> jan "Jan." --> jan "January 24" --> jan # " 24" "24 Jan." --> "24 " # jan "May/June" --> may # "/" # jun "February and May" --> feb # " and " # may New test input files have been added, with new expected output and error files. [3] Add -[no-]fix-font-changes switch and new function opt_fix_font_changes() to supply additional braces around font changes in titles to prevent letter case conversion by some BibTeX stylos. The real work is done in the new function brace_font_changes() called at the end of fix_title(). [4] Fix bug in out_value() with -delete-empty-fields requested. The code incorrectly assumed that a string value of 2 or fewer characters was an empty string. That is correct if the string value is a quoted string, but wrong if the string value is a 1- or 2-character macro. The code now correctly checks explicitly for empty string, rather than using its length to make the deletion decision. Thanks to Manfred Aben for reporting this bug! [5] Fix bug in get_token(); the code must test for a non-NULL pointer before calling SKIP_SPACE(). Thanks to Gil Webster for reporting this bug! [And thanks to the Internet funding agencies, who make worldwide collaboration like this possible.] [6] Add support for Roman numeral matching in match.c. New file romtol.c contains romtol(), isroman(), and a test program. [7] Add calls to perror() at file open failures. [8] Fix stupid error in brace_font_changes(); s[] was not NUL-terminated before the final call to strcpy(). [9] In format(), make newmsg[] an internal static array instead of reusing shared_string; otherwise, a warning message from check_length() wipes out the current string value. [10] In main(), initialize the_file.input.filename to an empty string, to avoid dereferencing a NULL pointer in warning() during command-line option parsing. [11] In do_args(), add code to display erroneous option switches. [12] In apply_function(), make string comparisons use longer of the minimum match length and the option switch length, so that all characters of the option switch are tested. Previously, option misspellings after the minimum match length went undetected. [13] Add -[no-]prettyprint support, with new functions do_string(), opt_prettyprint(), out_at(), out_close_brace(), out_comma(), out_newline(), out_open_brace(), out_other(), and out_token(), and new data type token_t. Move do_comma() processing out of do_field_value_pair() so that it can be used by do_string(). [14] Add BYTE_VAL() macro for printing of characters with octal formats. [15] Change tag/key to key/field to agree with Appendix B of LaTeX User's Guide and Reference Manual. Change format items %t/%k to %k/%f to match key/field terminology. The user impact of this rather large source code and documentation change should be minimal, and the removal of the disagreement with the LaTeX book needs to be done now, rather than later. bibclean.ini does not use either of the changed format items. [16] Change keyboard uses of key to keyboard, so key now refers exclusively to bibliography citation keys. [17] Change MAX_COLUMN to MAX_WIDTH and add -max-width support, with new function opt_max_width() and new variable max_width. Include "xlimits.h". [18] Add BIBCLEAN_EXT and BIBCLEAN_INI to define environment variables that can supply alternate initialization file extension and name. Add GETDEFAULT() macro to simplify coding. [19] Rename put_char() to out_c(), and remove macro out_c(). [20] Add line wrapping support for lexical analysis output in out_c(). [21] Add ``# line nnn "filename"'' output from out_token() for lexical analysis output. [22] Add strdup macro to redefine that name as strdup_private to avoid problems with incorrect declarations of that function (for DEC Alpha OSF/1). [23] Add out_input_position() and token_start so that out_token() can record both starting and ending line numbers of a multi-line value token in lexical analyzer output. [24] Add check for non-zero brace level at end-of-file in do_one_file(). [25] Add support for @Include{...}, a proposed extension of BibTeX. [26] Add append_value(), do_newline(), do_optional_inline_comment(), do_optional_space(), do_space(), get_inline_comment(), get_optional_space(), and out_complex_value() so that intervening space can be output when lexing, and so that we can support in-line comments as well as horizontal and vertical space between lexical items, according to the proposed grammar for BibTeX. Add do_preamble() to do rigorous parse of @Preamble{...}. Revise do_BibTeX_value() to support recognition of optional space between tokens of a string expression, splitting it into two separate functions do_BibTeX_value_1() and do_BibTeX_value_2() to handle the two cases of prettyprinting and lexical analysis. Add out_string() to localize test for output style. [27] Add checks for end-of-file in quoted and braced strings so these errors get reported. Suppress pattern value checking for empty values; some of the check_xxx() functions did this already, but some did not. Now the test is localized in one place, in out_value(), for all of them. [16-Nov-1992 -- 24-Nov-1992] 2.05 Add Makefile steps to automatically extract help() text from output of manual pages into new file bibclean.h, so the built-in documentation stays up-to-date. The usage messages still need manual adjustment if switches are added or changed. Add missing test of check_values in check_patterns(). Add support for optional warning messages with patterns from initialization files. New function: get_token(). New parsing code in do_new_pattern() to handle optional warning message strings. Add message argument to add_pattern(). Remove strip_comments() since comment processing is now handled by get_token() and do_new_pattern(). This permits unescaped comment characters inside quoted strings. Write bibclean.reg, an initialization file similar to bibclean.ini, but with regular expressions. Replace cascaded if statements for regular expression testing with loop over patterns in check_patterns(). Move inclusion of match.h to after definition of typedef YESorNO, and change type of match_pattern() from int to YESorNO. Add do_fileinit() and code in main() to call do_fileinit() for each named input file with an extension, replacing that extension with INITFILE_EXT (default .ini). This adds a bibliography-specific initialization capability to the system-wide, user-wide, and job-wide files already supported. Change -keep-initials and -keep-names to -fix-initials and -fix-names, making them positive, rather than negative, options. Also, make them independent by moving invocations of fix_period() outside of fix_author(), and by checking fix_names in fix_author() instead of at start of fix_namelist(). Add -[no-]read-init-files option to allow control over which initialization files are read. Add -[no-]trace-file-opening option to allow easy tracing of file opening attempts by the program. A similar feature in my DVI drivers has proved enormously valuable in tracking down problems of missing files. Rename entry_name[] to current_entry_name[], key[] to current_key[], tag[] to current_tag[], and value[] to current_value[] to get more distinctive names for those global variables. Include the value string matching code selection in the version() message; this is needed so that users can prepare initialization files with the correct pattern syntax. Make several MAX_xxx symbolic constants definable at compile time. Add MAX_PATTERN_NAMES constant, and increase pattern_names[] table to that size, leaving empty slots for expansion. Extend add_pattern() so that unrecognized key names result in creation of new entries in pattern_names[], making the set of key/value pairs extensible without modification of the bibclean source code. Add check_other() to handle checking of other keywords. Add unexpected() to localize issuing of unexpected value warnings. Repair next_s() in match.c to skip past TeX control sequence; it was stopping one character early. Revise upper-case letter bracing code in fix_title() to handle more cases. Rewrite space collapsing code in fix_pages() to only collapse space around en-dashes. The previous code was too aggressive, so that "319 with 30 illustrations" became "319 with30illustrations". Add check_tag() called from do_tag_name(), and add second argument, value, to check_patterns(). Add format() called from error() and warning() to expand %e (@entry name), %k (key), %t (tag), %v (value), and %% (percent) format items in messages. This feature is needed so user-defined messages in initialization files can get key, tag, and value into messages. It also simplifies, and improves, calls to warning() and error(). Add some missing (void) typecasts before str***() calls. Change word_length() to return one more than true length at end of string. Change tests in out_s() to > MAX_COLUMN instead of >= MAX_COLUMN. Previously, if a line ended exactly at column MAX_COLUMN, bibclean could produce a spurious blank line, and would sometimes wrap a line earlier than necessary. Add additional punctuation wrap points in out_s(), and remove tests for non-blank whitespace in switch() statement. Change type of all string index variables from int to size_t. In get_simple_string(), use enum type for type codes if NEW_STYLE. In check_year(), validate all sequences of 1 or more digits. Use the C preprocessor to define memmove() to be Memmove(), so we always use our own version. Too many C and C++ implementations were found to be lacking it, sigh... Similarly, we provide our own version of strtol() (in a separate file) from the DVI 3.0 development, because it too is missing from older UNIX systems. Complete port to IBM PC DOS with Turbo C 2.0, and Turbo C and C++ 3.0. This required economization of storage for arrays of size [MAX_TOKEN_SIZE] to get global data below 64KB without having to reduce MAX_TOKEN. Added code in do_more() and preprocessor conditionals in out_lines() to handle character-at-a-time input for help paging on IBM PC DOS. Keyboard function keys PgUp, PgDn, End, Home, Up arrow and Down arrow are also recognized. This was easy to do because most PC DOS C compilers provide getch() to get a keyboard character without echo. No fiddling of terminal modes is needed like it is on other systems. The IBM PC DOS port exposed a problem in findfile(), where it was assumed that an environment variable would not be longer than the longest filename. Turbo C sets the latter to 80 characters, but environment variables can be set that are almost 128 characters long. Microsoft C 5.0 also sets it to 80, but C 5.1 sets it to 144, and C 6.0 and C and C++ 7.0 set it to 260. This has been handled by defining MAXPATHLEN at compile time, overriding the built-in defaults. Add support for character-at-a-time input for help paging on VAX VMS, and for getting the screen size in get_screen_lines(). Rename do_more_init() to kbopen(), do_more_term() to kbclose(), and use kbget() in do_more() to conceal the heavily-O/S dependent details of the kbxxx() functions. Introduce STREQUAL() macro to simplify coding. Introduce KEY_FUNCTION_ENTRY type and apply_function() to simplify coding, and use it in do_args(), do_preargs(), and out_value(). Argument actions are moved into separate functions, opt_xxx(). Rename show_author() to opt_author(), and help() to opt_help(). Rename do_file() to do_one_file(), and move file loop code from main() into new do_files(). Split large body of get_simple_string() into four new functions, get_braced_string(), get_digit_string(), get_quoted_string(), and get_identifier_string(). Add check_inodes() to determine whether stdlog and stdout are the same file. If so, we need to ensure that each warning message begins a new line, without double spacing unnecessarily when they are different files. Add memset() implementation for SunOS 4.1.1 CC (C++) and BSD 4.3 UNIX because it is missing from their run-time libraries. Replace fopen() by macro FOPEN() to work around erroneous fopen() prototype for SunOS 4.1.1 CC (C++). Complete port to IBM PC DOS with Microsoft C 5.1 and 6.0 compilers. Minor source changes (the CONST macro below) needed to work around compiler errors. [15-Nov-1992] 2.04 Minor changes to complete successful VAX VMS installation and test. [15-Nov-1992] 2.03 Add match_pattern() support for consistent pattern matching in the check_xxx() functions, using new code defined separately in match.c. Add support for run-time redefinition of patterns via one or more initialization file(s) found in the PATH (system-defined) and BIBINPUTS (user-defined) search paths. New functions: add_pattern(), check_patterns(), do_initfile(), do_new_pattern(), do_single_arg(), enlarge_table(), get_line(), strdup(), strip_comments(), and trim_value(). New C preprocessor symbols: HAVE_OLDCODE, HAVE_PATTERNS, HAVE_RECOMP, and HAVE_REGEXP. One of these should be defined at compile time; if none are, then HAVE_PATTERNS is the default. Since options can now be specified in initialization files, they each need negations so the command line can override values from an initialization file. Change all YES/NO flags to new type, YESorNO, for better type checking. Add do_more(), do_more_init(), and do_more_term(), for pausing during help output; a private version of screen paging is used instead of a pager invoked by system() for better portability across systems. Set SCREEN_LINES to 0 at compile time to suppress this feature. In fix_title(), add code to brace upper-case letters for cases like: "X11" -> "{X11}" "Standard C Library" -> "Standard {C} Library" "C++ Book" -> "{C}++ Book" leaving "A xxx" unchanged. [11-Nov-1992] 2.02 Add bad_ISBN(), bad_ISSN(), check_ISBN(), and check_ISSN() for validation of ISBN and ISSN values. ISBN == "International Standard Book Number", and ISSN = "International Standard Serial Number". Add testisxn.bib and testisxn.bok to the test collection, with steps in the Makefile to run the test. Add support for embedded \" in Scribe value strings (forgotten in 2.01 revision); they are converted from \"x to {\"x}. [10-Nov-1992] 2.01 Add support for conversion of level-0 \"x to {\"x} and x"y to x{"}y in value strings. Such input is illegal for BibTeX, and causes hard-to-find errors, since BibTeX raises an error at the line where it runs out of string collection space, rather than at the beginning of the collection point. [06-Nov-1992] 2.00 Add full Scribe .bib file input compatibility with -scribe command-line option. Add support for multiple .bib file arguments on command line, with new do_file() function to process them. Allow slash as well as hyphen for introducing command-line options on VAX VMS and IBM PC DOS. Add argument summary to help() (text extracted verbatim from the manual pages). Add new -delete-empty-fields, -keep-names, -no-parbreaks, -remove-OPT-prefixes, and -no-warnings command-line options and support code. Add new out_with_error() and out_with_parbreak_error() functions, and APPEND_CHAR() and EMPTY_STRING() macros to shorten and clarify coding. Add flush_inter_entry_space() function to standardize line spacing. Increase array sizes to MAX_TOKEN_SIZE (= MAX_TOKEN + 3) to reduce array bounds checking in inner loops. Add additional file position tracking to enhance error localization (structures IO_PAIR and POSITION, and functions new_io_pair(), new_position(), out_position(), and out_status()). Error messages are parsable by GNU Emacs M-x next-error (C-x `) when bibclean is run from Emacs by the command M-x compilebibclean foo.bib >foo.new Use arrays of constant strings for multiple string output via new function out_lines(), instead of multiple calls to fprintf(). Add additional checking via check_chapter(), check_month(), check_number(), check_pages(), check_volume(), check_year(), and match_regexp(). Supply implementation of memmove() library function missing from g++ 2.2.2 library. [03-Oct-1992] 1.06 Correct logic error in do_comma() that prevented correct recognition of @name(key = "value") where the last key/value pair did not have a trailing comma. Add C++ support. Add key_pair[] and entry_pair[] tables for standardization of letter case usage, and use the new NAME_PAIR type in fix_month(). Update author address. Rename author() to show_author() to avoid shadowing global names. Fix two assignments of constant strings to char* pointers. Remove variable at_line_number which was defined, but never used. [01-Aug-1992] 1.05 Add -keep-initials switch support (thanks to Karl Berry ). Internationalize telephone and FAX numbers. [02-Jan-1992] 1.04 Modify fix_title() to ignore macros. Modify fix_author()) to ignore author lists with parentheses (e.g. author = "P. D. Q. Bach (113 MozartStrasse, Vienna, Austria)"). [31-Dec-1991] 1.03 Add fix_title() to supply braces around unbraced upper-case acronyms in titles, and add private definition of MAX(). [15-Nov-1991] 1.02 Handle @String(...) and @Preamble(...), converting outer parentheses to braces. Insert spaces after author and editor initials, and normalize names to form "P. D. Q. Bach" instead of "Bach, P. D. Q.". [10-Oct-1991] 1.01 Increase MAX_TOKEN to match enlarged BibTeX, and add check against STD_MAX_TOKEN. Output ISBN and ISSN in upper case. Always surround = by blanks in key = "value". [19-Dec-1990] 1.00 (version number unchanged) Install Sun386i bug fix. [08-Oct-1990] 1.00 Original version. ***********************************************************************/ /* Make a preliminary sanity check on which pattern matching we will use */ #if defined(HAVE_REGEXP) #if defined(HAVE_RECOMP) || defined(HAVE_PATTERNS) || defined(HAVE_OLDCODE) ?? Define only one of HAVE_OLDCODE, HAVE_PATTERNS, HAVE_REGEXP, and HAVE_RECOMP #endif #endif #if defined(HAVE_RECOMP) #if defined(HAVE_REGEXP) || defined(HAVE_PATTERNS) || defined(HAVE_OLDCODE) ?? Define only one of HAVE_OLDCODE, HAVE_PATTERNS, HAVE_REGEXP, and HAVE_RECOMP #endif #endif #if defined(HAVE_PATTERNS) #if defined(HAVE_RECOMP) || defined(HAVE_REGEXP) || defined(HAVE_OLDCODE) ?? Define only one of HAVE_OLDCODE, HAVE_PATTERNS, HAVE_REGEXP, and HAVE_RECOMP #endif #endif #if defined(HAVE_OLDCODE) #if defined(HAVE_PATTERNS) || defined(HAVE_RECOMP) || defined(HAVE_REGEXP) ?? Define only one of HAVE_OLDCODE, HAVE_PATTERNS, HAVE_REGEXP, and HAVE_RECOMP #endif #endif #if !(defined(HAVE_REGEXP) || defined(HAVE_RECOMP)) #if !(defined(HAVE_PATTERNS) || defined(HAVE_OLDCODE)) #define HAVE_PATTERNS 1 #endif #endif /*********************************************************************** We want this code to be compilable with C++ compilers as well as C compilers, in order to get better compile-time checking. We therefore must declare all function headers in both old Kernighan-and-Ritchie style, as well as in new Standard C and C++ style. Although Standard C also allows K&R style, C++ does not. For functions with no argument, we just use VOID which expands to either void, or nothing. Older C++ compilers predefined the symbol c_plusplus, but that was changed to __cplusplus in 1989 to conform to ISO/ANSI Standard C conventions; we allow either. It is regrettable that the C preprocessor language is not powerful enough to transparently handle the generation of either style of function declaration. ***********************************************************************/ #include "os.h" #include "xstdlib.h" #include "xstring.h" #include "xctype.h" #include "xlimits.h" #include "xstat.h" #include "unixlib.h" RCSID("$Id: bibclean.c,v 1.32 1993/06/04 17:42:55 beebe Exp beebe $") #if defined(memmove) #undef memmove /* at least one system defines this */ #endif #define memmove Memmove /* we want our private version */ /* see 2.05 change log above for why */ #if (defined(__cplusplus) || defined(__STDC__) || defined(c_plusplus)) #define NEW_STYLE 1 #else #define NEW_STYLE 0 #endif #if NEW_STYLE #define VOID void #else /* K&R style */ #define VOID #endif /* NEW_STYLE */ #if NEW_STYLE typedef enum { NO = 0, YES = 1 } YESorNO; #else /* K&R style */ #define NO 0 /* must be FALSE (zero) */ #define YES 1 /* must be TRUE (non-zero) */ typedef int YESorNO; #endif /* NEW_STYLE */ #include "match.h" /* must come after YESorNO typedef */ #if defined(M_I86) #define CONST /* bug workaround for IBM PC Microsoft C compilers */ #else /* NOT M_I86 */ #define CONST const #endif /* M_I86 */ typedef struct s_option_function_entry { const char *name; /* option name */ size_t min_match; /* minimum length string match */ void (*function)(VOID); /* function to call when option matched */ } OPTION_FUNCTION_ENTRY; typedef struct s_name_pair { const char *old_name; const char *new_name; } NAME_PAIR; typedef struct s_position { const char *filename; long byte_position; long last_column_position; long column_position; long line_number; } POSITION; typedef struct s_io_pair { POSITION input; POSITION output; } IO_PAIR; typedef struct s_pattern_table { MATCH_PATTERN *patterns; int current_size; int maximum_size; } PATTERN_TABLE; typedef struct s_pattern_names { const char *name; PATTERN_TABLE *table; } PATTERN_NAMES; #define strdup strdup_private /* want our private version to avoid */ /* clash with inconsistent arguments */ /* in system versions */ #if defined(sun386) /* Sun386i run-time library bug in fputs(): only first line in s is written! */ #define fputs(s,fp) fwrite(s,1,strlen(s),fp) #endif #define APPEND_CHAR(s,n,c) (s[n] = (char)c, s[n+1] = (char)'\0') /* append c and NUL to s[] */ #if !defined(BIBCLEAN_INITFILE) #define BIBCLEAN_INI "BIBCLEANINI" /* environment variable */ #endif #if !defined(BIBCLEAN_SUFFIX) #define BIBCLEAN_EXT "BIBCLEANEXT" /* environment variable */ #endif #define BIBTEX_COMMENT_PREFIX '%' /* comment character in BibTeX files */ /* (I hope this will be standard in BibTeX 1.0) */ #define BYTE_VAL(c) ((unsigned int)((c) & 0xff)) #define COMMENT_PREFIX '%' /* comment character in initialization files */ #define CTL(X) (X & 037) /* make ASCII control character */ #define DELETE_CHAR (EOF - 1) /* magic char value for out_c() */ #define DELETE_LINE (EOF - 2) /* magic char value for out_c() */ #define EMPTY_STRING(s) (s[0] = (char)'\0', s) /* for return (EMPTY_STRING(foo))*/ #define ERROR_PREFIX "??" /* this prefixes all error messages */ #if !defined(EXIT_FAILURE) #define EXIT_FAILURE 1 #endif #if !defined(EXIT_SUCCESS) #define EXIT_SUCCESS 0 #endif #ifdef FOPEN #undef FOPEN #endif #if defined(__SUNCC__) #define FOPEN(a,b) fopen((char*)(a),(char*)(b)) /* bug workaround: wrong type for fopen() args with SunOS 4.1.2 CC */ #else /* NOT defined(__SUNCC__) */ #define FOPEN(a,b) fopen((a),(b)) #endif /* defined(__SUNCC__) */ #define GETDEFAULT(envname,default) \ ((getenv(envname) != (char *)NULL) ? getenv(envname) : strdup(default)) #if !defined(INITFILE_EXT) #define INITFILE_EXT ".ini" /* file extension for initialization files */ #endif #define ISBN_DIGIT_VALUE(c) ((((c) == 'X') || ((c) == 'x')) ? 10 : \ ((c) - '0')) /* correct only if digits are valid; */ /* the code below ensures that */ #define ISSN_DIGIT_VALUE(c) ISBN_DIGIT_VALUE(c) /* ISSN digits are just like ISBN digits */ #define FIELD_INDENTATION 2 /* how far to indent "field = value," pairs */ #define LAST_SCREEN_LINE (-2) /* used in opt_help() and do_more() */ #if defined(MAX) #undef MAX #endif #define MAX(a,b) (((a) > (b)) ? (a) : (b)) #if !defined(MAX_BUFFER) #define MAX_BUFFER 8192 /* output buffer size; this does NOT */ /* limit lengths of input lines */ #endif /* !defined(MAX_BUFFER) */ #if !defined(MAX_WIDTH) #define MAX_WIDTH 72 /* length of longest entry line; */ /* non-BibTeX entry text is output verbatim */ #endif /* !defined(MAX_WIDTH) */ #if !defined(MAX_FIELD_LENGTH) #define MAX_FIELD_LENGTH 12 /* "howpublished" */ #endif /* !defined(MAX_FIELD_LENGTH) */ #if !defined(MAX_LINE) #define MAX_LINE 10240 /* maximum line length in initialization file */ #endif /* !defined(MAX_LINE) */ #if !defined(MAX_PATTERN_NAMES) #define MAX_PATTERN_NAMES 100 /* maximum number of field/pattern types; */ /* 100 is far more than ever likely to be */ /* needed, but we only waste 8 bytes each for */ /* unused entries */ #endif /* !defined(MAX_PATTERN_NAMES) */ #if !defined(MAX_TOKEN) #define MAX_TOKEN 4093 /* internal buffer size; no BibTeX string value may be larger than this. */ #endif /* !defined(MAX_TOKEN) */ #define MAX_TOKEN_SIZE (MAX_TOKEN + 3) /* Arrays are always dimensioned MAX_TOKEN_SIZE, so as to have space for an additional pair of braces and a trailing NUL, without tedious subscript checking in inner loops. */ #define META(X) (X | 0200) /* make GNU Emacs meta character */ #define NOOP /* dummy statement */ #if defined(HAVE_PATTERNS) #define PATTERN_MATCHES(string,pattern) (match_pattern(string,pattern) == YES) #else /* NOT defined(HAVE_PATTERNS) */ #define PATTERN_MATCHES(string,pattern) match_regexp(string,pattern) #endif /* defined(HAVE_PATTERNS) */ #if !defined(SCREEN_LINES) #if OS_PCDOS #define SCREEN_LINES 25 /* set 0 to disable pausing in out_lines() */ #else /* NOT OS_PCDOS */ #define SCREEN_LINES 24 /* set 0 to disable pausing in out_lines() */ #endif /* OS_PCDOS */ #endif /* !defined(SCREEN_LINES) */ #define SKIP_SPACE(p) while (isspace((unsigned char)*p)) ++p #define STD_MAX_TOKEN ((size_t)1000) /* Standard BibTeX limit */ #define STREQUAL(a,b) (strcmp(a,b) == 0) #define TABLE_CHUNKS 25 /* how many table entries to allocate at once */ #define TOLOWER(c) (isupper(((unsigned char)c)) ? \ tolower(((unsigned char)c)) : (((unsigned char)c))) #define VALUE_INDENTATION (FIELD_INDENTATION + MAX_FIELD_LENGTH + 3) /* where item values are output; allow space */ /* for "< = >" */ #define WARNING_PREFIX "%%" /* this prefixes all warning messages */ /* Operating system-specific customizations. */ #if OS_UNIX #if !defined(INITFILE) #define INITFILE ".bibcleanrc" #endif #if !defined(SYSPATH) #define SYSPATH "PATH" #endif #if !defined(USERPATH) #define USERPATH "BIBINPUTS" #endif #define isoptionprefix(c) ((c) == '-') #endif /* OS_UNIX */ #if OS_VAXVMS #if !defined(INITFILE) #define INITFILE "bibclean.ini" #endif #if !defined(SYSPATH) #define SYSPATH "SYS$SYSTEM" #endif #if !defined(USERPATH) #define USERPATH "BIBINPUTS" #endif #define isoptionprefix(c) (((c) == '-') || ((c) == '/')) #endif /* OS_VAXVMS */ #if OS_PCDOS #define isoptionprefix(c) (((c) == '-') || ((c) == '/')) #endif /* OS_PCDOS */ /* For any that are undefined, default to values suitable for OS_PCDOS. */ #if !defined(INITFILE) #define INITFILE "bibclean.ini" #endif #if !defined(SYSPATH) #define SYSPATH "PATH" #endif #if !defined(USERPATH) #define USERPATH "BIBINPUTS" #endif #if NEW_STYLE typedef enum token_list { TOKEN_UNKNOWN = 0, TOKEN_ABBREV = 1, /* alphabetical order, starting at 1 */ TOKEN_AT, TOKEN_COMMA, TOKEN_COMMENT, TOKEN_ENTRY, TOKEN_EQUALS, TOKEN_FIELD, TOKEN_INCLUDE, TOKEN_INLINE, TOKEN_KEY, TOKEN_LBRACE, TOKEN_LITERAL, TOKEN_NEWLINE, TOKEN_PREAMBLE, TOKEN_RBRACE, TOKEN_SHARP, TOKEN_SPACE, TOKEN_STRING, TOKEN_VALUE } token_t; #else /* K&R style */ typedef int token_t; #define TOKEN_UNKNOWN 0 #define TOKEN_ABBREV 1 /* alphabetical order, starting at 1 */ #define TOKEN_AT 2 #define TOKEN_COMMA 3 #define TOKEN_COMMENT 4 #define TOKEN_ENTRY 5 #define TOKEN_EQUALS 6 #define TOKEN_FIELD 7 #define TOKEN_INCLUDE 8 #define TOKEN_INLINE 9 #define TOKEN_KEY 10 #define TOKEN_LBRACE 11 #define TOKEN_LITERAL 12 #define TOKEN_NEWLINE 13 #define TOKEN_PREAMBLE 14 #define TOKEN_RBRACE 15 #define TOKEN_SHARP 16 #define TOKEN_SPACE 17 #define TOKEN_STRING 18 #define TOKEN_VALUE 19 #endif /* NEW_STYLE */ const char *type_name[] = { /* must be indexable by TOKEN_xxx */ "UNKNOWN", "ABBREV", /* alphabetical order, starting at 1 */ "AT", "COMMA", "COMMENT", "ENTRY", "EQUALS", "FIELD", "INCLUDE", "INLINE", "KEY", "LBRACE", "LITERAL", "NEWLINE", "PREAMBLE", "RBRACE", "SHARP", "SPACE", "STRING", "VALUE", }; /* All functions except main() are static to overcome limitations on external name lengths in ISO/ANSI Standard C. Please keep them in ALPHABETICAL order, ignoring letter case. */ static void add_one_pattern ARGS((PATTERN_TABLE *pt_, const char *fieldname_, const char *pattern_, const char *msg_)); static void add_pattern ARGS((const char *fieldname_, const char *pattern_, const char *msg_)); static void append_value ARGS((const char *s_)); static YESorNO apply_function ARGS((const char *option_, OPTION_FUNCTION_ENTRY table_[])); static void bad_ISBN ARGS((char ISBN_[11])); static void bad_ISSN ARGS((char ISSN_[9])); static void brace_font_changes ARGS((void)); static void check_chapter ARGS((void)); static void check_inodes ARGS((void)); static void check_ISBN ARGS((void)); static void check_ISSN ARGS((void)); static YESorNO check_junior ARGS((const char *last_name_)); static void check_key ARGS((void)); static void check_length ARGS((size_t n_)); static void check_month ARGS((void)); static void check_number ARGS((void)); static void check_other ARGS((void)); static void check_pages ARGS((void)); static YESorNO check_patterns ARGS((PATTERN_TABLE *pt_,const char *value_)); static void check_volume ARGS((void)); static void check_year ARGS((void)); static void do_args ARGS((int argc_, char *argv_[])); static void do_at ARGS((void)); static void do_BibTeX_entry ARGS((void)); static void do_BibTeX_value ARGS((void)); static void do_BibTeX_value_1 ARGS((void)); static void do_BibTeX_value_2 ARGS((void)); static void do_close_brace ARGS((void)); static void do_comma ARGS((void)); static void do_entry_name ARGS((void)); static void do_equals ARGS((void)); static void do_escapes ARGS((char *s_)); static void do_field ARGS((void)); static YESorNO do_field_value_pair ARGS((void)); static void do_files ARGS((int argc_, char *argv_[])); static void do_fileinit ARGS((const char *bibfilename_)); static void do_group ARGS((void)); static void do_initfile ARGS((const char *pathlist_,const char *name_)); static void do_key_name ARGS((void)); #if (SCREEN_LINES > 0) static int do_more ARGS((FILE *fpout_, int line_, int pause_after_)); #endif /* (SCREEN_LINES > 0) */ static void do_new_pattern ARGS((char *s_)); static void do_newline ARGS((void)); static void do_one_file ARGS((FILE *fp_)); static void do_open_brace ARGS((void)); static void do_optional_inline_comment ARGS((void)); static void do_optional_space ARGS((void)); static void do_other ARGS((void)); static void do_preamble ARGS((void)); static void do_preargs ARGS((int argc_, char *argv_[])); static void do_Scribe_block_comment ARGS((void)); static void do_Scribe_close_delimiter ARGS((void)); static void do_Scribe_comment ARGS((void)); static void do_Scribe_entry ARGS((void)); static void do_Scribe_open_delimiter ARGS((void)); static void do_Scribe_separator ARGS((void)); static void do_Scribe_value ARGS((void)); static void do_single_arg ARGS((char *s_)); static void do_space ARGS((void)); static void do_string ARGS((void)); static void enlarge_table ARGS((PATTERN_TABLE *table_)); static void error ARGS((const char *msg_)); static void fatal ARGS((const char *msg_)); char *findfile ARGS((const char *pathlist_, const char *name_)); static char *fix_author ARGS((char *author_)); static void fix_month ARGS((void)); static void fix_namelist ARGS((void)); static void fix_pages ARGS((void)); static char *fix_periods ARGS((char *author_)); static void fix_title ARGS((void)); static void flush_inter_entry_space ARGS((void)); static char *format ARGS((const char *msg_)); static char *get_braced_string ARGS((void)); static int get_char ARGS((void)); static char *get_digit_string ARGS((void)); static char *get_identifier_string ARGS((void)); static char *get_inline_comment ARGS((void)); static char *get_line ARGS((FILE *fp_)); static int get_next_non_blank ARGS((void)); static char *get_optional_space ARGS((void)); static char *get_quoted_string ARGS((void)); #if (SCREEN_LINES > 0) static int get_screen_lines ARGS((void)); #endif /* (SCREEN_LINES > 0) */ static char *get_Scribe_delimited_string ARGS((void)); static char *get_Scribe_identifier_string ARGS((void)); static char *get_Scribe_string ARGS((void)); static char *get_simple_string ARGS((void)); static char *get_token ARGS((char *s_, char **nextp_, const char *terminators_)); #define isfieldvalueseparator(c) (((c) == '=') || ((c) == ':')) static int isidchar ARGS((int c_)); /* We need the isxxx() functions/macros from to work correctly for 8-bit characters, but regrettably, those in many C implementations fail to do so if char is a signed data type, and the character is out of the range 0..127. If your compiler lacks an unsigned char data type, then you will have to change (unsigned char)(c) to (int)(0xff & (unsigned int)(c)). With this change, it is important that none of these be invoked with c == EOF. */ #define Isalnum(c) isalnum((unsigned char)(c)) #define Isalpha(c) isalpha((unsigned char)(c)) #define Isdigit(c) isdigit((unsigned char)(c)) #define Isgraph(c) isgraph((unsigned char)(c)) #define Islower(c) islower((unsigned char)(c)) #define Isprint(c) isprint((unsigned char)(c)) #define Isspace(c) isspace((unsigned char)(c)) #define Isupper(c) isupper((unsigned char)(c)) #if (SCREEN_LINES > 0) #if NEW_STYLE typedef enum keyboard_code { KEYBOARD_EOF = EOF, KEYBOARD_UNKNOWN = 0, KEYBOARD_AGAIN, KEYBOARD_DOWN, KEYBOARD_END, KEYBOARD_HELP, KEYBOARD_HOME, KEYBOARD_PGDN, KEYBOARD_PGUP, KEYBOARD_QUIT, KEYBOARD_UP } keyboard_code_t; #else /* K&R style */ #define KEYBOARD_EOF EOF #define KEYBOARD_UNKNOWN 0 #define KEYBOARD_AGAIN 1 #define KEYBOARD_DOWN 2 #define KEYBOARD_END 3 #define KEYBOARD_HELP 4 #define KEYBOARD_HOME 5 #define KEYBOARD_PGDN 6 #define KEYBOARD_PGUP 7 #define KEYBOARD_QUIT 8 #define KEYBOARD_UP 9 typedef int keyboard_code_t; #endif /* NEW_STYLE */ static void kbclose ARGS((void)); static keyboard_code_t kbcode ARGS((void)); static int kbget ARGS((void)); static void kbinitmap ARGS((void)); static void kbopen ARGS((void)); #define MAX_CHAR 256 keyboard_code_t keymap[MAX_CHAR]; #endif /* (SCREEN_LINES > 0) */ int main ARGS((int argc_, char *argv_[])); #if (defined(HAVE_REGEXP) || defined(HAVE_RECOMP)) static int match_regexp ARGS((const char *string_,const char *pattern_)); #endif /* (defined(HAVE_REGEXP) || defined(HAVE_RECOMP)) */ /* NB: memmove() is a private version known as Memmove() to the compiler */ static void memmove ARGS((void *target_, const void *source_, size_t n_)); const char *month_token ARGS((const char *s_, size_t *p_len_)); static void new_entry ARGS((void)); static void new_io_pair ARGS((IO_PAIR *pair_)); static void new_position ARGS((POSITION *position_)); static void opt_author ARGS((void)); static void opt_check_values ARGS((void)); static void opt_delete_empty_values ARGS((void)); static void opt_error_log ARGS((void)); static void opt_file_position ARGS((void)); static void opt_fix_font_changes ARGS((void)); static void opt_fix_initials ARGS((void)); static void opt_fix_names ARGS((void)); static void opt_help ARGS((void)); static void opt_init_file ARGS((void)); static void opt_max_width ARGS((void)); static void opt_parbreaks ARGS((void)); static void opt_prettyprint ARGS((void)); static void opt_print_patterns ARGS((void)); static void opt_read_init_files ARGS((void)); static void opt_remove_OPT_prefixes ARGS((void)); static void opt_scribe ARGS((void)); static void opt_trace_file_opening ARGS((void)); static void opt_version ARGS((void)); static void opt_warnings ARGS((void)); static void out_at ARGS((void)); static void out_c ARGS((int c_)); static void out_close_brace ARGS((void)); static void out_comma ARGS((void)); static void out_complex_value ARGS((void)); static void out_equals ARGS((void)); static void out_error ARGS((FILE *fpout_, const char *s_)); static void out_field ARGS((void)); static void out_flush ARGS((void)); static void out_input_position ARGS((IO_PAIR *pair_)); static void out_lines ARGS((FILE *fpout_,const char *lines_[], YESorNO pausing_)); static void out_newline ARGS((void)); static void out_number ARGS((long n_)); static void out_open_brace ARGS((void)); static void out_other ARGS((const char *s_)); static void out_position ARGS((FILE *fpout_,const char *msg_, IO_PAIR *the_location_)); static void out_s ARGS((const char *s_)); static void out_spaces ARGS((int n_)); static void out_status ARGS((FILE *fpout_, const char *prefix_)); static void out_string ARGS((token_t type_, const char *token_)); static void out_token ARGS((token_t type_, const char *token_)); static void out_value ARGS((void)); static void out_with_error ARGS((const char *s_,const char *msg_)); static void out_with_parbreak_error ARGS((char *s_)); static void prt_pattern ARGS((const char *fieldname_, const char *pattern_, const char *msg_)); static void put_back ARGS((int c_)); static void put_back_string ARGS((const char *s_)); static void resync ARGS((void)); char *strdup ARGS((const char *s_)); int stricmp ARGS((const char *s1_, const char *s2_)); int strnicmp ARGS((const char *s1_, const char *s2_, size_t n_)); static FILE *tfopen ARGS((const char *filename_, const char *mode_)); static void trim_value ARGS((void)); static void unexpected ARGS((void)); static void usage ARGS((void)); static void version ARGS((void)); static void warning ARGS((const char *msg_)); static int word_length ARGS((const char *s_)); static void wrap_line ARGS((void)); static YESorNO YESorNOarg ARGS((void)); /**********************************************************************/ /* All global variables are static to keep them local to this file, and to overcome limitations on external name lengths in ISO/ANSI Standard C. Please keep them in ALPHABETICAL order, ignoring letter case. */ static int at_level = 0; /* @ nesting level */ static int brace_level = 0; /* curly brace nesting level */ static YESorNO check_values = YES; /* NO: suppress value checks */ static int close_char = EOF; /* BibTeX entry closing; may */ /* be right paren or brace */ static char current_entry_name[MAX_TOKEN_SIZE]; /* entry name */ static int current_index; /* argv[] index in do_args() */ static char current_field[MAX_TOKEN_SIZE]; /* field name */ static char *current_option; /* set by do_args() */ static char current_key[MAX_TOKEN_SIZE]; /* citation key */ static char current_value[MAX_TOKEN_SIZE]; /* string value */ static YESorNO delete_empty_values = NO; /* YES: delete empty values */ static YESorNO discard_next_comma = NO; /* YES: deleting field/value */ static YESorNO eofile = NO; /* set to YES at end-of-file */ static int error_count = 0; /* used to decide exit code */ /* normalizing names */ static YESorNO fix_initials = YES; /* reformat A.U. Thor? */ static YESorNO fix_names = YES; /* reformat Bach, P.D.Q? */ static YESorNO fix_font_changes = NO; /* brace {\em E. Coli}? */ #if defined(DEBUG) static FILE *fpdebug; /* for debugging */ #endif /* defined(DEBUG) */ static FILE *fpin; /* input file pointer */ static char *initialization_file_name; static YESorNO in_preamble = NO; /* YES: in @Preamble{...} */ static YESorNO in_string = NO; /* YES: in @String{...} */ static YESorNO is_parbreak = NO; /* get_next_non_blank() sets */ static long max_width = MAX_WIDTH; static NAME_PAIR month_pair[] = { {"January", "jan"}, {"February", "feb"}, {"March", "mar"}, {"April", "apr"}, {"May", "may"}, {"June", "jun"}, {"July", "jul"}, {"August", "aug"}, {"September", "sep"}, {"October", "oct"}, {"November", "nov"}, {"December", "dec"}, {"Jan.", "jan"}, {"Feb.", "feb"}, {"Mar.", "mar"}, {"Apr.", "apr"}, {"Jun.", "jun"}, {"Jul.", "jul"}, {"Aug.", "aug"}, {"Sep.", "sep"}, {"Sept.", "sep"}, {"Oct.", "oct"}, {"Nov.", "nov"}, {"Dec.", "dec"}, {"Jan", "jan"}, {"Feb", "feb"}, {"Mar", "mar"}, {"Apr", "apr"}, {"Jun", "jun"}, {"Jul", "jul"}, {"Aug", "aug"}, {"Sep", "sep"}, {"Sept", "sep"}, {"Oct", "oct"}, {"Nov", "nov"}, {"Dec", "dec"}, {(const char*)NULL, (const char*)NULL}, }; static char *next_option; /* set in do_args() */ static int non_white_chars = 0; /* used to test for legal @ */ static YESorNO parbreaks = YES; /* NO: parbreaks forbidden */ /* in strings and entries */ static YESorNO prettyprint = YES; /* NO: do lexical analysis */ static YESorNO print_patterns = NO; /* YES: print value patterns */ static char *program_name; /* set to argv[0] */ static PATTERN_TABLE pt_chapter = { (MATCH_PATTERN*)NULL, 0, 0 }; static PATTERN_TABLE pt_month = { (MATCH_PATTERN*)NULL, 0, 0 }; static PATTERN_TABLE pt_number = { (MATCH_PATTERN*)NULL, 0, 0 }; static PATTERN_TABLE pt_pages = { (MATCH_PATTERN*)NULL, 0, 0 }; static PATTERN_TABLE pt_volume = { (MATCH_PATTERN*)NULL, 0, 0 }; static PATTERN_TABLE pt_year = { (MATCH_PATTERN*)NULL, 0, 0 }; static PATTERN_NAMES pattern_names[MAX_PATTERN_NAMES] = { {"chapter", &pt_chapter}, {"month", &pt_month}, {"number", &pt_number}, {"pages", &pt_pages}, {"volume", &pt_volume}, {"year", &pt_year}, #if _AIX370 {NULL, NULL}, /* CC compiler cannot handle correct cast */ #else /* NOT _AIX370 */ {(CONST char*)NULL, (PATTERN_TABLE*)NULL}, /* entry terminator */ #endif /* _AIX370 */ /* remaining slots may be initialized at run time */ }; #define MAX_PUSHBACK 10 static int n_pushback = 0; static int pushback_buffer[MAX_PUSHBACK]; static YESorNO read_initialization_files = YES;/* -[no-]read-init-files sets */ static YESorNO remove_OPT_prefixes = NO; /* YES: remove OPT prefix */ static YESorNO rflag = NO; /* YES: resynchronizing */ static int screen_lines = SCREEN_LINES;/* kbopen() and out_lines() reset */ static YESorNO Scribe = NO; /* Scribe format input */ static char Scribe_open_delims[] = "{[(<'\"`"; static char Scribe_close_delims[] = "}])>'\"`"; /* In all memory models from tiny to huge, Turbo C on IBM PC DOS will not permit more than 64KB of global constant data. Therefore, we use a global scratch array shared between the functions fix_title(), format(), get_Scribe_identifier_string() and get_Scribe_delimited_string(). The code has been carefully examined to make sure that this space is not overwritten while still in use. Oh, the pain of the Intel segmented memory architecture! */ static char shared_string[MAX_TOKEN_SIZE]; static YESorNO show_file_position = NO; /* messages usually brief */ static FILE *stdlog; /* usually stderr */ static long space_count = 0L; /* count of spaces in do_optional_space() */ YESorNO stdlog_on_stdout = YES; /* NO for separate files */ #if OS_PCDOS unsigned int _stklen = 0xF000; /* stack size for Turbo C */ #endif /* OS_PCDOS */ static IO_PAIR token_start; /* used for # line output */ static IO_PAIR the_entry; /* used in error messages */ static IO_PAIR the_file; /* used in error messages */ static IO_PAIR the_value; /* used in error messages */ static YESorNO trace_file_opening = NO; /* -[no-]trace-file-opening sets */ static YESorNO warnings = YES; /* NO: suppress warnings */ static YESorNO wrapping = YES; /* NO: verbatim output */ /**********************************************************************/ #if NEW_STYLE static void add_one_pattern(PATTERN_TABLE *pt, const char *fieldname, const char *pattern, const char *message) #else /* K&R style */ static void add_one_pattern(pt,fieldname,pattern,message) PATTERN_TABLE *pt; const char *fieldname; const char *pattern; const char *message; #endif /* NEW_STYLE */ { int m; /* index into pt->patterns[] */ if (STREQUAL(pattern,"")) /* then clear pattern table */ { for (m = 0; m < pt->current_size; ++m) { /* free old pattern memory */ if (pt->patterns[m].pattern != (char*)NULL) free((char*)pt->patterns[m].pattern); /* NB: (void*) cast fails with Sun C++ */ if (pt->patterns[m].message != (char*)NULL) free((char*)pt->patterns[m].message); } pt->current_size = 0; } else /* otherwise add new pattern */ { if (pt->current_size == pt->maximum_size) /* then table full */ enlarge_table(pt); for (m = 0; m < pt->current_size; ++m) { /* Make sure this is not a duplicate; if it is, and its message */ /* is the same, then we just ignore the request. Duplicates */ /* are possible when the user and system search paths overlap. */ if (STREQUAL(pattern,pt->patterns[m].pattern)) { /* duplicate pattern found */ if (((pt->patterns[m].message) != (char*)NULL) && (message != (char*)NULL) && (STREQUAL(message,pt->patterns[m].message))) return; /* messages duplicate too */ pt->patterns[m].message = (message == (char*)NULL) ? message : (const char*)strdup(message); /* replace message string */ prt_pattern(fieldname,pattern,message); return; } } /* We have a new and distinct pattern and message, so save them */ pt->patterns[pt->current_size].pattern = strdup(pattern); pt->patterns[pt->current_size++].message = (message == (char*)NULL) ? message : (const char*)strdup(message); } prt_pattern(fieldname,pattern,message); } #if NEW_STYLE static void add_pattern(const char *fieldname, const char *pattern, const char *message) #else /* K&R style */ static void add_pattern(fieldname,pattern,message) const char *fieldname; const char *pattern; const char *message; #endif /* NEW_STYLE */ { int k; /* index into pattern_names[] */ for (k = 0; pattern_names[k].name != (const char*)NULL; ++k) { /* find the correct pattern table */ if (stricmp(pattern_names[k].name,fieldname) == 0) { /* then found the required table */ add_one_pattern(pattern_names[k].table,fieldname,pattern,message); return; } } /* If we get here, then the pattern name is not in the built-in list, so create a new entry in pattern_names[] if space remains */ if (k >= (int)((sizeof(pattern_names)/sizeof(pattern_names[0]) - 1))) { /* too many pattern types */ (void)fprintf(stdlog, "%s Out of memory for pattern name [%s] -- pattern ignored\n", WARNING_PREFIX, fieldname); } else { /* sufficient table space remains */ pattern_names[k].name = strdup(fieldname); /* add new table entry */ pattern_names[k].table = (PATTERN_TABLE*)malloc(sizeof(PATTERN_TABLE)); if (pattern_names[k].table == (PATTERN_TABLE*)NULL) fatal("Out of memory for pattern tables"); pattern_names[k].table->patterns = (MATCH_PATTERN*)NULL; pattern_names[k].table->current_size = 0; pattern_names[k].table->maximum_size = 0; add_one_pattern(pattern_names[k].table,fieldname,pattern,message); pattern_names[k+1].name = (char*)NULL; /* mark new end of table */ pattern_names[k+1].table = (PATTERN_TABLE*)NULL; } } #if NEW_STYLE static void append_value(const char *s) #else /* K&R style */ static void append_value(s) const char *s; #endif /* NEW_STYLE */ { size_t n_cv = strlen(current_value); size_t n_s = strlen(s); if ((n_cv + n_s) < MAX_TOKEN) (void)strcpy(¤t_value[n_cv],s); else /* string too long; concatenate into parts */ { out_s(current_value); (void)strcpy(current_value,s); out_with_error(" # ","Value too long for field ``%f''"); } } #if NEW_STYLE static YESorNO apply_function(const char *option, OPTION_FUNCTION_ENTRY table[]) #else /* K&R style */ static YESorNO apply_function(option,table) const char *option; OPTION_FUNCTION_ENTRY table[]; #endif /* NEW_STYLE */ { /* return YES if function matching option was invoked, otherwise NO */ int k; /* index into table[] */ size_t n = strlen(option); /* all chars of option[] will be examined */ for (k = 0; table[k].name != (const char*)NULL; ++k) { if (strnicmp(option,table[k].name,MAX(n,table[k].min_match)) == 0) { table[k].function(); return (YES); } } return (NO); } #if NEW_STYLE static void bad_ISBN(char ISBN[11]) #else /* K&R style */ static void bad_ISBN(ISBN) char ISBN[11]; #endif /* NEW_STYLE */ { static char fmt[] = "Invalid checksum for ISBN %c-%c%c%c%c%c-%c%c%c-%c in ``%%f = %%v''"; char msg[sizeof(fmt)]; (void)sprintf(msg, fmt, (int)ISBN[1], (int)ISBN[2], (int)ISBN[3], (int)ISBN[4], (int)ISBN[5], (int)ISBN[6], (int)ISBN[7], (int)ISBN[8], (int)ISBN[9], (int)ISBN[10]); error(msg); } #if NEW_STYLE static void bad_ISSN(char ISSN[9]) #else /* K&R style */ static void bad_ISSN(ISSN) char ISSN[9]; #endif /* NEW_STYLE */ { static char fmt[] = "Invalid checksum for ISSN %c%c%c%c-%c%c%c%c in ``%%f = %%v''"; char msg[sizeof(fmt)]; (void)sprintf(msg, fmt, (int)ISSN[1], (int)ISSN[2], (int)ISSN[3], (int)ISSN[4], (int)ISSN[5], (int)ISSN[6], (int)ISSN[7], (int)ISSN[8]); error(msg); } static void brace_font_changes(VOID) { int b_level; /* brace level */ size_t k; /* index into current_value[] */ size_t m; /* index into s[] */ YESorNO need_close_brace; char *p; /* pointer into current_value[] */ char *s = shared_string; /* memory-saving device */ /******************************************************************* If the user has coded a title string like "Signal-transducing {G} proteins in {\em Dictyostelium Discoideum}" or "Signal-transducing {G} proteins in {\em {D}ictyostelium {D}iscoideum}" BibTeX styles that downcase titles will downcase the name Dictyostelium Discoideum, even WITH the protecting braces around the D's. The solution offered by this function is to rewrite the title string as "Signal-transducing {G} proteins in {{\em Dictyostelium Discoideum}}" This action cannot be taken without forethought, because there are many cases where the downcasing inside font changes is consistent, so the default run-time option is -no-fix-font-changes. *******************************************************************/ for (b_level = 0, k = 0, m = 0, need_close_brace = NO; current_value[k] ; ++k, ++m) { switch (current_value[k]) { case '{': /* '}' for balance */ b_level++; if (b_level == 1) { p = ¤t_value[k+1]; SKIP_SPACE(p); if (*p == '{') /* '}' for balance */ break; /* already have extra brace level */ if ( (strncmp(p,"\\bf",3) == 0) || (strncmp(p,"\\em",3) == 0) || (strncmp(p,"\\it",3) == 0) || (strncmp(p,"\\rm",3) == 0) || (strncmp(p,"\\sf",3) == 0) || (strncmp(p,"\\sl",3) == 0) || (strncmp(p,"\\tt",3) == 0) ) { s[m++] = '{'; /* '}' for balance */ need_close_brace = YES; } } break; /* '{' for balance */ case '}': if ((b_level == 1) && (need_close_brace == YES)) { /* '{' for balance */ s[m++] = '}'; need_close_brace = NO; } b_level--; break; default: break; } s[m] = current_value[k]; } s[m] = '\0'; /* terminate collected string */ (void)strcpy(current_value, s); } static void check_chapter(VOID) { #if defined(HAVE_OLDCODE) size_t k; size_t n = strlen(current_value) - 1; /* match patterns like "23" and "23-1" */ for (k = 1; k < n; ++k) { /* omit first and last characters -- they are quotation marks */ if (!(Isdigit(current_value[k]) || (current_value[k] == '-'))) break; } if (k == n) return; #else /* NOT defined(HAVE_OLDCODE) */ if (check_patterns(&pt_chapter,current_value) == YES) return; #endif /* defined(HAVE_OLDCODE) */ unexpected(); } static void check_inodes(VOID) { struct stat buflog; struct stat bufout; stdlog_on_stdout = YES; /* assume the worst initially */ (void)fstat(fileno(stdlog),&buflog); (void)fstat(fileno(stdout),&bufout); #if OS_UNIX stdlog_on_stdout = (buflog.st_ino == bufout.st_ino) ? YES : NO; #endif /* OS_UNIX */ #if OS_PCDOS /* No inodes, so use creation times instead */ stdlog_on_stdout = (buflog.st_ctime == bufout.st_ctime) ? YES : NO; #endif /* OS_PCDOS */ #if OS_VAXVMS /* Inode field is 3 separate values */ stdlog_on_stdout = ((buflog.st_ino[0] == bufout.st_ino[0]) && (buflog.st_ino[1] == bufout.st_ino[1]) && (buflog.st_ino[2] == bufout.st_ino[2])) ? YES : NO; #endif /* OS_VAXVMS */ } static void check_ISBN(VOID) { int checksum; char ISBN[11]; /* saved ISBN for error messages */ /* (use slots 1..10 instead of 0..9) */ int k; /* index into ISBN[] */ size_t n; /* index into current_value[] */ YESorNO new_ISBN; /* YES: start new ISBN */ /******************************************************************* ISBN numbers are 10-character values from the set [0-9Xx], with a checksum given by (sum(k=1:9) digit(k) * k) mod 11 == digit(10) where digits have their normal value, X (or x) as a digit has value 10, and spaces and hyphens are ignored. The sum is bounded from above by 10*(1 + 2 + ... + 9) = 450, so even short (16-bit) integers are sufficient for the accumulation. We allow multiple ISBN numbers separated by arbitrary characters other than [0-9Xx], and check each one of them. *******************************************************************/ for (checksum = 0, k = 0, new_ISBN = YES, n = 1; current_value[n+1]; ++n) { /* loop skips surrounding quotes */ if (new_ISBN == YES) { (void)strcpy(ISBN,"???????????"); /* initialize for error messages */ checksum = 0; /* new checksum starting */ k = 0; /* no digits collected yet */ new_ISBN = NO; /* initialization done */ } switch (current_value[n]) { case ' ': case '-': break; /* ignore space and hyphen */ case '0': case '1': case '2': case '3': case '4': case '5': case '6': case '7': case '8': case '9': case 'X': case 'x': /* valid ISBN digit */ k++; if (k < 10) { ISBN[k] = current_value[n]; checksum += ISBN_DIGIT_VALUE(ISBN[k]) * k; break; } else if (k == 10) { ISBN[k] = current_value[n]; if ((checksum % 11) != ISBN_DIGIT_VALUE(ISBN[k])) bad_ISBN(ISBN); new_ISBN = YES; break; } /* k > 10: FALL THROUGH for error */ default: /* ignore all other characters */ if (k > 0) /* then only got partial ISBN */ { bad_ISBN(ISBN); new_ISBN = YES; /* start new checksum */ } break; } /* end switch (current_value[n]) */ } /* end for (loop over current_value[]) */ if ((k > 0) && (new_ISBN == NO)) /* too few digits in last ISBN */ bad_ISBN(ISBN); } static void check_ISSN(VOID) { int checksum; char ISSN[9]; /* saved ISSN for error messages */ /* (use slots 1..8 instead of 0..7) */ int k; /* index into ISSN[] */ size_t n; /* index into current_value[] */ YESorNO new_ISSN; /* YES: start new ISSN */ /******************************************************************* ISSN numbers are 10-character values from the set [0-9Xx], with a checksum given by (sum(k=1:7) digit(k) * (k+2)) mod 11 == digit(8) where digits have their normal value, X (or x) as a digit has value 10, and spaces and hyphens are ignored. The sum is bounded from above by 10*(3 + 4 + ... + 9) = 420, so even short (16-bit) integers are sufficient for the accumulation. We allow multiple ISSN numbers separated by arbitrary characters other than [0-9Xx], and check each one of them. *******************************************************************/ for (checksum = 0, k = 0, new_ISSN = YES, n = 1; current_value[n+1]; ++n) { /* loop skips surrounding quotes */ if (new_ISSN == YES) { (void)strcpy(ISSN,"?????????"); /* initialize for error messages */ k = 0; /* no digits collected yet */ checksum = 0; /* new checksum starting */ new_ISSN = NO; /* initialization done */ } switch (current_value[n]) { case ' ': case '-': break; /* ignore space and hyphen */ case '0': case '1': case '2': case '3': case '4': case '5': case '6': case '7': case '8': case '9': case 'X': case 'x': /* valid ISSN digit */ k++; if (k < 8) { ISSN[k] = current_value[n]; checksum += ISSN_DIGIT_VALUE(ISSN[k]) * (k + 2); break; } else if (k == 8) { ISSN[k] = current_value[n]; if ((checksum % 11) != ISSN_DIGIT_VALUE(ISSN[k])) bad_ISSN(ISSN); new_ISSN = YES; break; } /* k > 8: FALL THROUGH for error */ default: /* ignore all other characters */ if (k > 0) /* then only got partial ISSN */ { bad_ISSN(ISSN); new_ISSN = YES; /* start new checksum */ } break; } /* end switch (current_value[n]) */ } /* end for (loop over current_value[]) */ if ((k > 0) && (new_ISSN == NO)) /* too few digits in last ISSN */ bad_ISSN(ISSN); } #if NEW_STYLE static YESorNO check_junior(const char *last_name) #else /* K&R style */ static YESorNO check_junior(last_name) const char *last_name; #endif /* NEW_STYLE */ { /* return YES: name is Jr.-like, else: NO */ int b_level; /* brace level */ static const char *juniors[] = { /* name parts that parse like "Jr." */ "Jr", "Jr.", "Sr", "Sr.", "SJ", "S.J.", "S. J.", (const char*)NULL, /* list terminator */ }; int k; /* index into juniors[] */ int n; /* index into last_name[] */ for (n = 0, b_level = 0; last_name[n]; ++n) { /* check for "Smith, Jr" and "Smith Jr" and */ switch (last_name[n]) /* convert to "{Smith, Jr}" and "{Smith Jr}" */ { case '{': b_level++; break; case '}': b_level--; break; case ',': if (b_level == 0) return (YES); break; case ' ': /* test for Jr.-like name */ if (b_level == 0) { for (k = 0; juniors[k] != (const char*)NULL; ++k) { if (strnicmp(&last_name[n+1],juniors[k],strlen(juniors[k])) == 0) return (YES); } /* end for (k...) */ if (strcspn(&last_name[n+1],"IVX") == 0) return (YES); /* probably small upper-case Roman number */ } break; default: break; } /* end switch (last_name[n]) */ } /* end for (n = 0,...) */ return (NO); } static void check_key(VOID) { int k; /* index into pattern_names[] */ for (k = 0; pattern_names[k].name != (const char*)NULL; ++k) { if (stricmp(pattern_names[k].name,current_key) == 0) { /* then found the required table */ if (check_patterns(pattern_names[k].table,current_key) == NO) warning("Unexpected citation key ``%k''"); return; } } } #if NEW_STYLE static void check_length(size_t n) #else /* K&R style */ static void check_length(n) size_t n; #endif /* NEW_STYLE */ { if ((check_values == YES) && (n >= STD_MAX_TOKEN)) warning("String length exceeds standard BibTeX limit for ``%f'' entry"); } static void check_month(VOID) { int m; /* month index */ size_t n = strlen(current_value); if (n == 3) /* check for match against standard abbrevs */ { for (m = 0; month_pair[m].old_name != (const char*)NULL; ++m) { if (stricmp(month_pair[m].new_name,current_value) == 0) return; } } /* Hand coding for the remaining patterns is too ugly to contemplate, so we only provide the checking when real pattern matching is available. */ #if !defined(HAVE_OLDCODE) if (check_patterns(&pt_month,current_value) == YES) return; #endif /* !defined(HAVE_OLDCODE) */ unexpected(); } static void check_number(VOID) { #if defined(HAVE_OLDCODE) size_t k; size_t n = strlen(current_value) - 1; /* We expect the value string to match the regexp "[0-9a-zA-Z---,/ ()]+ to handle values like "UMIACS-TR-89-11, CS-TR-2189, SRC-TR-89-13", "RJ 3847 (43914)", "{STAN-CS-89-1256}", "UMIACS-TR-89-3.1, CS-TR-2177.1", "TR\#89-24", "23", "23-27", and "3+4". */ for (k = 1; k < n; ++k) { /* omit first and last characters -- they are quotation marks */ if (!( Isalnum(current_value[k]) || Isspace(current_value[k]) || (current_value[k] == '-') || (current_value[k] == '+') || (current_value[k] == ',') || (current_value[k] == '.') || (current_value[k] == '/') || (current_value[k] == '#') || (current_value[k] == '\\') || (current_value[k] == '(') || (current_value[k] == ')') || (current_value[k] == '{') || (current_value[k] == '}') )) break; } if (k == n) return; #else /* NOT defined(HAVE_OLDCODE) */ if (check_patterns(&pt_number,current_value) == YES) return; #endif /* defined(HAVE_OLDCODE) */ unexpected(); } static void check_other(VOID) { int k; /* index into pattern_names[] */ for (k = 0; pattern_names[k].name != (const char*)NULL; ++k) { if (stricmp(pattern_names[k].name,current_field) == 0) { /* then found the required table */ if (check_patterns(pattern_names[k].table,current_value) == NO) unexpected(); return; } } } static void check_pages(VOID) { /* Need to handle "B721--B729" as well as "721--729"; some physics journals use an initial letter in page number. */ #if defined(HAVE_OLDCODE) int number = 1; size_t k; size_t n = strlen(current_value) - 1; /* We expect the value string to match the regexps [0-9]+ or [0-9]+--[0-9]+ */ for (k = 1; k < n; ++k) { /* omit first and last characters -- they are quotation marks */ switch (current_value[k]) { case '0': case '1': case '2': case '3': case '4': case '5': case '6': case '7': case '8': case '9': if (number > 2) { warning("More than 2 page numbers in ``%f = %v''"); return; } break; case '-': number++; if (current_value[k+1] != '-') /* expect -- */ { warning( "Use en-dash, --, to separate page numbers in ``%f = %v''"); return; } ++k; if (current_value[k+1] == '-') /* should not have --- */ { warning( "Use en-dash, --, to separate page numbers in ``%f = %v''"); return; } break; case ',': number++; break; default: unexpected(); return; } } #else /* NOT defined(HAVE_OLDCODE) */ if (check_patterns(&pt_pages,current_value) == YES) return; #endif /* defined(HAVE_OLDCODE) */ unexpected(); } #if (defined(HAVE_PATTERNS) || defined(HAVE_REGEXP) || defined(HAVE_RECOMP)) #if NEW_STYLE static YESorNO check_patterns(PATTERN_TABLE* pt,const char *value) #else /* K&R style */ static YESorNO check_patterns(pt,value) PATTERN_TABLE* pt; const char *value; #endif /* NEW_STYLE */ { /* Return YES if current_value[] matches a pattern, or there are no patterns, and NO if there is a match failure. Any message associated with a successfully-matched pattern is printed before returning. */ int k; for (k = 0; k < pt->current_size; ++k) { if (PATTERN_MATCHES(value,pt->patterns[k].pattern)) { if (pt->patterns[k].message != (const char*)NULL) { if (pt->patterns[k].message[0] == '?') /* special error flag */ error(pt->patterns[k].message + 1); else /* just normal warning */ warning(pt->patterns[k].message); } return (YES); } } return ((pt->current_size == 0) ? YES : NO); } #endif /* (defined(HAVE_PATTERNS) || defined(HAVE_REGEXP) || defined(HAVE_RECOMP)) */ static void check_volume(VOID) { #if defined(HAVE_OLDCODE) size_t k; size_t n = strlen(current_value) - 1; /* Match patterns like "27", "27A", "27/3", "27A 3", "SMC-13", "VIII", "B", "{IX}", "1.2", "Special issue A", and "11 and 12". However, NEVER match pattern like "11(5)", since that is probably an erroneous incorporation of issue number into the volume value. */ for (k = 1; k < n; ++k) { /* omit first and last characters -- they are quotation marks */ if (!( Isalnum(current_value[k]) || (current_value[k] == '-') || (current_value[k] == '/') || (current_value[k] == '.') || (current_value[k] == ' ') || (current_value[k] == '{') || (current_value[k] == '}') )) { unexpected(); return; } } #else /* NOT defined(HAVE_OLDCODE) */ if (check_patterns(&pt_volume,current_value) == YES) return; #endif /* defined(HAVE_OLDCODE) */ unexpected(); } static void check_year(VOID) { char *p; char *q; long year; #if defined(HAVE_OLDCODE) size_t k; size_t n; /* We expect the value string to match the regexp [0-9]+ */ for (k = 1, n = strlen(current_value) - 1; k < n; ++k) { /* omit first and last characters -- they are quotation marks */ if (!Isdigit(current_value[k])) { warning("Non-digit found in field value of ``%f = %v''"); return; } } #else /* NOT defined(HAVE_PATTERNS) */ if (check_patterns(&pt_year,current_value) == YES) return; unexpected(); #endif /* defined(HAVE_PATTERNS) */ for (p = current_value; *p ; ) /* now validate all digit strings */ { if (Isdigit(*p)) /* then have digit string */ { /* now make sure year is `reasonable' */ year = strtol(p,&q,10); if ((year < 1800L) || (year > 2099L)) warning("Suspicious year in ``%f = %v''"); p = q; } else /* ignore other characters */ p++; } } #if NEW_STYLE static void do_args(int argc, char *argv[]) #else /* K&R style */ static void do_args(argc,argv) int argc; char *argv[]; #endif /* NEW_STYLE */ { int k; /* index into argv[] */ #define MSG_PREFIX "Unrecognized option switch: " #define MAX_OPTION_LENGTH 100 char msg[sizeof(MSG_PREFIX) + MAX_OPTION_LENGTH + 1]; /* for error messages */ int nfiles; /* number of files found in argv[] */ static OPTION_FUNCTION_ENTRY options[] = { {"?", 1, opt_help}, {"author", 1, opt_author}, {"check-values", 1, opt_check_values}, {"delete-empty-values", 1, opt_delete_empty_values}, {"error-log", 1, opt_error_log}, {"file-position", 3, opt_file_position}, {"fix-font-changes", 5, opt_fix_font_changes}, {"fix-initials", 5, opt_fix_initials}, {"fix-names", 5, opt_fix_names}, {"help", 1, opt_help}, {"init-file", 1, opt_init_file}, {"max-width", 1, opt_max_width}, {"no-check-values", 4, opt_check_values}, {"no-delete-empty-values", 4, opt_delete_empty_values}, {"no-file-position", 6, opt_file_position}, {"no-fix-font-changes", 8, opt_fix_font_changes}, {"no-fix-initials", 8, opt_fix_initials}, {"no-fix-names", 8, opt_fix_names}, {"no-parbreaks", 5, opt_parbreaks}, {"no-prettyprint", 6, opt_prettyprint}, {"no-print-patterns", 6, opt_print_patterns}, {"no-read-init-files", 6, opt_read_init_files}, {"no-remove-OPT-prefixes", 6, opt_remove_OPT_prefixes}, {"no-scribe", 4, opt_scribe}, {"no-trace-file-opening", 4, opt_trace_file_opening}, {"no-warnings", 4, opt_warnings}, {"parbreaks", 2, opt_parbreaks}, {"prettyprint", 3, opt_prettyprint}, {"print-patterns", 3, opt_print_patterns}, {"read-init-files", 3, opt_read_init_files}, {"remove-OPT-prefixes", 3, opt_remove_OPT_prefixes}, {"scribe", 1, opt_scribe}, {"trace-file-opening", 1, opt_trace_file_opening}, {"version", 1, opt_version}, {"warnings", 1, opt_warnings}, {(const char*)NULL, 0, (void (*)(VOID))NULL}, }; for (nfiles = 1, k = 1; k < argc; ++k) { if ( (argv[k][1] != '\0') && isoptionprefix(argv[k][0]) ) { /* then process command-line switch */ current_index = k; /* needed by opt_init_file() and */ next_option = argv[k+1]; /* opt_error_log() */ current_option = argv[k]; /* needed by YESorNOarg() */ if (apply_function(current_option+1,options) == NO) { (void)sprintf(msg, "%s%.*s", MSG_PREFIX, MAX_OPTION_LENGTH, current_option); warning(msg); usage(); exit(EXIT_FAILURE); } k = current_index; /* some opt_xxx() functions update it */ } else /* save file names */ argv[nfiles++] = argv[k]; /* shuffle file names down */ } argv[nfiles] = (char*)NULL; /* terminate new argument list */ } static void do_at(VOID) /* parse @name{...} */ { int c; token_start = the_file; /* remember location of token start */ c = get_char(); the_entry = the_file; if ((non_white_chars == 1) && (c == '@')) { at_level++; out_at(); if (brace_level != 0) { error( "@ begins line, but brace level is not zero after entry ``@%e{%k,''"); brace_level = 0; } } else if (c != EOF) { out_c(c); out_with_error("", "Expected @name{...} after entry ``@%e{%k,''"); } } static void do_BibTeX_entry(VOID) { /************************************************************* Parse a BibTeX entry, one of: @entry-name{key,field=value,field=value,...,} @string{name=value} @preamble{...} *************************************************************/ new_entry(); do_at(); if ((rflag == YES) || (eofile == YES)) return; do_optional_space(); do_entry_name(); if (rflag == YES) return; if (STREQUAL(current_entry_name,"Include")) do_group(); else if (STREQUAL(current_entry_name,"Preamble")) do_preamble(); else if (STREQUAL(current_entry_name,"String")) do_string(); else /* expect @name{key, field = value, ... } */ { do_optional_space(); do_open_brace(); if (rflag == YES) return; do_optional_space(); do_key_name(); if (rflag == YES) return; do_optional_space(); do_comma(); if (rflag == YES) return; do_optional_space(); while (do_field_value_pair() == YES) { do_optional_space(); do_comma(); /* this supplies any missing optional comma */ if ((rflag == YES) || (eofile == YES)) return; do_optional_space(); } if (rflag == YES) return; do_optional_space(); do_close_brace(); } flush_inter_entry_space(); } /*********************************************************************** BibTeX field values can take several forms, as illustrated by this simple BNF grammar: BibTeX-value-string: simple-string | simple-string # BibTeX-value-string simple-string: "quoted string" | {braced-string} | digit-sequence | alpha-sequence | ***********************************************************************/ static void do_BibTeX_value(VOID) /* process BibTeX value string */ { if (prettyprint == YES) do_BibTeX_value_1(); else do_BibTeX_value_2(); } static void do_BibTeX_value_1(VOID) /* process BibTeX value string */ { /* for prettyprinted output */ /* In order to support string value checking, we need to collect complete values, including intervening inline comments, which we bracket by magic delimiters so they can be ignored during pattern matching, and restored on output. Space between values is simply discarded. */ int c; the_value = the_file; current_value[0] = '\0'; append_value(get_simple_string()); do_optional_inline_comment(); while ((c = get_char()) == '#') { append_value(" # "); do_optional_inline_comment(); append_value(get_simple_string()); do_optional_inline_comment(); } put_back(c); out_value(); } static void do_BibTeX_value_2(VOID) /* process BibTeX value string */ { /* for lexical analysis output */ int c; the_value = the_file; (void)strcpy(current_value,get_simple_string()); out_string((current_value[0] == '"') ? TOKEN_VALUE : TOKEN_ABBREV, current_value); do_optional_space(); while ((c = get_char()) == '#') { out_string(TOKEN_SPACE," "); out_string(TOKEN_SHARP,"#"); out_string(TOKEN_SPACE," "); do_optional_space(); (void)strcpy(current_value,get_simple_string()); out_string((current_value[0] == '"') ? TOKEN_VALUE : TOKEN_ABBREV, current_value); do_optional_space(); } put_back(c); } static void do_close_brace(VOID) /* parse level 1 closing brace or parenthesis */ { int c; c = get_char(); if (c == EOF) return; else if (c == close_char) { if (c == ')') brace_level--; /* get_char() could not do this for us */ out_close_brace(); /* standardize parenthesis to brace */ if (brace_level != 0) out_with_error("", "Non-zero brace level after @name{...} processed. Last key = ``%k''"); } else /* raise error and try to resynchronize */ { out_c(c); out_with_error("", "Expected closing brace or parenthesis in entry ``@%e{%k,''"); } } static void do_comma(VOID) { int c; /* Parse a comma, or an optional comma before a closing brace or parenthesis; an omitted legal comma is supplied explicitly. A newline is output after the comma so that field = value pairs appear on separate lines. */ the_value = the_file; c = get_char(); if (c == EOF) NOOP; else if (c == ',') { if (discard_next_comma == NO) { out_comma(); out_newline(); } } else if (c == close_char) { /* supply missing comma for last field = value pair*/ if (c == ')') brace_level--; /* get_char() could not do this for us */ if (brace_level == 0) /* reached end of bibliography entry */ { if (c == ')') brace_level++; /* put_back() could not do this for us */ put_back(c); if (discard_next_comma == NO) { out_comma(); out_newline(); } } else /* no comma, and still in bibliography entry */ { out_c(c); out_with_error("","Non-zero brace level after @name{...} \ processed. Last entry = ``@%e{%k,''"); } } else /* raise error and try to resynchronize */ { out_c(c); out_with_error("", "Expected comma after last field ``%f''"); } discard_next_comma = NO; } static void do_entry_name(VOID) /* process BibTeX entry name */ { int c; size_t k; int n; static NAME_PAIR entry_pair[] = { /* entry name case change table */ { "Deathesis", "DEAthesis" }, { "Inbook", "InBook" }, { "Incollection", "InCollection" }, { "Inproceedings", "InProceedings" }, { "Mastersthesis", "MastersThesis" }, { "Phdthesis", "PhdThesis" }, { "Techreport", "TechReport" }, }; token_start = the_file; /* remember location of token start */ for (k = 0; ((c = get_char()) != EOF) && isidchar(c); ++k) { /* store capitalized entry name */ if ((k == 0) && !Isalpha(c)) error("Non-alphabetic character begins an entry name"); if ((k == 0) && Islower(c)) c = toupper(c); else if ((k > 0) && Isupper(c)) c = tolower(c); if ((parbreaks == NO) && (is_parbreak == YES)) { APPEND_CHAR(current_entry_name,k,c); out_with_parbreak_error(current_entry_name); return; } if (k >= MAX_TOKEN) { APPEND_CHAR(current_entry_name,k,c); out_with_error(current_entry_name, "@entry_name too long"); return; } current_entry_name[k] = (char)c; } current_entry_name[k] = (char)'\0'; if (c != EOF) put_back(c); /* Substitute a few entry names that look better in upper case */ for (n = 0; n < (int)(sizeof(entry_pair)/sizeof(entry_pair[0])); ++n) if (STREQUAL(current_entry_name,entry_pair[n].old_name)) (void)strcpy(current_entry_name,entry_pair[n].new_name); if (prettyprint == YES) out_s(current_entry_name); else if (STREQUAL(current_entry_name,"Include")) out_token(TOKEN_INCLUDE, current_entry_name); else if (STREQUAL(current_entry_name,"Preamble")) out_token(TOKEN_PREAMBLE, current_entry_name); else if (STREQUAL(current_entry_name,"String")) out_token(TOKEN_STRING, current_entry_name); else out_token(TOKEN_ENTRY, current_entry_name); check_length(k); } static void do_equals(VOID) /* process = in field = value */ { int c; the_value = the_file; token_start = the_file; /* remember location of token start */ c = get_char(); if (c == EOF) NOOP; else if (c == '=') out_equals(); else { out_c(c); out_with_error("", "Expected \"=\" after field ``%f''"); } out_spaces((int)(VALUE_INDENTATION - the_file.output.column_position)); /* supply leading indentation */ } #if NEW_STYLE static void do_escapes(char *s) #else /* K&R style */ static void do_escapes(s) char *s; #endif /* NEW_STYLE */ { /* reduce escape sequences in s[] */ int base; /* number base for strtol() */ char *endptr; /* pointer returned by strtol() */ char *p; /* pointer into output s[] */ if (s == (char*)NULL) /* nothing to do if no string */ return; for (p = s ; *s ; ++s) { if (*s == '\\') /* have escaped character */ { base = 8; /* base is tentatively octal */ switch (*++s) { case 'a': *p++ = CTL('G'); break; case 'b': *p++ = CTL('H'); break; case 'f': *p++ = CTL('L'); break; case 'n': *p++ = CTL('J'); break; case 'r': *p++ = CTL('M'); break; case 't': *p++ = CTL('I'); break; case 'v': *p++ = CTL('K'); break; case '0': if (TOLOWER(s[1]) == 'x') /* 0x means hexadecimal */ base = 16; /* FALL THROUGH */ case '1': case '2': case '3': case '4': case '5': case '6': case '7': *p++ = (char)strtol((const char*)s,&endptr,base); s = endptr - 1; /* point to last used character */ break; default: /* \x becomes x for all other x */ *p++ = *s; break; } } else /* not escaped, so just copy it */ *p++ = *s; } *p = '\0'; /* terminate final string */ } #if NEW_STYLE static void do_fileinit(const char *bibfilename) /* process one initialization file */ #else /* K&R style */ static void do_fileinit(bibfilename) /* process one initialization file */ const char *bibfilename; #endif /* NEW_STYLE */ { char *p; char *ext; ext = GETDEFAULT(BIBCLEAN_EXT,INITFILE_EXT); if (strrchr(bibfilename,'.') != (char*)NULL) /* then have file extension */ { /* convert foo.bib to foo.ini and then process it as an init file */ if ((p = (char*)malloc(strlen(bibfilename) + strlen(ext) + 1)) != (char*)NULL) { (void)strcpy(p,bibfilename); (void)strcpy(strrchr(p,'.'),ext); do_initfile((char*)NULL,p); free(p); } } } static void do_field(VOID) /* process BibTeX field name */ { int c; size_t k; int n; static NAME_PAIR field_pair[] = { /* field name case change table */ { "ansi-standard-number", "ANSI-standard-number" }, { "ieee-standard-number", "IEEE-standard-number" }, { "isbn", "ISBN" }, { "iso-standard-number", "ISO-standard-number" }, { "issn", "ISSN" }, { "lccn", "LCCN" }, }; the_value = the_file; token_start = the_file; /* remember location of token start */ for (k = 0, c = get_char(); (c != EOF) && isidchar(c); c = get_char(), k++) { if (k >= MAX_TOKEN) { APPEND_CHAR(current_field,k,c); out_with_error(current_field, "Entry field name too long"); return; } else if ((k == 0) && !Isalpha(c)) error("Non-alphabetic character begins a field name"); current_field[k] = (char)(((in_string == NO) && Isupper(c)) ? tolower(c) : c); } if (c != EOF) put_back(c); current_field[k] = (char)'\0'; if (in_string == NO) /* @String{...} contents untouched */ { /* Substitute a few field names that look better in upper case */ for (n = 0; n < (int)(sizeof(field_pair)/sizeof(field_pair[0])); ++n) if (STREQUAL(current_field,field_pair[n].old_name)) (void)strcpy(current_field,field_pair[n].new_name); if (strncmp("opt",current_field,3) == 0) { /* Emacs bibtex.el expects OPT */ (void)strncpy(current_field,"OPT",3); } } if (k > 0) out_field(); check_length(k); } static YESorNO do_field_value_pair(VOID) /* process field = value pair */ { if (eofile == YES) return (NO); do_field(); if ((rflag == YES) || (eofile == YES) || (current_field[0] == '\0')) return (NO); space_count = 0L; /* examined in do_Scribe_separator() */ do_optional_space(); /* and set here */ if (Scribe == YES) do_Scribe_separator(); else do_equals(); if ((rflag == YES) || (eofile == YES)) return (NO); do_optional_space(); if (Scribe == YES) do_Scribe_value(); else do_BibTeX_value(); if ((rflag == YES) || (eofile == YES)) return (NO); return (YES); } #if NEW_STYLE static void do_files(int argc, char *argv[]) #else /* K&R style */ static void do_files(argc,argv) int argc; char *argv[]; #endif /* NEW_STYLE */ { FILE *fp; int k = argc; /* index into argv[] */ /* set to argc to remove optimizer complaints about unused argument */ if (argv[1] == (char*)NULL) /* no files specified, so use stdin */ { the_file.input.filename = "stdin"; do_one_file(stdin); } else /* else use command-line files left in argv[] */ { for (k = 1; argv[k] != (char*)NULL; ++k) { if (STREQUAL(argv[k],"-")) { /* A filename of "-" is conventionally interpreted in the UNIX world as a synonym for stdin, since that system otherwise lacks true filenames for stdin, stdout, and stdlog. We process stdin with do_one_file(), but never close it so that subsequent read attempts will silently, and harmlessly, fail at end-of-file. */ the_file.input.filename = "stdin"; do_one_file(stdin); } else if ((fp = tfopen(argv[k], "r")) == (FILE*)NULL) { (void)fprintf(stdlog, "\n%s Ignoring open failure on file [%s]\n", ERROR_PREFIX, argv[k]); perror("perror() says"); } else /* open succeeded, so process file */ { if (k > 1) /* supply blank line between */ out_newline(); /* entries at file boundaries */ the_file.input.filename = argv[k]; if (read_initialization_files == YES) do_fileinit(the_file.input.filename); do_one_file(fp); (void)fclose(fp); /* close to save file resources */ } } } } static void do_group(VOID) /* copy a braced group verbatim */ { int c; char *s = shared_string; /* memory-saving device */ size_t k; /* index into s[] */ do_optional_space(); if (prettyprint == YES) { do_open_brace(); if (rflag == YES) return; while ((c = get_char()) != EOF) { if ((brace_level == 1) && (close_char == ')') && (c == close_char)) { /* end of @entry(...) */ brace_level = 0; c = '}'; } if ((non_white_chars == 1) && (c == '@')) error("@ begins line, but brace level is not zero after \ entry ``@%e{%k,''"); if ((brace_level == 0) && (c == '}')) out_close_brace(); else out_c(c); if (brace_level == 0) break; } } else /* prettyprint == NO */ { /* output entire braced group as one literal*/ token_start = the_file; /* remember location of token start */ c = get_char(); if (c == '{') close_char = '}'; else if (c == '(') { close_char = ')'; brace_level++; /* get_char() could not do this for us */ } else /* raise error and try to resynchronize */ { s[0] = c; s[1] = '\0'; out_token(TOKEN_LITERAL,s); error( "Expected open brace or parenthesis. Last entry = ``@%e{%k,''"); return; } s[0] = '{'; /* standardize to outer braces */ for (k = 1; c != EOF;) { c = get_char(); if (k >= MAX_TOKEN) { error("Braced literal string too long for entry ``%e''"); s[k] = '\0'; out_token(TOKEN_LITERAL, s); return; } s[k++] = c; if ((c == close_char) && (c == ')')) brace_level--; /* get_char() could not do this for us */ if (brace_level == 0) break; /* here's the normal loop exit */ } s[k-1] = '}'; /* standardize to outer braces */ s[k] = '\0'; /* terminate string */ out_token(TOKEN_LITERAL, s); } } #if NEW_STYLE static void do_initfile(const char *pathlist, const char *name) #else /* K&R style */ static void do_initfile(pathlist,name) const char *pathlist; const char *name; #endif /* NEW_STYLE */ { FILE *fp; char *p; if ((initialization_file_name = findfile(pathlist,name)) == (char*)NULL) return; /* silently ignore missing files */ if ((fp = tfopen(initialization_file_name,"r")) == (FILE*)NULL) return; /* silently ignore missing files */ while ((p = get_line(fp)) != (char *)NULL) { /* process init file lines */ SKIP_SPACE(p); if (isoptionprefix(*p)) do_single_arg(p); /* then expect -option [value] */ else do_new_pattern(p); /* else expect field = "value" */ } (void)fclose(fp); } static void do_key_name(VOID) /* process BibTeX citation key */ { int c; size_t k; token_start = the_file; /* remember location of token start */ for (k = 0, c = get_char(); (c != EOF) && (c != ',') && !Isspace(c); c = get_char(), k++) { if (k >= MAX_TOKEN) { APPEND_CHAR(current_key,k,c); out_with_error(current_key, "Citation key too long"); return; } current_key[k] = (char)c; } current_key[k] = (char)'\0'; if (c != EOF) put_back(c); if (check_values == YES) check_key(); out_string(TOKEN_KEY, current_key); check_length(k); } #if (SCREEN_LINES > 0) #if NEW_STYLE static int do_more(FILE *fpout,int line_number, int pause_after) #else /* K&R style */ static int do_more(fpout, line_number, pause_after) FILE *fpout; int line_number; int pause_after; #endif /* NEW_STYLE */ { #if OS_PCDOS #define MORE_HELP \ "More? f)orward b)ackward e)nd q)uit r)efresh t)op \030 \031 PgUp PgDn Home \ End\n\r" #else /* NOT OS_PCDOS */ #define MORE_HELP \ "More? f)orward b)ackward d)own e)nd q)uit r)efresh t)op u)p\n\r" #endif /* OS_PCDOS */ (void)fputs(MORE_HELP,fpout); (void)fflush(fpout); /* make screen up-to-date */ for (;;) /* loop until a valid input code is received */ { switch (kbcode()) { case KEYBOARD_PGUP: /* backward screen */ return (MAX(0,line_number + 1 - 2*pause_after)); case KEYBOARD_DOWN: /* go down 1 line (scroll up 1 line) */ return (line_number + 2 - pause_after); case KEYBOARD_END: /* end */ return (LAST_SCREEN_LINE); case KEYBOARD_PGDN: /* forward screen */ return (line_number + 1); case KEYBOARD_HELP: (void)fputs(MORE_HELP,fpout); break; case KEYBOARD_EOF: case KEYBOARD_QUIT: return (EOF); case KEYBOARD_AGAIN: /* refresh */ return (MAX(0,line_number + 1 - pause_after)); case KEYBOARD_HOME: /* top */ return (0); case KEYBOARD_UP: /* go up 1 line (scroll down 1 line) */ return (line_number + 0 - pause_after); case KEYBOARD_UNKNOWN: default: /* anything else produces */ fputc('\007',fpout); /* an error beep */ break; } /* end switch (c...) */ } /* end for (;;) */ } #endif /* (SCREEN_LINES > 0) */ #if NEW_STYLE static void do_new_pattern(char *s) #else /* K&R style */ static void do_new_pattern(s) char *s; #endif /* NEW_STYLE */ { char *field; char *p = s; YESorNO saw_space; char *value; /******************************************************************* We expect s[] to contain field = "value" field : "value" field "value" field = "value" "message" field : "value" "message" field "value" "message" Empty lines are silently ignored. *******************************************************************/ field = get_token(p,&p,"=: \t\v\f"); if (field == (char*)NULL) return; /* then we have an empty line */ if (p != (char*)NULL) /* then we have more text */ { saw_space = Isspace(*p) ? YES : NO; SKIP_SPACE(p); if ((saw_space == YES) || isfieldvalueseparator(*p)) { if (isfieldvalueseparator(*p)) ++p; /* then move past separator */ SKIP_SPACE(p); if (*p == '"') /* then have quoted value */ { value = get_token(p,&p," \t\v\f"); if (value != (char*)NULL) { SKIP_SPACE(p); if (*p == '"') /* then have quoted message */ { add_pattern(field,value,get_token(p,&p," \t\v\f")); return; } else if ((*p == '\0') || (*p == COMMENT_PREFIX)) { /* have end of string s[] */ add_pattern(field,value,(char*)NULL); return; } } } } } (void)fprintf(stdlog,"%s Bad line [%s] in initialization file [%s]\n", ERROR_PREFIX, s, initialization_file_name); exit(EXIT_FAILURE); } static void do_newline(VOID) { int c; /* Newlines are standardized by bibclean inside bibliographic entries, */ /* so we only output a newline here if we are outside such an entry. */ c = get_char(); if (c == '\n') { if (brace_level == 0) out_newline(); } else put_back(c); } #if NEW_STYLE static void do_one_file(FILE *fp) /* process one input file on fp */ #else /* K&R style */ static void do_one_file(fp) /* process one input file on fp */ FILE *fp; #endif /* NEW_STYLE */ { fpin = fp; /* save file pointer globally for get_char() */ new_io_pair(&the_file); eofile = NO; new_entry(); while (eofile == NO) { do_optional_space(); do_other(); if (Scribe == YES) do_Scribe_entry(); else do_BibTeX_entry(); } out_flush(); /* flush all buffered output */ if (brace_level != 0) error("Non-zero brace level at end-of-file"); } static void do_open_brace(VOID) /* process open brace or parenthesis */ { int c; c = get_char(); if (c == EOF) return; else if (c == '{') { close_char = '}'; out_open_brace(); } else if (c == '(') { close_char = ')'; brace_level++; /* get_char() could not do this for us */ out_open_brace(); /* standardize parenthesis to brace */ } else /* raise error and try to resynchronize */ { out_c(c); out_with_error("", "Expected open brace or parenthesis. Last entry = ``@%e{%k,''"); } } static void do_optional_inline_comment(VOID) { size_t n; char *s; for (;;) { s = get_optional_space(); switch ((int)s[0]) { case BIBTEX_COMMENT_PREFIX: n = strlen(s); memmove(s+1,s,n); s[0] = BIBTEX_HIDDEN_DELIMITER; s[n+1] = BIBTEX_HIDDEN_DELIMITER; s[n+2] = '\0'; append_value(s); break; case '\n': /* newline or */ case ' ': /* horizontal space token */ case '\f': case '\r': case '\t': case '\v': break; /* discard white space */ default: /* no more space or inline comments */ return; /* here's the loop exit */ } } } static void do_optional_space(VOID) { /* skip over optional horizontal space, newline, and in-line comments */ YESorNO save_wrapping; char *s; for (;;) { s = get_optional_space(); switch (s[0]) { case '\n': /* newline token */ space_count++; put_back((int)s[0]); do_newline(); break; case ' ': /* horizontal space token */ case '\f': case '\r': case '\t': case '\v': space_count++; put_back((int)s[0]); do_space(); break; case BIBTEX_COMMENT_PREFIX: /* in-line comment token */ save_wrapping = wrapping; wrapping = NO; /* inline comments are never line wrapped */ out_string(TOKEN_INLINE,s); wrapping = save_wrapping; break; default: /* not optional space */ return; /* here's the loop exit */ } } } static void do_other(VOID) /* copy non-BibTeX text verbatim */ { int c; /* current input character */ size_t k; /* index into s[] */ YESorNO save_wrapping; char *s = shared_string; /* memory-saving device */ save_wrapping = wrapping; wrapping = NO; /* For the purposes of lexical analysis (-no-prettyprint), we collect complete lines, rather than single characters. */ for (k = 0, s[0] = (char)'\0'; (c = get_char()) != EOF; ) { if ((c == '@') && (non_white_chars == 1)) { /* new entry found */ put_back(c); break; } if (k >= MAX_TOKEN) { /* buffer full, empty it and start a new one */ APPEND_CHAR(s,k,c); out_other(s); k = 0; } else if (c == '\n') /* end of line */ { s[k] = (char)'\0'; out_other(s); /* output line contents */ out_newline(); /* and then separate newline token */ k = 0; } else if (Isspace(s[0])) /* then collecting whitespace */ { if (Isspace(c)) /* still collecting whitespace */ s[k++] = (char)c; else /* end of whitespace */ { s[k] = (char)'\0'; out_other(s); /* output whitespace token */ k = 0; s[k++] = (char)c; /* and start new one */ } } else s[k++] = (char)c; } s[k] = (char)'\0'; out_other(s); wrapping = save_wrapping; } static void do_preamble(VOID) { do_optional_space(); do_open_brace(); if (rflag == YES) return; do_optional_space(); in_preamble = YES; do_BibTeX_value(); in_preamble = NO; if (rflag == YES) return; do_optional_space(); do_close_brace(); } #if NEW_STYLE static void do_preargs(int argc, char *argv[]) #else /* K&R style */ static void do_preargs(argc,argv) int argc; char *argv[]; #endif /* NEW_STYLE */ { int k; static OPTION_FUNCTION_ENTRY options[] = { {"no-print-patterns", 5, opt_print_patterns}, {"no-read-init-files", 6, opt_read_init_files}, {"no-trace-file-opening", 4, opt_trace_file_opening}, {"print-patterns", 2, opt_print_patterns}, {"read-init-files", 3, opt_read_init_files}, {"trace-file-opening", 1, opt_trace_file_opening}, {(const char*)NULL, 0, (void (*)(VOID))NULL}, }; for (k = 1; k < argc; ++k) { /* Do argument scan for options that must be known BEFORE initializations are attempted. */ if ( (argv[k][1] != '\0') && isoptionprefix(argv[k][0]) ) { /* then process command-line switch */ current_index = k; current_option = argv[k]; next_option = argv[k+1]; (void)apply_function(current_option+1,options); } } } static void do_Scribe_block_comment(VOID) { int b_level = 0; /* brace level */ int c; int k; char *p; char s[3+1]; /* to hold "end" */ p = get_Scribe_string(); /* expect to get "comment" */ if (stricmp(p,"\"comment\"") == 0) { /* found start of @Begin{comment} */ for (k = 6; k > 0; --k) out_c(DELETE_CHAR); /* delete "@Begin" from output */ /* that was output by do_entry_name() */ out_s("@Comment{"); /* convert to BibTeX `comment' */ while ((c = get_char()) != EOF) { switch (c) { case '@': /* lookahead for "@End" */ s[0] = (char)get_char(); s[1] = (char)get_char(); s[2] = (char)get_char(); s[3] = (char)'\0'; if (stricmp(s,"end") == 0) { /* then we have @End */ p = get_Scribe_string(); /* so get what follows */ if (stricmp(p,"\"Comment\"") == 0) { out_close_brace();/* found @End{comment}, so finish conversion to @Comment{...} */ return; /* block comment conversion done! */ } else /* false alarm, just stuff lookahead */ { /* back into input stream */ put_back_string(p); put_back_string(s); } } else /* lookahead was NOT "@End" */ put_back_string(s); break; case '{': b_level++; break; case '}': if (b_level <= 0) out_open_brace(); /* keep output braces balanced */ else b_level--; break; } /* end switch(c) */ out_c(c); /* copy one comment character */ } /* end while ((c = ...)) */ } else /* was not @Begin{comment} after all */ put_back_string(p); } static void do_Scribe_close_delimiter(VOID) { int c; static char fmt[] = "Expected Scribe close delimiter `%c' [8#%03o], but \ found `%c' [8#%03o] instead for field ``%%f''"; char msg[sizeof(fmt)]; c = get_char(); if ((parbreaks == NO) && (is_parbreak == YES)) { APPEND_CHAR(msg,0,c); out_with_parbreak_error(msg); return; } if (c == EOF) return; else if (c == close_char) { if (c == ')') brace_level--; /* get_char() could not do this for us */ out_close_brace(); /* standardize parenthesis to brace */ } else /* raise error and try to resynchronize */ { out_c(c); (void)sprintf(msg, fmt, close_char, BYTE_VAL(close_char), (int)(Isprint(c) ? c : '?'), BYTE_VAL(c)); out_with_error("", msg); } } static void do_Scribe_comment(VOID) { int c; int b_level = 0; /* brace level */ /* BibTeX does not yet have a comment syntax, so we just output the Scribe comment in braces, ensuring that internal braces are balanced. */ do_optional_space(); do_Scribe_open_delimiter(); /* this outputs an opening brace */ if (rflag == YES) return; for (c = get_char(); (c != EOF) && (c != close_char); c = get_char()) { if (c == '{') b_level++; else if (c == '}') { b_level--; if (b_level < 0) { out_open_brace(); /* force matching internal braces */ b_level++; } } out_c(c); } for (; b_level > 0; b_level--) out_close_brace(); /* force matching internal braces */ out_close_brace(); } static void do_Scribe_entry(VOID) { /************************************************************* Parse a Scribe entry, one of: @entry-name{key,field=value,field=value,...,} @string{name=value} @comment{...} @begin{comment}...@end{comment} The = separator in field/value pairs may also be a space or a slash. Any of the seven Scribe delimiters can be used to surround the value(s) following @name, and to surround values of field value pairs. *************************************************************/ int save_close_char; new_entry(); do_at(); if ((rflag == YES) || (eofile == YES)) return; do_optional_space(); do_entry_name(); if (rflag == YES) return; if (STREQUAL(current_entry_name,"Comment")) do_Scribe_comment(); else if (STREQUAL(current_entry_name,"Begin")) do_Scribe_block_comment(); else if (STREQUAL(current_entry_name,"String")) do_string(); else { do_optional_space(); do_Scribe_open_delimiter(); if (rflag == YES) return; save_close_char = close_char; brace_level = 1; /* get_char() cannot do this for us */ do_optional_space(); do_key_name(); if (rflag == YES) return; do_optional_space(); do_comma(); if (rflag == YES) return; do_optional_space(); while (do_field_value_pair() == YES) { do_optional_space(); do_comma(); /* this supplies any missing optional comma */ if ((rflag == YES) || (eofile == YES)) return; do_optional_space(); } if (rflag == YES) return; do_optional_space(); close_char = save_close_char; do_Scribe_close_delimiter(); } flush_inter_entry_space(); } static void do_Scribe_open_delimiter(VOID) /* process open delimiter */ { int c; char *p; c = get_char(); if (c == EOF) return; else { p = strchr(Scribe_open_delims,c); if (p == (char*)NULL) { out_c(c); out_with_error("", "Expected Scribe open delimiter, one of { [ ( < ' \" ` for field ``%f''"); return; } close_char = Scribe_close_delims[(int)(p - Scribe_open_delims)]; out_open_brace(); /* standardize open delimiter to brace */ } } static void do_Scribe_separator(VOID) { int c; YESorNO saw_space = NO; the_value = the_file; saw_space = (space_count > 0L) ? YES : NO; c = get_char(); if ((parbreaks == NO) && (is_parbreak == YES)) { char msg[2]; APPEND_CHAR(msg,0,c); out_with_parbreak_error(msg); return; } if (c == EOF) NOOP; else if ((c == '=') || (c == '/')) out_equals(); else if (saw_space == YES) /* have field value with no binary operator */ { out_equals(); /* supply the missing = operator */ put_back(c); /* this is first character of value string */ } else /* looks like run-together fieldvalue */ { out_c(c); out_with_error("", "Expected Scribe separator \"=\", \"/\", or \" \" for field ``%f''"); } out_spaces((int)(VALUE_INDENTATION - the_file.output.column_position)); /* supply leading indentation */ } /*********************************************************************** Scribe field values can take several forms, as illustrated by this simple BNF grammar: Scribe-value-string: * | * ***********************************************************************/ static void do_Scribe_value(VOID) /* process Scribe value string */ { the_value = the_file; (void)strcpy(current_value,get_Scribe_string()); if ((rflag == YES) || (eofile == YES)) out_s(current_value); else out_value(); } #if NEW_STYLE static void do_single_arg(char *s) #else /* K&R style */ static void do_single_arg(s) char *s; #endif /* NEW_STYLE */ { /* expect -option or -option value */ char *temp_argv[4]; /* "program" "-option" "value" NULL */ int temp_argc; /* temporary argument count */ temp_argv[0] = program_name; /* 0th argument always program name */ temp_argv[1] = get_token(s,&s," \t\v\f"); /* option */ temp_argv[2] = get_token(s,&s," \t\v\f"); /* value */ temp_argv[3] = (char *)NULL; temp_argc = (temp_argv[2] == (char*)NULL) ? 2 : 3; do_args(temp_argc,temp_argv); } static void do_space(VOID) { int c; char *s = shared_string; /* memory-saving device */ size_t k; /* index into s[] */ token_start = the_file; /* remember location of token start */ c = get_char(); s[0] = '\0'; for (k = 0; (c != EOF) && Isspace(c) && (c != '\n'); ) { if (k >= MAX_TOKEN) { /* split long comments into multiple ones */ s[k] = '\0'; if (prettyprint == NO) out_token(TOKEN_SPACE,s); /* else discard: spaces are standardized during prettyprinting */ k = 0; } s[k++] = c; c = get_char(); } s[k] = '\0'; /* terminate token string */ if (prettyprint == NO) out_token(TOKEN_SPACE,s); /* else discard: spaces are standardized during prettyprinting */ put_back(c); /* restore lookahead */ } static void do_string(VOID) /* process @String{abbrev = "value"} */ { do_optional_space(); in_string = YES; do { /* one trip loop */ do_open_brace(); if (rflag == YES) break; do_optional_space(); if (do_field_value_pair() == NO) break; if (rflag == YES) break; do_optional_space(); do_close_brace(); if (rflag == YES) break; } while (0); in_string = NO; } #if NEW_STYLE static void enlarge_table(PATTERN_TABLE *table) #else /* K&R style */ static void enlarge_table(table) PATTERN_TABLE *table; #endif /* NEW_STYLE */ { if (table->maximum_size == 0) table->patterns = (MATCH_PATTERN*)malloc(sizeof(MATCH_PATTERN) * TABLE_CHUNKS); else table->patterns = (MATCH_PATTERN*)realloc((char*)table->patterns, sizeof(MATCH_PATTERN) * (table->maximum_size + TABLE_CHUNKS)); /* NB: Sun C++ requires (char*) cast */ if (table->patterns == (MATCH_PATTERN*)NULL) fatal("Out of memory for pattern table space"); table->maximum_size += TABLE_CHUNKS; } #if NEW_STYLE static void /* issue an error message */ error(const char *msg) /* default provided if this is NULL */ #else /* K&R style */ static void error(msg) /* issue an error message */ const char *msg; /* default provided if this is NULL */ #endif /* NEW_STYLE */ { char *p; error_count++; out_flush(); /* flush all buffered output */ at_level = 0; /* suppress further messages */ /* until we have resynchronized */ p = format(msg); (void)fprintf(stdlog,"%s \"%s\", line %ld: %s.\n", ERROR_PREFIX, the_file.input.filename, the_value.input.line_number, p); /* UNIX-style error message for */ /* GNU Emacs M-x compile to parse */ out_status(stdlog, ERROR_PREFIX); (void)fflush(stdlog); out_error(stdout, "\n"); /* make sure we start a newline */ out_error(stdout, ERROR_PREFIX); out_error(stdout, " "); out_error(stdout, p); out_error(stdout, ".\n"); out_status(stdout, ERROR_PREFIX); out_flush(); /* flush all buffered output */ } #if NEW_STYLE static void /* issue an error message and die */ fatal(const char *msg) #else /* K&R style */ static void fatal(msg) /* issue an error message and die */ const char *msg; #endif /* NEW_STYLE */ { (void)fprintf(stdlog,"%s %s\n", ERROR_PREFIX, msg); exit(EXIT_FAILURE); } #if NEW_STYLE static char * /* normalize author names and return */ fix_author(char *author) /* new string from static space */ #else /* K&R style */ static char * fix_author(author) /* normalize author names and return */ char *author; /* new string from static space */ #endif /* NEW_STYLE */ { size_t a; /* index into author[] */ int b_level; /* brace level */ char *p; /* pointer into author[] */ char *pcomma; /* pointer to last unbraced comma in author[] */ static char s[MAX_TOKEN_SIZE]; /* returned to caller */ /* Convert "Smith, J.K." to "J. K. Smith" provided "," and "." are */ /* at brace level 0 */ if (fix_names == NO) return (author); /* Leave untouched entries like: */ /* author = "P. D. Q. Bach (113 MozartStrasse, Vienna, Austria)" */ if (strchr(author,'(') != (char*)NULL) return (author); /******************************************************************* We now have a tricky job. Some names have additional parts, which BibTeX calls "von" and "Jr.". It permits them to be input as (see L. Lamport, ``LaTeX User's Guide and Reference Manual'', pp. 141--142) Brinch Hansen, Per OR Per {Brinch Hansen} Ford, Jr., Henry OR Henry {Ford, Jr.} {Steele Jr.}, Guy L. OR Guy L. {Steele Jr.} von Beethoven, Ludwig OR Ludwig von Beethoven {von Beethoven}, Ludwig OR Ludwig {von Beethoven} The last two lines are NOT equivalent; the first will be alphabetized under Beethoven, and the second under von. Other examples include names like Charles XII, King OR King Charles XII Ford, Sr., Henry OR Henry {Ford, Sr.} Vallee Poussin, C. L. X. J. de la OR C. L. X. J. de la Vallee Poussin van der Waerden, Bartel Leendert OR Bartel Leendert van der Waerden These transformations conform to the general patterns B, A --> A B B C, A --> A B C (von case) B C, A --> A {B C} (Brinch Hansen case) B, C, A --> A {B, C} (Jr. case) A, B, and C represent one or more space-separated words, or brace-delimited strings with arbitrary contents. Notice the conflict: the von case differs from Brinch Hansen in that braces may NOT be inserted when the name is reordered, because this changes the alphabetization. In order to deal with this ambiguity, we supply braces in the "B C, A" case ONLY when the C part matches something like Jr (see the juniors[] table above), or when it looks like a small number in Roman numerals. The latter case is uncommon, and we therefore don't bother to attempt to parse it to determine whether it is a valid number. The "B, C, A" case (multiple level-zero commas) is unambiguous, and can be converted to the form "A {B, C}". *******************************************************************/ for (a = 0, b_level = 0, pcomma = (char*)NULL; author[a]; ++a) { /* convert "Smith, John" to "John Smith" */ switch (author[a]) { case '{': b_level++; break; case '}': b_level--; break; case ',': if (b_level == 0) pcomma = &author[a]; /* remember last unbraced comma */ break; default: break; } } if (pcomma == (char*)NULL) /* no commas, so nothing more to do */ return (author); *pcomma = '\0'; /* terminate "Smith" */ /* have "Smith, J.K." or "Smith, Jr., J.K." */ p = pcomma + 1; SKIP_SPACE(p); (void)strcpy(s,p); /* s <- "J.K." */ (void)strcat(s," "); /* s <- "J.K. " */ if (check_junior(author) == YES) { (void)strcat(s,"{"); (void)strcat(s,author); /* s <- "J.K. {Smith, Jr.}" */ (void)strcat(s,"}"); } else (void)strcat(s,author); /* s <- "J.K. Smith" */ return (strcpy(author,s)); } static void fix_month(VOID) /* convert full month names to macros*/ { /* for better style-file customization */ size_t k; /* index into month_pair[] and s[] */ size_t token_length; /* token length */ const char *p; /* pointer to current_value[] */ char *s = shared_string; /* memory-saving device */ const char *token; /* pointer into current_value[] */ for (p = current_value; (token = month_token(p,&token_length)) != (const char*)NULL; p = (const char*)NULL) { if (token_length == 1) /* just copy single-char tokens */ *s++ = *token; else { for (k = 0; month_pair[k].old_name != (const char*)NULL; ++k) { if ((strlen(month_pair[k].old_name) == token_length) && (strnicmp(month_pair[k].old_name,token,token_length) == 0)) { /* change "January" to jan etc. */ (void)strcpy(s,"\" # "); (void)strcat(s,month_pair[k].new_name); (void)strcat(s," # \""); s = strchr(s,'\0'); token_length = 0; /* so we don't copy twice at loop end */ break; } } /* end for (k = 0, ...) */ (void)strncpy(s,token,token_length); /* no definition, just copy */ s += token_length; } } *s = '\0'; /* supply string terminator */ k = (size_t)(s - shared_string); s = shared_string; if (STREQUAL(&s[k-5]," # \"\"")) s[k-5] = '\0'; /* discard final empty string */ if (strncmp(s,"\"\" # ",5) == 0) (void)strcpy(current_value,&s[5]); /* discard initial empty string */ else (void)strcpy(current_value,s); } static void fix_namelist(VOID) /* normalize list of personal names */ { /* leaving it in global current_value[] */ size_t m; /* index of start of author in current_value[]*/ size_t n; /* length of current_value[], less 1 */ char namelist[MAX_TOKEN_SIZE]; /* working copy of current_value[] */ size_t v; /* loop index into current_value[] */ /* Convert "Smith, J.K. and Brown, P.M." to */ /* "J. K. Smith and P. M. Brown" */ /* We loop over names separated by " and ", and hand each off */ /* to fix_author() */ n = strlen(current_value) - 1; /* namelist = "\"...\"" */ if ((current_value[0] != '"') || (current_value[n] != '"')) /* sanity check */ return; /* not quoted string, may be macro */ (void)strcpy(namelist,"\"");/* supply initial quotation mark */ current_value[n] = (char)'\0'; /* clobber final quotation mark */ for (v = 1, m = 1; v < n; ++v) /* start past initial quotation mark */ { if (strncmp(" and ",¤t_value[v],5) == 0) { current_value[v] = (char)'\0'; (void)strcat(namelist,fix_periods(fix_author(¤t_value[m]))); (void)strcat(namelist," and "); current_value[v] = (char)' '; v += 4; m = v + 1; } else if ((Scribe == YES) && (current_value[v] == ';')) { /* expand semicolons to " and " */ current_value[v] = (char)'\0'; (void)strcat(namelist,fix_periods(fix_author(¤t_value[m]))); (void)strcat(namelist," and "); current_value[v] = (char)' '; m = v + 1; } } (void)strcat(namelist,fix_periods(fix_author(¤t_value[m]))); /* handle last author */ (void)strcat(namelist,"\""); /* supply final quotation mark */ (void)strcpy(current_value,namelist); } static void fix_pages(VOID) { size_t k; /* index into current_value[] */ size_t m; /* index into new_value[] */ char new_value[MAX_TOKEN_SIZE]; /* working copy of new_value[] */ for (m = 0, k = 0; current_value[k]; ++k) { /* squeeze out spaces around hyphens */ /* and convert hyphen runs to en-dashes */ if (current_value[k] == '-') { /* convert hyphens to en-dash */ for ( ; (m > 0) && Isspace(new_value[m-1]) ; ) --m; /* discard preceding spaces */ for ( ; current_value[k+1] == '-'; ) ++k; for ( ; Isspace(current_value[k+1]); ) ++k; /* discard following spaces */ new_value[m++] = (char)'-'; /* save an en-dash */ new_value[m++] = (char)'-'; } else new_value[m++] = current_value[k]; } new_value[m] = (char)'\0'; (void)strcpy(current_value,new_value); } #if NEW_STYLE static char * fix_periods(char *author) #else /* K&R style */ static char * fix_periods(author) char *author; #endif /* NEW_STYLE */ { int b_level; /* brace level */ size_t a; /* index in author[] */ size_t n; /* index in name[] */ char *name = shared_string; /* memory-saving device */ if (fix_initials == NO) return author; /* Convert "J.K. Smith" to "J. K. Smith" if "." at brace level 0 */ for (b_level = 0, a = 0, n = 0; /* NO-OP (exit below) */ ; ++a, ++n) { name[n] = author[a]; /* copy character */ if (author[a] == '\0') break; /* here's the loop exit */ switch (author[a]) { case '{': b_level++; break; case '}': b_level--; break; case '.': if (b_level == 0) { if ((a > 0) && Isupper(author[a-1]) && Isupper(author[a+1])) name[++n] = (char)' '; /* supply space between initials */ } break; } } return (name); } static void fix_title(VOID) /* protect upper-case acronyms */ { YESorNO brace_letter; int b_level; /* brace level */ size_t k; /* index into s[] */ char *s = shared_string; /* memory-saving device */ size_t t; /* index into title[] */ if (current_value[0] != '\"') return; /* leave macros alone */ for (k = 0, b_level = 0, t = 0; current_value[t]; ) { switch (current_value[t]) { case '{': b_level++; s[k++] = current_value[t++]; break; case '}': b_level--; s[k++] = current_value[t++]; break; default: if (b_level > 0) brace_letter = NO; /* already braced, so no changes */ else if (Isupper(current_value[t])) /* maybe brace + */ { /* or */ if (Isupper(current_value[t+1]) || Isdigit(current_value[t+1])) brace_letter = YES; /* XY -> {XY}, X11 -> {X11} */ else if (!Isalpha(current_value[t+1])) { if ((t == 1) && (current_value[t] == 'A')) brace_letter = NO; /* "A gnat" -> "A gnat" */ else brace_letter = YES; /* "The C book" -> "The {C} Book" */ } else brace_letter = NO; /* everything else unchanged */ } else brace_letter = NO; if (brace_letter) { /* Convert XWS to {XWS} and X11 to {X11} */ s[k++] = (char)'{'; while (Isupper(current_value[t]) || Isdigit(current_value[t])) s[k++] = current_value[t++]; s[k++] = (char)'}'; } else s[k++] = current_value[t++]; break; } } s[k] = (char)'\0'; check_length(k); (void)strcpy(current_value,s); if (fix_font_changes == YES) brace_font_changes(); } static void flush_inter_entry_space(VOID) /* standardize to 1 blank line between entries */ { int c; put_back((c = get_next_non_blank())); if (c != EOF) out_newline(); out_newline(); } #if NEW_STYLE static char * format(const char *msg) #else /* K&R style */ static char * format(msg) const char *msg; #endif /* NEW_STYLE */ { /* expand %f, %k, %v, and %% items in msg[], return pointer to new copy */ size_t k; size_t len; size_t n; static char newmsg[MAX_TOKEN_SIZE]; /* static because we return it */ /* Shorthand for writable copy of msg[] with guaranteed NUL termination */ #define ORIGINAL_MESSAGE (strncpy(newmsg,msg,MAX_TOKEN_SIZE), \ newmsg[MAX_TOKEN_SIZE-1] = (char)'\0', newmsg) for (k = 0, n = 0; msg[k]; ++k) { switch (msg[k]) { case '%': /* expect valid format item */ switch (msg[++k]) { case 'e': /* %e -> current_entry_name */ len = strlen(current_entry_name); if ((n + len) >= MAX_TOKEN) return (ORIGINAL_MESSAGE); /* no space left*/ (void)strcpy(&newmsg[n],current_entry_name); n += len; break; case 'f': /* %f -> current_field */ len = strlen(current_field); if ((n + len) >= MAX_TOKEN) return (ORIGINAL_MESSAGE); /* no space left*/ (void)strcpy(&newmsg[n],current_field); n += len; break; case 'k': /* %k -> current_key */ len = strlen(current_key); if ((n + len) >= MAX_TOKEN) return (ORIGINAL_MESSAGE); /* no space left*/ (void)strcpy(&newmsg[n],current_key); n += len; break; case 'v': /* %v -> current_value */ len = strlen(current_value); if ((n + len) >= MAX_TOKEN) return (ORIGINAL_MESSAGE); /* no space left*/ (void)strcpy(&newmsg[n],current_value); n += len; break; case '%': /* %% -> % */ newmsg[n++] = (char)'%'; break; default: return (ORIGINAL_MESSAGE); /* no space left*/ } break; default: if (n >= MAX_TOKEN) return (ORIGINAL_MESSAGE); /* no space left*/ newmsg[n++] = msg[k]; break; } } newmsg[n] = (char)'\0'; /* terminate string */ return (newmsg); } static char * get_braced_string(VOID) { int b_level = 0; /* brace level */ int c; /* current input character */ size_t k; /* index into s[] */ size_t n; /* index into t[] */ char *s = shared_string; /* memory-saving device */ char t[MAX_TOKEN_SIZE]; /* working area for braced string */ for (c = get_char(), k = 0; c != EOF; ) { if ((parbreaks == NO) && (is_parbreak == YES)) { APPEND_CHAR(s,k,c); out_with_parbreak_error(s); return (EMPTY_STRING(s)); } else if (k >= MAX_TOKEN) { APPEND_CHAR(s,k,c); out_with_error(s, "BibTeX string too long for field ``%f''"); return (EMPTY_STRING(s)); } else { if (Isspace(c)) c = ' '; /* change whitespace to real space */ else if (c == '{') b_level++; else if (c == '}') b_level--; s[k++] = (char)c; if (b_level == 0) break; /* here's the loop exit */ c = Isspace(c) ? get_next_non_blank() : get_char(); } } s[k] = (char)'\0'; /* Now convert braced string to quoted string */ for (b_level = 0, k = 0, n = 0; s[k]; ++k) { if (s[k] == '{') b_level++; else if (s[k] == '}') b_level--; if ((s[k] == '"') && (b_level == 1)) /* k > 0 if this is true */ { /* so we can omit that check */ if (s[k-1] == '\\') /* change \"xy to {\"x}y */ n--, t[n++] = (char)'{', t[n++] = (char)'\\', t[n++] = (char)'"', t[n++] = s[++k], t[n++] = (char)'}'; else /* change x" to x{"} */ t[n++] = (char)'{', t[n++] = (char)'"', t[n++] = (char)'}'; } else t[n++] = s[k]; } t[0] = (char)'"'; /* change initial and final */ APPEND_CHAR(t,n-1,'"'); /* braces to quotes */ check_length(n); if (c == EOF) error("End-of-file in braced string"); return (strcpy(s,t)); } static int get_char(VOID) /* all input is read through this function */ { int c; /* NB: this is the ONLY place where the input file is read! */ c = (n_pushback > 0) ? pushback_buffer[--n_pushback] : getc(fpin); the_file.input.byte_position++; /* Adjust global status and position values */ if (c == EOF) eofile = YES; else if (c == '\n') { the_file.input.line_number++; the_file.input.last_column_position = the_file.input.column_position; the_file.input.column_position = 0L; non_white_chars = 0; } else if (!Isspace(c)) { the_file.input.last_column_position = the_file.input.column_position; the_file.input.column_position++; non_white_chars++; } else if (c == '\t') { the_file.input.last_column_position = the_file.input.column_position; the_file.input.column_position = (the_file.input.column_position + 8L) & ~07L; } else { the_file.input.last_column_position = the_file.input.column_position; the_file.input.column_position++; } if (c == '{') brace_level++; else if (c == '}') brace_level--; #if defined(DEBUG) if (fpdebug) (void)fprintf(fpdebug,"[%c] %5ld %4ld %2ld\n", c, the_file.input.byte_position, the_file.input.line_number, the_file.input.column_position); #endif /* defined(DEBUG) */ return (c); } static char * get_digit_string(VOID) { int c; /* current input character */ size_t k; /* index into s[] */ char *s = shared_string; /* memory-saving device */ k = 0; s[k++] = (char)'"'; /* convert to quoted string */ for (c = get_char(); (c != EOF) && Isdigit(c); ) { if (k >= MAX_TOKEN) { APPEND_CHAR(s,k,c); out_with_error(s, "BibTeX string too long for field ``%f''"); return (EMPTY_STRING(s)); } else { s[k++] = (char)c; c = get_char(); } } put_back(c); /* we read past end of digit string */ s[k++] = (char)'"'; /* supply terminating quote */ s[k] = (char)'\0'; check_length(k); return (s); } static char * get_identifier_string(VOID) { int c; /* current input character */ size_t k; /* index into s[] */ char *s = shared_string; /* memory-saving device */ for (c = get_char(), k = 0; (c != EOF) && isidchar(c); ) { if (k >= MAX_TOKEN) { APPEND_CHAR(s,k,c); out_with_error(s, "BibTeX string too long for field ``%f''"); return (EMPTY_STRING(s)); } else { s[k++] = (char)c; c = get_char(); } } put_back(c); /* we read past end of identifier string */ s[k] = (char)'\0'; check_length(k); return (s); } static char * get_inline_comment(VOID) { int c; /* current input character */ size_t k; /* index into s[] */ int newlines; char *s = shared_string; /* memory-saving device */ s[0] = '\0'; c = get_char(); if ((Scribe == NO) && (c == BIBTEX_COMMENT_PREFIX)) { token_start = the_file; /* remember location of token start */ for (s[0] = BIBTEX_COMMENT_PREFIX, c = get_char(), k = 1, newlines = 0; (c != EOF); ) { /* collect up to newline, plus following horizontal space */ if ((newlines == 1) && !Isspace(c)) break; /* here's a loop exit */ if (k >= MAX_TOKEN) { /* split long comments into multiple ones */ s[k++] = '\n'; put_back(c); /* restore lookahead */ c = BIBTEX_COMMENT_PREFIX; /* we put this back too later */ break; /* here's a loop exit */ } if (c == '\n') newlines++; if (newlines > 1) break; /* here's a loop exit */ s[k++] = c; c = get_char(); } s[k] = '\0'; /* terminate token string */ } put_back(c); /* restore lookahead */ return (s); } #if NEW_STYLE static char * get_line(FILE *fp) #else /* K&R style */ static char * get_line(fp) FILE *fp; #endif /* NEW_STYLE */ { /* return a complete line to the caller, discarding backslash-newlines */ /* on consecutive lines, and discarding the final newline. At EOF, */ /* return (char*)NULL instead. */ static char line[MAX_LINE]; static char *p; static char *more; more = &line[0]; line[0] = (char)'\0'; /* must set in case we hit EOF */ while (fgets(more,(int)(&line[MAX_LINE] - more),fp) != (char *)NULL) { p = strchr(more,'\n'); if (p != (char*)NULL) /* did we get a complete line? */ { /* yes */ *p = '\0'; /* zap final newline */ if (*(p-1) == '\\') /* then have backslash-newline */ more = p - 1; /* so get another line */ else /* otherwise have normal newline */ break; /* so return the current line */ } else /* no, return partial line */ break; } return ((line[0] == '\0' && feof(fp)) ? (char*)NULL : &line[0]); } static int get_next_non_blank(VOID) { int c; int ff = 0; int nl = 0; while (((c = get_char()) != EOF) && Isspace(c)) { switch (c) { case '\n': nl++; break; case '\f': ff++; break; } } is_parbreak = ((nl > 1) || (ff > 0)) ? YES : NO; return (c); } static char * get_optional_space(VOID) { int c; /* current input character */ char *s = shared_string; /* memory-saving device */ /* Space tokens are returned as single-character values, because */ /* do_optional_space() pushes them back into the input stream before */ /* calling do_newline() and do_space() for further processing. However, */ /* inline comments are returned as multiple-character values */ c = get_char(); switch (c) { case '\n': /* newline token */ case ' ': /* horizontal space token */ case '\f': case '\r': case '\t': case '\v': s[0] = c; s[1] = '\0'; break; case BIBTEX_COMMENT_PREFIX: /* in-line comment token */ put_back(c); s = get_inline_comment(); break; default: /* not optional space */ put_back(c); s[0] = '\0'; break; } return (s); } static char * get_quoted_string(VOID) { int b_level = 0; /* brace level */ int c; /* current input character */ size_t k; /* index into s[] */ char *s = shared_string; /* memory-saving device */ for (c = get_char(), k = 0; c != EOF; ) { if ((parbreaks == NO) && (is_parbreak == YES)) { APPEND_CHAR(s,k,c); out_with_parbreak_error(s); return (EMPTY_STRING(s)); } else if (k >= MAX_TOKEN) { APPEND_CHAR(s,k,c); out_with_error(s, "BibTeX string too long for field ``%f''"); return (EMPTY_STRING(s)); } else { if (Isspace(c)) c = ' '; /* change whitespace to real space */ else if (c == '{') b_level++; else if (c == '}') b_level--; s[k++] = (char)c; if ((c == '"') && (k > 1) && (b_level == 0)) { if (s[k-2] == '\\') { /* convert \"x inside string at brace-level 0 to {\"x}: */ /* illegal, but hand-entered bibliographies have it */ c = get_char(); if (c != EOF) { k = k - 2; s[k++] = (char)'{'; s[k++] = (char)'\\'; s[k++] = (char)'"'; s[k++] = (char)c; s[k++] = (char)'}'; } } else break; /* here's the loop exit */ } c = Isspace(c) ? get_next_non_blank() : get_char(); } } s[k] = (char)'\0'; check_length(k); if (c == EOF) error("End-of-file in quoted string"); return (s); } static char * get_Scribe_delimited_string(VOID) { int c; int close_delim; size_t k; int last_c = EOF; char *p; char *s = shared_string; /* memory-saving device */ c = get_char(); p = strchr(Scribe_open_delims,c); /* maybe delimited string? */ if (p == (char*)NULL) { APPEND_CHAR(s,0,c); out_with_error(s,"Expected Scribe value string for field ``%f''"); return (EMPTY_STRING(s)); } /* We have a delimited string */ close_delim = Scribe_close_delims[(int)(p - Scribe_open_delims)]; c = get_next_non_blank(); /* get first character in string */ /* ignoring leading space */ for (k = 0, s[k++] = (char)'"'; (c != EOF) && !((last_c != '\\') && (c == close_delim)) && (k < MAX_TOKEN); k++) { if ((parbreaks == NO) && (is_parbreak == YES)) { APPEND_CHAR(s,k,c); out_with_parbreak_error(s); return (EMPTY_STRING(s)); } if (c == '"') /* protect quotes inside string */ { if (s[k-1] == '\\') { /* then TeX accent in Scribe string */ last_c = c; c = get_char(); if (c == '{') /* change \"{ to {\" */ { s[k-1] = (char)'{'; s[k] = (char)'\\'; s[++k] = (char)'"'; } else /* change \". to {\".} (. = any) */ { s[k-1] = (char)'{'; s[k] = (char)'\\'; s[++k] = (char)'"'; s[++k] = (char)c; s[++k] = (char)'}'; } } else { s[k] = (char)'{'; s[++k] = (char)'"'; s[++k] = (char)'}'; } } else if (Isspace(c)) s[k] = (char)' '; /* change whitespace to real space */ else s[k] = (char)c; last_c = c; c = Isspace(c) ? get_next_non_blank() : get_char(); } APPEND_CHAR(s,k,'"'); /* append close delimiter */ if (k >= MAX_TOKEN) { out_with_error(s, "Scribe string too long for field ``%f''"); return (EMPTY_STRING(s)); } check_length(k); return (s); } static char * get_Scribe_identifier_string(VOID) /* read undelimited identifier */ { /* and return quoted equivalent */ int c; size_t k; char *s = shared_string; /* memory-saving device */ c = get_char(); for (k = 0, s[k++] = (char)'"'; isidchar(c) && (k < MAX_TOKEN); k++, c = get_char()) { s[k] = (char)c; } put_back(c); /* put back lookahead */ APPEND_CHAR(s,k,'"'); if (k >= MAX_TOKEN) { out_with_error(s, "Scribe number string too long for field ``%f''"); return (EMPTY_STRING(s)); } ++k; check_length(k); return (s); } static char * get_Scribe_string(VOID) /* read Scribe string */ { int c; do_optional_space(); c = get_char(); /* peek ahead one character */ put_back(c); return (isidchar(c) ? get_Scribe_identifier_string() : get_Scribe_delimited_string()); } static char * get_simple_string(VOID) /* read simple BibTeX string */ { int c; /* current input character */ char *s = shared_string; /* memory-saving device */ c = get_next_non_blank(); /* peek ahead to next non-blank */ if (c == EOF) return (EMPTY_STRING(s)); else if ((parbreaks == NO) && (is_parbreak == YES)) { APPEND_CHAR(s,0,c); out_with_parbreak_error(s); return (EMPTY_STRING(s)); } put_back(c); /* put back lookahead */ token_start = the_file; /* remember location of string start */ if (c == '{') return (get_braced_string()); else if (Isdigit(c)) return (get_digit_string()); else if (c == '"') return (get_quoted_string()); else if (Isalpha(c)) return (get_identifier_string()); else { out_with_error("", "Expected BibTeX value string for field ``%f''"); return (EMPTY_STRING(s)); } } #if NEW_STYLE static char * get_token(char *s, char **nextp, const char *terminators) #else /* K&R style */ static char * get_token(s,nextp,terminators) char *s; char **nextp; const char *terminators; #endif /* NEW_STYLE */ { char *t = s; char *token; /******************************************************************* Ignoring leading space, find the next token in s[], stopping at end-of-string, or one of the characters in terminators[], whichever comes first. Replace the terminating character in s[] by a NUL. Set *nextp to point to the next character in s[], or to (char*)NULL if end-of-string was reached. Return (char*)NULL if no token was found, or else a pointer to its start in s[]. The job is terminated with an error message if a syntax error is detected. Quoted strings are correctly recognized as valid tokens, and returned with their surrounding quotes removed, and embedded escape sequences expanded. The comment character is recognized outside quoted strings, but not inside. *******************************************************************/ if (t != (char*)NULL) SKIP_SPACE(t); if ((t == (char*)NULL) || (*t == '\0') || (*t == COMMENT_PREFIX)) { /* initial sanity check */ t = (char*)NULL; /* save for *nextp later */ token = (char*)NULL; } else if (*t == '"') /* then collect quoted string */ { token = ++t; /* drop leading quote */ for ( ; *t && (*t != '"'); ++t) { /* find ending quote */ /* step over escape sequences; it doesn't matter if we have */ /* \123, since we are only looking for the ending quote */ if (*t == '\\') ++t; } if (*t == '"') /* then found valid string */ { *t++ = '\0'; /* terminate token */ do_escapes(token); /* and expand escape sequences */ } else { (void)fprintf(stdlog, "%s Bad line [%s] in initialization file [%s]\n", ERROR_PREFIX, s, initialization_file_name); exit(EXIT_FAILURE); } } else /* else collect unquoted string */ { for (token = t; *t && (*t != COMMENT_PREFIX) && (strchr(terminators,*t) == (char*)NULL); ++t) NOOP; /* scan over token */ if ((*t == '\0') || (*t == COMMENT_PREFIX)) /* then hit end of s[] */ t = (char*)NULL; /* save for *nextp later */ else /* else still inside s[] */ *t++ = '\0'; /* terminate token */ } *nextp = t; /* set continuation position */ return (token); } #if NEW_STYLE static int isidchar(int c) #else /* K&R style */ static int isidchar(c) int c; #endif /* NEW_STYLE */ { /* See LaTeX User's Guide and Reference Manual, Section B.1.3, for the rules of what characters can be used in a BibTeX word value. Section 30 of BibTeX initializes id_class[] to match this, but curiously, allows ASCII DELete (0x3f), as an identifier character. This irregularity has been reported to Oren Patashnik on [06-Oct-1990]. We disallow it here. The Scribe syntax is simpler: letters, digits, ., #, &, and %. */ return ((Scribe == YES) ? (Isalnum(c) || (c == '.') || (c == '#') || (c == '&') || (c == '%') ) : (Isgraph(c) && (strchr("\"#%'(),={}",c) == (char*)NULL)) ); } #if (OS_PCDOS && (SCREEN_LINES > 0)) #include /* needed for getch() declaration */ static int get_screen_lines(VOID) { return (SCREEN_LINES); } static void kbclose(VOID) { } static keyboard_code_t kbcode(VOID) { int c; c = kbget(); /* get from keyboard without echo */ if ((c == 0) || (c == 0xe0)) /* then have IBM PC function key */ { c = kbget(); /* function key code */ switch (c) /* convert key code to character */ { case 71: /* HOME */ return (KEYBOARD_HOME); case 72: /* UP arrow */ return (KEYBOARD_UP); case 73: /* PGUP */ return (KEYBOARD_PGUP); case 79: /* END */ return (KEYBOARD_END); case 80: /* DOWN arrow */ return (KEYBOARD_DOWN); case 81: /* PGDN */ return (KEYBOARD_PGDN); default: return (KEYBOARD_UNKNOWN); } } else if (c == EOF) return (KEYBOARD_EOF); else return (keymap[(unsigned)c]); } static int kbget(VOID) { return (getch()); } static void kbopen(VOID) { kbinitmap(); } #endif /* (OS_PCDOS && (SCREEN_LINES > 0)) */ #if (OS_UNIX && (SCREEN_LINES > 0)) /* One of HAVE_SGTTY_H, HAVE_TERMIO_H, or HAVE_TERMIOS_H can be defined at compile time. If more than one is set, we use the first one set in that order. Usually, the UNIX_BSD or _POSIX_SOURCE values are sufficient to distinguish between the three cases, and no compile-time setting is necessary. DECstation ULTRIX has all three, making it impossible to use symbols defined in sgtty.h, termio.h, and termios.h to select code fragments below. */ #if !(defined(HAVE_SGTTY_H)||defined(HAVE_TERMIO_H)||defined(HAVE_TERMIOS_H)) #if UNIX_BSD #define HAVE_SGTTY_H 1 #else /* NOT UNIX_BSD */ #if defined(_POSIX_SOURCE) #define HAVE_TERMIOS_H 1 #else /* NOT BSD or POSIX, perhaps its AT&T System V */ #define HAVE_TERMIO_H 1 #endif /* defined(_POSIX_SOURCE) */ #endif /* UNIX_BSD */ #endif /* !(defined(HAVE_SGTTY_H) || defined(HAVE_TERMIO_H) || defined(HAVE_TERMIOS_H)) */ static void reset_terminal ARGS((void)); static void set_terminal ARGS((void)); static FILE *fptty = (FILE*)NULL; /* for kbxxx() functions */ static YESorNO tty_init = NO; /* set to YES if tty_save set */ static void kbclose(VOID) { reset_terminal(); if (fptty != (FILE*)NULL) (void)fclose(fptty); } static keyboard_code_t kbcode(VOID) { int c = kbget(); if (c == EOF) return (KEYBOARD_EOF); else return (keymap[(unsigned)c]); } static int kbget(VOID) { if (fptty != (FILE*)NULL) { (void)fflush(fptty); return (getc(fptty)); } else return (EOF); } static void kbopen(VOID) { kbinitmap(); if ((fptty = tfopen("/dev/tty","r")) != (FILE*)NULL) { set_terminal(); screen_lines = get_screen_lines(); } } #if defined(HAVE_SGTTY_H) #undef HAVE_TERMIO_H #undef HAVE_TERMIOS_H #include #include static struct sgttyb tty_save; /* Berkeley style interface */ static void reset_terminal(VOID) /* restored saved terminal modes */ { if (tty_init == YES) (void)ioctl((int)(fileno(fptty)),(int)TIOCSETP,(char*)&tty_save); } static void set_terminal(VOID) /* set terminal for cbreak input mode */ { struct sgttyb tty; /* Try to put file into cbreak mode for character-at-a-time input */ if (ioctl((int)(fileno(fptty)),(int)TIOCGETP,(char*)&tty) != -1) { tty_save = tty; tty_init = YES; tty.sg_flags &= ~(ECHO | LCASE); tty.sg_flags |= CBREAK; (void)ioctl((int)(fileno(fptty)),(int)TIOCSETP,(char*)&tty); } } #endif /* defined(HAVE_SGTTY_H) */ #if defined(HAVE_TERMIO_H) #undef HAVE_SGTTY_H #undef HAVE_TERMIOS_H #include static struct termio tty_save; /* SVID2 and XPG2 interface */ static void reset_terminal(VOID) /* restore saved modes */ { if (tty_init == YES) (void)ioctl((int)(fileno(fptty)),(int)TCSETAF,(char*)&tty_save); } static void set_terminal(VOID) /* set to cbreak input mode */ { struct termio tty; /* SVID2, XPG2 interface */ if (ioctl((int)(fileno(fptty)),(int)TCGETA,(char*)&tty) != -1) { tty_save = tty; tty_init = YES; tty.c_iflag &= ~(INLCR | ICRNL | ISTRIP | IXON | BRKINT); #if defined(IUCLC) tty.c_iflag &= ~IUCLC; /* absent from POSIX */ #endif /* defined(IUCLC) */ tty.c_lflag &= ~(ECHO | ICANON); tty.c_cc[4] = 5; /* MIN */ tty.c_cc[5] = 2; /* TIME */ (void)ioctl((int)(fileno(fptty)),(int)TCSETAF,(char*)&tty); } } #endif /* HAVE_TERMIO_H */ #if defined(HAVE_TERMIOS_H) #undef HAVE_SGTTY_H #undef HAVE_TERMIO_H #include static struct termios tty_save; /* XPG3, POSIX.1, FIPS 151-1 interface */ static void reset_terminal(VOID) /* restore saved modes */ { if (tty_init == YES) (void)tcsetattr((int)(fileno(fptty)),TCSANOW,&tty_save); } static void set_terminal(VOID) /* set to cbreak input mode */ { struct termios tty; /* XPG3, POSIX.1, FIPS 151-1 interface */ if (tcgetattr((int)(fileno(fptty)),&tty) != -1) { tty_save = tty; tty_init = YES; tty.c_iflag &= ~(INLCR | ICRNL | ISTRIP | IXON | BRKINT); #if defined(IUCLC) tty.c_iflag &= ~IUCLC; /* absent from POSIX */ #endif /* defined(IUCLC) */ tty.c_lflag &= ~(ECHO | ICANON); tty.c_cc[VMIN] = 5; /* MIN */ tty.c_cc[VTIME] = 2; /* TIME */ (void)tcsetattr((int)(fileno(fptty)),TCSANOW,&tty); } } #endif /* defined(HAVE_TERMIOS_H) */ static int get_screen_lines(VOID) /* this must come after terminal header includes! */ { #if defined(TIOCGWINSZ) struct winsize window_size; if (fptty != (FILE*)NULL) { (void)ioctl((int)(fileno(fptty)),(int)TIOCGWINSZ,&window_size); if (window_size.ws_row > 0) return ((int)window_size.ws_row); } #else /* defined(TIOCGWINSZ) */ /* some systems store screen lines in environment variables */ char *p; int n; if (((p = getenv("ROWS")) != (char*)NULL) || ((p = getenv("LINES")) != (char*)NULL)) { n = (int)atoi(p); if (n > 0) return (n); } #endif /* defined(TIOCGWINSZ) */ return (SCREEN_LINES); } #endif /* (OS_UNIX && (SCREEN_LINES > 0)) */ #if (OS_VAXVMS && (SCREEN_LINES > 0)) #include #include #include #include #include #define TTYOPENFLAGS "rb" #define TTYNAME ctermid((char*)NULL) static int status; /* system service status */ static int tt_channel = -1; /* terminal channel for image QIO's */ static int iomask; /* QIO flag mask */ static $DESCRIPTOR(sys_in,"TT:"); /* terminal descriptor */ static struct { unsigned char class; unsigned char type; unsigned short buffer_size; unsigned long tt; unsigned long tt2; } mode_buf,mode_save; #define FAILED(status) (~(status) & 1) /* failure if LSB is 0 */ static int get_screen_lines(VOID) { short flags; short dvtype; short ncols; short nrows = 0; #if defined(__ALPHA) /* I don't know what the OpenVMS replacement for lib$screen_info() is yet */ ncols = 80; nrows = 24; #else (void)lib$screen_info(&flags,&dvtype,&ncols,&nrows); #endif return ((int)((nrows > 0) ? nrows : SCREEN_LINES)); } static void kbclose(VOID) { #if !defined(__ALPHA) (void)sys$qiow(0,tt_channel,IO$_SETMODE,0,0,0, &mode_save,12,0,0,0,0); #endif } static keyboard_code_t kbcode(VOID) { int c = kbget(); return ((c == EOF) ? KEYBOARD_EOF : keymap[(unsigned)c]); } static int kbget(VOID) { int c; #if defined(__ALPHA) return (getchar()); #else status = sys$qiow(0,tt_channel,iomask,0,0,0,&c,1,0,0,0,0); return ((int)(FAILED(status) ? EOF : BYTE_VAL(c))); #endif } static void kbopen(VOID) { kbinitmap(); #if defined(__ALPHA) /* assume stdin is open for now */ #else status = sys$assign(&sys_in,&tt_channel,0,0); if (!FAILED(status)) { (void)sys$qiow(0,tt_channel,IO$_SENSEMODE,0,0,0,&mode_save,12,0,0,0,0); mode_buf = mode_save; mode_buf.tt &= ~TT$M_WRAP; (void)sys$qiow(0,tt_channel,IO$_SETMODE,0,0,0,&mode_buf,12,0,0,0,0); iomask = IO$_TTYREADALL | IO$M_NOECHO; } #endif } #endif /* (OS_VAXVMS && (SCREEN_LINES > 0)) */ #if (SCREEN_LINES > 0) static void kbinitmap(VOID) { (void)memset((void*)&keymap[0],(int)KEYBOARD_UNKNOWN,sizeof(keymap)); keymap[(unsigned)'b'] = KEYBOARD_PGUP; keymap[(unsigned)'B'] = KEYBOARD_PGUP; keymap[(unsigned)META('V')] = KEYBOARD_PGUP; /* Emacs scroll-down */ keymap[(unsigned)'d'] = KEYBOARD_DOWN; keymap[(unsigned)'D'] = KEYBOARD_DOWN; keymap[(unsigned)CTL('N')] = KEYBOARD_DOWN; /* Emacs next-line*/ keymap[(unsigned)'e'] = KEYBOARD_END; keymap[(unsigned)'E'] = KEYBOARD_END; keymap[(unsigned)META('>')] = KEYBOARD_HOME; /* Emacs end-of-buffer */ keymap[(unsigned)'f'] = KEYBOARD_PGDN; keymap[(unsigned)'F'] = KEYBOARD_PGDN; keymap[(unsigned)' '] = KEYBOARD_PGDN; keymap[(unsigned)'\r'] = KEYBOARD_PGDN; keymap[(unsigned)'\n'] = KEYBOARD_PGDN; keymap[(unsigned)CTL('V')] = KEYBOARD_PGDN; /* Emacs scroll-up */ keymap[(unsigned)'h'] = KEYBOARD_HELP; keymap[(unsigned)'H'] = KEYBOARD_HELP; keymap[(unsigned)'?'] = KEYBOARD_HELP; keymap[(unsigned)CTL('H')] = KEYBOARD_HELP; /* Emacs help */ keymap[(unsigned)'\033'] = KEYBOARD_QUIT; /* ESCape gets out */ keymap[(unsigned)'q'] = KEYBOARD_QUIT; keymap[(unsigned)'Q'] = KEYBOARD_QUIT; keymap[(unsigned)'.'] = KEYBOARD_AGAIN; keymap[(unsigned)'r'] = KEYBOARD_AGAIN; keymap[(unsigned)'R'] = KEYBOARD_AGAIN; keymap[(unsigned)CTL('L')] = KEYBOARD_AGAIN; /* Emacs recenter */ keymap[(unsigned)'t'] = KEYBOARD_HOME; keymap[(unsigned)'T'] = KEYBOARD_HOME; keymap[(unsigned)META('<')] = KEYBOARD_HOME; /* Emacs beginning-of-buffer */ keymap[(unsigned)'u'] = KEYBOARD_UP; keymap[(unsigned)'U'] = KEYBOARD_UP; keymap[(unsigned)CTL('P')] = KEYBOARD_UP; /* Emacs previous-line */ } #endif /* (SCREEN_LINES > 0) */ #if NEW_STYLE int main(int argc, char *argv[]) #else /* K&R style */ int main(argc,argv) int argc; char *argv[]; #endif /* NEW_STYLE */ { char *initfile; #if defined(vms) extern char **cmd_lin(); argv = cmd_lin( "", &argc, argv ); #endif /* defined(vms) */ initfile = GETDEFAULT(BIBCLEAN_INI,INITFILE); max_width = 0L; /* reset later */ stdlog = stderr; /* cannot assign at compile time on some systems */ program_name = argv[0]; check_inodes(); #if defined(DEBUG) fpdebug = tfopen("bibclean.dbg", "w"); #endif /* defined(DEBUG) */ the_file.input.filename = ""; the_file.output.filename = "stdout"; do_preargs(argc,argv);/* some args must be handled BEFORE initializations */ if (read_initialization_files == YES) do_initfile(getenv(SYSPATH),initfile); if (read_initialization_files == YES) do_initfile(getenv(USERPATH),initfile); do_args(argc,argv); if (max_width == 0L) /* set default value */ max_width = (prettyprint == YES) ? MAX_WIDTH : LONG_MAX; do_files(argc,argv); #if OS_VAXVMS exit (error_count ? EXIT_FAILURE : EXIT_SUCCESS); #endif /* OS_VAXVMS */ return (error_count ? EXIT_FAILURE : EXIT_SUCCESS); } #if NEW_STYLE static void memmove(void *target, const void *source, size_t n) #else /* K&R style */ static void memmove(target, source, n) void *target; const void *source; size_t n; #endif /* NEW_STYLE */ { char *t; const char *s; t = (char *)target; s = (const char*)source; if ((s <= t) && (t < (s + n))) /* overlap: need right to left copy */ { for (t = ((char *)target) + n - 1, s = ((const char*)source) + n - 1; n > 0; --n) *t-- = *s--; } else /* left to right copy is okay */ { for ( ; n > 0; --n) *t++ = *s++; } } #if (defined(BSD) || defined(__SUNCC__)) && !defined(__NeXT__) #if !__alpha #if NEW_STYLE void* memset(void *target, int value, size_t n) #else /* K&R style */ void* memset(target, value, n) void *target; int value; size_t n; #endif /* NEW_STYLE */ { unsigned char *t = (unsigned char*)target; for ( ; n > 0; --n) *t++ = (unsigned char)value; return (target); } #endif /* !__alpha */ #endif /* (defined(BSD) || defined(__SUNCC__)) && !defined(__NeXT__) */ #if NEW_STYLE const char * month_token(const char *s, size_t *p_len) #else /* K&R style */ const char * month_token(s, p_len) const char *s; size_t *p_len; #endif /* NEW_STYLE */ { /* Return pointer to next token in s[], with its length in *p_len */ /* if s is NULL, the parsing continues from where it was last. */ /* A token is either a sequence of letters, possibly with a terminal */ /* period, or else a single character. Outside quoted strings, all */ /* characters are considered non-letters. This code is vaguely modelled */ /* on Standard C's strtok() function. */ static int b_level = 0; /* remembered across calls */ static YESorNO in_quoted_string = NO; /* remembered across calls */ static const char *last = (const char *)NULL; /* remembered across calls */ const char *token; /* pointer to returned token */ if (s != (const char*)NULL) /* do we have a new s[]? */ { last = s; /* yes, remember it */ b_level = 0; /* and reinitialize state */ in_quoted_string = NO; /* variables */ } for (*p_len = 0, token = last; (last != (const char*)NULL) && *last ; ) { switch (*last) { case '"': if (b_level == 0) in_quoted_string = (in_quoted_string == YES) ? NO : YES; break; case '{': /* '}' for brace balance */ b_level++; break; /* '{' for brace balance */ case '}': b_level--; break; default: break; } if ((in_quoted_string == YES) && Isalpha(*last)) { (*p_len)++; /* count this token character */ ++last; /* collect multi-letter token */ } else if (*last == '.') /* then period ends a token */ { (*p_len)++; /* count this token character */ ++last; break; } else if (*p_len == 0) /* then have 1-char token */ { (*p_len)++; /* count this token character */ ++last; break; } else /* looked ahead too far */ break; } /* end for (*p_len ...) */ return ((*p_len == 0) ? (const char*)NULL : token); } static void new_entry(VOID) /* initialize for new BibTeX @name{...} */ { at_level = 0; brace_level = 0; is_parbreak = NO; rflag = NO; /* already synchronized */ current_entry_name[0] = '\0'; /* empty current_xxx[] strings */ current_field[0] = '\0'; current_key[0] = '\0'; current_value[0] = '\0'; } #if NEW_STYLE static void new_io_pair(IO_PAIR *pair) #else /* K&R style */ static void new_io_pair(pair) IO_PAIR *pair; #endif /* NEW_STYLE */ { new_position(&pair->input); new_position(&pair->output); } #if NEW_STYLE static void new_position(POSITION *position) #else /* K&R style */ static void new_position(position) POSITION *position; #endif /* NEW_STYLE */ { position->byte_position = 0L; position->last_column_position = 0L; position->column_position = 0L; position->line_number = 1L; } static void opt_author(VOID) { static CONST char *author[] = { "Author:\n", "\tNelson H. F. Beebe\n", "\tCenter for Scientific Computing\n", "\tDepartment of Mathematics\n", "\tUniversity of Utah\n", "\tSalt Lake City, UT 84112\n", "\tUSA\n", "\tTel: +1 801 581 5254\n", "\tFAX: +1 801 581 4801\n", "\tEmail: \n", (const char*)NULL, }; out_lines(stdlog, author, NO); } static void opt_check_values(VOID) { check_values = YESorNOarg(); } static void opt_delete_empty_values(VOID) { delete_empty_values = YESorNOarg(); } static void opt_error_log(VOID) { current_index++; if ((stdlog = tfopen(next_option,"w")) == (FILE*)NULL) { fprintf(stderr, "%s cannot open error log file [%s]", WARNING_PREFIX, next_option); fprintf(stderr, " -- using stderr instead\n"); perror("perror() says"); stdlog = stderr; } else check_inodes(); /* stdlog changed */ } static void opt_file_position(VOID) { show_file_position = YESorNOarg(); } static void opt_fix_font_changes(VOID) { fix_font_changes = YESorNOarg(); } static void opt_fix_initials(VOID) { fix_initials = YESorNOarg(); } static void opt_fix_names(VOID) { fix_names = YESorNOarg(); } static void opt_help(VOID) { static CONST char *help_lines[] = { "\nUsage: ", (const char*)NULL, " [ -author ] [ -error-log filename ] [ -help ] [ '-?' ]\n", "\t[ -init-file filename ] [ -max-width width ]\n", "\t[ -[no-]check-values ] [ -[no-]delete-empty-values ]\n", "\t[ -[no-]file-position ] [ -[no-]fix-font-changes ]\n", "\t[ -[no-]fix-initials ] [ -[no-]fix-names ]\n", "\t[ -[no-]par-breaks ] [ -[no-]prettyprint ]\n", "\t[ -[no-]print-patterns ] [ -[no-]read-init-files ]\n", "\t[ -[no-]remove-OPT-prefixes ] [ -[no-]scribe ]\n", "\t[ -[no-]trace-file-opening ] [ -[no-]warnings ] [ -version ]\n", "\t[ outfile\n", "\n", #include "bibclean.h" }; help_lines[1] = program_name; /* cannot have this in initializer */ out_lines(stdlog, help_lines, (screen_lines > 0) ? YES : NO); exit(EXIT_SUCCESS); } static void opt_init_file(VOID) { current_index++; do_initfile((const char*)NULL,next_option); } static void opt_max_width(VOID) { current_index++; max_width = strtol(next_option,(char**)NULL,0); if (max_width <= 0L) /* width <= 0 means unlimited width */ max_width = LONG_MAX; } static void opt_parbreaks(VOID) { parbreaks = YESorNOarg(); } static void opt_prettyprint(VOID) { prettyprint = YESorNOarg(); } static void opt_print_patterns(VOID) { print_patterns = YESorNOarg(); } static void opt_read_init_files(VOID) { read_initialization_files = YESorNOarg(); } static void opt_remove_OPT_prefixes(VOID) { remove_OPT_prefixes = YESorNOarg(); } static void opt_scribe(VOID) { Scribe = YESorNOarg(); } static void opt_trace_file_opening(VOID) { trace_file_opening = YESorNOarg(); } static void opt_version(VOID) { version(); } static void opt_warnings(VOID) { warnings = YESorNOarg(); } static void out_at(VOID) { out_string(TOKEN_AT, "@"); } #if NEW_STYLE static void /* output c, but trim trailing blanks, */ out_c(int c) /* and output buffer if c == EOF */ #else /* K&R style */ static void out_c(c) /* output c, but trim trailing blanks, */ int c; /* and output buffer if c == EOF */ #endif /* NEW_STYLE */ { static int buf_length = 0; static char buf[MAX_BUFFER+1]; /* 1 extra slot for trailing NUL */ the_file.output.byte_position++; if ((c == EOF) || (buf_length >= MAX_BUFFER)) { buf[buf_length] = (char)'\0'; if (buf_length > 0) { (void)fputs(buf,stdout); (void)fflush(stdout); buf_length = 0; } if (c == EOF) return; } if ((prettyprint == NO) && (c != '\n')) { /* need to line wrap? */ if (the_file.output.column_position > (max_width - 2)) { /* output backslash-newline pair, adding the backslash */ /* manually to avoid an infinite loop from out_c('\\') */ the_file.input.last_column_position = the_file.input.column_position; the_file.output.column_position++; buf[buf_length++] = (char)'\\'; out_c('\n'); /* recursive call */ } } switch (c) { case '\n': /* trim trailing spaces */ the_file.output.line_number++; the_file.output.column_position = 0L; while ((buf_length > 0) && (buf[buf_length-1] == ' ')) { the_file.output.byte_position--; buf_length--; } the_file.input.last_column_position = the_file.input.column_position - 1; /* inexact if we trimmed tabs. */ break; case '\t': the_file.input.last_column_position = the_file.input.column_position; the_file.output.column_position = (the_file.output.column_position + 8L) & ~07L; break; case DELETE_CHAR: /* delete a character from the output */ if (buf_length <= 0) /* this should NEVER happen! */ fatal("Internal error: too many output characters deleted"); if (buf[buf_length] == '\n') the_file.output.line_number--; the_file.output.column_position--; /* inexact if tab deleted */ the_file.output.byte_position--; buf_length--; return; /* don't store this character! */ case DELETE_LINE: /* delete back to beginning of line */ while ((buf_length > 0) && (buf[buf_length-1] != '\n')) { buf_length--; the_file.output.byte_position--; } the_file.output.column_position = 0; return; /* don't store this character! */ default: the_file.input.last_column_position = the_file.input.column_position; the_file.output.column_position++; break; } /* end switch (c) */ buf[buf_length++] = (char)c; } static void out_close_brace(VOID) { out_string(TOKEN_RBRACE, "}"); } static void out_comma(VOID) { YESorNO save_wrapping; save_wrapping = wrapping; wrapping = NO; out_string(TOKEN_COMMA, ","); wrapping = save_wrapping; } static void out_complex_value(VOID) { char *s; char *p; /* A complex value may contain concatenated simple strings with */ /* intervening inline comments delimited by BIBTEX_HIDDEN_DELIMITER. */ /* We split it apart and output separate tokens. */ for (s = ¤t_value[0]; *s; ) { p = strchr(s,BIBTEX_HIDDEN_DELIMITER); if (p == (char*)NULL) { out_string((*s == '"') ? TOKEN_VALUE : TOKEN_ABBREV,s); check_length(strlen(s)); return; } *p = '\0'; out_string((*s == '"') ? TOKEN_VALUE : TOKEN_ABBREV,s); check_length(strlen(s)); s = p + 1; p = strchr(s,BIBTEX_HIDDEN_DELIMITER); if (p == (char*)NULL) /* should never happen, but recover safely */ p = strchr(s,'\0'); /* if it does */ *p = '\0'; out_string(TOKEN_INLINE,s); check_length(strlen(s)); s = p + 1; } } static void out_equals(VOID) { if (prettyprint == YES) { out_c(' '); out_c('='); /* standardize to = */ out_c(' '); /* always surround = by spaces */ } else out_token(TOKEN_EQUALS,"="); } #if NEW_STYLE static void out_error(FILE *fpout, const char *s) #else /* K&R style */ static void out_error(fpout, s) FILE *fpout; const char *s; #endif /* NEW_STYLE */ { if (fpout == stdout) /* private handling of stdout so we */ out_s(s); /* can track positions */ else (void)fputs(s,fpout); } static void out_field(VOID) { if (prettyprint == YES) { if (in_string == NO) out_spaces(FIELD_INDENTATION); out_s(current_field); } else out_token((in_string == YES) ? TOKEN_ABBREV : TOKEN_FIELD, current_field); } static void out_flush(VOID) /* flush buffered output */ { out_c(EOF); /* magic value to flush buffers */ } #if NEW_STYLE static void out_input_position(IO_PAIR *pair) #else /* K&R style */ static void out_input_position(pair) IO_PAIR *pair; #endif /* NEW_STYLE */ { out_s("# line "); out_number(pair->input.line_number); out_s(" \""); out_s(pair->input.filename); out_s("\"\n"); } #if NEW_STYLE static void out_lines(FILE *fpout, const char *lines[], YESorNO pausing) #else /* K&R style */ static void out_lines(fpout, lines, pausing) FILE *fpout; const char *lines[]; YESorNO pausing; #endif /* NEW_STYLE */ { int k; #if (SCREEN_LINES > 0) int lines_on_screen; int nlines; if (pausing == YES) { kbopen(); for (nlines = 0; lines[nlines] != (const char*)NULL; ++nlines) NOOP; /* count number of lines */ for (k = 0, lines_on_screen = 0; ; ) { if (lines[k] != (const char*)NULL) { (void)fputs(lines[k], fpout); if (strchr(lines[k],'\n') != (char*)NULL) lines_on_screen++; /* some lines[k] are only partial */ } if ((lines_on_screen == (screen_lines - 2)) || (lines[k] == (const char*)NULL)) { /* pause for user action */ lines_on_screen = 0; screen_lines = get_screen_lines(); /* maybe window resize? */ k = do_more(fpout,k,screen_lines - 2); if (k == EOF) break; /* here's the loop exit */ else if (k == LAST_SCREEN_LINE) k = nlines - (screen_lines - 2); if (k < 0) /* ensure k stays in range */ k = 0; else if (k >= nlines) k = nlines - 1; } else /* still filling current screen */ k++; } /* end for (k...) */ kbclose(); } else /* pausing == NO */ { for (k = 0; lines[k] != (const char*)NULL; k++) (void)fputs(lines[k], fpout); } #else /* NOT (SCREEN_LINES > 0) */ for (k = 0; lines[k] != (const char*)NULL; k++) (void)fputs(lines[k], fpout); #endif /* (SCREEN_LINES > 0) */ } static void out_newline(VOID) { out_string(TOKEN_NEWLINE, "\n"); } #if NEW_STYLE static void out_number(long n) #else /* K&R style */ static void out_number(n) long n; #endif /* NEW_STYLE */ { char number[22]; /* ceil(log10(2^64-1))+1, big enough */ /* for even 64-bit machines */ (void)sprintf(number,"%ld",n); out_s(number); } static void out_open_brace(VOID) { out_string(TOKEN_LBRACE, "{"); } #if NEW_STYLE static void out_other(const char *s) /* output a non-BibTeX string */ #else /* K&R style */ static void out_other(s) const char *s; #endif /* NEW_STYLE */ { if (prettyprint == YES) out_s(s); else { if (Isspace(s[0])) /* do_other() guarantees whole token is whitespace */ out_token(TOKEN_SPACE, s); else if (s[0] == BIBTEX_COMMENT_PREFIX) out_token(TOKEN_INLINE, s); else out_token(TOKEN_LITERAL, s); } } #if NEW_STYLE static void out_position(FILE* fpout, const char *msg, IO_PAIR *the_location) #else /* K&R style */ static void out_position(fpout,msg,the_location) FILE* fpout; const char *msg; IO_PAIR *the_location; #endif /* NEW_STYLE */ { char s[sizeof( " output byte=XXXXXXXXXX line=XXXXXXXXXX column=XXXXXXXXXX")+1]; out_error(fpout, msg); (void)sprintf(s," input byte=%ld line=%ld column=%2ld", the_location->input.byte_position, the_location->input.line_number, the_location->input.column_position); out_error(fpout, s); (void)sprintf(s, " output byte=%ld line=%ld column=%2ld\n", the_location->output.byte_position, the_location->output.line_number, the_location->output.column_position); out_error(fpout, s); } #if NEW_STYLE static void out_s(const char *s) /* output a string, wrapping long lines */ #else /* K&R style */ static void out_s(s) /* output a string, wrapping long lines */ const char *s; #endif /* NEW_STYLE */ { /* The strings s[] has already had runs of whitespace of all kinds collapsed to single spaces. The word_length() function returns 1 more than the actual non-blank word length at end of string, so that we can automatically account for the comma that will be supplied after the string. */ for (; *s; ++s) { switch (*s) { case ' ': /* may change space to new line and indent */ if ((wrapping == YES) && (the_file.output.column_position + 1 + word_length(s+1)) > max_width) wrap_line(); else out_c((unsigned char)*s); break; #if 0 /* It appears that wrapping at these punctuation characters is a bad idea, because it can introduce line breaks in strings which should not be broken, such as file names, and WWW URL fields. */ case '!': /* may wrap after certain punctuation */ case '&': case '+': case ',': case '.': case ':': case ';': case '=': case '?': out_c((unsigned char)*s); if ((wrapping == YES) && (the_file.output.column_position + word_length(s+1)) > max_width) wrap_line(); break; #endif default: /* everything else is output verbatim */ out_c((unsigned char)*s); } } } #if NEW_STYLE static void out_spaces(int n) #else /* K&R style */ static void out_spaces(n) int n; #endif /* NEW_STYLE */ { if (prettyprint == YES) { for (; n > 0; --n) out_c(' '); } /* If we are not prettyprinting, but lexically analyzing, we */ /* cannot use n as a reliable count of spaces, because it is */ /* based on column positions in prettyprinted output. We must */ /* therefore simply discard TOKEN_SPACE from the output stream. */ } #if NEW_STYLE static void out_status (FILE* fpout,const char *prefix) #else /* K&R style */ static void out_status(fpout,prefix) FILE* fpout; const char *prefix; #endif /* NEW_STYLE */ { if (show_file_position == YES) { out_error(fpout, prefix); out_error(fpout, " File positions: input ["); out_error(fpout, the_file.input.filename); out_error(fpout, "] output ["); out_error(fpout, the_file.output.filename); out_error(fpout, "]\n"); out_error(fpout, prefix); out_position(fpout, " Entry ", &the_entry); out_error(fpout, prefix); out_position(fpout, " Value ", &the_value); out_error(fpout, prefix); out_position(fpout, " Current", &the_file); } } #if NEW_STYLE static void out_string(token_t type, const char *token) #else /* K&R style */ static void out_string(type,token) token_t type; const char *token; #endif /* NEW_STYLE */ { if (prettyprint == YES) out_s(token); /* prettyprinted output */ else out_token(type,token); /* lexical analysis output */ } #if NEW_STYLE static void out_token(token_t type, const char *token) /* lexical analysis output */ #else /* K&R style */ static void out_token(type,token) token_t type; const char *token; #endif /* NEW_STYLE */ { char octal[4 + 1]; static long last_line_number = 0L; if (*token == (char)'\0') /* ignore empty tokens */ return; if (last_line_number < token_start.input.line_number) { out_input_position(&token_start); last_line_number = token_start.input.line_number; } out_number((long)type); out_c('\t'); out_s(type_name[(int)type]); out_c('\t'); out_c('"'); for (; *token; ++token) { switch (*token) { case '"': case '\\': out_c('\\'); out_c(*token); break; case '\b': out_c('\\'); out_c('b'); break; case '\f': out_c('\\'); out_c('f'); break; case '\n': out_c('\\'); out_c('n'); break; case '\r': out_c('\\'); out_c('r'); break; case '\t': out_c('\\'); out_c('t'); break; case '\v': out_c('\\'); out_c('v'); break; default: if (Isprint(*token)) out_c((unsigned char)*token); else { (void)sprintf(octal,"\\%03o",BYTE_VAL(*token)); out_s(octal); } break; } } out_c('"'); out_c('\n'); } static void out_value(VOID) { static OPTION_FUNCTION_ENTRY checks[] = { {"author", 6, check_other}, {"chapter", 7, check_chapter}, {"ISBN", 4, check_ISBN}, {"ISSN", 4, check_ISSN}, {"month", 5, check_month}, {"number", 6, check_number}, {"pages", 5, check_pages}, {"volume", 6, check_volume}, {"year", 4, check_year}, {(const char*)NULL, 0, (void (*)(VOID))NULL}, }; static OPTION_FUNCTION_ENTRY fixes[] = { {"author", 6, fix_namelist}, {"editor", 6, fix_namelist}, {"month", 5, fix_month}, {"pages", 5, fix_pages}, {"title", 5, fix_title}, {(const char*)NULL, 0, (void (*)(VOID))NULL}, }; trim_value(); if (in_preamble == NO) { (void)apply_function(current_field,fixes); if ((check_values == YES) && !STREQUAL(current_value,"\"\"")) { if (apply_function(current_field,checks) == NO) check_other(); } if ((remove_OPT_prefixes == YES) && (strncmp(current_field,"OPT",3) == 0) && (strlen(current_field) > (size_t)3) && (strlen(current_value) > (size_t)2)) /* 2, not 0: quotes are included! */ { out_c(DELETE_LINE); memmove(current_field,¤t_field[3], (size_t)(strlen(current_field)-3+1)); /* reduce "OPTname" to "name" */ out_field(); out_equals(); out_spaces((int)(VALUE_INDENTATION - the_file.output.column_position)); } else if ((delete_empty_values == YES) && (STREQUAL(current_value,"\"\""))) { /* 2, not 0, because quotes are included! */ out_c(DELETE_LINE); discard_next_comma = YES; return; } } out_complex_value(); } #if NEW_STYLE static void out_with_error(const char *s, const char *msg) #else /* K&R style */ static void out_with_error(s,msg) /* output string s, error message, and resynchronize */ const char *s; const char *msg; #endif /* NEW_STYLE */ { out_s(s); error(msg); resync(); } #if NEW_STYLE static void out_with_parbreak_error(char *s) #else /* K&R style */ static void out_with_parbreak_error(s) char *s; #endif /* NEW_STYLE */ { out_with_error(s, "Unexpected paragraph break for field ``%f''"); } #if NEW_STYLE static void prt_pattern(const char *fieldname, const char *pattern, const char *message) #else /* K&R style */ static void prt_pattern(fieldname,pattern,message) const char *fieldname; const char *pattern; const char *message; #endif /* NEW_STYLE */ { if (print_patterns == YES) { if ((pattern == (const char*)NULL) || (*pattern == '\0')) (void)fprintf(stdlog, "\nfile=[%s] field=[%-12s] existing patterns discarded\n\n", initialization_file_name, fieldname); else if (message == (char*)NULL) (void)fprintf(stdlog, "file=[%s] field=[%-12s] pattern=[%s]\n", initialization_file_name, fieldname, pattern); else (void)fprintf(stdlog, "file=[%s] field=[%-12s] pattern=[%s] message[%s]\n", initialization_file_name, fieldname, pattern, message); } } #if NEW_STYLE static void put_back(int c) /* put last get_char() value back onto input stream */ #else /* K&R style */ static void put_back(c) /* put last get_char() value back onto input stream */ int c; #endif /* NEW_STYLE */ { if (n_pushback >= MAX_PUSHBACK) { warning("Pushback buffer overflow: characters lost"); return; } pushback_buffer[n_pushback++] = c; the_file.input.byte_position--; /* Adjust status values that are set in get_char() */ if (!Isspace(c)) non_white_chars--; if (c == EOF) eofile = NO; else if (c == '\n') { the_file.input.column_position = the_file.input.last_column_position; the_file.input.line_number--; } else if (c == '\t') the_file.input.column_position = the_file.input.last_column_position; else the_file.input.column_position--; if (c == '{') brace_level--; else if (c == '}') brace_level++; } #if NEW_STYLE static void put_back_string(const char *s) /* put string value back onto input stream */ #else /* K&R style */ static void put_back_string(s) /* put string value back onto input stream */ const char *s; #endif /* NEW_STYLE */ { char *p; for (p = strchr(s,'\0') - 1; p >= s; p--) put_back(*p); } static void resync(VOID) /* copy input to output until new entry met */ { /* and set resynchronization flag */ rflag = YES; do_other(); /* copy text until new entry found */ brace_level = 0; /* might have been non-zero because of errors */ } #if NEW_STYLE char* strdup(const char *s) #else /* K&R style */ char* strdup(s) const char *s; #endif /* NEW_STYLE */ { char *p; p = (char*)malloc(strlen(s)+1); if (p == (char*)NULL) fatal("Out of string memory"); return (strcpy(p,s)); } #if NEW_STYLE int stricmp(const char *s1,const char *s2) #else /* K&R style */ int stricmp(s1, s2) const char *s1; const char *s2; #endif /* STDC */ { #define TOUPPER(c) (Islower(c) ? toupper(c) : (c)) while ((*s1) && (TOUPPER(*s1) == TOUPPER(*s2))) { s1++; s2++; } return((int)(TOUPPER(*s1) - TOUPPER(*s2))); #undef TOUPPER } #if NEW_STYLE int strnicmp(const char *s1, const char *s2, size_t n) #else /* K&R style */ int strnicmp(s1,s2,n) const char *s1; const char *s2; size_t n; #endif /* NEW_STYLE */ { int c1; int c2; /******************************************************************* Compare strings ignoring case, stopping after n characters, or at end-of-string, whichever comes first. *******************************************************************/ for (; (n > 0) && *s1 && *s2; ++s1, ++s2, --n) { c1 = 0xff & (int)(Islower(*s1) ? (int)*s1 : tolower(*s1)); c2 = 0xff & (int)(Islower(*s2) ? (int)*s2 : tolower(*s2)); if (c1 < c2) return (-1); else if (c1 > c2) return (1); } if (n <= 0) /* first n characters match */ return (0); else if (*s1 == '\0') return ((*s2 == '\0') ? 0 : -1); else /* (*s2 == '\0') */ return (1); } #if NEW_STYLE static FILE* tfopen(const char *filename, const char *mode) /* traced file opening */ #else /* K&R style */ static FILE* tfopen(filename,mode) const char *filename; const char *mode; #endif /* NEW_STYLE */ { FILE *fp; fp = FOPEN(filename,mode); if (trace_file_opening == YES) (void)fprintf(stdlog,"%s open file [%s]%s\n", WARNING_PREFIX, filename, (fp == (FILE*)NULL) ? ": FAILED" : ""); return (fp); } static void trim_value(VOID) { /* trim leading and trailing space from current_value[] */ size_t k; size_t n = strlen(current_value); if ((current_value[0] == '"') && Isspace(current_value[1])) { /* then quoted string value with leading space*/ for (k = 1; (k < n) && Isspace(current_value[k]); ++k) NOOP; memmove(¤t_value[1], ¤t_value[k], (size_t)(n + 1 - k)); /* copy includes trailing NULL */ n = strlen(current_value); } if (current_value[n-1] == '"') { for (k = n; (k > 1) && Isspace(current_value[k-2]); --k) NOOP; current_value[k-1] = (char)'"'; current_value[k] = (char)'\0'; } } static void unexpected(VOID) { warning("Unexpected value in ``%f = %v''"); } static void usage(VOID) { static CONST char *usage_lines[] = { "\nUsage: ", (const char*)NULL, " [ -author ] [ -error-log filename ] [ -help ] [ '-?' ]\n", "\t[ -init-file filename ] [ -max-width width ]\n", "\t[ -[no-]check-values ] [ -[no-]delete-empty-values ]\n", "\t[ -[no-]file-position ] [ -[no-]fix-font-changes ]\n", "\t[ -[no-]fix-initials ] [ -[no-]fix-names ]\n", "\t[ -[no-]par-breaks ] [ -[no-]prettyprint ]\n", "\t[ -[no-]print-patterns ] [ -[no-]read-init-files ]\n", "\t[ -[no-]remove-OPT-prefixes ] [ -[no-]scribe ]\n", "\t[ -[no-]trace-file-opening ] [ -[no-]warnings ] [ -version ]\n", "\t[ outfile\n", (const char*)NULL, }; version(); usage_lines[1] = program_name; /* cannot have this in initializer */ out_lines(stdlog, usage_lines, NO); } static void version(VOID) { static CONST char *version_string[] = { BIBCLEAN_VERSION, "\n", #if defined(HOST) || defined(USER) || defined(__DATE__) || defined(__TIME__) "Compiled", #if defined(USER) " by <", USER, #if defined(HOST) "@", HOST, #endif /* defined(HOST) */ ">", #endif /* defined(USER) */ #if defined(__DATE__) " on ", __DATE__, #endif /* defined(__DATE__) */ #if defined(__TIME__) " ", __TIME__, #endif /* defined(__TIME__) */ #if defined(HAVE_PATTERNS) "\nwith native pattern matching", #endif /* defined(HAVE_PATTERNS) */ #if defined(HAVE_RECOMP) || defined(HAVE_REGEXP) "\nwith regular-expression pattern matching", #endif /* defined(HAVE_RECOMP) || defined(HAVE_REGEXP) */ #if defined(HAVE_OLDCODE) "\nwith old matching code", #endif /* defined(HAVE_OLDCODE) */ "\n", #endif /* defined(HOST)||defined(USER)||defined(__DATE__)||defined(__TIME__) */ (const char*)NULL, }; out_lines(stdlog, version_string, NO); } #if NEW_STYLE static void warning(const char *msg) /* issue a warning message to stdlog */ #else /* K&R style */ static void warning(msg) /* issue a warning message to stdlog */ const char *msg; #endif /* NEW_STYLE */ { if (warnings == YES) { out_flush(); /* flush all buffered output */ /* Because warnings are often issued in the middle of lines, we start a new line if stdlog and stdout are the same file. */ (void)fprintf(stdlog,"%s%s \"%s\", line %ld: %s.\n", (stdlog_on_stdout == YES) ? "\n" : "", WARNING_PREFIX, the_file.input.filename, the_value.input.line_number, format(msg)); out_status(stdlog, WARNING_PREFIX); (void)fflush(stdlog); } } #if NEW_STYLE static int word_length(const char *s) /* return length of leading non-blank prefix */ #else /* K&R style */ static int word_length(s) /* return length of leading non-blank prefix */ const char *s; #endif /* NEW_STYLE */ { size_t n; for (n = 0; s[n]; ++n) { if (Isspace(s[n])) break; } return ((int)((s[n] == '\0') ? n + 1 : n)); /* at end of string, return one more than */ /* true length to simplify line wrapping */ } static void wrap_line(VOID) /* insert a new line and leading indentation */ { out_newline(); out_spaces(VALUE_INDENTATION); /* supply leading indentation */ } static YESorNO YESorNOarg(VOID) { return ((strnicmp(current_option+1,"no-",3) == 0) ? NO : YES); } /*********************************************************************** We put this regular expression matching code last because (a) it is not universally available, (b) the 6 macros in the HAVE_REGEXP section can only be defined once, and (c) there are three variants: the old ugly regexp.h interface (HAVE_REGEXP), the new clean regex.h interface (HAVE_RECOMP), and the GNU version (not yet supported here) ***********************************************************************/ /**********************************************************************/ #if defined(HAVE_RECOMP) #if (_AIX || ultrix) /* AIX 370, AIX PS/2, and ULTRIX have these, but no regex.h, sigh... */ #if __cplusplus extern "C" { #endif /* __cplusplus */ char *re_comp ARGS((const char *s_)); int re_exec ARGS((const char *s_)); #if __cplusplus }; #endif /* __cplusplus */ #else /* NOT (_AIX || ultrix) */ #include #endif /* (_AIX || ultrix) */ #if NEW_STYLE static int match_regexp(const char *string,const char *pattern) #else /* K&R style */ static int match_regexp(string,pattern) const char *string; const char *pattern; #endif /* NEW_STYLE */ { if (re_comp(pattern) != (char*)NULL) fatal("Internal error: bad regular expression"); switch (re_exec(string)) { case 1: return (YES); case 0: return (NO); default: fatal("Internal error: bad regular expression"); } return (YES); /* keep optimizers happy */ } #endif /* defined(HAVE_RECOMP) */ /**********************************************************************/ #if defined(HAVE_REGEXP) const char *sp_global; #define ERROR(c) regerr() #define GETC() (*sp++) #define INIT const char *sp = sp_global; #define PEEKC() (*sp) #define RETURN(c) return(c) #define UNGETC(c) (--sp) void regerr(VOID) { fatal("Internal error: bad regular expression"); } #include #if NEW_STYLE static int match_regexp(const char *string,const char *pattern) #else /* K&R style */ static int match_regexp(string,pattern) const char *string; const char *pattern; #endif /* NEW_STYLE */ { char expbuf[MAX_TOKEN_SIZE]; sp_global = string; (void)compile((char*)pattern, (char*)expbuf, (char*)(expbuf + sizeof(expbuf)), '\0'); return (step((char*)string,(char*)expbuf) ? YES : NO); } #endif /* defined(HAVE_REGEXP) */ /**********************************************************************/