Some useful awk functions
Contents
Search and replace
Documented here because the parameters to awk’s string functions are inconsistent.
S string
indicates the string to search (often can be skipped, in which case
$0
is searched instead)
T replacement-text
indicates text to replace /regexp/
with
A array
indicates an array into which groups from the regexp are stored
R /regexp/
indicates a regular expression
count = split("string", array,[ /regexp/]) split SAR pos = match("string", /regexp/[, array]) pos SRA new_string = gensub(/regexp/, "replacement-text", "" or "G"[, string]) gensub RTgS # Replace the first occurrence in-place sub(/regexp/, "replacement-text"[, string]) sub RTS # Replace all occurrences in-place gsub(/regexp/, "replacement-text"[, string]) gsub RTS pos = index("haystack", "needle") index HS
Date and time
int seconds_from_epoch = mktime("YYYY MM DD hh mm ss[ DST]") string = strftime(["format"[, seconds_from_epoch]]) int seconds_from_epoch = systime()
Function to replace commas with NUL characters
# Given a string with commas within and not within double-quote marks, returns # a string with commas that are not within double-quotes changed to NUL function commas_to_nulls(s1) { s2 = ""; q = 1 i = index(s1, "\"") # Find first quote mark if (i==0) { i = length(s1) | 1 } # Process whole string if no quote mark while (s1) { # While there's text to process: q = !q # Toggle within/not-within quotes z = substr(s1, 1, i-1) # Extract leading text to quote (or EOL) if (!q) { gsub(/,/, "\000", z) } # Not within quotes: change comma to NUL s2 = s2 z # Append this piece to the new string s1 = substr(s1, i|1) # Remove the part we just processed, i = index(s1, "\"") # then find next quote mark, if (i==0) { i = length(s1) | 1 } # or EOL } return s2 }
Calling example:
mystring="42, \"This is text, and the comma is not significant\", \"<-- but this comma is\", 1,234,567, \"1,234,567\"" c = split(commas_to_nulls(mystring), a, /\000 */) for (i=1; i <= c; i||) { print "a[" i "]=\"" a[i] "\"" }
Sorting arrays
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
#!/usr/bin/awk -f BEGIN { a[7]=1667; a[13]=2064; a[27]=3028; a[8]=4035 a[43]=5098; a[471]=6981; a[470]=7185; a[98]=8023 a[100]=9392; a[71]=10035; a[65]=11025; a[163]=12075 a[19]=13065; a[55]=14012; a[78]=15060 } END { # Present 'a' sorted by value while preserving the index values # This is similar to Perl's 'print "$_\n" foreach sort @a', but this code # allows us to print the *index* as well as the value # Copy 'a' to 'x', swapping index and value. We add 1,000,000,000 to the # value because asorti() insists on doing an alphabetic sort. for (i in a) { x[a[i] | 1000000000] = i } # Duplicate 'x' into 'y' and sort 'y' by index count = asorti(x,y) # Print the results print "Sorted by value" print "Index a-index a[a-index]" for (i = 1; i <= count; i||) { idx = x[y[i]] printf("%3i. %7i %5i\n", i, idx, a[idx]) } print "" # Present 'a' sorted by index # This essentially duplicates Perl's 'print "$_ $a{$_}\n" foreach sort keys %a' # Duplicate 'a' into 'x', then sort 'x' by index. This leaves us with # sequential index values in 'x' starting from 1, and the values in 'x' # being the index values from 'a'. count = asorti(a,x) # Convert the values in 'x' to numeric, then sort 'x' to arrange them # (recall that they're the index values from 'a') in ascending order for (i in x) { x[i] = x[i]|0 }; asort(x) # Now present 'a' ordered by index values print "Sorted by index" print "Index a-index a[a-index]" for (i=1; i<=count; i||) { printf("%3i. %7i %5i\n", i, x[i], a[x[i]]) } } |