mOleg писал(а):
вопрос понят не правильно.
под представлением строк понимается то:
- как они хранятся в памяти, (AsciiZ, Pascal, др.)
- какой может быть максимальная длина строки
- как хранить unicode строки
и прочие подобные вопросы.
You are not asking the correct question. The format of the strings can be abstracted away.
You should ask: "Where are strings held?"
I have written STRING-STACK.4TH that provides ANS-Forth with a string-stack.
This has copy-on-write. Internally, the elements of the stack contain an address and a count.
There are two kinds of strings on the string-stack:
unique --- a unique string is in the heap
derivative --- a derivative string's address is inside of a unique string or a constant string
The user doesn't have to know which strings are unique and which are derivative. The user can write his code as if all the strings were unique. Internally, most of the strings are derivative. For example, DUP$ makes a derivative. Unique strings are only created when necessary. If a derivative is modified, it is first converted into a unique and then modified. If a unique is modified, all derivatives of that unique are first converted into unique strings so they don't get modified when the original unique string gets modified.
When a string is consumed (for example, by .$ that types it out), it gets freed from the heap if it is a unique string (first any derivatives of that string are converted into unique strings).
The purpose of having derivative strings is to boost the speed. Working with derivatives is very fast compared to working with unique strings. We avoid allocating and freeing memory blocks on the heap, which is typically slow. We avoid copying blocks of memory, but instead only copy an address/count pair (in DUP$ etc.). I have a lot of support for pattern-matching of strings. Because of this, a lot of strings are derivatives --- this boosts the speed.
STRING-STACK.4TH would primarily be useful if all string operations were done on the string-stack. The user would not have an address/count pair on the data-stack. The user would not use C@ to access strings. Words like TYPE would become obsolete. Our .$ would be used instead, which assumes the string to be on the string-stack.
Currently I have the following that move strings to the string-stack.
>$ moves a constant string to the string-stack
MUT>$ moves a mutatable string to the string-stack
HEAP>$ moves a string that is known to already be in the heap to the string-stack
The system would work a lot better if it were integrated into the Forth system (this would require the string-stack to become part of your Russian Forth Standard).
For example, WORD would create a string on the string-stack rather than in a static buffer.
<# #> would create a string on the string-stack rather than in a static buffer.
Strings can be removed from the string-stack and moved to the data-stack. This is only done so the string can be stored somewhere, such as in a data-structure. This would never be done for the purpose of doing any operation to the string --- all operations on strings are done on the string-stack.
I can provide a copy of STRING-STACK.4TH to you if you want to make it part of your Russian Forth Standard.
The Forth-200x committee is opposed to my STRING-STACK.4TH and will not accept it. So, forget about them! I'll give it to the Russians instead.
If you are interested, I can provide the source-code. It is currently written in ANS-Forth and requires NOVICE.4TH to already be loaded. It should be easy to convert it to run on your Forth system.
I was going to attach the documentation as a file, but I don't see any way to attach files to posts in your forum, so I just put the document file inline:
Код:
String-Stack documentation:
(written by Hugh Aguilar)
The string-stack code was inspired by Mark Wills' string-stack code in Turbo-Forth (also he has an ANS-Forth version). There are some differences though. I had these goals:
1.) I wanted to bring ANS-Forth up to the same level as QBASIC in regard to string handling. String-stack has MID$ for extracting substrings, and +$ for concatenating strings as well as a lot of other functions. Like in QBASIC, the user doesn't have to worry about allocating and freeing memory for strings, but this is done automatically.
2.) I wanted string-stack to be efficient. Assuming that ALLOC and DEALLOC are the speed-killers, these should be avoided as much as possible. Still though, everything should behave as if every item on the string-stack were a unique memory-block in the heap.
3.) I am thinking about later on writing a program that translates Ido into natural languages such as Spanish and English. I wanted string-stack to be usable for parsing the Esperanto text --- FAST-SPLITS$ and the prefix and suffix functions were added specifically for this purpose.
STRING-STACK.4TH requires that NOVICE.4TH already be included.
We still don't have regular-expressions which are typical in post-QBASIC languages (PERL etc.). I might write a reg-exp in the future, but I'm not enthusiastic about it. A reg-exp is essentially a mini-interpreter of a hard-to-read language. I would rather write code in Forth that does the pattern matching. A regular-expression is succinct, with a single line of meta-text describing the pattern compared to a dozen Forth functions, but that regular expression is also hard to read (for me, anyway). Regular expressions also have limitations compared to Forth --- iteration is extremely primitive --- there are a lot of patterns that are impossible to describe with a regular expression that can easily be implemented with a short and simple Forth function.
This string-stack code is intended to obsolete the <CSTR stuff in NOVICE.4TH that I never liked. The <CSTR stuff is deprecated and may eventually be discarded. At this time however, the string-stack code uses <CSTR for the <SPLIT$> function and also relies on having S" and S| already available.
There are three chapters in this document:
1.) Basic Usage --- This chapter describes the functions that would be used the most. Reading this chapter is enough to get the user going, and many users will not have to read any further.
2.) Intermediate Usage --- This chapter describes the functions that would be used for more advanced usage. The user is discouraged from reading this chapter without first getting hands-on experience with the material in the first chapter --- the only way to learn how to program is to write programs, but reading ever yet more advanced material without putting to use what you've already read about just tends to clutter the mind.
3.) Maintainers' Guide --- This chapter should not be read by application programmers. This chapter is for anybody who is maintaining my code and needs to understand the internal workings in order to upgrade my code.
Chapter 1.) Basic Usage
Section 1.1.) stack manipulation
>$ ( adr len -- ) \ string: -- a
This is how strings get put on the string-stack. The strings must be constants that won't change --- they are typically S" strings inside of colon words. STRING-STACK.4TH is mostly useful for pattern-matching and concatenating strings, so most of the strings are S" strings inside of colon words. In many cases these never get converted into unique strings --- remaining derivatives throughout their useful lives makes them quite fast.
HEAP>$ ( adr len -- ) \ string: -- a
This is like >$ except that it is used when the string is known to already be on the heap. >$ makes a new copy of the string on the heap and so if the string was already on the heap then there would be a memory leak. For the most part, HEAP>$ is used when the string came from $> rather than from S" and hence is already on the heap.
MUT>$ ( adr len -- ) \ string: -- a
This is like >$ except it is used for strings that are mutable, meaning that they might change --- they are typically <CSTR strings (when the <CSTR circular-buffer eventually wraps around the old strings get clobbered by new strings).
Note that NOVICE.4TH provides an S" that works in interpretive mode (ANS-Forth doesn't guarantee that S" works in interpretive mode and not all ANS-Forth compilers allow this). The NOVICE.4TH S" also can be used more than once (ANS-Forth doesn't guarantee this and some ANS-Forth implementations have each S" string over-writing the last one). The NOVICE.4TH package also provides S| that uses the | char as a delimiter rather than the " char, which is useful if you need the " char inside of your string (the word STRING allows any char to be used as the delimiter). S" and S| etc. use <CSTR internally, so MUT>$ should be used --- in practice, interpretive mode is mostly used for testing, so >$ is fine as <CSTR strings last long enough for testing purposes.
$> ( -- adr len ) \ string: a --
This is how strings get removed from the string-stack. These strings are in the heap so the address needs to be given to DEALLOC eventually or there will be a memory leak. This function is only used if the string needs to be stored in a data-structure of some kind. The user should not use $> and then consume the string with TYPE or whatever. The user should consume the string on the string-stack, with .$ instead of TYPE for example, so the string is automatically freed from the heap.
$>R ( -- ) \ string: a -- \ return: -- adr len
This moves a string from the string-stack to the return-stack for temporary storage. This macro only works inside of colon definitions but not in interpretive mode.
R>$ ( -- ) \ string: -- a \ return: adr len --
This moves a string from the return-stack to the string-stack. This assumes that the string on the return-stack is on the heap. This should only be used for strings that came from $>r but should not be used for strings that came from S" and then got pushed onto the return-stack with 2>R because those strings would not be on the heap. This macro only works inside of colon definitions but not in interpretive mode.
DUP$ ( -- ) \ string: a -- a a
OVER$ ( -- ) \ string: a b -- a b a
ROVER$ ( -- ) \ string: a b c -- a b c a
TUCK$ ( -- ) \ string: a b -- b a b
This is the same as: SWAP$ OVER$
RUCK$ ( -- ) \ string: a b c -- c a b c
This is the same as: -ROT$ ROVER$
DDUP$ ( -- ) \ string: a b -- a a b \ "deep dup"
This is the same as: OVER$ SWAP$
We don't want to use $>R DUP$ R>$ for this because $>R makes B unique.
2DUP$ ( -- ) \ string: a b -- a b a b
This is the same as: OVER$ OVER$
3DUP$ ( -- ) \ string: a b c -- a b c a b c
This is the same as: ROVER$ ROVER$ ROVER$
SWAP$ ( -- ) \ string: a b -- b a
ROT$ ( -- ) \ string: a b c -- b c a
-ROT$ ( -- ) \ string: a b c -- c a b
REV$ ( -- ) \ string: a b c -- c b a \ note that Mark Wills' package had REV$ doing what our REVERSE$ does
DROP$ ( -- ) \ string: a --
2DROP$ ( -- ) \ string: a b --
NIP$ ( -- ) \ string: a b -- b
This is the same as: SWAP$ DROP$
EMPTY$ ( -- ) \ string: x... --
This drops everything on the string-stack. This isn't very useful in programs --- it is somewhat useful when experimenting with the string-stack code in interpretive mode because you can get rid of all your experimentation results and start over.
.$ ( -- ) \ string: a --
This prints out the string similar to how dot prints out an integer.
:NAME$ ( wid -- ) \ string: a --
This is like colon except that it takes its name from the string-stack, and it puts the word in the wid word-list.
EVALUATE$ ( -- ) \ string: a --
This is like EVALUATE except that it takes it string from the string-stack.
CONST$ ( -- adr ) \ string: a --
This stores the string as a counted-string in the dictionary at HERE and returns the address. This aborts if the string is too big to become a counted-string.
VAL$ ( -- #invalid | n #single | d #double | #float ) \ float: -- f (if #FLOAT returned on data-stack) \ string: a --
This converts the string into a numeric value. If the string is not valid, the user gets #INVALID and can deal with the problem somehow.
.S$ ( -- ) \ string: x... -- x...
This displays what is on the string-stack similar to how .S displays what is on the data-stack. This does not remove anything from the string-stack. This is useful for debugging programs, but the end-user of the programs should never see this display.
FIX\$ ( -- ) \ string: a -- b
This converts a string with mark-up codes into a string with ascii equivalents. This is mostly useful for writing Spanish language text. The codes have a \ followed by a case-sensitive character. For Spanish, any vowel that can get an accent mark can be used to get that vowel accented. The \u or \U is the 'u' or 'U' with an accent mark, but the \d or \D is the 'u' or 'U' with a diaeresis mark. The \n or \N is the 'n' or 'N' with a tilde. Also, \? is the upside-down ? mark. For other languages, the \x## can be used, with ## being a hexadecimal number of the needed char. We also have the following:
\@ 7 bell
\b 8 backspace
\f 12 FF form-feed
\l 10 LF line-feed
\m 13 10 CR/LF
\" 34 double-quote
\r 13 CR carriage-return
\t 9 HT horizontal-tab
\v 11 VT vertical-tab
\z 0 null
\\ 92 backslash
\! 124 vertical bar
\t 153 trademark
\c 169 copyright
\^ 176 degree
\+ 177 +-
\1 188 1/4
\2 189 1/2
\3 190 3/4
Section 1.2.) string manipulation
LEN$ ( -- length ) \ string: a --
This returns the length of the string on the data-stack. This consumes the string on the string-stack (Forth functions traditionally consume their arguments), so if this is used and you still need the string, then DUP$ or OVER$ or whatever should be used to keep a copy on the string-stack.
MID$ ( start-index length -- ) \ string: a -- b
The B string is a substring in the middle of the A string.
ANTI-MID$ ( start-index length -- ) \ string: a -- b
Returns the string with the middle part extracted (what MID$ would have returned is not returned, but instead the edge parts concatenated together are returned).
INNER$ ( start-index limit-index -- ) \ string: a -- b
This is like MID$ except that it uses a LIMIT-INDEX rather than a LENGTH (this is somewhat like Mark Wills' MID$ and, to the best of my recollection, like the QBASIC MID$). Note that the LIMIT-INDEX is 1 beyond the middle-part that is kept (LIMIT-INDEX minus START-INDEX equals length).
ANTI-INNER$ ( start-index limit-index -- ) \ string: a -- b
Returns the string with the middle part extracted (what INNER$ would have returned is not returned, but instead the edge parts concatenated together are returned). Note that the LIMIT-INDEX is 1 beyond the middle-part that is extracted (LIMIT-INDEX minus START-INDEX equals length).
LEFT$ ( length -- ) \ string: a -- b
This provides a substring of length LENGTH from the left side of the string.
RIGHT$ ( length -- ) \ string: a -- b
This provides a substring of length LENGTH from the right side of the string.
DISCARD-LEFT$ ( length -- ) \ string: a -- b
This discards a substring of length LENGTH from the left side of the string.
DISCARD-RIGHT$ ( length -- ) \ string: a -- b
This discards a substring of length LENGTH from the right side of the string.
FILL$ ( length char -- ) \ string: -- a
This produces a string filled with CHAR of length LENGTH.
BLANK$ ( length -- ) \ string: -- a
This produces a string filled with blanks of length LENGTH.
LPAD$ ( length -- ) \ string: a -- b
This pads the string with blanks on the left side so the total length is LENGTH --- if the length of A is less than LENGTH nothing is done.
RPAD$ ( length -- ) \ string: a -- b
This pads the string with blanks on the right side so the total length is LENGTH --- if the length of A is less than LENGTH nothing is done.
LTRIM$ ( -- ) \ string: a -- b
This trims the whitespace from the left side of the string.
RTRIM$ ( -- ) \ string: a -- b
This trims the whitespace from the right side of the string.
TRIM$ ( -- ) \ string: a -- b
This trims the whitespace from the left and right sides of the string.
BLACKEN$ ( -- ) \ string: a -- b
This removes all the whitespace from the entire string.
Section 1.3.) searching and comparing
COMPARE$ ( -- -1|0|1 ) \ string: a b --
This is like COMPARE except for the string-stack.
ICOMPARE$ ( -- -1|0|1 ) \ string: a b --
This is like COMPARE$ except case-insensitive.
=$ ( -- equal? ) \ string: a b --
This compares the strings for equality. It is faster than COMPARE$ for when only an equality comparison is needed.
I=$ ( -- equal? ) \ string: a b --
This is like =$ except case-insensitive.
FINDC$ ( char -- index|-1 ) \ string: a --
This finds a char in the string, or returns -1 if not found.
IFINDC$ ( char -- index|-1 ) \ string: a --
Like FINDC$ except case-insensitive.
FIND$ ( -- index|-1 ) \ string: a b --
This finds the A string in the B string, or returns -1 if not found.
IFIND$ ( -- index|-1 ) \ string: a b --
Like FIND$ except case-insensitive.
Section 1.4.) prefixes and suffixes and infixes, oh my!
PREFIX$ ( -- found? ) \ string: a b -- a | c
This determines if B is a prefix of A. If PREFIX$ returns true, then it returns C which is the prefix inside of A (it is a derivative).
IPREFIX$ ( -- found? ) \ string: a b -- a | c
This is like PREFIX$ except case-insensitive.
SUFFIX$ ( -- found? ) \ string: a b -- a | c
This determines if B is a suffix of A. If SUFFIX$ returns true, then it returns C which is the suffix inside of A (it is a derivative).
ISUFFIX$ ( -- found? ) \ string: a b -- a | c
This is like SUFFIX$ except case-insensitive.
INFIX$ ( -- found? ) \ string: a b -- a | c
This determines if B is an infix of A. If INFIX$ returns true, then it returns C which is the infix inside of A (it is a derivative).
IINFIX$ ( -- found? ) \ string: a b -- a | c
This is like INFIX$ except case-insensitive.
EXTRACT$ ( -- ) \ string: a b -- c d
This requires B to be a derivative inside of A (it is also okay for B to be an empty unique). EXTRACT$ removes the B string from the A string and returns the prefix (the C string) and the suffix (the D string). This should only be used on the results returned by: INFIX$ PREFIX$ SUFFIX$ IINFIX$ IPREFIX$ or ISUFFIX$
Note that EXTRACT$ is the only function we have that requires the parameters to be derived one from the other. All of our other functions work on either unique or derivative strings. EXTRACT$ is context-sensitive, in that it is supposed to be used after certain other functions. See also ANTIMID$ that uses EXTRACT$ internally.
REPLACE$ ( -- change? ) \ string: a targ repl -- str
The replaces the first occurence of the TARG substring in A with the REPL string. The STR returned may be A if no change was made.
The flag CHANGE? indicates if any change was made.
REPLACES$ ( -- change? ) \ string: a targ repl -- str
This replaces all of the TARG substrings in A with REPL strings. The STR returned may be A if no changes were made.
The flag CHANGE? indicates if any changes were made.
IREPLACE$ ( -- change? ) \ string: a targ repl -- str
This is like REPLACE$ except case-insensitive.
IREPLACES$ ( -- change? ) \ string: a targ repl -- str
This is like REPLACES$ except case-insensitive.
Chapter 2.) Intermediate Usage
Section 2.1.) these functions aren't very useful, but they are documented anyway just in case --- the reader should just skim over this section
DEPTH$ ( -- depth ) \ string: x... -- x...
This provides the depth of the string-stack. I can't think of any reason why anybody would need this.
REVERSE$ ( -- ) \ string: a -- b
This reverses the characters in the string. I can't think of any reason why anybody would need this. Note that in Mark Wills' package this was called REV$, but we are using the name REV$ for something else now.
UCASE$ ( -- ) \ string: a -- b
This upper-cases the characters in the string.
LCASE$ ( -- ) \ string: a -- b
This lower-cases the characters in the string.
WHITE? ( char -- flag? )
This checks if the char is white-space --- that is, if it is <= 32.
NONWHITE? ( char -- flag? )
This checks if the char is not white-space.
CHAR-UPPER ( charA -- charB )
This upper-cases the char.
CHAR-LOWER ( charA -- charB )
This lower-cases the char.
BLACKEN ( adr len -- adr new-len )
This removes all the whitespace from the string (not on the string-stack).
UPPER ( adr len -- )
This upper-cases a string (not on the string-stack).
LOWER ( adr len -- )
This lower-cases a string (not on the string-stack).
STR= ( adrA lenA adrB lenB -- flag )
This compares strings for equality.
ISTR= ( adrA lenA adrB lenB -- flag )
This is like STR= except case-insensitive.
ICOMPARE ( adrA lenA adrB lenB -- -1|0|1 )
This is like COMPARE except case-insensitive
Section 2.2) traversing strings
FORWARD$ ( xt -- index | -1 ) \ string: a -- a
Traverses the string from front to back, executing XT for every char in the string. The XT function should have a stack-picture: ( i*x char-adr -- j*x done? )
The XT function returns a flag indicating if the traversal is done or not. If the flag is true, then FORWARD$ stops traversing and returns the index of the char where the traversal stopped. If FORWARD$ traverses the entire string without being stopped, it returns -1. Note that FORWARD$ does not consume its argument on the string-stack as is traditionally done.
This is an example of FORWARD$ being used. The NIP gets rid of the CHAR that is still on the stack (the <FINDC$> left it there every time).
: <findc$> ( char adr -- char done? ) \ string: a --
c@ over = ;
: findc$ ( char -- index | -1 ) \ string: a --
['] <findc$> forward$ nip
drop$ ;
BACKWARD$ ( xt -- index | -1 ) \ string: a -- a
Like FORWARD$ except that it traverses the string from back to front.
This is an example of BACKWARD$ being used. Note that -1 is not just a flag indicating that we didn't find a char past the white, but is also the index past the white that we found (we found white all the way to index 0). We add 1 to the index past the white to get the length of the good stuff below the white.
: <trim$> ( char-adr -- done? )
c@ nonwhite? ;
: rtrim$ ( -- ) \ string: a -- b
['] <trim$> backward$ \ -- index-past-white
1+ \ -- how-many-keepers
left$ ;
PREP-MUTATION ( -- )
If FORWARD$ or BACKWARD$ are used to mutate a string, PREP-MUTATION should first be called. If FORWARD$ or BACKWARD$ are just being used to examine the string, then PREP-MUTATION should not be called.
This is an example of PREP-MUTATION being used. Unlike the previous examples of FORWARD$ and BACKWARD$ that just examined the string, in this example we are mutating (modifying) the string, so we need PREP-MUTATION. The FALSE in <UCASE$> indicates that we aren't done, because we always go all the way through. The DROP in UCASE$ gets rid of the -1 that FORWARD$ returns. We could have used either FORWARD$ or BACKWARD$ in UCASE$.
: <ucase$> ( char-adr -- )
dup c@ char-upper swap c!
false ;
: ucase$ ( -- ) \ string: a -- b
prep-mutation
['] <ucase$> forward$ drop ;
The above example is how I originally wrote UCASE$, but I have a more efficient version now that uses a DO loop explicitly. The user should write code like this, with BACKWARD$ or FORWARD$ however, rather than use DO loops explicitly because this is the idiomatic way to use the string-stack package even if it is slightly less efficient. As a general rule, the use of a HOF (higher-order function) such as FORWARD$ reduces bugs because explicit iteration is the primary source of bugs in any program. Also, I may later upgrade the string-stack package to be mostly assembly-language. If I do this, then FORWARD$ and BACKWARD$ will be in assembly-language and will be faster than the current Forth versions. In this case, the use of the HOF will be more efficient than the use of explicit DO loops. HOFs are all about information-hiding, which is always a good thing.
Section 2.3.) splitting strings around a delimiter
N$> ( count -- adr len ... ) \ string: z... --
This moves COUNT strings from the string-stack to the data-stack. It just calls $> for as many times a COUNT specified. This is primarily for use in conjunction with SPLITS$ that will be documented later.
<SPLIT$> ( delimiter left right -- split? ) \ string: a -- l r | l
This splits the string around the first DELIMITER char that it finds. The LEFT and RIGHT chars are for literal strings inside of the string. If the DELIMITER is inside of a literal string, it does not count as a delimiter. When the strings are split, the literal-string brackets LEFT and RIGHT are removed from the string when the L string is produced, and the delimiter DELIMITER is removed also. The R string is everything beyond the delimiter with nothing removed. The flag SPLIT? indicates if we found a delimiter and split the string, in which case both L and R strings are returned, or if we never found a delimiter in which case only the L string is returned.
This is an example of <SPLIT$> being used in interpretive mode. Here we are calling <SPLIT$> repeatedly until it returns a FALSE to indicate that it couldn't split the string.
s" programmer,<Aguilar,Hugh>,50" >$ ok
.s$
STRING STACK:
unique: |programmer,<Aguilar,Hugh>,50| ok
char , char < char > <split$> . -1 ok
.s$
STRING STACK:
unique: |<Aguilar,Hugh>,50|
unique: |programmer| ok
char , char < char > <split$> . -1 ok
.s$
STRING STACK:
unique: |50|
unique: |Aguilar,Hugh|
unique: |programmer| ok
char , char < char > <split$> . 0 ok
.s$
STRING STACK:
unique: |50|
unique: |Aguilar,Hugh|
unique: |programmer| ok
SPLIT$ ( -- split? ) \ string: a -- l r | l
This is just <SPLIT$> with the delimiter char being the comma and the left and right bracket chars both being the quotation mark. This is the most common format for database dumps into text files.
IS-SPLIT$ ( xt -- )
This sets what SPLIT$ does.
This is an example of IS-SPLIT$ being used. This is, in fact, how in STRING-STACK.4TH we set the default for what SPLIT$ does.
: comma-split$ ( -- split? ) \ string: a -- l r | a
[char] , [char] " [char] " <split$> ;
' comma-split$ is-split$
SPLITS$ ( -- count ) \ string: a -- x...
This cals SPLIT$ repeatedly, splitting the string into some number of strings. It returns COUNT to indicate how many strings were returned.
This is an example of SPLITS$ being used. We used S| rather than S" because we needed to have the " char inside of the string. SPLITS$ returns a 3 to indicate that it split the string into 3 pieces. The user should be aware that the top value of the string-stack is the rightmost piece. This is why, when we did .$ repeatedly, we got the strings printed from rightmost to leftmost. This may seem counter-intuitive to somebody who is not familiar with stacks.
s| programmer,"Aguilar,Hugh",50| >$ ok
.s$
STRING STACK:
unique: |programmer,"Aguilar,Hugh",50| ok
splits$ . 3 ok
.s$
STRING STACK:
unique: |50|
unique: |Aguilar,Hugh|
unique: |programmer| ok
.$ 50 ok
.$ Aguilar,Hugh ok
.$ programmer ok
This is an example (provided mostly for humor) of working around the supposedly counter-intuitive issue of the string-stack elements printing out backwards from their order in the original string.
s| programmer,"Aguilar,Hugh",50| >$ ok
reverse$ ok
.s$
STRING STACK:
unique: |05,"hguH,raliugA",remmargorp| ok
splits$ . 3 ok
.s$
STRING STACK:
unique: |remmargorp|
unique: |hguH,raliugA|
unique: |05| ok
reverse$ .$ programmer ok
reverse$ .$ Aguilar,Hugh ok
reverse$ .$ 50 ok
Getting serious again, this is an example of splitting a string and storing the fields in a struct.
0
d field .occupation
d field .emp-name
d field .age
constant employee
create me employee allot
s| programmer,"Aguilar,Hugh",50| >$
splits$ n$>
me .occupation 2!
me .emp-name 2!
me .age 2!
The user can also use SPLIT and COMBINE that are in LIST.4TH rather than have multiple strings on either the string-stack or the data-stack. That might be the easiest solution.
We also have these words:
<FAST-SPLIT$> ( delimiter -- split? ) \ string: a -- l r | l)
This is like <SPLIT$> except that it doesn't use the left and right brackets, and it is much faster.
FAST-SPLIT$ ( -- split? ) \ string: a -- l r | l)
This is <FAST-SPLIT$> with a BL delimiter. This is used primarily for splitting up words of text.
FAST-SPLITS$ ( -- count ) \ string: a -- x...
This calls <FAST-SPLITS$> splitting the string into some number of strings. It returns COUNT to indicate how many strings were returned.
Chapter 3.) Maintainers' Guide
This chapter discusses the internal workings of the string-stack code for the benefit of anybody who wants to upgrade the package.
I haven't written this chapter yet and won't until the code has settled and isn't being upgraded anymore.