Автор |
Сообщение |
|
|
Заголовок сообщения: |
Re: представление строк |
|
|
Hishnik писал(а): HughAguilar писал(а): I am primarily interested in my own Straight Forth design, but there seems to be no Russian interest in that. 'Russian Forth Standard' is an idea that mimics 'adult' ANS-Forth. After years of blind repetition of existing approaches, people who afraid to be an absolute independent simple copies behavior they can see. If Forth Inc has Standard, they must have Standard too. This is too easy and will not lead to a kind of 'commercial success' or 'hundreds followers'. Straight Forth is not mimicking ANS-Forth! ANS-Forth was a marketing gimmick from Forth Inc. --- it was an attempt by Elizabeth Rather to convince the world that Forth Inc. sets the standards for all Forth programmers, and that every Forth programmer must kneel for her. It is not my goal with Straight Forth to force every Forth programmer to be kneel for me. I don't want Forth programmers to kneel for anybody. My goal is to allow Forth programmers to write Forth programs that are portable between different Forth systems from different vendors. My goal is to allow general-purpose code-libraries to be written that can be used by everybody. Certainly, having code-libraries available is necessary for programs to be written quickly. No employer has the time or the money to allow the employee to build every program from the ground up as if nothing similar had ever been done before. Most programs have a lot of similarity. A general-purpose code-library can be useful in many different programs, because they are all similar. Forth programmers, both Russian and English-speaking, worry that I am trying to prevent innovation. I'm not trying to prevent innovation though (to see that crime being committed, go to the Forth-200x mailing-list). At least 90% of programs written for desktop-computers don't need to be written with innovation --- they need to be written with speed. Straight Forth is for these straight-forward programs that need to be written quickly. If you need to be innovative, then you can abandon Straight Forth and write non-standard Forth code specific to one particular compiler. Straight Forth is for intermediate-level Forth programming. 90% of the time, an advanced Forth programmer can write his program using Straight Forth and using intermediate-level programming techniques. Just because you are advanced, doesn't mean that every program you write needs to be written with advanced-level programming techniques. Maybe I will change the name. Instead of Straight Forth I will call it: Intermediate-Level Portable Forth. Straight Forth is for 64-bit desktop computers. It is not for micro-controllers. Straight Forth is intended to support Forth cross-compilers that target micro-controllers, but Straight Forth itself does not run on micro-controllers. Programs on micro-controllers are not going to be portable because of I/O dependency, so there is no need for a Standard --- the purpose of a Standard is to allow code to be portable. Most advanced-level programming is done on micro-controllers. If you want to be an advanced-level Forth programmer every day of the week, and you have too much pride to write an intermediate-level program, then Straight Forth is not for you --- focus on micro-controllers.
[quote="Hishnik"][quote="HughAguilar"]I am primarily interested in my own Straight Forth design, but there seems to be no Russian interest in that.[/quote] 'Russian Forth Standard' is an idea that mimics 'adult' ANS-Forth. After years of blind repetition of existing approaches, people who afraid to be an absolute independent simple copies behavior they can see. If Forth Inc has Standard, they must have Standard too. This is too easy and will not lead to a kind of 'commercial success' or 'hundreds followers'. [/quote] Straight Forth is not mimicking ANS-Forth! ANS-Forth was a marketing gimmick from Forth Inc. --- it was an attempt by Elizabeth Rather to convince the world that Forth Inc. sets the standards for all Forth programmers, and that every Forth programmer must kneel for her.
It is not my goal with Straight Forth to force every Forth programmer to be kneel for me. I don't want Forth programmers to kneel for anybody.
My goal is to allow Forth programmers to write Forth programs that are portable between different Forth systems from different vendors. My goal is to allow general-purpose code-libraries to be written that can be used by everybody. Certainly, having code-libraries available is necessary for programs to be written quickly. No employer has the time or the money to allow the employee to build every program from the ground up as if nothing similar had ever been done before. Most programs have a lot of similarity. A general-purpose code-library can be useful in many different programs, because they are all similar.
Forth programmers, both Russian and English-speaking, worry that I am trying to prevent innovation. I'm not trying to prevent innovation though (to see that crime being committed, go to the Forth-200x mailing-list). At least 90% of programs written for desktop-computers don't need to be written with innovation --- they need to be written with speed. Straight Forth is for these straight-forward programs that need to be written quickly. If you need to be innovative, then you can abandon Straight Forth and write non-standard Forth code specific to one particular compiler.
Straight Forth is for intermediate-level Forth programming. 90% of the time, an advanced Forth programmer can write his program using Straight Forth and using intermediate-level programming techniques. Just because you are advanced, doesn't mean that every program you write needs to be written with advanced-level programming techniques.
Maybe I will change the name. Instead of Straight Forth I will call it: Intermediate-Level Portable Forth. :wink:
Straight Forth is for 64-bit desktop computers. It is not for micro-controllers. Straight Forth is intended to support Forth cross-compilers that target micro-controllers, but Straight Forth itself does not run on micro-controllers. Programs on micro-controllers are not going to be portable because of I/O dependency, so there is no need for a Standard --- the purpose of a Standard is to allow code to be portable.
Most advanced-level programming is done on micro-controllers. If you want to be an advanced-level Forth programmer every day of the week, and you have too much pride to write an intermediate-level program, then Straight Forth is not for you --- focus on micro-controllers.
|
|
|
|
Добавлено: Ср дек 19, 2018 00:09 |
|
|
|
|
|
Заголовок сообщения: |
Re: представление строк |
|
|
KPG писал(а): HughAguilar писал(а): I am primarily interested in my own Straight Forth design, but there seems to be no Russian interest in that. Old Straight Forth is here? Novice ForthNo --- the novice package is written in ANS-Forth --- that has nothing to do with Straight Forth. I have a new novice-package that includes more features: STRING-STACK.4TH , diisambiguifiers, early-binding MACRO: , SYNONYM , <SWITCH , a merge-sort for lists, an OOP package, etc.. It is significantly better than what you I posted on forth.org in 2010. I haven't posted it publicly. Everybody hated my novice-package that I posted in 2010 and I was attacked on comp.lang.forth for this --- I have no further interest in supporting ANS-Forth.
[quote="KPG"][quote="HughAguilar"]I am primarily interested in my own Straight Forth design, but there seems to be no Russian interest in that.[/quote] Old Straight Forth is here? [url=http://www.forth.org/novice.html]Novice Forth[/url] [/quote] No --- the novice package is written in ANS-Forth --- that has nothing to do with Straight Forth.
I have a new novice-package that includes more features: STRING-STACK.4TH , diisambiguifiers, early-binding MACRO: , SYNONYM , <SWITCH , a merge-sort for lists, an OOP package, etc.. It is significantly better than what you I posted on forth.org in 2010. I haven't posted it publicly. Everybody hated my novice-package that I posted in 2010 and I was attacked on comp.lang.forth for this --- I have no further interest in supporting ANS-Forth.
|
|
|
|
Добавлено: Вт дек 18, 2018 23:38 |
|
|
|
|
|
Заголовок сообщения: |
Re: представление строк |
|
|
HughAguilar писал(а): I am primarily interested in my own Straight Forth design, but there seems to be no Russian interest in that. Old Straight Forth is here? Novice ForthP.S. Прикрепление файлов к сообщению на форуме периодически ломается. Обращал на это неоднократное внимание.
[quote="HughAguilar"]I am primarily interested in my own Straight Forth design, but there seems to be no Russian interest in that.[/quote] Old Straight Forth is here? [url=http://www.forth.org/novice.html]Novice Forth[/url]
P.S. Прикрепление файлов к сообщению на форуме периодически ломается. Обращал на это неоднократное внимание. :)
|
|
|
|
Добавлено: Чт дек 06, 2018 13:28 |
|
|
|
|
|
Заголовок сообщения: |
Re: представление строк |
|
|
Almost every forter has its own forth-system. And each system has its own ideology of construction and interaction. What leads to the standard? The point then is to write different systems if they are almost the same? Here we can draw a parallel with genetics. The greater the variety, the greater the stability of the race. And opposite reduction of the mutation makes the race weak. Standards on the FORTH with some degree of conditionality may reduce the applicability of the Forth systems. But in genetics, this is equivalent to bad inbreeding (incest) and/or genocide.
About a stack of strings. It's useful. But the need for such a mechanism does not always arise. For example, in a forth-server stack of strings will good, and without a floating point you can don't. Possible and Vice versa.
In my forth-system (Nova) now 2 "non-standard" useful words: SPLIT LAST-CHAR But they needs for me. Forth-systems can live without them. And the standard is the imposition of "what is and what is not".
Almost every [b]forter has its own forth-system.[/b] And each system has[b] its own ideology[/b] of construction and interaction. What leads to the standard? The point then is to write different systems if they are almost the same? Here we can draw a parallel with genetics. The greater the variety, the [b]greater the stability[/b] of the race. And opposite reduction of the mutation makes the race weak. Standards on the FORTH with some degree of conditionality may reduce the applicability of the Forth systems. But in genetics, this is equivalent to [b]bad inbreeding[/b] (incest) and/or [b]genocide[/b].
About a stack of strings. It's useful. But the need for such a mechanism does not always arise. For example, in a forth-server stack of strings will good, and without a floating point you can don't. Possible and Vice versa.
In my forth-system (Nova) now 2 "non-standard" useful words: SPLIT LAST-CHAR But they [b]needs for me[/b]. Forth-systems can [b]live without them[/b]. And the standard is the imposition of "what is and what is not".
|
|
|
|
Добавлено: Чт дек 06, 2018 10:48 |
|
|
|
|
|
Заголовок сообщения: |
Re: представление строк |
|
|
HughAguilar писал(а): I am primarily interested in my own Straight Forth design, but there seems to be no Russian interest in that. I'm interested in more Forths in the world... HughAguilar писал(а): there is no group of Russians interested in a Russian Forth standard either 'Russian Forth Standard' is an idea that mimics 'adult' ANS-Forth. After years of blind repetition of existing approaches, people who afraid to be an absolute independent simple copies behavior they can see. If Forth Inc has Standard, they must have Standard too. This is too easy and will not lead to a kind of 'commercial success' or 'hundreds followers'. There is GOST 19.506 describing the way programming language must be represented. This is not a standard for any specific language but an official requirement to any language documentation. It is not complex and will not lead to acceptance by all. Bad idea for Russian Forth Standard is 'we need to issue it and all will be forced to use it'. It is no more than crap. Officially, a document issued in according to GOST 19.506 may be applied to software development, but a link to ANS is useless in Russia and cannot be count as a Standard reference. So there is no way to take over all forthers HughAguilar писал(а): Are you telling me that it is easy to post files on this forum? I don't see a way to do it. Perhaps not all users are allowed to do this.
Take a look on "Выберите файл" (Choose file to add) and "Добавить файл" (Add file) buttons below the main text form whre you typing the message.
[quote="HughAguilar"]I am primarily interested in my own Straight Forth design, but there seems to be no Russian interest in that.[/quote] I'm interested in more Forths in the world... :) [quote="HughAguilar"]there is no group of Russians interested in a Russian Forth standard either[/quote] 'Russian Forth Standard' is an idea that mimics 'adult' ANS-Forth. After years of blind repetition of existing approaches, people who afraid to be an absolute independent simple copies behavior they can see. If Forth Inc has Standard, they must have Standard too. This is too easy and will not lead to a kind of 'commercial success' or 'hundreds followers'.
There is GOST 19.506 describing the way programming language must be represented. This is not a standard for any specific language but an official requirement to any language documentation. It is not complex and will not lead to acceptance by all. Bad idea for Russian Forth Standard is 'we need to issue it and all will be forced to use it'. It is no more than crap. Officially, a document issued in according to GOST 19.506 may be applied to software development, but a link to ANS is useless in Russia and cannot be count as a Standard reference. So there is no way to take over all forthers :)
[quote="HughAguilar"]Are you telling me that it is easy to post files on this forum? I don't see a way to do it. Perhaps not all users are allowed to do this. [/quote]
Take a look on "Выберите файл" (Choose file to add) and "Добавить файл" (Add file) buttons below the main text form whre you typing the message.
|
|
|
|
Добавлено: Чт дек 06, 2018 09:50 |
|
|
|
|
|
Заголовок сообщения: |
Re: представление строк |
|
|
mOleg писал(а): HughAguilar писал(а): I can provide a copy of STRING-STACK.4TH to you if you want to make it part of your Russian Forth Standard. спасибо большое за ваш развернутый ответ! К сожалению, русский стандарт в перспективе не проглядывается (пока?) - нет заинтересованной группы, а в одиночку такие вещи не делаются. Но, я очень буду рад включению любых интересных библиотек в мою форт систему, если, конечно, вам это интересно If there is no group of Russians interested in a Russian Forth standard, then I won't bother to provide the source-code for my STRING-STACK.4TH package. In the past, I thought that donating code, such as my STRING-STACK.4TH package, to the ANS-Forth community was a good idea. Now I realize that I was just value-adding to a Forth Inc. product and getting nothing in return. The same would be true of donating my code to your personal Forth system, or anybody else's personal Forth system. I consider ANS-Forth to be a marketing gimmick of Forth Inc. --- it does not represent any significant portion of the Forth community --- most of the ANS-Forth promoters don't know how to program in Forth at all, so they aren't really a part of the Forth community. I want to develop a Forth standard in competition with ANS-Forth and Forth-200x, both of which I consider to be a negative contribution to Forth. Truly, ANS-Forth is what killed widespread interest in Forth. In 1994 there was a mass exodus of Forth programmers who moved to C programming. I am primarily interested in my own Straight Forth design, but there seems to be no Russian interest in that. I would be okay with a Russian design if it were reasonably good (almost anything would be better than ANS-Forth), but you are telling me now that there is no group of Russians interested in a Russian Forth standard either. Victor__v seems to think that a Russian Forth Standard is "absolute evil" --- apparently, we can count him out... mOleg писал(а): HughAguilar писал(а): I was going to attach the documentation as a file, but I don't see any way to attach files to posts in your forum, so I just put the document file inline: тут так и делается, все правильно сделано. This didn't translate into English very well --- your words didn't make any sense. Are you telling me that it is easy to post files on this forum? I don't see a way to do it. Perhaps not all users are allowed to do this.
[quote="mOleg"] [quote="HughAguilar"]I can provide a copy of STRING-STACK.4TH to you if you want to make it part of your Russian Forth Standard.[/quote] спасибо большое за ваш развернутый ответ! К сожалению, русский стандарт в перспективе не проглядывается (пока?) - нет заинтересованной группы, а в одиночку такие вещи не делаются. Но, я очень буду рад включению любых интересных библиотек в [url=http://fforum.winglion.ru/viewforum.php?f=25]мою форт систему[/url], если, конечно, вам это интересно 8) [/quote] If there is no group of Russians interested in a Russian Forth standard, then I won't bother to provide the source-code for my STRING-STACK.4TH package. In the past, I thought that donating code, such as my STRING-STACK.4TH package, to the ANS-Forth community was a good idea. Now I realize that I was just value-adding to a Forth Inc. product and getting nothing in return. The same would be true of donating my code to your personal Forth system, or anybody else's personal Forth system. I consider ANS-Forth to be a marketing gimmick of Forth Inc. --- it does not represent any significant portion of the Forth community --- most of the ANS-Forth promoters don't know how to program in Forth at all, so they aren't really a part of the Forth community.
I want to develop a Forth standard in competition with ANS-Forth and Forth-200x, both of which I consider to be a negative contribution to Forth. Truly, ANS-Forth is what killed widespread interest in Forth. In 1994 there was a mass exodus of Forth programmers who moved to C programming. I am primarily interested in my own Straight Forth design, but there seems to be no Russian interest in that. I would be okay with a Russian design if it were reasonably good (almost anything would be better than ANS-Forth), but you are telling me now that there is no group of Russians interested in a Russian Forth standard either. Victor__v seems to think that a Russian Forth Standard is "absolute evil" --- apparently, we can count him out...
[quote="mOleg"] [quote="HughAguilar"]I was going to attach the documentation as a file, but I don't see any way to attach files to posts in your forum, so I just put the document file inline:[/quote] тут так и делается, все правильно сделано. [/quote] This didn't translate into English very well --- your words didn't make any sense. Are you telling me that it is easy to post files on this forum? I don't see a way to do it. Perhaps not all users are allowed to do this.
|
|
|
|
Добавлено: Чт дек 06, 2018 07:53 |
|
|
|
|
|
Заголовок сообщения: |
Re: представление строк |
|
|
Код: Russian Forth Standard Forth standart as russian post. Is absolutly evil
[code]Russian Forth Standard[/code] Forth standart as russian post. Is absolutly evil
|
|
|
|
Добавлено: Пн дек 03, 2018 17:59 |
|
|
|
|
|
Заголовок сообщения: |
Re: представление строк |
|
|
mOleg писал(а): HughAguilar писал(а): The format of the strings can be abstracted away. могут, но не в стандарте (а тема, вроде как о стандарте на язык), а, значит, вопрос хранения обойти не получится. To some extent, the format of the strings can be abstracted away. The user doesn't directly access the characters using C@ or @ --- he uses FORWARD$ and BACKWARD$ to traverse the strings, obtaining one character at a time. That character can be defined as being UTF-32. This doesn't mean the string is UTF-32 though. The string is more likely UTF-8, which is more compact. The issue of storage is circumvented because the user has an API between him and the string --- that is information hiding! I don't actually care if it is UTF-32 or one of the 8-bit extended-ASCII formats. My STRING-STACK.4TH provides a way to work with strings, but it is not dependent upon any particular format. UTF-32 is needed for languages such as Russian, Vietnamese, etc., that use alien alphabets. For English and Spanish, 8-bit extended-ASCII is adequate. mOleg писал(а): HughAguilar писал(а): I have written STRING-STACK.4TH that provides ANS-Forth with a string-stack. очень интересно, но, подойдет ли ваша библиотека для ядра форта, предназначенного для маленького контроллера? No, my STRING-STACK.4TH is not for a small micro-controller. It relies on having a heap, and small micro-controllers don't generally have a heap. By "small micro-controller" I'm thinking of an 8051 derivative with maybe 2KB of RAM at the most. It might be useful on a large micro-controller that has enough memory to support a heap (maybe 8KB or RAM at a minimum, but more likely 64KB). An example would be a POS (point-of-sale) terminal or anything else with a keyboard and video display. I wrote STRING-STACK.4TH for use in the Straight Forth standard, which is for 64-bit desktop-computers --- Straight Forth is not for micro-controllers at all --- my STRING-STACK.4TH is currently written in ANS-Forth and is just plain vanilla Forth without any tricky techniques.
[quote="mOleg"] [quote="HughAguilar"]The format of the strings can be abstracted away.[/quote] могут, но не в стандарте (а тема, вроде как о стандарте на язык), а, значит, вопрос хранения обойти не получится. [/quote] To some extent, the format of the strings can be abstracted away. The user doesn't directly access the characters using C@ or @ --- he uses FORWARD$ and BACKWARD$ to traverse the strings, obtaining one character at a time. That character can be defined as being UTF-32. This doesn't mean the string is UTF-32 though. The string is more likely UTF-8, which is more compact. The issue of storage is circumvented because the user has an API between him and the string --- that is information hiding!
I don't actually care if it is UTF-32 or one of the 8-bit extended-ASCII formats. My STRING-STACK.4TH provides a way to work with strings, but it is not dependent upon any particular format. UTF-32 is needed for languages such as Russian, Vietnamese, etc., that use alien alphabets. For English and Spanish, 8-bit extended-ASCII is adequate.
[quote="mOleg"] [quote="HughAguilar"]I have written STRING-STACK.4TH that provides ANS-Forth with a string-stack.[/quote] очень интересно, но, подойдет ли ваша библиотека для ядра форта, предназначенного для маленького контроллера? [/quote] No, my STRING-STACK.4TH is not for a small micro-controller. It relies on having a heap, and small micro-controllers don't generally have a heap. By "small micro-controller" I'm thinking of an 8051 derivative with maybe 2KB of RAM at the most.
It might be useful on a large micro-controller that has enough memory to support a heap (maybe 8KB or RAM at a minimum, but more likely 64KB). An example would be a POS (point-of-sale) terminal or anything else with a keyboard and video display.
I wrote STRING-STACK.4TH for use in the Straight Forth standard, which is for 64-bit desktop-computers --- Straight Forth is not for micro-controllers at all --- my STRING-STACK.4TH is currently written in ANS-Forth and is just plain vanilla Forth without any tricky techniques.
|
|
|
|
Добавлено: Пн дек 03, 2018 05:25 |
|
|
|
|
|
Заголовок сообщения: |
Re: представление строк |
|
|
HughAguilar писал(а): You are not asking the correct question the question was: "basic string representation" how it holds in HughAguilar писал(а): I was going to attach the documentation as a file, but I don't see any way to attach files to posts in your forum, so I just put the document file inline: the memory. sorry, I have too little English writing practice, so, I will answer questions in russian. (you can write in English) HughAguilar писал(а): The format of the strings can be abstracted away. могут, но не в стандарте (а тема, вроде как о стандарте на язык), а, значит, вопрос хранения обойти не получится. HughAguilar писал(а): I have written STRING-STACK.4TH that provides ANS-Forth with a string-stack. очень интересно, но, подойдет ли ваша библиотека для ядра форта, предназначенного для маленького контроллера? Все же, я предпочитаю такие вещи видеть библиотеками, подключаемыми по-необходимости. HughAguilar писал(а): I can provide a copy of STRING-STACK.4TH to you if you want to make it part of your Russian Forth Standard. спасибо большое за ваш развернутый ответ! К сожалению, русский стандарт в перспективе не проглядывается (пока?) - нет заинтересованной группы, а в одиночку такие вещи не делаются. Но, я очень буду рад включению любых интересных библиотек в мою форт систему, если, конечно, вам это интересно 8) HughAguilar писал(а): I was going to attach the documentation as a file, but I don't see any way to attach files to posts in your forum, so I just put the document file inline: тут так и делается, все правильно сделано. HughAguilar писал(а): How many people can read my post in English? i can read freely, but too hard to answer.
[quote="HughAguilar"]You are not asking the correct question[/quote] the question was: "basic string representation" how it holds in[quote="HughAguilar"]I was going to attach the documentation as a file, but I don't see any way to attach files to posts in your forum, so I just put the document file inline:[/quote] the memory.
sorry, I have too little English writing practice, so, I will answer questions in russian. (you can write in English) [quote="HughAguilar"]The format of the strings can be abstracted away.[/quote] могут, но не в стандарте (а тема, вроде как о стандарте на язык), а, значит, вопрос хранения обойти не получится.
[quote="HughAguilar"]I have written STRING-STACK.4TH that provides ANS-Forth with a string-stack.[/quote] очень интересно, но, подойдет ли ваша библиотека для ядра форта, предназначенного для маленького контроллера? Все же, я предпочитаю такие вещи видеть библиотеками, подключаемыми по-необходимости.[quote="HughAguilar"]I can provide a copy of STRING-STACK.4TH to you if you want to make it part of your Russian Forth Standard.[/quote] спасибо большое за ваш развернутый ответ! К сожалению, русский стандарт в перспективе не проглядывается (пока?) - нет заинтересованной группы, а в одиночку такие вещи не делаются. Но, я очень буду рад включению любых интересных библиотек в [url=http://fforum.winglion.ru/viewforum.php?f=25]мою форт систему[/url], если, конечно, вам это интересно 8)
[quote="HughAguilar"]I was going to attach the documentation as a file, but I don't see any way to attach files to posts in your forum, so I just put the document file inline:[/quote] тут так и делается, все правильно сделано.
[quote="HughAguilar"]How many people can read my post in English?[/quote] i can read freely, but too hard to answer.
|
|
|
|
Добавлено: Вс дек 02, 2018 19:52 |
|
|
|
|
|
Заголовок сообщения: |
Re: Re: |
|
|
HughAguilar писал(а): mOleg писал(а): вопрос понят не правильно. под представлением строк понимается то: - как они хранятся в памяти, (AsciiZ, Pascal, др.) - какой может быть максимальная длина строки - как хранить unicode строки и прочие подобные вопросы.
You are not asking the correct question. The format of the strings can be abstracted away. You should ask: "Where are strings held?" I have written STRING-STACK.4TH that provides ANS-Forth with a string-stack. This has copy-on-write. Internally, the elements of the stack contain an address and a count. An advantage with the string-stack is that unique strings are held in the heap. The user doesn't have to explicitly allocate memory for the string and later explicitly free that memory. Explicitly allocating and freeing memory, as done in C, is error-prone. My string package automatically handles the allocation and freeing of memory in the heap as necessary. The user doesn't have to bother with this. My STRING-STACK.4TH package has been discussed several times on comp.lang.forth, each time getting a purely negative response. Here is the latest thread: https://groups.google.com/forum/#!topic ... IsEEJ2SwFUHere is quote from that thread: On Friday, September 28, 2018 at 6:11:04 PM UTC-7, hughag...@gmail.com wrote: > On Wednesday, September 26, 2018 at 1:17:58 PM UTC-7, Gerry Jackson wrote: > > On 26/09/2018 19:16, hughaguilar96@gmail.com wrote: > > > You have "a concatenation buffer" --- this is singular --- that means that you only have one place to store a string. > > > > Not so - see below - I probably should have worded that a bit better. I > > get the feeling I need a lawyer to vet responses to you! > > > > > What you have is essentially the same as PAD except that it doesn't move around like PAD does --- you said earlier that moving around is the problem with PAD. > > > Ultimately, your concatenation buffer has the same problems as PAD does. You are passing data between functions in a global variable, and you only have one. > > > You can't hold a string in your concatenation buffer when calling other functions unless you are sure that they don't use the buffer for something else. > > > This means that you can't have general-purpose code-libraries working with strings because the caller might also use the same buffer at the same time. > > > > Well if you'd read the documentation I provided a link to several months > > ago you would have seen that the user can create new buffers of any size > > as required. If a user is worried about something corrupting his buffer > > he can create his own and switch that in and out when appropriate. So > > other functions couldn't accidentally use it. Here's the link again: > > http://www.qlikz.org/forth/regex.html> > > > And if switching buffers isn't good enough for you there are other > > internal words available where a user can supply his own buffer e.g. > > CB-CONCAT ( caddr u buf -- ). I haven't felt the need to publicise those > > words - not that it matters as I've no evidence that anyone has > > seriously used the package. > > This isn't really good enough for me. > > You said: "the user can create new buffers of any size as required." > There are only two places to create these buffers: the dictionary or the heap. > If they are in the dictionary, then the user has to know at compile-time what buffers will be needed and how big they need to be. > If they are in the heap, then the user has to know when they are no longer needed in order to manually free them, lest that he gets a memory-leak. > Both solutions are over-complicated, error-prone, and put an undue burden on the programmer (who should not need to worry about such low-level details). > Also, in my novice-package, I have rewritten FREE so it will work on both memory-blocks in the heap and in the dictionary. > It can tell the difference because there is a flag in front of each to indicate where it came from > (in Straight Forth no flag will be required because the heap addresses are negative and the dictionary addresses are non-negative). > This allows me to easily switch between storing structs in the heap at run-time or in the dictionary at compile-time. > Code that uses the structs does not need to be modified because FREE and REALLOC etc. work either way. > You don't have this capability. You would require changes all through the application program if the user makes a switch from the heap to the dictionary. > One of the many failures of ANS-Forth was that it used a different mechanism for reserving memory in the heap and dictionary (ALLOCATE and ALLOT). > > I think my STRING-STACK.4TH is superior to your package. I have unique strings that are on the heap, but they are automatically freed when they are consumed. > I also have derivative strings that are pointers into other strings (either unique strings or constant strings in the dictionary from S" in colon words). > When a unique string gets freed, any derivatives it may have on the string-stack are first converted into uniques. > All of this is taken care of in the background. The user can assume the same behavior as if all strings on the string-stack were unique. > > You said before that I: "lost the argument." I don't think this is true. > You may have succeeded in answering my challenge for REPLACE$ and REPLACES$ but I think your design is going fail in actual programs with a lot of strings. > I think you are relying on the ANS-Forth cult, especially Alex McDonald, to support you without regard to the technical flaws of your design. > > I can add regexp to STRING-STACK.4TH but you can't add a string-stack to your package (except by copying my code). > > What exactly is your purpose in writing ANS-Forth code? Why are you value-adding to a Forth Inc. product when you don't work for Forth Inc.? Above I mention REPLACE$ and REPLACES$ which were example functions I provided earlier. These are them: Код: : replace$ ( -- change? ) \ string: a targ repl -- a | b \ replace a TARG in A with REPL to return B -rot$ ddup$ \ string: -- repl a a targ infix$ if \ string: -- repl a infix extract$ \ string: -- repl prefix suffix rot$ swap$ \ string: -- prefix repl suffix +$ +$ true else \ string: -- repl a a drop$ nip$ \ string: -- a false then ;
: replaces$ { | change? -- change? } \ string: a targ repl -- a | b \ replace all TARGs in A with REPL to return B begin swap$ ruck$ swap$ ruck$ \ string: targ repl str targ repl replace$ while \ string: targ repl str true to change? -rot$ repeat nip$ nip$ change? ;
> s" Gerry is the um greatest um most loyal um ANS-Forth um programmer ever!" mut>$ s" um" mut>$ s" ahh" mut>$ ok > replaces$ . -1 ok > .$ Gerry is the ahh greatest ahh most loyal ahh ANS-Forth ahh programmer ever! ok
[quote="HughAguilar"] [quote="mOleg"]вопрос понят не правильно. под представлением строк понимается то: - как они хранятся в памяти, (AsciiZ, Pascal, др.) - какой может быть максимальная длина строки - как хранить unicode строки и прочие подобные вопросы. [/quote] You are not asking the correct question. The format of the strings can be abstracted away. You should ask: "Where are strings held?"
I have written STRING-STACK.4TH that provides ANS-Forth with a string-stack. This has copy-on-write. Internally, the elements of the stack contain an address and a count. [/quote] An advantage with the string-stack is that unique strings are held in the heap. The user doesn't have to explicitly allocate memory for the string and later explicitly free that memory. Explicitly allocating and freeing memory, as done in C, is error-prone. My string package automatically handles the allocation and freeing of memory in the heap as necessary. The user doesn't have to bother with this.
My STRING-STACK.4TH package has been discussed several times on comp.lang.forth, each time getting a purely negative response. Here is the latest thread: https://groups.google.com/forum/#!topic/comp.lang.forth/EIsEEJ2SwFU Here is quote from that thread:
On Friday, September 28, 2018 at 6:11:04 PM UTC-7, hughag...@gmail.com wrote: > On Wednesday, September 26, 2018 at 1:17:58 PM UTC-7, Gerry Jackson wrote: > > On 26/09/2018 19:16, hughaguilar96@gmail.com wrote: > > > You have "a concatenation buffer" --- this is singular --- that means that you only have one place to store a string. > > > > Not so - see below - I probably should have worded that a bit better. I > > get the feeling I need a lawyer to vet responses to you! > > > > > What you have is essentially the same as PAD except that it doesn't move around like PAD does --- you said earlier that moving around is the problem with PAD. > > > Ultimately, your concatenation buffer has the same problems as PAD does. You are passing data between functions in a global variable, and you only have one. > > > You can't hold a string in your concatenation buffer when calling other functions unless you are sure that they don't use the buffer for something else. > > > This means that you can't have general-purpose code-libraries working with strings because the caller might also use the same buffer at the same time. > > > > Well if you'd read the documentation I provided a link to several months > > ago you would have seen that the user can create new buffers of any size > > as required. If a user is worried about something corrupting his buffer > > he can create his own and switch that in and out when appropriate. So > > other functions couldn't accidentally use it. Here's the link again: > > http://www.qlikz.org/forth/regex.html > > > > And if switching buffers isn't good enough for you there are other > > internal words available where a user can supply his own buffer e.g. > > CB-CONCAT ( caddr u buf -- ). I haven't felt the need to publicise those > > words - not that it matters as I've no evidence that anyone has > > seriously used the package. > > This isn't really good enough for me. > > You said: "the user can create new buffers of any size as required." > There are only two places to create these buffers: the dictionary or the heap. > If they are in the dictionary, then the user has to know at compile-time what buffers will be needed and how big they need to be. > If they are in the heap, then the user has to know when they are no longer needed in order to manually free them, lest that he gets a memory-leak. > Both solutions are over-complicated, error-prone, and put an undue burden on the programmer (who should not need to worry about such low-level details). > Also, in my novice-package, I have rewritten FREE so it will work on both memory-blocks in the heap and in the dictionary. > It can tell the difference because there is a flag in front of each to indicate where it came from > (in Straight Forth no flag will be required because the heap addresses are negative and the dictionary addresses are non-negative). > This allows me to easily switch between storing structs in the heap at run-time or in the dictionary at compile-time. > Code that uses the structs does not need to be modified because FREE and REALLOC etc. work either way. > You don't have this capability. You would require changes all through the application program if the user makes a switch from the heap to the dictionary. > One of the many failures of ANS-Forth was that it used a different mechanism for reserving memory in the heap and dictionary (ALLOCATE and ALLOT). > > I think my STRING-STACK.4TH is superior to your package. I have unique strings that are on the heap, but they are automatically freed when they are consumed. > I also have derivative strings that are pointers into other strings (either unique strings or constant strings in the dictionary from S" in colon words). > When a unique string gets freed, any derivatives it may have on the string-stack are first converted into uniques. > All of this is taken care of in the background. The user can assume the same behavior as if all strings on the string-stack were unique. > > You said before that I: "lost the argument." I don't think this is true. > You may have succeeded in answering my challenge for REPLACE$ and REPLACES$ but I think your design is going fail in actual programs with a lot of strings. > I think you are relying on the ANS-Forth cult, especially Alex McDonald, to support you without regard to the technical flaws of your design. > > I can add regexp to STRING-STACK.4TH but you can't add a string-stack to your package (except by copying my code). > > What exactly is your purpose in writing ANS-Forth code? Why are you value-adding to a Forth Inc. product when you don't work for Forth Inc.?
Above I mention REPLACE$ and REPLACES$ which were example functions I provided earlier. These are them: [code] : replace$ ( -- change? ) \ string: a targ repl -- a | b \ replace a TARG in A with REPL to return B -rot$ ddup$ \ string: -- repl a a targ infix$ if \ string: -- repl a infix extract$ \ string: -- repl prefix suffix rot$ swap$ \ string: -- prefix repl suffix +$ +$ true else \ string: -- repl a a drop$ nip$ \ string: -- a false then ;
: replaces$ { | change? -- change? } \ string: a targ repl -- a | b \ replace all TARGs in A with REPL to return B begin swap$ ruck$ swap$ ruck$ \ string: targ repl str targ repl replace$ while \ string: targ repl str true to change? -rot$ repeat nip$ nip$ change? ;
> s" Gerry is the um greatest um most loyal um ANS-Forth um programmer ever!" mut>$ s" um" mut>$ s" ahh" mut>$ ok > replaces$ . -1 ok > .$ Gerry is the ahh greatest ahh most loyal ahh ANS-Forth ahh programmer ever! ok [/code]
|
|
|
|
Добавлено: Пт ноя 30, 2018 22:13 |
|
|
|
|
|
Заголовок сообщения: |
Re: Re: |
|
|
HughAguilar писал(а): The format of the strings can be abstracted away.
How many people can read my post in English? I used Google to translate the above sentence into Russian and then back into English, and I got this: "Format strings can be distracting." That was not what I said! lol It is possible that I'm wasting my time trying to show my string package to the Russians. You won't be able to read my documentation or my source-code. This discussion may never get past vague statements such as: "Strings are necessary." That is not worthwhile.
[quote="HughAguilar"] The format of the strings can be abstracted away. [/quote] How many people can read my post in English? I used Google to translate the above sentence into Russian and then back into English, and I got this: "Format strings can be distracting." That was not what I said! lol
It is possible that I'm wasting my time trying to show my string package to the Russians. You won't be able to read my documentation or my source-code. This discussion may never get past vague statements such as: "Strings are necessary." That is not worthwhile.
|
|
|
|
Добавлено: Пт ноя 30, 2018 02:19 |
|
|
|
|
|
Заголовок сообщения: |
Re: |
|
|
mOleg писал(а): вопрос понят не правильно. под представлением строк понимается то: - как они хранятся в памяти, (AsciiZ, Pascal, др.) - какой может быть максимальная длина строки - как хранить unicode строки и прочие подобные вопросы.
You are not asking the correct question. The format of the strings can be abstracted away. You should ask: "Where are strings held?" I have written STRING-STACK.4TH that provides ANS-Forth with a string-stack. This has copy-on-write. Internally, the elements of the stack contain an address and a count. There are two kinds of strings on the string-stack: unique --- a unique string is in the heap derivative --- a derivative string's address is inside of a unique string or a constant string The user doesn't have to know which strings are unique and which are derivative. The user can write his code as if all the strings were unique. Internally, most of the strings are derivative. For example, DUP$ makes a derivative. Unique strings are only created when necessary. If a derivative is modified, it is first converted into a unique and then modified. If a unique is modified, all derivatives of that unique are first converted into unique strings so they don't get modified when the original unique string gets modified. When a string is consumed (for example, by .$ that types it out), it gets freed from the heap if it is a unique string (first any derivatives of that string are converted into unique strings). The purpose of having derivative strings is to boost the speed. Working with derivatives is very fast compared to working with unique strings. We avoid allocating and freeing memory blocks on the heap, which is typically slow. We avoid copying blocks of memory, but instead only copy an address/count pair (in DUP$ etc.). I have a lot of support for pattern-matching of strings. Because of this, a lot of strings are derivatives --- this boosts the speed. STRING-STACK.4TH would primarily be useful if all string operations were done on the string-stack. The user would not have an address/count pair on the data-stack. The user would not use C@ to access strings. Words like TYPE would become obsolete. Our .$ would be used instead, which assumes the string to be on the string-stack. Currently I have the following that move strings to the string-stack. >$ moves a constant string to the string-stack MUT>$ moves a mutatable string to the string-stack HEAP>$ moves a string that is known to already be in the heap to the string-stack The system would work a lot better if it were integrated into the Forth system (this would require the string-stack to become part of your Russian Forth Standard). For example, WORD would create a string on the string-stack rather than in a static buffer. <# #> would create a string on the string-stack rather than in a static buffer. Strings can be removed from the string-stack and moved to the data-stack. This is only done so the string can be stored somewhere, such as in a data-structure. This would never be done for the purpose of doing any operation to the string --- all operations on strings are done on the string-stack. I can provide a copy of STRING-STACK.4TH to you if you want to make it part of your Russian Forth Standard. The Forth-200x committee is opposed to my STRING-STACK.4TH and will not accept it. So, forget about them! I'll give it to the Russians instead. If you are interested, I can provide the source-code. It is currently written in ANS-Forth and requires NOVICE.4TH to already be loaded. It should be easy to convert it to run on your Forth system. I was going to attach the documentation as a file, but I don't see any way to attach files to posts in your forum, so I just put the document file inline: Код: String-Stack documentation: (written by Hugh Aguilar)
The string-stack code was inspired by Mark Wills' string-stack code in Turbo-Forth (also he has an ANS-Forth version). There are some differences though. I had these goals: 1.) I wanted to bring ANS-Forth up to the same level as QBASIC in regard to string handling. String-stack has MID$ for extracting substrings, and +$ for concatenating strings as well as a lot of other functions. Like in QBASIC, the user doesn't have to worry about allocating and freeing memory for strings, but this is done automatically. 2.) I wanted string-stack to be efficient. Assuming that ALLOC and DEALLOC are the speed-killers, these should be avoided as much as possible. Still though, everything should behave as if every item on the string-stack were a unique memory-block in the heap. 3.) I am thinking about later on writing a program that translates Ido into natural languages such as Spanish and English. I wanted string-stack to be usable for parsing the Esperanto text --- FAST-SPLITS$ and the prefix and suffix functions were added specifically for this purpose.
STRING-STACK.4TH requires that NOVICE.4TH already be included.
We still don't have regular-expressions which are typical in post-QBASIC languages (PERL etc.). I might write a reg-exp in the future, but I'm not enthusiastic about it. A reg-exp is essentially a mini-interpreter of a hard-to-read language. I would rather write code in Forth that does the pattern matching. A regular-expression is succinct, with a single line of meta-text describing the pattern compared to a dozen Forth functions, but that regular expression is also hard to read (for me, anyway). Regular expressions also have limitations compared to Forth --- iteration is extremely primitive --- there are a lot of patterns that are impossible to describe with a regular expression that can easily be implemented with a short and simple Forth function.
This string-stack code is intended to obsolete the <CSTR stuff in NOVICE.4TH that I never liked. The <CSTR stuff is deprecated and may eventually be discarded. At this time however, the string-stack code uses <CSTR for the <SPLIT$> function and also relies on having S" and S| already available.
There are three chapters in this document: 1.) Basic Usage --- This chapter describes the functions that would be used the most. Reading this chapter is enough to get the user going, and many users will not have to read any further. 2.) Intermediate Usage --- This chapter describes the functions that would be used for more advanced usage. The user is discouraged from reading this chapter without first getting hands-on experience with the material in the first chapter --- the only way to learn how to program is to write programs, but reading ever yet more advanced material without putting to use what you've already read about just tends to clutter the mind. 3.) Maintainers' Guide --- This chapter should not be read by application programmers. This chapter is for anybody who is maintaining my code and needs to understand the internal workings in order to upgrade my code.
Chapter 1.) Basic Usage
Section 1.1.) stack manipulation
>$ ( adr len -- ) \ string: -- a This is how strings get put on the string-stack. The strings must be constants that won't change --- they are typically S" strings inside of colon words. STRING-STACK.4TH is mostly useful for pattern-matching and concatenating strings, so most of the strings are S" strings inside of colon words. In many cases these never get converted into unique strings --- remaining derivatives throughout their useful lives makes them quite fast.
HEAP>$ ( adr len -- ) \ string: -- a This is like >$ except that it is used when the string is known to already be on the heap. >$ makes a new copy of the string on the heap and so if the string was already on the heap then there would be a memory leak. For the most part, HEAP>$ is used when the string came from $> rather than from S" and hence is already on the heap.
MUT>$ ( adr len -- ) \ string: -- a This is like >$ except it is used for strings that are mutable, meaning that they might change --- they are typically <CSTR strings (when the <CSTR circular-buffer eventually wraps around the old strings get clobbered by new strings).
Note that NOVICE.4TH provides an S" that works in interpretive mode (ANS-Forth doesn't guarantee that S" works in interpretive mode and not all ANS-Forth compilers allow this). The NOVICE.4TH S" also can be used more than once (ANS-Forth doesn't guarantee this and some ANS-Forth implementations have each S" string over-writing the last one). The NOVICE.4TH package also provides S| that uses the | char as a delimiter rather than the " char, which is useful if you need the " char inside of your string (the word STRING allows any char to be used as the delimiter). S" and S| etc. use <CSTR internally, so MUT>$ should be used --- in practice, interpretive mode is mostly used for testing, so >$ is fine as <CSTR strings last long enough for testing purposes.
$> ( -- adr len ) \ string: a -- This is how strings get removed from the string-stack. These strings are in the heap so the address needs to be given to DEALLOC eventually or there will be a memory leak. This function is only used if the string needs to be stored in a data-structure of some kind. The user should not use $> and then consume the string with TYPE or whatever. The user should consume the string on the string-stack, with .$ instead of TYPE for example, so the string is automatically freed from the heap.
$>R ( -- ) \ string: a -- \ return: -- adr len This moves a string from the string-stack to the return-stack for temporary storage. This macro only works inside of colon definitions but not in interpretive mode.
R>$ ( -- ) \ string: -- a \ return: adr len -- This moves a string from the return-stack to the string-stack. This assumes that the string on the return-stack is on the heap. This should only be used for strings that came from $>r but should not be used for strings that came from S" and then got pushed onto the return-stack with 2>R because those strings would not be on the heap. This macro only works inside of colon definitions but not in interpretive mode.
DUP$ ( -- ) \ string: a -- a a
OVER$ ( -- ) \ string: a b -- a b a
ROVER$ ( -- ) \ string: a b c -- a b c a
TUCK$ ( -- ) \ string: a b -- b a b This is the same as: SWAP$ OVER$
RUCK$ ( -- ) \ string: a b c -- c a b c This is the same as: -ROT$ ROVER$
DDUP$ ( -- ) \ string: a b -- a a b \ "deep dup" This is the same as: OVER$ SWAP$ We don't want to use $>R DUP$ R>$ for this because $>R makes B unique.
2DUP$ ( -- ) \ string: a b -- a b a b This is the same as: OVER$ OVER$
3DUP$ ( -- ) \ string: a b c -- a b c a b c This is the same as: ROVER$ ROVER$ ROVER$
SWAP$ ( -- ) \ string: a b -- b a
ROT$ ( -- ) \ string: a b c -- b c a
-ROT$ ( -- ) \ string: a b c -- c a b
REV$ ( -- ) \ string: a b c -- c b a \ note that Mark Wills' package had REV$ doing what our REVERSE$ does
DROP$ ( -- ) \ string: a --
2DROP$ ( -- ) \ string: a b --
NIP$ ( -- ) \ string: a b -- b This is the same as: SWAP$ DROP$
EMPTY$ ( -- ) \ string: x... -- This drops everything on the string-stack. This isn't very useful in programs --- it is somewhat useful when experimenting with the string-stack code in interpretive mode because you can get rid of all your experimentation results and start over.
.$ ( -- ) \ string: a -- This prints out the string similar to how dot prints out an integer.
:NAME$ ( wid -- ) \ string: a -- This is like colon except that it takes its name from the string-stack, and it puts the word in the wid word-list.
EVALUATE$ ( -- ) \ string: a -- This is like EVALUATE except that it takes it string from the string-stack.
CONST$ ( -- adr ) \ string: a -- This stores the string as a counted-string in the dictionary at HERE and returns the address. This aborts if the string is too big to become a counted-string.
VAL$ ( -- #invalid | n #single | d #double | #float ) \ float: -- f (if #FLOAT returned on data-stack) \ string: a -- This converts the string into a numeric value. If the string is not valid, the user gets #INVALID and can deal with the problem somehow.
.S$ ( -- ) \ string: x... -- x... This displays what is on the string-stack similar to how .S displays what is on the data-stack. This does not remove anything from the string-stack. This is useful for debugging programs, but the end-user of the programs should never see this display.
FIX\$ ( -- ) \ string: a -- b This converts a string with mark-up codes into a string with ascii equivalents. This is mostly useful for writing Spanish language text. The codes have a \ followed by a case-sensitive character. For Spanish, any vowel that can get an accent mark can be used to get that vowel accented. The \u or \U is the 'u' or 'U' with an accent mark, but the \d or \D is the 'u' or 'U' with a diaeresis mark. The \n or \N is the 'n' or 'N' with a tilde. Also, \? is the upside-down ? mark. For other languages, the \x## can be used, with ## being a hexadecimal number of the needed char. We also have the following: \@ 7 bell \b 8 backspace \f 12 FF form-feed \l 10 LF line-feed \m 13 10 CR/LF \" 34 double-quote \r 13 CR carriage-return \t 9 HT horizontal-tab \v 11 VT vertical-tab \z 0 null \\ 92 backslash \! 124 vertical bar \t 153 trademark \c 169 copyright \^ 176 degree \+ 177 +- \1 188 1/4 \2 189 1/2 \3 190 3/4
Section 1.2.) string manipulation
LEN$ ( -- length ) \ string: a -- This returns the length of the string on the data-stack. This consumes the string on the string-stack (Forth functions traditionally consume their arguments), so if this is used and you still need the string, then DUP$ or OVER$ or whatever should be used to keep a copy on the string-stack.
MID$ ( start-index length -- ) \ string: a -- b The B string is a substring in the middle of the A string.
ANTI-MID$ ( start-index length -- ) \ string: a -- b Returns the string with the middle part extracted (what MID$ would have returned is not returned, but instead the edge parts concatenated together are returned).
INNER$ ( start-index limit-index -- ) \ string: a -- b This is like MID$ except that it uses a LIMIT-INDEX rather than a LENGTH (this is somewhat like Mark Wills' MID$ and, to the best of my recollection, like the QBASIC MID$). Note that the LIMIT-INDEX is 1 beyond the middle-part that is kept (LIMIT-INDEX minus START-INDEX equals length).
ANTI-INNER$ ( start-index limit-index -- ) \ string: a -- b Returns the string with the middle part extracted (what INNER$ would have returned is not returned, but instead the edge parts concatenated together are returned). Note that the LIMIT-INDEX is 1 beyond the middle-part that is extracted (LIMIT-INDEX minus START-INDEX equals length).
LEFT$ ( length -- ) \ string: a -- b This provides a substring of length LENGTH from the left side of the string.
RIGHT$ ( length -- ) \ string: a -- b This provides a substring of length LENGTH from the right side of the string.
DISCARD-LEFT$ ( length -- ) \ string: a -- b This discards a substring of length LENGTH from the left side of the string.
DISCARD-RIGHT$ ( length -- ) \ string: a -- b This discards a substring of length LENGTH from the right side of the string.
FILL$ ( length char -- ) \ string: -- a This produces a string filled with CHAR of length LENGTH.
BLANK$ ( length -- ) \ string: -- a This produces a string filled with blanks of length LENGTH.
LPAD$ ( length -- ) \ string: a -- b This pads the string with blanks on the left side so the total length is LENGTH --- if the length of A is less than LENGTH nothing is done.
RPAD$ ( length -- ) \ string: a -- b This pads the string with blanks on the right side so the total length is LENGTH --- if the length of A is less than LENGTH nothing is done.
LTRIM$ ( -- ) \ string: a -- b This trims the whitespace from the left side of the string.
RTRIM$ ( -- ) \ string: a -- b This trims the whitespace from the right side of the string.
TRIM$ ( -- ) \ string: a -- b This trims the whitespace from the left and right sides of the string.
BLACKEN$ ( -- ) \ string: a -- b This removes all the whitespace from the entire string.
Section 1.3.) searching and comparing
COMPARE$ ( -- -1|0|1 ) \ string: a b -- This is like COMPARE except for the string-stack.
ICOMPARE$ ( -- -1|0|1 ) \ string: a b -- This is like COMPARE$ except case-insensitive.
=$ ( -- equal? ) \ string: a b -- This compares the strings for equality. It is faster than COMPARE$ for when only an equality comparison is needed.
I=$ ( -- equal? ) \ string: a b -- This is like =$ except case-insensitive.
FINDC$ ( char -- index|-1 ) \ string: a -- This finds a char in the string, or returns -1 if not found.
IFINDC$ ( char -- index|-1 ) \ string: a -- Like FINDC$ except case-insensitive.
FIND$ ( -- index|-1 ) \ string: a b -- This finds the A string in the B string, or returns -1 if not found.
IFIND$ ( -- index|-1 ) \ string: a b -- Like FIND$ except case-insensitive.
Section 1.4.) prefixes and suffixes and infixes, oh my!
PREFIX$ ( -- found? ) \ string: a b -- a | c This determines if B is a prefix of A. If PREFIX$ returns true, then it returns C which is the prefix inside of A (it is a derivative).
IPREFIX$ ( -- found? ) \ string: a b -- a | c This is like PREFIX$ except case-insensitive.
SUFFIX$ ( -- found? ) \ string: a b -- a | c This determines if B is a suffix of A. If SUFFIX$ returns true, then it returns C which is the suffix inside of A (it is a derivative).
ISUFFIX$ ( -- found? ) \ string: a b -- a | c This is like SUFFIX$ except case-insensitive.
INFIX$ ( -- found? ) \ string: a b -- a | c This determines if B is an infix of A. If INFIX$ returns true, then it returns C which is the infix inside of A (it is a derivative).
IINFIX$ ( -- found? ) \ string: a b -- a | c This is like INFIX$ except case-insensitive.
EXTRACT$ ( -- ) \ string: a b -- c d This requires B to be a derivative inside of A (it is also okay for B to be an empty unique). EXTRACT$ removes the B string from the A string and returns the prefix (the C string) and the suffix (the D string). This should only be used on the results returned by: INFIX$ PREFIX$ SUFFIX$ IINFIX$ IPREFIX$ or ISUFFIX$
Note that EXTRACT$ is the only function we have that requires the parameters to be derived one from the other. All of our other functions work on either unique or derivative strings. EXTRACT$ is context-sensitive, in that it is supposed to be used after certain other functions. See also ANTIMID$ that uses EXTRACT$ internally.
REPLACE$ ( -- change? ) \ string: a targ repl -- str The replaces the first occurence of the TARG substring in A with the REPL string. The STR returned may be A if no change was made. The flag CHANGE? indicates if any change was made.
REPLACES$ ( -- change? ) \ string: a targ repl -- str This replaces all of the TARG substrings in A with REPL strings. The STR returned may be A if no changes were made. The flag CHANGE? indicates if any changes were made.
IREPLACE$ ( -- change? ) \ string: a targ repl -- str This is like REPLACE$ except case-insensitive.
IREPLACES$ ( -- change? ) \ string: a targ repl -- str This is like REPLACES$ except case-insensitive.
Chapter 2.) Intermediate Usage
Section 2.1.) these functions aren't very useful, but they are documented anyway just in case --- the reader should just skim over this section
DEPTH$ ( -- depth ) \ string: x... -- x... This provides the depth of the string-stack. I can't think of any reason why anybody would need this.
REVERSE$ ( -- ) \ string: a -- b This reverses the characters in the string. I can't think of any reason why anybody would need this. Note that in Mark Wills' package this was called REV$, but we are using the name REV$ for something else now.
UCASE$ ( -- ) \ string: a -- b This upper-cases the characters in the string.
LCASE$ ( -- ) \ string: a -- b This lower-cases the characters in the string.
WHITE? ( char -- flag? ) This checks if the char is white-space --- that is, if it is <= 32.
NONWHITE? ( char -- flag? ) This checks if the char is not white-space.
CHAR-UPPER ( charA -- charB ) This upper-cases the char.
CHAR-LOWER ( charA -- charB ) This lower-cases the char.
BLACKEN ( adr len -- adr new-len ) This removes all the whitespace from the string (not on the string-stack).
UPPER ( adr len -- ) This upper-cases a string (not on the string-stack).
LOWER ( adr len -- ) This lower-cases a string (not on the string-stack).
STR= ( adrA lenA adrB lenB -- flag ) This compares strings for equality.
ISTR= ( adrA lenA adrB lenB -- flag ) This is like STR= except case-insensitive.
ICOMPARE ( adrA lenA adrB lenB -- -1|0|1 ) This is like COMPARE except case-insensitive
Section 2.2) traversing strings
FORWARD$ ( xt -- index | -1 ) \ string: a -- a Traverses the string from front to back, executing XT for every char in the string. The XT function should have a stack-picture: ( i*x char-adr -- j*x done? ) The XT function returns a flag indicating if the traversal is done or not. If the flag is true, then FORWARD$ stops traversing and returns the index of the char where the traversal stopped. If FORWARD$ traverses the entire string without being stopped, it returns -1. Note that FORWARD$ does not consume its argument on the string-stack as is traditionally done.
This is an example of FORWARD$ being used. The NIP gets rid of the CHAR that is still on the stack (the <FINDC$> left it there every time). : <findc$> ( char adr -- char done? ) \ string: a -- c@ over = ; : findc$ ( char -- index | -1 ) \ string: a -- ['] <findc$> forward$ nip drop$ ;
BACKWARD$ ( xt -- index | -1 ) \ string: a -- a Like FORWARD$ except that it traverses the string from back to front.
This is an example of BACKWARD$ being used. Note that -1 is not just a flag indicating that we didn't find a char past the white, but is also the index past the white that we found (we found white all the way to index 0). We add 1 to the index past the white to get the length of the good stuff below the white. : <trim$> ( char-adr -- done? ) c@ nonwhite? ; : rtrim$ ( -- ) \ string: a -- b ['] <trim$> backward$ \ -- index-past-white 1+ \ -- how-many-keepers left$ ;
PREP-MUTATION ( -- ) If FORWARD$ or BACKWARD$ are used to mutate a string, PREP-MUTATION should first be called. If FORWARD$ or BACKWARD$ are just being used to examine the string, then PREP-MUTATION should not be called.
This is an example of PREP-MUTATION being used. Unlike the previous examples of FORWARD$ and BACKWARD$ that just examined the string, in this example we are mutating (modifying) the string, so we need PREP-MUTATION. The FALSE in <UCASE$> indicates that we aren't done, because we always go all the way through. The DROP in UCASE$ gets rid of the -1 that FORWARD$ returns. We could have used either FORWARD$ or BACKWARD$ in UCASE$. : <ucase$> ( char-adr -- ) dup c@ char-upper swap c! false ; : ucase$ ( -- ) \ string: a -- b prep-mutation ['] <ucase$> forward$ drop ;
The above example is how I originally wrote UCASE$, but I have a more efficient version now that uses a DO loop explicitly. The user should write code like this, with BACKWARD$ or FORWARD$ however, rather than use DO loops explicitly because this is the idiomatic way to use the string-stack package even if it is slightly less efficient. As a general rule, the use of a HOF (higher-order function) such as FORWARD$ reduces bugs because explicit iteration is the primary source of bugs in any program. Also, I may later upgrade the string-stack package to be mostly assembly-language. If I do this, then FORWARD$ and BACKWARD$ will be in assembly-language and will be faster than the current Forth versions. In this case, the use of the HOF will be more efficient than the use of explicit DO loops. HOFs are all about information-hiding, which is always a good thing.
Section 2.3.) splitting strings around a delimiter
N$> ( count -- adr len ... ) \ string: z... -- This moves COUNT strings from the string-stack to the data-stack. It just calls $> for as many times a COUNT specified. This is primarily for use in conjunction with SPLITS$ that will be documented later.
<SPLIT$> ( delimiter left right -- split? ) \ string: a -- l r | l This splits the string around the first DELIMITER char that it finds. The LEFT and RIGHT chars are for literal strings inside of the string. If the DELIMITER is inside of a literal string, it does not count as a delimiter. When the strings are split, the literal-string brackets LEFT and RIGHT are removed from the string when the L string is produced, and the delimiter DELIMITER is removed also. The R string is everything beyond the delimiter with nothing removed. The flag SPLIT? indicates if we found a delimiter and split the string, in which case both L and R strings are returned, or if we never found a delimiter in which case only the L string is returned.
This is an example of <SPLIT$> being used in interpretive mode. Here we are calling <SPLIT$> repeatedly until it returns a FALSE to indicate that it couldn't split the string. s" programmer,<Aguilar,Hugh>,50" >$ ok .s$ STRING STACK: unique: |programmer,<Aguilar,Hugh>,50| ok char , char < char > <split$> . -1 ok .s$ STRING STACK: unique: |<Aguilar,Hugh>,50| unique: |programmer| ok char , char < char > <split$> . -1 ok .s$ STRING STACK: unique: |50| unique: |Aguilar,Hugh| unique: |programmer| ok char , char < char > <split$> . 0 ok .s$ STRING STACK: unique: |50| unique: |Aguilar,Hugh| unique: |programmer| ok
SPLIT$ ( -- split? ) \ string: a -- l r | l This is just <SPLIT$> with the delimiter char being the comma and the left and right bracket chars both being the quotation mark. This is the most common format for database dumps into text files.
IS-SPLIT$ ( xt -- ) This sets what SPLIT$ does.
This is an example of IS-SPLIT$ being used. This is, in fact, how in STRING-STACK.4TH we set the default for what SPLIT$ does. : comma-split$ ( -- split? ) \ string: a -- l r | a [char] , [char] " [char] " <split$> ; ' comma-split$ is-split$
SPLITS$ ( -- count ) \ string: a -- x... This cals SPLIT$ repeatedly, splitting the string into some number of strings. It returns COUNT to indicate how many strings were returned.
This is an example of SPLITS$ being used. We used S| rather than S" because we needed to have the " char inside of the string. SPLITS$ returns a 3 to indicate that it split the string into 3 pieces. The user should be aware that the top value of the string-stack is the rightmost piece. This is why, when we did .$ repeatedly, we got the strings printed from rightmost to leftmost. This may seem counter-intuitive to somebody who is not familiar with stacks. s| programmer,"Aguilar,Hugh",50| >$ ok .s$ STRING STACK: unique: |programmer,"Aguilar,Hugh",50| ok splits$ . 3 ok .s$ STRING STACK: unique: |50| unique: |Aguilar,Hugh| unique: |programmer| ok .$ 50 ok .$ Aguilar,Hugh ok .$ programmer ok
This is an example (provided mostly for humor) of working around the supposedly counter-intuitive issue of the string-stack elements printing out backwards from their order in the original string. s| programmer,"Aguilar,Hugh",50| >$ ok reverse$ ok .s$ STRING STACK: unique: |05,"hguH,raliugA",remmargorp| ok splits$ . 3 ok .s$ STRING STACK: unique: |remmargorp| unique: |hguH,raliugA| unique: |05| ok reverse$ .$ programmer ok reverse$ .$ Aguilar,Hugh ok reverse$ .$ 50 ok
Getting serious again, this is an example of splitting a string and storing the fields in a struct. 0 d field .occupation d field .emp-name d field .age constant employee
create me employee allot
s| programmer,"Aguilar,Hugh",50| >$ splits$ n$> me .occupation 2! me .emp-name 2! me .age 2!
The user can also use SPLIT and COMBINE that are in LIST.4TH rather than have multiple strings on either the string-stack or the data-stack. That might be the easiest solution.
We also have these words:
<FAST-SPLIT$> ( delimiter -- split? ) \ string: a -- l r | l)
This is like <SPLIT$> except that it doesn't use the left and right brackets, and it is much faster.
FAST-SPLIT$ ( -- split? ) \ string: a -- l r | l)
This is <FAST-SPLIT$> with a BL delimiter. This is used primarily for splitting up words of text.
FAST-SPLITS$ ( -- count ) \ string: a -- x...
This calls <FAST-SPLITS$> splitting the string into some number of strings. It returns COUNT to indicate how many strings were returned.
Chapter 3.) Maintainers' Guide
This chapter discusses the internal workings of the string-stack code for the benefit of anybody who wants to upgrade the package.
I haven't written this chapter yet and won't until the code has settled and isn't being upgraded anymore.
[quote="mOleg"]вопрос понят не правильно. под представлением строк понимается то: - как они хранятся в памяти, (AsciiZ, Pascal, др.) - какой может быть максимальная длина строки - как хранить unicode строки и прочие подобные вопросы. [/quote] You are not asking the correct question. The format of the strings can be abstracted away. You should ask: "Where are strings held?"
I have written STRING-STACK.4TH that provides ANS-Forth with a string-stack. This has copy-on-write. Internally, the elements of the stack contain an address and a count. There are two kinds of strings on the string-stack: unique --- a unique string is in the heap derivative --- a derivative string's address is inside of a unique string or a constant string
The user doesn't have to know which strings are unique and which are derivative. The user can write his code as if all the strings were unique. Internally, most of the strings are derivative. For example, DUP$ makes a derivative. Unique strings are only created when necessary. If a derivative is modified, it is first converted into a unique and then modified. If a unique is modified, all derivatives of that unique are first converted into unique strings so they don't get modified when the original unique string gets modified. When a string is consumed (for example, by .$ that types it out), it gets freed from the heap if it is a unique string (first any derivatives of that string are converted into unique strings).
The purpose of having derivative strings is to boost the speed. Working with derivatives is very fast compared to working with unique strings. We avoid allocating and freeing memory blocks on the heap, which is typically slow. We avoid copying blocks of memory, but instead only copy an address/count pair (in DUP$ etc.). I have a lot of support for pattern-matching of strings. Because of this, a lot of strings are derivatives --- this boosts the speed.
STRING-STACK.4TH would primarily be useful if all string operations were done on the string-stack. The user would not have an address/count pair on the data-stack. The user would not use C@ to access strings. Words like TYPE would become obsolete. Our .$ would be used instead, which assumes the string to be on the string-stack.
Currently I have the following that move strings to the string-stack. >$ moves a constant string to the string-stack MUT>$ moves a mutatable string to the string-stack HEAP>$ moves a string that is known to already be in the heap to the string-stack
The system would work a lot better if it were integrated into the Forth system (this would require the string-stack to become part of your Russian Forth Standard). For example, WORD would create a string on the string-stack rather than in a static buffer. <# #> would create a string on the string-stack rather than in a static buffer.
Strings can be removed from the string-stack and moved to the data-stack. This is only done so the string can be stored somewhere, such as in a data-structure. This would never be done for the purpose of doing any operation to the string --- all operations on strings are done on the string-stack.
I can provide a copy of STRING-STACK.4TH to you if you want to make it part of your Russian Forth Standard. The Forth-200x committee is opposed to my STRING-STACK.4TH and will not accept it. So, forget about them! I'll give it to the Russians instead.
If you are interested, I can provide the source-code. It is currently written in ANS-Forth and requires NOVICE.4TH to already be loaded. It should be easy to convert it to run on your Forth system.
I was going to attach the documentation as a file, but I don't see any way to attach files to posts in your forum, so I just put the document file inline: [code] String-Stack documentation: (written by Hugh Aguilar)
The string-stack code was inspired by Mark Wills' string-stack code in Turbo-Forth (also he has an ANS-Forth version). There are some differences though. I had these goals: 1.) I wanted to bring ANS-Forth up to the same level as QBASIC in regard to string handling. String-stack has MID$ for extracting substrings, and +$ for concatenating strings as well as a lot of other functions. Like in QBASIC, the user doesn't have to worry about allocating and freeing memory for strings, but this is done automatically. 2.) I wanted string-stack to be efficient. Assuming that ALLOC and DEALLOC are the speed-killers, these should be avoided as much as possible. Still though, everything should behave as if every item on the string-stack were a unique memory-block in the heap. 3.) I am thinking about later on writing a program that translates Ido into natural languages such as Spanish and English. I wanted string-stack to be usable for parsing the Esperanto text --- FAST-SPLITS$ and the prefix and suffix functions were added specifically for this purpose.
STRING-STACK.4TH requires that NOVICE.4TH already be included.
We still don't have regular-expressions which are typical in post-QBASIC languages (PERL etc.). I might write a reg-exp in the future, but I'm not enthusiastic about it. A reg-exp is essentially a mini-interpreter of a hard-to-read language. I would rather write code in Forth that does the pattern matching. A regular-expression is succinct, with a single line of meta-text describing the pattern compared to a dozen Forth functions, but that regular expression is also hard to read (for me, anyway). Regular expressions also have limitations compared to Forth --- iteration is extremely primitive --- there are a lot of patterns that are impossible to describe with a regular expression that can easily be implemented with a short and simple Forth function.
This string-stack code is intended to obsolete the <CSTR stuff in NOVICE.4TH that I never liked. The <CSTR stuff is deprecated and may eventually be discarded. At this time however, the string-stack code uses <CSTR for the <SPLIT$> function and also relies on having S" and S| already available.
There are three chapters in this document: 1.) Basic Usage --- This chapter describes the functions that would be used the most. Reading this chapter is enough to get the user going, and many users will not have to read any further. 2.) Intermediate Usage --- This chapter describes the functions that would be used for more advanced usage. The user is discouraged from reading this chapter without first getting hands-on experience with the material in the first chapter --- the only way to learn how to program is to write programs, but reading ever yet more advanced material without putting to use what you've already read about just tends to clutter the mind. 3.) Maintainers' Guide --- This chapter should not be read by application programmers. This chapter is for anybody who is maintaining my code and needs to understand the internal workings in order to upgrade my code.
Chapter 1.) Basic Usage
Section 1.1.) stack manipulation
>$ ( adr len -- ) \ string: -- a This is how strings get put on the string-stack. The strings must be constants that won't change --- they are typically S" strings inside of colon words. STRING-STACK.4TH is mostly useful for pattern-matching and concatenating strings, so most of the strings are S" strings inside of colon words. In many cases these never get converted into unique strings --- remaining derivatives throughout their useful lives makes them quite fast.
HEAP>$ ( adr len -- ) \ string: -- a This is like >$ except that it is used when the string is known to already be on the heap. >$ makes a new copy of the string on the heap and so if the string was already on the heap then there would be a memory leak. For the most part, HEAP>$ is used when the string came from $> rather than from S" and hence is already on the heap.
MUT>$ ( adr len -- ) \ string: -- a This is like >$ except it is used for strings that are mutable, meaning that they might change --- they are typically <CSTR strings (when the <CSTR circular-buffer eventually wraps around the old strings get clobbered by new strings).
Note that NOVICE.4TH provides an S" that works in interpretive mode (ANS-Forth doesn't guarantee that S" works in interpretive mode and not all ANS-Forth compilers allow this). The NOVICE.4TH S" also can be used more than once (ANS-Forth doesn't guarantee this and some ANS-Forth implementations have each S" string over-writing the last one). The NOVICE.4TH package also provides S| that uses the | char as a delimiter rather than the " char, which is useful if you need the " char inside of your string (the word STRING allows any char to be used as the delimiter). S" and S| etc. use <CSTR internally, so MUT>$ should be used --- in practice, interpretive mode is mostly used for testing, so >$ is fine as <CSTR strings last long enough for testing purposes.
$> ( -- adr len ) \ string: a -- This is how strings get removed from the string-stack. These strings are in the heap so the address needs to be given to DEALLOC eventually or there will be a memory leak. This function is only used if the string needs to be stored in a data-structure of some kind. The user should not use $> and then consume the string with TYPE or whatever. The user should consume the string on the string-stack, with .$ instead of TYPE for example, so the string is automatically freed from the heap.
$>R ( -- ) \ string: a -- \ return: -- adr len This moves a string from the string-stack to the return-stack for temporary storage. This macro only works inside of colon definitions but not in interpretive mode.
R>$ ( -- ) \ string: -- a \ return: adr len -- This moves a string from the return-stack to the string-stack. This assumes that the string on the return-stack is on the heap. This should only be used for strings that came from $>r but should not be used for strings that came from S" and then got pushed onto the return-stack with 2>R because those strings would not be on the heap. This macro only works inside of colon definitions but not in interpretive mode.
DUP$ ( -- ) \ string: a -- a a
OVER$ ( -- ) \ string: a b -- a b a
ROVER$ ( -- ) \ string: a b c -- a b c a
TUCK$ ( -- ) \ string: a b -- b a b This is the same as: SWAP$ OVER$
RUCK$ ( -- ) \ string: a b c -- c a b c This is the same as: -ROT$ ROVER$
DDUP$ ( -- ) \ string: a b -- a a b \ "deep dup" This is the same as: OVER$ SWAP$ We don't want to use $>R DUP$ R>$ for this because $>R makes B unique.
2DUP$ ( -- ) \ string: a b -- a b a b This is the same as: OVER$ OVER$
3DUP$ ( -- ) \ string: a b c -- a b c a b c This is the same as: ROVER$ ROVER$ ROVER$
SWAP$ ( -- ) \ string: a b -- b a
ROT$ ( -- ) \ string: a b c -- b c a
-ROT$ ( -- ) \ string: a b c -- c a b
REV$ ( -- ) \ string: a b c -- c b a \ note that Mark Wills' package had REV$ doing what our REVERSE$ does
DROP$ ( -- ) \ string: a --
2DROP$ ( -- ) \ string: a b --
NIP$ ( -- ) \ string: a b -- b This is the same as: SWAP$ DROP$
EMPTY$ ( -- ) \ string: x... -- This drops everything on the string-stack. This isn't very useful in programs --- it is somewhat useful when experimenting with the string-stack code in interpretive mode because you can get rid of all your experimentation results and start over.
.$ ( -- ) \ string: a -- This prints out the string similar to how dot prints out an integer.
:NAME$ ( wid -- ) \ string: a -- This is like colon except that it takes its name from the string-stack, and it puts the word in the wid word-list.
EVALUATE$ ( -- ) \ string: a -- This is like EVALUATE except that it takes it string from the string-stack.
CONST$ ( -- adr ) \ string: a -- This stores the string as a counted-string in the dictionary at HERE and returns the address. This aborts if the string is too big to become a counted-string.
VAL$ ( -- #invalid | n #single | d #double | #float ) \ float: -- f (if #FLOAT returned on data-stack) \ string: a -- This converts the string into a numeric value. If the string is not valid, the user gets #INVALID and can deal with the problem somehow.
.S$ ( -- ) \ string: x... -- x... This displays what is on the string-stack similar to how .S displays what is on the data-stack. This does not remove anything from the string-stack. This is useful for debugging programs, but the end-user of the programs should never see this display.
FIX\$ ( -- ) \ string: a -- b This converts a string with mark-up codes into a string with ascii equivalents. This is mostly useful for writing Spanish language text. The codes have a \ followed by a case-sensitive character. For Spanish, any vowel that can get an accent mark can be used to get that vowel accented. The \u or \U is the 'u' or 'U' with an accent mark, but the \d or \D is the 'u' or 'U' with a diaeresis mark. The \n or \N is the 'n' or 'N' with a tilde. Also, \? is the upside-down ? mark. For other languages, the \x## can be used, with ## being a hexadecimal number of the needed char. We also have the following: \@ 7 bell \b 8 backspace \f 12 FF form-feed \l 10 LF line-feed \m 13 10 CR/LF \" 34 double-quote \r 13 CR carriage-return \t 9 HT horizontal-tab \v 11 VT vertical-tab \z 0 null \\ 92 backslash \! 124 vertical bar \t 153 trademark \c 169 copyright \^ 176 degree \+ 177 +- \1 188 1/4 \2 189 1/2 \3 190 3/4
Section 1.2.) string manipulation
LEN$ ( -- length ) \ string: a -- This returns the length of the string on the data-stack. This consumes the string on the string-stack (Forth functions traditionally consume their arguments), so if this is used and you still need the string, then DUP$ or OVER$ or whatever should be used to keep a copy on the string-stack.
MID$ ( start-index length -- ) \ string: a -- b The B string is a substring in the middle of the A string.
ANTI-MID$ ( start-index length -- ) \ string: a -- b Returns the string with the middle part extracted (what MID$ would have returned is not returned, but instead the edge parts concatenated together are returned).
INNER$ ( start-index limit-index -- ) \ string: a -- b This is like MID$ except that it uses a LIMIT-INDEX rather than a LENGTH (this is somewhat like Mark Wills' MID$ and, to the best of my recollection, like the QBASIC MID$). Note that the LIMIT-INDEX is 1 beyond the middle-part that is kept (LIMIT-INDEX minus START-INDEX equals length).
ANTI-INNER$ ( start-index limit-index -- ) \ string: a -- b Returns the string with the middle part extracted (what INNER$ would have returned is not returned, but instead the edge parts concatenated together are returned). Note that the LIMIT-INDEX is 1 beyond the middle-part that is extracted (LIMIT-INDEX minus START-INDEX equals length).
LEFT$ ( length -- ) \ string: a -- b This provides a substring of length LENGTH from the left side of the string.
RIGHT$ ( length -- ) \ string: a -- b This provides a substring of length LENGTH from the right side of the string.
DISCARD-LEFT$ ( length -- ) \ string: a -- b This discards a substring of length LENGTH from the left side of the string.
DISCARD-RIGHT$ ( length -- ) \ string: a -- b This discards a substring of length LENGTH from the right side of the string.
FILL$ ( length char -- ) \ string: -- a This produces a string filled with CHAR of length LENGTH.
BLANK$ ( length -- ) \ string: -- a This produces a string filled with blanks of length LENGTH.
LPAD$ ( length -- ) \ string: a -- b This pads the string with blanks on the left side so the total length is LENGTH --- if the length of A is less than LENGTH nothing is done.
RPAD$ ( length -- ) \ string: a -- b This pads the string with blanks on the right side so the total length is LENGTH --- if the length of A is less than LENGTH nothing is done.
LTRIM$ ( -- ) \ string: a -- b This trims the whitespace from the left side of the string.
RTRIM$ ( -- ) \ string: a -- b This trims the whitespace from the right side of the string.
TRIM$ ( -- ) \ string: a -- b This trims the whitespace from the left and right sides of the string.
BLACKEN$ ( -- ) \ string: a -- b This removes all the whitespace from the entire string.
Section 1.3.) searching and comparing
COMPARE$ ( -- -1|0|1 ) \ string: a b -- This is like COMPARE except for the string-stack.
ICOMPARE$ ( -- -1|0|1 ) \ string: a b -- This is like COMPARE$ except case-insensitive.
=$ ( -- equal? ) \ string: a b -- This compares the strings for equality. It is faster than COMPARE$ for when only an equality comparison is needed.
I=$ ( -- equal? ) \ string: a b -- This is like =$ except case-insensitive.
FINDC$ ( char -- index|-1 ) \ string: a -- This finds a char in the string, or returns -1 if not found.
IFINDC$ ( char -- index|-1 ) \ string: a -- Like FINDC$ except case-insensitive.
FIND$ ( -- index|-1 ) \ string: a b -- This finds the A string in the B string, or returns -1 if not found.
IFIND$ ( -- index|-1 ) \ string: a b -- Like FIND$ except case-insensitive.
Section 1.4.) prefixes and suffixes and infixes, oh my!
PREFIX$ ( -- found? ) \ string: a b -- a | c This determines if B is a prefix of A. If PREFIX$ returns true, then it returns C which is the prefix inside of A (it is a derivative).
IPREFIX$ ( -- found? ) \ string: a b -- a | c This is like PREFIX$ except case-insensitive.
SUFFIX$ ( -- found? ) \ string: a b -- a | c This determines if B is a suffix of A. If SUFFIX$ returns true, then it returns C which is the suffix inside of A (it is a derivative).
ISUFFIX$ ( -- found? ) \ string: a b -- a | c This is like SUFFIX$ except case-insensitive.
INFIX$ ( -- found? ) \ string: a b -- a | c This determines if B is an infix of A. If INFIX$ returns true, then it returns C which is the infix inside of A (it is a derivative).
IINFIX$ ( -- found? ) \ string: a b -- a | c This is like INFIX$ except case-insensitive.
EXTRACT$ ( -- ) \ string: a b -- c d This requires B to be a derivative inside of A (it is also okay for B to be an empty unique). EXTRACT$ removes the B string from the A string and returns the prefix (the C string) and the suffix (the D string). This should only be used on the results returned by: INFIX$ PREFIX$ SUFFIX$ IINFIX$ IPREFIX$ or ISUFFIX$
Note that EXTRACT$ is the only function we have that requires the parameters to be derived one from the other. All of our other functions work on either unique or derivative strings. EXTRACT$ is context-sensitive, in that it is supposed to be used after certain other functions. See also ANTIMID$ that uses EXTRACT$ internally.
REPLACE$ ( -- change? ) \ string: a targ repl -- str The replaces the first occurence of the TARG substring in A with the REPL string. The STR returned may be A if no change was made. The flag CHANGE? indicates if any change was made.
REPLACES$ ( -- change? ) \ string: a targ repl -- str This replaces all of the TARG substrings in A with REPL strings. The STR returned may be A if no changes were made. The flag CHANGE? indicates if any changes were made.
IREPLACE$ ( -- change? ) \ string: a targ repl -- str This is like REPLACE$ except case-insensitive.
IREPLACES$ ( -- change? ) \ string: a targ repl -- str This is like REPLACES$ except case-insensitive.
Chapter 2.) Intermediate Usage
Section 2.1.) these functions aren't very useful, but they are documented anyway just in case --- the reader should just skim over this section
DEPTH$ ( -- depth ) \ string: x... -- x... This provides the depth of the string-stack. I can't think of any reason why anybody would need this.
REVERSE$ ( -- ) \ string: a -- b This reverses the characters in the string. I can't think of any reason why anybody would need this. Note that in Mark Wills' package this was called REV$, but we are using the name REV$ for something else now.
UCASE$ ( -- ) \ string: a -- b This upper-cases the characters in the string.
LCASE$ ( -- ) \ string: a -- b This lower-cases the characters in the string.
WHITE? ( char -- flag? ) This checks if the char is white-space --- that is, if it is <= 32.
NONWHITE? ( char -- flag? ) This checks if the char is not white-space.
CHAR-UPPER ( charA -- charB ) This upper-cases the char.
CHAR-LOWER ( charA -- charB ) This lower-cases the char.
BLACKEN ( adr len -- adr new-len ) This removes all the whitespace from the string (not on the string-stack).
UPPER ( adr len -- ) This upper-cases a string (not on the string-stack).
LOWER ( adr len -- ) This lower-cases a string (not on the string-stack).
STR= ( adrA lenA adrB lenB -- flag ) This compares strings for equality.
ISTR= ( adrA lenA adrB lenB -- flag ) This is like STR= except case-insensitive.
ICOMPARE ( adrA lenA adrB lenB -- -1|0|1 ) This is like COMPARE except case-insensitive
Section 2.2) traversing strings
FORWARD$ ( xt -- index | -1 ) \ string: a -- a Traverses the string from front to back, executing XT for every char in the string. The XT function should have a stack-picture: ( i*x char-adr -- j*x done? ) The XT function returns a flag indicating if the traversal is done or not. If the flag is true, then FORWARD$ stops traversing and returns the index of the char where the traversal stopped. If FORWARD$ traverses the entire string without being stopped, it returns -1. Note that FORWARD$ does not consume its argument on the string-stack as is traditionally done.
This is an example of FORWARD$ being used. The NIP gets rid of the CHAR that is still on the stack (the <FINDC$> left it there every time). : <findc$> ( char adr -- char done? ) \ string: a -- c@ over = ; : findc$ ( char -- index | -1 ) \ string: a -- ['] <findc$> forward$ nip drop$ ;
BACKWARD$ ( xt -- index | -1 ) \ string: a -- a Like FORWARD$ except that it traverses the string from back to front.
This is an example of BACKWARD$ being used. Note that -1 is not just a flag indicating that we didn't find a char past the white, but is also the index past the white that we found (we found white all the way to index 0). We add 1 to the index past the white to get the length of the good stuff below the white. : <trim$> ( char-adr -- done? ) c@ nonwhite? ; : rtrim$ ( -- ) \ string: a -- b ['] <trim$> backward$ \ -- index-past-white 1+ \ -- how-many-keepers left$ ;
PREP-MUTATION ( -- ) If FORWARD$ or BACKWARD$ are used to mutate a string, PREP-MUTATION should first be called. If FORWARD$ or BACKWARD$ are just being used to examine the string, then PREP-MUTATION should not be called.
This is an example of PREP-MUTATION being used. Unlike the previous examples of FORWARD$ and BACKWARD$ that just examined the string, in this example we are mutating (modifying) the string, so we need PREP-MUTATION. The FALSE in <UCASE$> indicates that we aren't done, because we always go all the way through. The DROP in UCASE$ gets rid of the -1 that FORWARD$ returns. We could have used either FORWARD$ or BACKWARD$ in UCASE$. : <ucase$> ( char-adr -- ) dup c@ char-upper swap c! false ; : ucase$ ( -- ) \ string: a -- b prep-mutation ['] <ucase$> forward$ drop ;
The above example is how I originally wrote UCASE$, but I have a more efficient version now that uses a DO loop explicitly. The user should write code like this, with BACKWARD$ or FORWARD$ however, rather than use DO loops explicitly because this is the idiomatic way to use the string-stack package even if it is slightly less efficient. As a general rule, the use of a HOF (higher-order function) such as FORWARD$ reduces bugs because explicit iteration is the primary source of bugs in any program. Also, I may later upgrade the string-stack package to be mostly assembly-language. If I do this, then FORWARD$ and BACKWARD$ will be in assembly-language and will be faster than the current Forth versions. In this case, the use of the HOF will be more efficient than the use of explicit DO loops. HOFs are all about information-hiding, which is always a good thing.
Section 2.3.) splitting strings around a delimiter
N$> ( count -- adr len ... ) \ string: z... -- This moves COUNT strings from the string-stack to the data-stack. It just calls $> for as many times a COUNT specified. This is primarily for use in conjunction with SPLITS$ that will be documented later.
<SPLIT$> ( delimiter left right -- split? ) \ string: a -- l r | l This splits the string around the first DELIMITER char that it finds. The LEFT and RIGHT chars are for literal strings inside of the string. If the DELIMITER is inside of a literal string, it does not count as a delimiter. When the strings are split, the literal-string brackets LEFT and RIGHT are removed from the string when the L string is produced, and the delimiter DELIMITER is removed also. The R string is everything beyond the delimiter with nothing removed. The flag SPLIT? indicates if we found a delimiter and split the string, in which case both L and R strings are returned, or if we never found a delimiter in which case only the L string is returned.
This is an example of <SPLIT$> being used in interpretive mode. Here we are calling <SPLIT$> repeatedly until it returns a FALSE to indicate that it couldn't split the string. s" programmer,<Aguilar,Hugh>,50" >$ ok .s$ STRING STACK: unique: |programmer,<Aguilar,Hugh>,50| ok char , char < char > <split$> . -1 ok .s$ STRING STACK: unique: |<Aguilar,Hugh>,50| unique: |programmer| ok char , char < char > <split$> . -1 ok .s$ STRING STACK: unique: |50| unique: |Aguilar,Hugh| unique: |programmer| ok char , char < char > <split$> . 0 ok .s$ STRING STACK: unique: |50| unique: |Aguilar,Hugh| unique: |programmer| ok
SPLIT$ ( -- split? ) \ string: a -- l r | l This is just <SPLIT$> with the delimiter char being the comma and the left and right bracket chars both being the quotation mark. This is the most common format for database dumps into text files.
IS-SPLIT$ ( xt -- ) This sets what SPLIT$ does.
This is an example of IS-SPLIT$ being used. This is, in fact, how in STRING-STACK.4TH we set the default for what SPLIT$ does. : comma-split$ ( -- split? ) \ string: a -- l r | a [char] , [char] " [char] " <split$> ; ' comma-split$ is-split$
SPLITS$ ( -- count ) \ string: a -- x... This cals SPLIT$ repeatedly, splitting the string into some number of strings. It returns COUNT to indicate how many strings were returned.
This is an example of SPLITS$ being used. We used S| rather than S" because we needed to have the " char inside of the string. SPLITS$ returns a 3 to indicate that it split the string into 3 pieces. The user should be aware that the top value of the string-stack is the rightmost piece. This is why, when we did .$ repeatedly, we got the strings printed from rightmost to leftmost. This may seem counter-intuitive to somebody who is not familiar with stacks. s| programmer,"Aguilar,Hugh",50| >$ ok .s$ STRING STACK: unique: |programmer,"Aguilar,Hugh",50| ok splits$ . 3 ok .s$ STRING STACK: unique: |50| unique: |Aguilar,Hugh| unique: |programmer| ok .$ 50 ok .$ Aguilar,Hugh ok .$ programmer ok
This is an example (provided mostly for humor) of working around the supposedly counter-intuitive issue of the string-stack elements printing out backwards from their order in the original string. s| programmer,"Aguilar,Hugh",50| >$ ok reverse$ ok .s$ STRING STACK: unique: |05,"hguH,raliugA",remmargorp| ok splits$ . 3 ok .s$ STRING STACK: unique: |remmargorp| unique: |hguH,raliugA| unique: |05| ok reverse$ .$ programmer ok reverse$ .$ Aguilar,Hugh ok reverse$ .$ 50 ok
Getting serious again, this is an example of splitting a string and storing the fields in a struct. 0 d field .occupation d field .emp-name d field .age constant employee
create me employee allot
s| programmer,"Aguilar,Hugh",50| >$ splits$ n$> me .occupation 2! me .emp-name 2! me .age 2!
The user can also use SPLIT and COMBINE that are in LIST.4TH rather than have multiple strings on either the string-stack or the data-stack. That might be the easiest solution.
We also have these words:
<FAST-SPLIT$> ( delimiter -- split? ) \ string: a -- l r | l)
This is like <SPLIT$> except that it doesn't use the left and right brackets, and it is much faster.
FAST-SPLIT$ ( -- split? ) \ string: a -- l r | l)
This is <FAST-SPLIT$> with a BL delimiter. This is used primarily for splitting up words of text.
FAST-SPLITS$ ( -- count ) \ string: a -- x...
This calls <FAST-SPLITS$> splitting the string into some number of strings. It returns COUNT to indicate how many strings were returned.
Chapter 3.) Maintainers' Guide
This chapter discusses the internal workings of the string-stack code for the benefit of anybody who wants to upgrade the package.
I haven't written this chapter yet and won't until the code has settled and isn't being upgraded anymore. [/code]
|
|
|
|
Добавлено: Чт ноя 29, 2018 18:36 |
|
|
|
|
|
Заголовок сообщения: |
|
|
|
вопрос писал(а): Цитата:мастабируемый счетчик ( 3 вариант в форке ) с максимальной длиной до 2^28-1 байт, чего хватит даже на небольшой фильм ( в смысле не HDTV ).а как это? это не то, что не на всякой системе сразу безглючно собирается ( у меня не работало)
не используйте C@ или B@ над счетчиком строки и будет счастье
не заработать могло по куче причин (надо разбираться).
[quote="вопрос"]Цитата:мастабируемый счетчик ( 3 вариант в форке ) с максимальной длиной до 2^28-1 байт, чего хватит даже на небольшой фильм ( в смысле не HDTV ).а как это? это не то, что не на всякой системе сразу безглючно собирается ( у меня не работало)[/quote]
не используйте C@ или B@ над счетчиком строки и будет счастье :)
не заработать могло по куче причин (надо разбираться).
|
|
|
|
Добавлено: Сб янв 10, 2009 22:44 |
|
|
|
|
|
Заголовок сообщения: |
|
|
|
Цитата: можно взять и любой тип строк себе сделать (просто внешней либой прикрутить, как, к примеру, это сделано в strN.f библиотечках СПФа... Всегда хочеться иметь простую, быструю, удобную и маленькую всегда работающую абстракцию.
себе - скольок угодно, но как другие будут пользоваться? именно в коллективной работе - никакой современный проект один человек не потянет. Цитата: мастабируемый счетчик ( 3 вариант в форке ) с максимальной длиной до 2^28-1 байт, чего хватит даже на небольшой фильм ( в смысле не HDTV ). а как это? это не то, что не на всякой системе сразу безглючно собирается ( у меня не работало)
[quote]можно взять и любой тип строк себе сделать (просто внешней либой прикрутить, как, к примеру, это сделано в strN.f библиотечках СПФа... Всегда хочеться иметь простую, быструю, удобную и маленькую всегда работающую абстракцию. [/quote]себе - скольок угодно, но как другие будут пользоваться? именно в коллективной работе - никакой современный проект один человек не потянет. [quote]мастабируемый счетчик ( 3 вариант в форке ) с максимальной длиной до 2^28-1 байт, чего хватит даже на небольшой фильм ( в смысле не HDTV ).[/quote]а как это? это не то, что не на всякой системе сразу безглючно собирается ( у меня не работало)
|
|
|
|
Добавлено: Сб янв 10, 2009 22:37 |
|
|
|
|
|
Заголовок сообщения: |
|
|
|
мндя, а ведь разговору больше, чем дела
можно взять и любой тип строк себе сделать (просто внешней либой прикрутить, как, к примеру, это сделано в strN.f библиотечках СПФа...
Всегда хочеться иметь простую, быструю, удобную и маленькую всегда работающую абстракцию.
Но в реальности приходится чем-то жертвовать.
Ну, и кроме того, не стоит относиться к строкам, как к чему-то типизированному, мне кажется, а скоре как к контейнеру, поэтому я лично предпочитаю мастабируемый счетчик ( 3 вариант в форке ) с максимальной длиной до 2^28-1 байт, чего хватит даже на небольшой фильм ( в смысле не HDTV ).
мндя, а ведь разговору больше, чем дела :)
можно взять и любой тип строк себе сделать (просто внешней либой прикрутить, как, к примеру, это сделано в strN.f библиотечках СПФа...
Всегда хочеться иметь простую, быструю, удобную и маленькую всегда работающую абстракцию.
Но в реальности приходится чем-то жертвовать.
Ну, и кроме того, не стоит относиться к строкам, как к чему-то типизированному, мне кажется, а скоре как к контейнеру, поэтому я лично предпочитаю мастабируемый счетчик ( 3 вариант в форке ) с максимальной длиной до 2^28-1 байт, чего хватит даже на небольшой фильм ( в смысле не HDTV ).
|
|
|
|
Добавлено: Сб янв 10, 2009 22:14 |
|
|
|
|