-
Notifications
You must be signed in to change notification settings - Fork 1
/
help.txt
315 lines (210 loc) · 10.8 KB
/
help.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
Red []
{Practice-Split: Hone Your Skill and Whet Your Appetite
===Introduction: Part 1
This app is an experiment. Instructions are intentionally vague at
times, to see what you try, naturally, without looking at a detailed
spec. Why? Because people tend to ignore documentation and do just
that, try things until they get it right. That does *not* mean we'll
take what people try and make that the design. They are clues from
which we can look for patterns.
This leads to the question of "How do we know what people try?". The
answer is that this app collects data you can share, so we can learn
from it. Because people don't like sharing data, especially if they
think it makes them look bad (note: anything you try is valid, not
wrong), but they love a challenge. So we'll make it a game. Run
through a series of tasks and the app records not only what you tried,
but how long it took you. You can do it more than once, so we can also
see how people learn and retain that knowledge.
Have fun, and please share what you learn.
Game on!
===Introduction: Part 2
There's more than one way to split a cat, or an input. Sometimes you
need to split at every occurrence of a delimiter, or the first, or
last; other times at a fixed position, or into many parts. Your input
may be a string or a block of values. Maybe you need to match a parse
rule or partition values based on a predicate test.
How many of those methods and their permutations can a single `split`
function support cleanly? Will it get progressively harder to use and
understand, or could it actually make things clearer and easier? Not
just for the writer, but especially for the readers who come after. Is
it better to have single-purpose functions for each behavior, or a
single entry point that codenses everything? Are we willing to include
extra code in Red's runtime, though any one user may only need a subset
of features, because in-the-large we all win when this very common task
is normalized across all Red code? Where do we draw the line? Those are
some of the questions we hope to answer with this app.
Another open question is how we provide documentation, help, and ways
to explore more heavily loaded interfaces (dialects and dialected
functions). Programmers are used to reading docs for simple functions,
what parameters and types they take, and the result they return. Tools
can also provide auto-complete and Intellisense™, but the closest thing
to that feature for languages are low code and block programming
environments. Dialects fall somewhere in between parameterized functions
and languages that are large, general, and wide in scope.
===Introduction: Part 3
The industry also hasn't come up with a definitive answer about how best
to "learn programming". A CS degree? Coding bootcamp? Disciplined self-
training and <Coding Katas>(http://codekata.com/)? Many agree that
software development is an art and craft as much as it is a science. You
learn best by doing, starting with simple exercises, building skills,
and being able to verify your results and understanding. It's called
practice.
<Live Coding>(https://en.wikipedia.org/wiki/Live_coding) and REPLs play a
prominent role here. But just as dialects and DSLs are focused versions of
languages, what is the equivalent for live coding? Tools like
<Jupyter>(https://jupyter.org/) and <Wolfram Notebooks>(http://www.wolfram.com/notebooks/)
are one approach.
But the "experiment at a time" approach is different than exploration.
How do we explore a new area? Other Red \*-lab tools show a way. Give the
user a tool to poke around and see results, like live coding, but in a
more constrained, domain specific way.
===Terminology
*Delimiter:* See Separator.
*Dialect:* A domain specific sub-language in the context of a more general language.
*Keyword:* A word with special meaning in a language or dialect.
*Part:* A piece, chunk, segment, slice, or region of a Red series.
*Position:* An offset or index in a Red series. Indexes start at 1, offsets start at 0.
*Separator:* A value specifying the boundary between separate, independent regions.
===The SPLIT Interface
USAGE:
SPLIT series dlm
DESCRIPTION:
Split a series into parts, by delimiter, size, number, function, type,
or advanced rules.
SPLIT is a function! value.
ARGUMENTS:
series [series!] "The series to split."
dlm {Dialected rule (block), part size (integer), predicate (function),
or separator.}
That's all you get right now. Why? Because this tool is about exploration,
seeing what people try when they /don't/ know what to do, what results they
expect, and how many things "just work".
It's important to note that you may need to use `reduce` or `compose`
when writing your rule.
===Keywords
OK, you win, here are some hints when using a dialected block as your
delimiting rule. Note that *first* is listed twice, but only to show
that its context matters.
every once into parts as-delim
first last
at before after
times
first by then
What happens if your rule isn't recognized as a *split* dialect input?
It's treated as a *parse* rule. Given that, does *split* still have
value? You bet it does, just as *collect* hides details, so does *split*,
and it makes your intent clear, which is the ultimate goal.
===Types of Splitting
<By Delimiter>(#Split By Delimiter)
<At an Offset/Index/Position>(#Split at a Position)
<Into Equal Parts>(#Split Into Equal Parts)
<Into N Parts>(#Split Into N Parts)
<Into Uneven Parts>(#Split Into Uneven Parts)
<Up To N Times>(#Split Up To N Times)
<By Test Predicate(s)>(#Split By Test Predicates)
<Using Advanced Rules>(#Split Using Advanced Rules)
===Split By Delimiter
Delimited data is a well known format, using a <delimiter>(https://en.wikipedia.org/wiki/Delimiter)
to indicate where one thing stops and the next thing starts.
This is the most widely used form of splitting. And many languages have
a *split* function. They also almost all follow a fixed pattern with
regard to their spec: *split \[string delimiter opt limit\]* I have a
hard time believing that the *limit* feature is /so/ widely useful that
it's there on merit. Everybody just copied what came before. Or they
say to use regexes. Some langs also added a refinement to omit empty
results.
What this model lacks is /any other/ useful feature, like splitting at
only the first or last occurrence of the separator. Or /including/ the
separator in the result. For example, you want to split /after/ *:*,
keeping it with the left part; or split /before/ uppercase characters,
keeping them with the right part. Rust adds some variants, with the
result of having 13 different *split\** functions.
Most telling, for Red, is that other langs only consider strings as
the input and characters (sometimes strings), as the separator. This
makes sense, but is <short-sighted>(#FutureThought) for Red.
===Split Once
Split just once, giving two parts. The data may be marked by a
delimiting value, or by the size of the either the first or last
part.
===Split Into Equal Parts
Use cases:
*Splitting a large file:* Each part should be no more than N bytes or
may be a dataset where each record is a fixed size. Of course this use
case has limits, and a stream based approach, reading and writing each
part in sequence should be used for very large files.
*Breaking work up:* When data has the potential to cause processing
errors, it can be helpful to work on smaller parts of the data, both
for error handling and user interactions. This is especially true when
cumulative errors or resource constraints come into play.
===Split Into N Parts
Use cases:
*Distributing work:* You have N workers and want to give each a part of
the data to work on.
*Secure Fragments:* Each part is useless, and can't be decrypted, without
all the others.
===Split Into Uneven Parts
*Small Fixed Values:* _YYYYMMDD_ into _\[YYYY MM DD\]_, or
_Mon, 24 Nov 1997_ into \["Mon" "24" "Nov" "1997"\].
*Tabular Data:* The much-maligned, but still useful, flat file.
*Schema Based Data:* A schema\/header tells you the size of each part in
a payload.
*Historical Data:* Legacy formats are often based on fixed size fields.
===Split Up To N Times
When you want to split more than once, but less than every time. This
can be useful if you want to check individual parts, and not split
any more than necessary once you find what you want. Or when there's
more than one "header" part but a trailing payload that may contain
separators.
===Split by Test Predicates
Partition data into groups based on one or more tests.
*Data Analysis:* How many of a particular type of data occur, relative
to others? What are the min\/max or average values for all the numeric
items (first you have to find them)?
*Specialize Processing:* Break data up into groups of small and large
values, or by type, so you can send each to a specific handler.
===Split Using Advanced Rules
Think *parse+collect*.
===UI: Goal, Input, Rule, Output
There are three parts to the UI. At the top is your goal and the input
for the task. Below that is a field where you write your *split* rule.
When you press *Enter\/F5* or click the Split button, the input is split
using your rule and the result is displayed below the field. You also
get an indicator showing whether you matched the goal. At the bottom
you can write notes, rate, and Like the task, to help us learn what
things look like to users. Once you get a green light, move on to the
next task.
Because you only have a field to work with, it may take a few tries
to understand that it's loading your rule, and doing some special
handling if you use *compose* or *reduce*. We'll improve this in
future versions I'm sure.
IMPORTANT NOTE: Do /not/ type "split" or "input" in the rule field.
Only the rule itself. e.g. if you want to use a comma as the delimiting
rule, just type *#","* or *comma*.
Save your work. We'll automate this in the future.
===Submitting Your Results
===Menus and keys
*F1:* Help
*F5:* Split
*Ctrl+Up:* Like
*Ctrl+Down:* Don't Like
*F6 or PgUp:* Previous Task
*F8 or PgDn:* Next Task
*Enter:* Split, if in Rule field
===FutureThought
Splitting is easy. Parsing is hard. For people coming from other
languages, *parse* is too much to ask if they just want to process
some data.
Keyword oriented languages and data. Think of a dialect like *draw*
where keywords play a prominent role, for each item to draw, followed
by parameters for it. We can design data in a similar way, making it
variable length in content, but easy to understand by looking for
markers.
New ways to think about data and how to leverage Red's datatypes.
Again, *parse* may be more than people want to deal with, and more
than they need, but need something more flexible than flat blocks
or key-value structures.
===About
Practice-Split is an experimental tool for the evaluation of the *split*
function design and more general training tools for the
<Red Language>(https://www.red-lang.org/).
}