New streaming backend #135

flashmob · 2019-02-21T03:43:40Z

Problem

The current 'backend' processes the emails by buffering the entire message in memory, then applying each Processor using the decorator pattern. This approach works OK - most emails are never over 1MB anyway, and the buffers are recycled after processing, keeping them allocated for the next message, (being nice the garbage collector). However, some things can be done more efficiently - for example the message could be compressed while it's being saved on-the-fly. This is the main idea of 'streaming'.

Solution

The go Go way of streaming is to use the io.Writer and io.Reader interfaces. What if we could use the current decorator pattern that we use for the backends, extend that by making the processors implement the io.Writer interface? 🤔

Yes, we can do just that. A little bit of background first: Did you know that the io.Writer way is usually a decorator pattern? When we make an instance of a writer, we usually pass some underlying writer to it, allowing us to wire multiple writers together. Some people call this pattern 'chaining'

Normally, when using io.Writer, if you would like to create a chain, you need to manually wire them with a few lines of code. This solution takes it further, by allowing you to wire the Writers by configuration.

Technical Details

Each Writer is an instance of a StreamDecorator, it's a struct that implements io.Writer. Additionally, the struct contains two callback functions Open and Close, both could be set when the StreamDecorator is being initialized, and called back at the start and end of the stream. The Open callback is also used to pass down the *mail.Envelope which can be used to keep the state for the email currently being processed.

type StreamDecorator struct {
	p     func(StreamProcessor) StreamProcessor
	e     *mail.Envelope
	Close streamCloseWith
	Open  streamOpenWith
}

type streamOpenWith func(e *mail.Envelope) error

type streamCloseWith func() error

in gateway.go there's a new newStreamStack method that instantiates the StreamDecorator structs and wires them up.

A new method was added to the Backend interface

ProcessStream(r io.Reader, e *mail.Envelope) (Result, error)

A new configuration option was also added to the config file: stream_save_process.
The value is a string with the names of each StreamDecorator to chain, delimited by a pipe |.

This is how the io.Reader is passed from the DATA command down to the backend. The ProcessStream method calls all the Open methods on our writers, and then begins the streaming of data using io.Copy. At the end of the stream, it calls Close() on our decorators in the other they were wired.

Examples

Perhaps the best way to understand this is to look at some example code.

There are 3 examples of StreamDecorator implementations in the backends dir:

s_header.go - adds a 'delivery header' to the front of the stream
s_compress.go - uses zlib to compress the stream
s_decompress.go - uses zlib to decompress the stream
s_debug - logs the stream

You will notice that each of the files contain a function that looks just like the io.Writer interface,
without the Write keyword. I.e StreamProcessWith(func(p []byte) (int, error)
This is an anonymous function which is converted to an io.Writerwhen it is returned. Here is the code
of s_debug.go

func StreamDebug() *StreamDecorator {
	sd := &StreamDecorator{}
	sd.p =
		func(sp StreamProcessor) StreamProcessor {
			sd.Open = func(e *mail.Envelope) error {
				return nil
			}
			return StreamProcessWith(func(p []byte) (int, error) {
				fmt.Println(string(p))
				Log().WithField("p", string(p)).Info("Debug stream")
				return sp.Write(p)
			})
		}
	return sd
}

The most important detail here is that the sp identifier refers to the next io.Writer in the chain.
In other words, sp contains a reference to the underlying writer.

(The sd.Open statement does nothing, it's just there here as an example / to be used as a template.)

In the api_test.go file, there is a test called `TestStreamProcessor'. The writers are chained with the
following config setting:

"stream_save_process": "Header|compress|Decompress|debug"

Which means it will call the Write method on the Header first, and then down to each underlying writer in the stream.

Todo:

Configurable stream buffer size & ability to recycle it.
Write more advanced processors (parse headers, streaming version of the maildir processor, sql database, save in chunks & deduplicate on-the-fly)
Fuzz testing
Test in production

fix tests

…_parser.go) - use a 4KB buffer to process the stream (io.CopyBuffer instead of io.Copy

- Add a new 'process' stream decorator

More comments.

- content type more flexible to white-space and missing charset param

- fix test

- if email header line has a parse error then ignore it then continue parsing

stream processor decorators have a new Shutdown function, switched to use this instead of Svc start developing background processing

…spatcher is started for each processor type - new ValidatingProcessor type - removed "gw" prefix form gateway config options

…t-processing

background processing: borrow a new envelope and copy the existing protocol params to it.

gatweay: workerID is unique

flashmob · 2020-07-14T11:29:09Z

4 done

… the concrete value

backend processing debugging & added test

…ore hashing it

flashmob · 2020-07-20T03:53:17Z

Still have a lot of small problems, but once that's fixed then real testing can start!

- update some log messages to use structured logging - update tests to be aware of the new log format

…f data read

envelope: changed queuedID to packed bytes and added a fmt.Stringer

flashmob · 2020-08-03T05:50:05Z

mysql driver taking shape.

envelopoe: added mime parse error to enveloper fixed tests

make ChunkPrefetchCount & ChunkMaxBytes configurable update comments to parseInfo

chunkPrefetchCountDefault configurable (add chunkPrefetchMax)

kushsharma · 2022-09-17T17:49:28Z

@flashmob Are you still planning to finish this PR? I see you have invested quite some time but then the motivation died down? 😄
We need you back.

flashmob added 3 commits February 19, 2019 10:21

wip

cf7c68c

wip

0b7ccf8

add streams based processing, implementing io.Writer

75a1230

flashmob mentioned this pull request Feb 21, 2019

Streaming data #84

Open

flashmob added 3 commits February 22, 2019 00:56

remove unnecessary functions

f35b5cd

fix tests

- add a stream processor that can parse headers on-the-fly (s_headers…

1367b5d

…_parser.go) - use a 4KB buffer to process the stream (io.CopyBuffer instead of io.Copy

- and visible line feed to debug message <LF?

2faf37a

- Add a new 'process' stream decorator

flashmob mentioned this pull request Mar 6, 2019

Kinesis Firehose Processor #138

Closed

flashmob added 7 commits March 16, 2019 08:23

new mime reader wip

404f0f4

boundary detection tests

8e6ecb6

milestone: tree structure correct

0ec041d

milestone - counts are correct

6db20b9

make the max header bytes limit configurable

9de6e5d

end of boundary returned as error

b715b24

fixed boundary ending position marker

ceebf82

flashmob mentioned this pull request May 7, 2019

Feature Request: Ability to return custom responses in validation process #151

Open

flashmob added 14 commits May 24, 2019 13:19

move mime parser to its own package

3375b86

begin work on chunk saver

5ca3365

it gives the correct result when buffer is 256 bytes

ebac03b

bug fixes

e3f1d0a

don't consider "\n" part of boundary

03ad4ac

stringify ContentType fully.

e57359f

More comments.

- do not expose sync.Mutex

fb8dcd6

- content type more flexible to white-space and missing charset param

- empty content-type param parsing

61481f1

- fix test

- parse emails with \r\n line endings (ignore \r)

b541fd4

- if email header line has a parse error then ignore it then continue parsing

forgot to incr p.msgPos

e779d0d

dont need line counts

e9b44d6

change content type params to slice instead of map

2adaadd

new open method, cleanup, normalization of header names

e671984

dont report notMime error unless there are 0 parts

ff26a3b

flashmob added 4 commits July 7, 2020 01:11

storage engines follow the configuration pattern

a3f6d7c

stream processor decorators have a new Shutdown function, switched to use this instead of Svc start developing background processing

comment

982a6c1

- refactor starting workers and workDispatcher so now a unique workDi…

89947c4

…spatcher is started for each processor type - new ValidatingProcessor type - removed "gw" prefix form gateway config options

- further refactoring to workDispatcher and developing background pos…

304ecca

…t-processing

flashmob mentioned this pull request Jul 12, 2020

Auth support #92

Open

flashmob added 2 commits July 14, 2020 11:43

envelope: use WaitGroup instead of lock

5610107

background processing: borrow a new envelope and copy the existing protocol params to it.

gateway: each stream processor has its own buffer

945bb6d

gatweay: workerID is unique

flashmob added 7 commits July 14, 2020 23:57

try using reflection

2ebd1b6

start gateway workers: incorrectly passed reflection value instead of…

59bf8f7

… the concrete value

configuration: new backend gateway config values & defaults

9e5bd99

backend processing debugging & added test

this bug won't be stored for a 1000 years in the arctic

49345fc

use serverID instead of interface string for generating queuedID

8f860d1

queuedID packs the seeds (3 uint64) in to a single 192 bit number bef…

b103deb

…ore hashing it

eliminate allocations

55c0609

flashmob added 7 commits July 21, 2020 15:59

- backend config struct uses typed sections

40185d0

- update some log messages to use structured logging - update tests to be aware of the new log format

more structured logging

7f22655

add logging from the dashboard branch

7a50712

logging: log more details in the DATA command, including the length o…

9f9bf90

…f data read

refactoring: protocol types gets a formal type & other small renames

210353d

sql store: reading and writing emails to an sql db

0333b75

envelope: changed queuedID to packed bytes and added a fmt.Stringer

sql store: mysql driver basic functionality

5cb7bcc

flashmob added 4 commits August 5, 2020 00:31

mimeparse: refactored errors, added custom error types

a6244bc

envelopoe: added mime parse error to enveloper fixed tests

add mime parse error

076e0aa

make ChunkPrefetchCount & ChunkMaxBytes configurable update comments to parseInfo

- prepare queries (without the map)

f9053d7

chunkPrefetchCountDefault configurable (add chunkPrefetchMax)

- sql store zlib compression support

a05030a

flashmob mentioned this pull request Oct 15, 2020

proposal: refactor config to allow nested items #217

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New streaming backend #135

New streaming backend #135

flashmob commented Feb 21, 2019 •

edited

Loading

flashmob commented Jul 14, 2020

flashmob commented Jul 20, 2020

flashmob commented Aug 3, 2020

kushsharma commented Sep 17, 2022

New streaming backend #135

Are you sure you want to change the base?

New streaming backend #135

Conversation

flashmob commented Feb 21, 2019 • edited Loading

Problem

Solution

Technical Details

Examples

Todo:

flashmob commented Jul 14, 2020

flashmob commented Jul 20, 2020

flashmob commented Aug 3, 2020

kushsharma commented Sep 17, 2022

flashmob commented Feb 21, 2019 •

edited

Loading