forked from google/carfac
-
Notifications
You must be signed in to change notification settings - Fork 0
/
CARFAC_Design_Doc.txt
140 lines (107 loc) · 6.1 KB
/
CARFAC_Design_Doc.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
CARFAC Design Doc
by "Richard F. Lyon" <[email protected]>
updated 24 May 2012 (v.237)
The CAR-FAC (cascade of asymmetric resonators with fast-acting
compression) is a cochlear model implemented as an efficient sound
processor, for mono, stereo, or multi-channel sound inputs. This file
describes aspects of the software design.
The implementation will include equivalent Matlab, C++, Python, and
perhaps other versions. They should all be based on the same set of
classes (or structs) and functions, with roughly equivalent
functionality and names where possible. The examples here are Matlab,
in which the structs are typeless, but in other languages each will be
class or type.
The top-level class is CARFAC. A CARFAC object knows how to design
its details from a modest set of parameters, and knows how to process
sound signals to produce "neural activity patterns" (NAPs). It
includes various sub-objects representing the parameters, designs, and
states of different parts of the CAR-FAC model.
The CARFAC class includes a vector of "ear" objects -- one for mono,
two for stereo, or more.
The three main subsystems of the EAR are the cascade of asymmetric
resonators (CAR), the inner hair cell (IHC), and the automatic gain
control (AGC). These are not intended to work independently, but
each part has three groups of data associated with it. The CARFAC
stores instances of the "params" that drive the design, and the
"coeffs" and "state" are stored per ear:
CAR_params, CAR_coeffs, CAR_state
IHC_params, IHC_coeffs, IHC_state
AGC_params, AGC_coeffs, AGC_state
These names can be used both for the classes, and slightly modified
for the member variables, arguments, temps, etc.
The params are inputs that specify the system; the coeffs are things
like filter coefficients, things computed once and used at run time;
the state is whatever internal state is needed between running the
model on samples or segments of sound waveforms, such as the state
variables of the digital filters.
At construction ("design") time, params are provided, and ears with coeffs
are created. The created CARFAC stores the params that it was designed
from; there is no state yet:
CF = CARFAC_Design(CAR_params, IHC_params, AGC_params)
On "channels" and "ears":
Stages of the cascade are called "channels", and there are n_ch of
them (n_ch is determined from other parameters, but we could also make
a function to pick params to get a desired n_ch). Since we already
used "channels" for that, sound input channels (one for monaural, two
for binaural, or more) are called "ears", and there are n_ears of them.
The parameter n_ears is set when state is initialized, but
not earlier at design time; it is not in params since it doesn't
affect coefficient design).
Multi-ear designs usually have the same coeffs across several ears, but
always need separate state.
The coeffs are kept separate per ear in case someone wants to
modify one or both to simulate asymmetric hearing loss or such.
The only place the ears interact is in the AGC. The function that
closes the AGC loop can "mix" their states, making an inter-ear
coupling. Other than that, the functions that update the CAR and IHC
and AGC states are simple single-ear functions (methods of the ear
class?).
Data size and performance:
The coeffs and states are several kilobytes each, since they store a
handful (10 or so) of floating-point values (at 4 or 8 bytes each) for
each channel (typically 60 to 100 channels); that 240 to 800 bytes per
coefficient and per state variable. That's the entire data memory
footprint; most of it is accessed at every sample time (hopefully it
will all fit and have good hit rate in a typical 32 KB L1 d-cache).
In Matlab we use doubles (8-byte), but in C++ we intend to use floats
(4-byte) and SSE (via Eigen) for higher performance. Alternate
implementations are OK.
Run-time strategy:
To support real-time applications, sound can be processed in short
segments, producing segments of NAPs:
[NAPs, CF] = CARFAC_Run_Segment(CF, input_waves)
Here the plurals ("NAPs" and "input_waves") suggest multiple ears.
These can be vectors, with length 1 for mono, or Matlab
multi-D arrays with a singleton last dimension.
The CF's states are updated such that there will be no glitch when a next
segment is processed. Segments are of arbitrary positive-integer
length, not necessarily equal. It is not inefficient to use a segment
length of 1 sample if results are needed with very low latency.
Internally, CARFAC_Run updates each part of the state, one sample at a
time. First the CAR, IHC, and AGC are updated ("stepped") for all ears:
[car_out, CAR_state] = CARFAC_CAR_Step(sample, CAR_coeffs, CAR_state)
[ihc_out, IHC_state] = CARFAC_IHC_Step(car_out, IHC_coeffs, IHC_state)
[AGC_state, updated] = CARFAC_AGC_Step(ihc_out, AGC_coeffs, AGC_state)
The AGC filter mostly runs at a lower sample rate (by a factor of 8 by
default). Usually it just accumulates its input and returns quickly. The
boolean "updated" indicates whether the AGC actually did some work and
has a new output that needs to be used to "close the loop" and modify
the CAR state. When it's true, there's one more step, involving both AGC
and CAR states, so it's a function on the CARFAC:
CF = CARFAC_Close_AGC_Loop(CF)
In Matlab, these functions return a modified copy of the state or
CARFAC; that's why we have the multiple returned values. In languages
that do more than just call-by-value, a reference will be passed so
the state can be modified in place instead.
C++ Eigen strategy:
The mapping from Matlab's "parallel" operations on vectors of values
into C++ code will be done by using Eigen
(http://eigen.tuxfamily.org/) arrays, which support similar
source-level parallel arithmetic, so we don't have to write loops over
channels. Eigen also allows fairly efficient compilation to SSE or
Arm/Neon instructions, which can do four float operations per cycle.
So we should be able to easily get to efficient code on various
platforms.
If there are similar strategies available in other languages, we should
use them. Python may have a route to using Eigen
(http://eigen.tuxfamily.org/index.php?title=PythonInterop).