-
Notifications
You must be signed in to change notification settings - Fork 0
/
slides_outline.txt
243 lines (192 loc) · 8.46 KB
/
slides_outline.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
# NOTES:
# o Maybe while you have a fully functioning CAIS, it's not
# necessarily a robust CAIS? Make this clear?
#+TITLE: Defense Slidedeck Outline
* Title Slide:
+ Show logos for:
+ RAIR
+ CISL
+ RPI
+ IBM
+ AFOSR
+ ONR
"Thank you to IBM for their support of CISL and ONR & AFOSR separately for
making this research possible."
# MATT: Technically, you did receive /some/support from ONR & AFOSSR,
# and need to mention that too. Thx. //S
* Roadmap
+ Introduction, Background, and Motivation
+ Technology for Cognitive and Immersive Systems
+ Reagent: Enabling Interaction with Arbitrary Web Content
+ MUIFOLD: Mobile UI Framework for Operating Large Displays
+ Formalizing the CAIS
+ Planning and Plan Recognition
+ Closing remarks
* Introduction
** What is a CAIS?
+ A CAIS should broadly be able to fulfill the following two requirements:
+ C Cognitive
+ I Immersive
** Motivating Example of Alvin, Betty, and Charlie
** Requirements for a CAIS implementation
Figure: =chapters/01_introduction/figures/planning_cais_framework.png=
+ The requirements of a CAIS must be able to support our earlier use-case of Alvin, Bill, and Charlie
+ Must be multi-user and multi-modal
+ Optimally be able to handle:
+ speech input (done here)
+ gestural input (done here)
+ perception input (done here, caveat placed on headpose vs just being "in the CAIS")
+ Should be able to operate on a generalizable standpoint, making adoption to others easier
+ Hammer that while it is fine to define things theoretically, we wish to provide here an actual
physical implementation to drive our theoretical underpinnings, to prevent us from giving
a theoretical system that is easily far into the future, and that we don't trap ourselves
there.
** Contributions
+ Create a definition for what constitutes a cognitive and immersive system
+ Define and provide implementation of a generalizable framework for a CAIS
+ As stated above, this implementation does not necessarily hit the absolute limits of robustness,
it can be used to accomplish all of the theoretical underpinnings with some slight caveats
+ Additionally, as part of this work, we provide this CAIS as open-source so as to help see
its adoption and secure its longevity. A lot of prior work appears to generate a paper or two,
get some buzz, and then disappear, all the while no technology is provided for future endeavors
+ Create formalization for CAIS such that it can operate at theory of mind level
+ Utilize items 2 and 3 for purposes of planning and plan recognition
** Remarks on Colocated vs Distributed
+ Why are they different?
+ Why can we not just pivot?
* Technology
** Prior Work on intelligent rooms
+ =bolt_put-that-there:_1980=
+ =coen_building_1997=
+ =brooks_intelligent_1997=
(Gap due to CALO project)
+ =farrell_symbiotic_2016=
+ =chakraborti_mr._2017=
+ =arai_cira:_2019=
** Architecture for a CAIS
Figure: =chapters/02_technology/figures/cais_high_level.png=
+ We have workers that are at a sensor level (e.g. the inputs of the "brain")
+ At the next several levels, the "raw" input is parsed into semantic information as well as can be
combined in any way
+ support for early vs late fusion common in ML multi-modal approaches
+ we aim for "semantic late fusion"
+ The results of the input are combined as necessary for resolution of the execution engine
** Implementing a CAIS
Figure: =chapters/02_technology/figures/cais_implementation.png=
+ Provide here open-source implementation of modules "Bishop CAIS" (https://github.com/bishopcais)
+ Briefly discuss these modules in minor detail:
+ transcript-worker
+ speaker-worker
+ executor
+ conversation-worker
+ occupancy-tracker
** Implementing a CAIS: display-worker
+ Electron based application which allows us to open webpages on grid system
+ Webpages can point to anywhere, and that through electron, we can reach into open webpages
+ Surfaces RabbitMQ / REST API to pull details out about open webpages (e.g. location on screen)
+ How do we make sense though of what's on the screen?
* Reagent
** Reagent
+ Motivation:
+ To make possible understanding pointing, we need capacity to understand what is on the screen
+ Prior work demanded that all content be created in-house and be specially instrumented
(e.g. chakraborti_visualizations_2018)
+ Doable, but of course that would mean our CAIS is probably not going to see real world usage
** Reagent Architecture
Figure: =chapters/03_reagent/figures/reagent.png=
+ Utilize pre-loading of small JS file on-top of any webview loaded by display-worker
+ Creates a bridge into the page to the Reagent server
+ Creates bindings into the page to detect interaction data, as well as hooks to allow us
to query information from the page
+ MutationObserver attached to the page alerting us of any changes to the DOM
** Reagent: Sample ontology use-case
+ Given page of ESPN, talk about how Reagent exposes the table on the page to users
+ Allow querying information about of the table using table names, as well as gestural information
+ Walk through building ontology
** Regaent: Experiment
+ Walkthrough experiment on:
+ Selecting pages
+ Composing our evaluation set
+ Metrics we measure against
** Reagent: Experiment Results
+ Show table
** Reagent: Summary
+ We can use this to understand screen content on the fly
+ Can build out a number of general extensions for a number of generic pages (e.g. tables)
+ Can build out specific extensions for custom content (such as we do below for our use-cases)
* MUIFOLD
** Motivation
+ While there exists prior work in pointing in large spaces, they almost always
utilize usage range of sensors that are costly to set-up and use
+ Additionally, these solutions had issues of allowed movement zones, Additional
hardware, etc.
** Prior Work
+ hutchison_pointerphone:_2013
+ pietroszek_smartcasting:_2014
+ abascal_ucanvas_2015
+ weisker_massive_2016
+ baldauf_your_2016
+ barsotti_web_2017
** Introduction MUIFOLD
+ Framework for building out UI for interacting with CAIS using cellphones
+ Almost everyone has a smart-phone that features powerful accelerometer and gyroscope
+ Does not utilize external sensors or landmarks
+ Allows building out also UIs that are more friendly to a use-case
** MUIFOLD: Sample Use-Case
+ Demonstrate MUIFOLD in intelligent analyst use-case
** MUIFOLD: User Study
+ Walk through experiment
+ Walk through results
** MUIFOLD: Summary
+ Technology works, provides general pointing device we can use in CAIS minimally
+ Maximally provides also alternative input schemes to content beyond just natural
language sentences and gestures
* Formalizing the CAIS
** Introduce CEC
+ Provide brief description
+ Intensional logic so allows for modeling not just B, K of a user, but also that
can be B of user of B of another user (or self).
** Formalizing our requirements
+ Walk through one at a time per slide (C1, C2, I1, I2, I3):
+ Informal textual description of principal
+ Formalization
** Formalizing the CAIS technology
+ Users within CAIS
+ Perception and Vicinity
+ Pointing
+ Spoken / Typed Natural Language and MUIFOLD Input
** Formalization of sample use-cases
+ Blocks world
+ Sticky-notes
** Solving the False-Belief Task with our Formalization
* Planning / Plan Recognition
** Why?
+ Given now that we have both a real-world implementation and formalization,
can now attack issues of planning and plan recognition in these spaces
** Define Planning (give example)
** Define Plan Recognition
** Define STRIPS-style Action Planners
** Discuss Spectra
** Intent Resolution (2+ slides)
+ Given sentence "Delete the green note"
+ Walk through scenario of resolving it
+ intent: delete, entities: ["green"]
+ match against possible action candidate
+ gain additional information what we're looking for (note)
+ combine with given entities
+ see if we have a matching note that is in agents' beliefs (from perception)
+ if not, launch into dialogue for resolution
** Plan Recognition
+ Given above false-belief scenario, have user continue to operate in it
+ Have user make an action consistent with an incorrect goal state
+ Provide course correction, utilizing the observations of users' action
sequence
* Conclusion
** Outline objectives of dissertation, met
** Identify future work
+ Extensions to CEC
+ Strength Factors
+ Emotional Theory
+ Handling privacy and ethical issues in the space
+ Extensions to technology
+ User studies