Jekyll2019-05-23T23:23:23+00:00http://erikerlandson.github.io/feed.xmltool monkeyadventures of an unfrozen caveman programmerUnit Types for Avro Schema: Integrating Avro with Coulomb2019-05-23T00:18:00+00:002019-05-23T00:18:00+00:00http://erikerlandson.github.io/blog/2019/05/23/unit-types-for-avro-schema-integrating-avro-with-coulomb<p>In a
<a href="http://erikerlandson.github.io/blog/2019/05/09/preventing-configuration-errors-with-unit-types/">previous post</a>
I showed how software configuration errors could be prevented by supporting values with unit types.
Configuration systems are an important use case for unit types, but they are far from the only one.
In this post I will show a similar integration of the <a href="https://github.com/erikerlandson/coulomb">coulomb</a>
project with <a href="https://avro.apache.org/">Apache Avro</a> schema.</p>
<p>The Avro data seralization library is a useful integration point for
<a href="https://github.com/erikerlandson/coulomb#quantity-and-unit-expressions">coulomb unit types</a>.
Avro serialization is schema-driven, and supports user supplied metadata, which allows unit type information
to be added to a schema.
Since the schema is decoupled from the data, the unit type information does not add to the cost of the
actual data, only the schema.
Even more importantly, Avro itself is used in a variety of other ecosystem projects, for example
<a href="https://kafka.apache.org/">Apache Kafka</a>.</p>
<p>The following examples are based on the <code class="highlighter-rouge">coulomb-avro</code> package.
You can learn more about how to use this project
<a href="https://github.com/erikerlandson/coulomb#how-to-include-coulomb-in-your-project">here</a>
and
<a href="https://erikerlandson.github.io/coulomb/latest/api/coulomb/avro/package$$EnhanceGenericRecord.html">here</a>.</p>
<p>Consider this small Avro schema:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{
"type": "record",
"name": "smol",
"fields": [
{ "name": "latency", "type": "double", "unit": "second" },
{ "name": "bandwidth", "type": "double", "unit": "gigabyte / second" }
]
}
</code></pre></div></div>
<p>As you can see, the fields in this schema have been augmented with a <code class="highlighter-rouge">"unit"</code> metadata field,
that contains a unit expression.</p>
<p>What can we do with this additional metadata?
The following example begins to demonstrate how the <code class="highlighter-rouge">"unit"</code> information is used by <code class="highlighter-rouge">avro-coulomb</code>:</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">scala</span><span class="o">></span> <span class="k">val</span> <span class="n">schema</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">Schema</span><span class="o">.</span><span class="nc">Parser</span><span class="o">().</span><span class="n">parse</span><span class="o">(</span><span class="k">new</span> <span class="n">java</span><span class="o">.</span><span class="n">io</span><span class="o">.</span><span class="nc">File</span><span class="o">(</span><span class="s">"smol.avsc"</span><span class="o">))</span>
<span class="n">schema</span><span class="k">:</span> <span class="kt">org.apache.avro.Schema</span> <span class="o">=</span> <span class="o">{</span><span class="s">"type"</span><span class="k">:</span><span class="err">"</span><span class="kt">record</span><span class="err">"</span><span class="o">,</span><span class="s">"name"</span><span class="k">:</span><span class="err">"</span><span class="kt">smol</span><span class="err">"</span><span class="o">,</span><span class="s">"fields"</span><span class="k">:</span><span class="err">[</span><span class="o">{</span><span class="err">"</span><span class="kt">name</span><span class="err">"</span><span class="kt">:</span><span class="err">"</span><span class="kt">latency</span><span class="err">"</span><span class="o">,</span><span class="err">"</span><span class="k">type</span><span class="err">"</span><span class="kt">:</span><span class="err">"</span><span class="kt">double</span><span class="err">"</span><span class="o">,</span><span class="s">"unit"</span><span class="k">:</span><span class="err">"</span><span class="kt">second</span><span class="err">"</span><span class="o">},{</span><span class="s">"name"</span><span class="k">:</span><span class="err">"</span><span class="kt">bandwidth</span><span class="err">"</span><span class="o">,</span><span class="s">"type"</span><span class="k">:</span><span class="err">"</span><span class="kt">double</span><span class="err">"</span><span class="o">,</span><span class="s">"unit"</span><span class="k">:</span><span class="err">"</span><span class="kt">gigabyte</span> <span class="kt">/</span> <span class="kt">second</span><span class="err">"</span><span class="o">}</span><span class="err">]</span><span class="o">}</span>
<span class="n">scala</span><span class="o">></span> <span class="k">val</span> <span class="n">rec</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">GenericData</span><span class="o">.</span><span class="nc">Record</span><span class="o">(</span><span class="n">schema</span><span class="o">)</span>
<span class="n">rec</span><span class="k">:</span> <span class="kt">org.apache.avro.generic.GenericData.Record</span> <span class="o">=</span> <span class="o">{</span><span class="s">"latency"</span><span class="k">:</span> <span class="kt">null</span><span class="o">,</span> <span class="s">"bandwidth"</span><span class="k">:</span> <span class="kt">null</span><span class="o">}</span>
<span class="n">scala</span><span class="o">></span> <span class="k">val</span> <span class="n">qp</span> <span class="k">=</span> <span class="nc">QuantityParser</span><span class="o">[</span><span class="kt">Second</span> <span class="kt">::</span> <span class="kt">Byte</span> <span class="kt">::</span> <span class="kt">Hour</span> <span class="kt">::</span> <span class="kt">Giga</span> <span class="kt">::</span> <span class="kt">HNil</span><span class="o">]</span>
<span class="n">qp</span><span class="k">:</span> <span class="kt">coulomb.parser.QuantityParser</span> <span class="o">=</span> <span class="n">coulomb</span><span class="o">.</span><span class="n">parser</span><span class="o">.</span><span class="nc">QuantityParser</span><span class="k">@</span><span class="mf">79f</span><span class="mi">0045</span>
<span class="n">scala</span><span class="o">></span> <span class="n">rec</span><span class="o">.</span><span class="n">putQuantity</span><span class="o">(</span><span class="n">qp</span><span class="o">)(</span><span class="s">"latency"</span><span class="o">,</span> <span class="mf">100.</span><span class="n">withUnit</span><span class="o">[</span><span class="kt">Milli</span> <span class="kt">%*</span> <span class="kt">Second</span><span class="o">])</span>
<span class="n">scala</span><span class="o">></span> <span class="n">rec</span><span class="o">.</span><span class="n">putQuantity</span><span class="o">(</span><span class="n">qp</span><span class="o">)(</span><span class="s">"bandwidth"</span><span class="o">,</span> <span class="mf">1.</span><span class="n">withUnit</span><span class="o">[</span><span class="kt">Tera</span> <span class="kt">%*</span> <span class="kt">Bit</span> <span class="kt">%/</span> <span class="kt">Minute</span><span class="o">])</span>
<span class="n">scala</span><span class="o">></span> <span class="n">rec</span>
<span class="n">res8</span><span class="k">:</span> <span class="kt">org.apache.avro.generic.GenericData.Record</span> <span class="o">=</span> <span class="o">{</span><span class="s">"latency"</span><span class="k">:</span> <span class="err">0</span><span class="kt">.</span><span class="err">1</span><span class="o">,</span> <span class="s">"bandwidth"</span><span class="k">:</span> <span class="err">2</span><span class="kt">.</span><span class="err">083333</span><span class="o">}</span>
</code></pre></div></div>
<p>What is happening here?
Firstly, the loading of an Avro schema, and creating a record from it, is standard to Avro.
Notice that the custom <code class="highlighter-rouge">"unit"</code> meta-data is preserved by Avro’s standard methods.
Next, I am declaring a
<a href="https://github.com/erikerlandson/coulomb#quantity-parsing"><code class="highlighter-rouge">QuantityParser</code></a>.
The quantity parser allows the unit expresions in the schema to be reconciled with the unit types
appearing in Scala.
You can see the quantity parser being used by the
<a href="https://erikerlandson.github.io/coulomb/latest/api/coulomb/avro/package$$EnhanceGenericRecord.html"><code class="highlighter-rouge">putQuantity</code></a>
method, which accepts a coulomb
<a href="https://github.com/erikerlandson/coulomb#quantity-and-unit-expressions"><code class="highlighter-rouge">Quantity</code></a>
instead of a raw data value of type Double, Int, etc.</p>
<p>What are these coulomb extensions buying us?
Notice that I can set the “latency” field with a value in <em>milliseconds</em> (<code class="highlighter-rouge">Milli %* Second</code>)
even though my schema denotes a unit of “seconds”.
Furthermore, the parser correctly determined that milliseconds are convertable to seconds,
and did this conversion automatically.
The coulomb library can perform these kind of computations on unit expressions of
<a href="https://github.com/erikerlandson/coulomb#quantity-and-unit-expressions">arbitrary complexity</a>,
which you can see in operation while setting the “bandwidth” field,
which correctly converts terabits/minute into gigabytes/second.</p>
<p>Equally important, this tool understands when units are <em>not</em> compatible.
The following attempt to set a field with units that are not convertable is also detected by the
parser and fails:</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">scala</span><span class="o">></span> <span class="n">rec</span><span class="o">.</span><span class="n">putQuantity</span><span class="o">(</span><span class="n">qp</span><span class="o">)(</span><span class="s">"latency"</span><span class="o">,</span> <span class="mf">100.</span><span class="n">withUnit</span><span class="o">[</span><span class="kt">Milli</span> <span class="kt">%*</span> <span class="kt">Meter</span><span class="o">])</span>
<span class="n">java</span><span class="o">.</span><span class="n">lang</span><span class="o">.</span><span class="nc">Exception</span><span class="k">:</span> <span class="kt">unit</span> <span class="kt">metadata</span> <span class="err">"</span><span class="kt">second</span><span class="err">"</span> <span class="kt">incompatible</span> <span class="kt">with</span> <span class="err">"</span><span class="kt">coulomb.%*</span><span class="err">[</span><span class="kt">coulomb.siprefix.Milli</span><span class="o">,</span> <span class="n">coulomb</span><span class="o">.</span><span class="n">si</span><span class="o">.</span><span class="nc">Meter</span><span class="err">]"</span>
</code></pre></div></div>
<p>Coulomb quantities are also supported on the field reading side
Here we use the
<a href="https://erikerlandson.github.io/coulomb/latest/api/coulomb/avro/package$$EnhanceGenericRecord.html"><code class="highlighter-rouge">getQuantity</code></a>
extension to extract field values into type safe units:</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">scala</span><span class="o">></span> <span class="n">rec</span><span class="o">.</span><span class="n">getQuantity</span><span class="o">[</span><span class="kt">Double</span>, <span class="kt">Micro</span> <span class="kt">%*</span> <span class="kt">Second</span><span class="o">](</span><span class="n">qp</span><span class="o">)(</span><span class="s">"latency"</span><span class="o">)</span>
<span class="n">res12</span><span class="k">:</span> <span class="kt">coulomb.Quantity</span><span class="o">[</span><span class="kt">Double</span>,<span class="kt">coulomb.siprefix.Micro</span> <span class="kt">%*</span> <span class="kt">coulomb.si.Second</span><span class="o">]</span> <span class="k">=</span> <span class="nc">Quantity</span><span class="o">(</span><span class="mf">100000.0</span><span class="o">)</span>
<span class="n">scala</span><span class="o">></span> <span class="n">rec</span><span class="o">.</span><span class="n">getQuantity</span><span class="o">[</span><span class="kt">Double</span>, <span class="kt">Giga</span> <span class="kt">%*</span> <span class="kt">Bit</span> <span class="kt">%/</span> <span class="kt">Minute</span><span class="o">](</span><span class="n">qp</span><span class="o">)(</span><span class="s">"bandwidth"</span><span class="o">)</span>
<span class="n">res13</span><span class="k">:</span> <span class="kt">coulomb.Quantity</span><span class="o">[</span><span class="kt">Double</span>,<span class="kt">coulomb.siprefix.Giga</span> <span class="kt">%*</span> <span class="kt">coulomb.info.Bit</span> <span class="kt">%/</span> <span class="kt">coulomb.time.Minute</span><span class="o">]</span> <span class="k">=</span> <span class="nc">Quantity</span><span class="o">(</span><span class="mf">1000.0</span><span class="o">)</span>
</code></pre></div></div>
<p>As with <code class="highlighter-rouge">putQuantity</code>, unit types and expressions are reconciled by the compiler and properly converted.
As before, unit incompatibilities result in parse error:</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">scala</span><span class="o">></span> <span class="n">rec</span><span class="o">.</span><span class="n">getQuantity</span><span class="o">[</span><span class="kt">Double</span>, <span class="kt">Byte</span><span class="o">](</span><span class="n">qp</span><span class="o">)(</span><span class="s">"latency"</span><span class="o">)</span>
<span class="n">java</span><span class="o">.</span><span class="n">lang</span><span class="o">.</span><span class="nc">Exception</span><span class="k">:</span> <span class="kt">unit</span> <span class="kt">metadata</span> <span class="err">"</span><span class="kt">second</span><span class="err">"</span> <span class="kt">incompatible</span> <span class="kt">with</span> <span class="err">"</span><span class="kt">coulomb.info.Byte</span><span class="err">"</span>
</code></pre></div></div>
<p>Another important consequence of using
<a href="https://github.com/erikerlandson/coulomb">coulomb</a>
with Avro is that in your Scala code you can use coulomb
<a href="https://github.com/erikerlandson/coulomb#quantity-and-unit-expressions">Quantity</a>
values, for compile-time unit type checking.</p>
<p>I hope this post has demonstrated how unit type expressions for Avro schema can make your
data schema safer and more expressive!</p>In a previous post I showed how software configuration errors could be prevented by supporting values with unit types. Configuration systems are an important use case for unit types, but they are far from the only one. In this post I will show a similar integration of the coulomb project with Apache Avro schema.Preventing Configuration Errors With Unit Types2019-05-09T19:16:00+00:002019-05-09T19:16:00+00:00http://erikerlandson.github.io/blog/2019/05/09/preventing-configuration-errors-with-unit-types<p>Anyone who has worked with software
has almost certainly had the experience of tracking down a software or systems problem to
discover that it was caused by an incorrectly configured parameter.
Settings get misconfigured for a variety of reasons, but one recurring pattern of error is a
value that was set assuming a <em>unit</em> that wasn’t expected.</p>
<p>What do I mean by “unit”? Consider this snippet of an Apache Kafka configuration file:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>log.flush.interval.messages=1000
log.flush.interval.ms=10000
log.flush.scheduler.interval.ms=1000
log.retention.hours=24
log.segment.bytes=1000000
log.retention.check.interval.ms=30000
log.roll.hours=24
</code></pre></div></div>
<p>Just in these few Kafka configuration parameters, we can see four units in play:
two units of time (milliseconds and hours), a unit of information (bytes) and the unit (messages).</p>
<p>A <em>unit</em>, in this context, is a <em>standard of measurement</em> to denote some particular kind of quantity.
We define units for quantities such as information (bytes, bits), time (seconds, hours, days),
length (feet, meters), and so on.
The physical sciences define seven such kinds of quantity, each having their own standard unit;
these are the Standard International (SI)
<a href="https://simple.wikipedia.org/wiki/International_System_of_Units#Base_units">Base Units</a>.</p>
<p>Let’s return to our Kafka configuration.
We can see that information about the expected units is encoded in the parameter names,
<em>not</em> in the values themselves.
For example, the parameter <code class="highlighter-rouge">log.retention.hours</code> tells us that it is expecting a time in units of hours.</p>
<p>This is helpful information for a user, and yet it provides relatively little in the way of
<em>actively protecting</em> against accidents.
Suppose this value is somehow configured assuming <em>seconds</em>, and so set to 86400;
now the configured log retention time is off by a <em>factor of 3600</em> from its intended value!</p>
<p>Pause to note that this is all the same to the configuration system: 86400 is just another number, like 24.
It will happily set the log retention time 3600 times too large,
quite likely causing some disk volume to fill up a couple weeks later.
The result will be data loss, possibly a software crash, and almost certainly unplanned
overtime for an unlucky ops team.</p>
<p>I can hear some readers thinking:
“That would be bad, but what are the odds? It says hours right in the name!”
Nevertheless, Murphy’s Law rules our world.
Perhaps the person who set up the configuration was in a hurry and rushing the job.
They could have been awake for 36 hours, and not thinking clearly.
They might not speak English, and weren’t 100% clear on what the word “hour” even means.
Quite possibly, the configuration file was generated by some <em>other</em> piece of software,
and that software had a unit bug in it.
Furthermore, not all configuration naming systems are as thoughtfully composed as Kafka’s.
There are plenty of configuration parameters out in the wild that don’t have any helpful unit information baked
into the parameter names.</p>
<p>So far, I’ve been examining the configuration process from the point of view of setting parameter values.
There are some similar issues on the software side, where these values are read.
Here is some pseudo-code that loads a value from a configuration
(as you might guess, it looks pretty similar in most common languages):</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>secondsPerHour = 3600
// my system calls are going to expect seconds
logRetentionSeconds = conf.getInt("log.retention.hours") * secondsPerHour
</code></pre></div></div>
<p>Firstly, note that the variable <code class="highlighter-rouge">logRetentionSeconds</code> is loaded as an integer value.
It’s unit (seconds) is being baked into its name, the same way that configuration parameter
names have units baked into theirs.
As with configuration file values, there are a variety of things that might go wrong here.
The programmer might not be so consciencious, and just name it <code class="highlighter-rouge">logRetentionTime</code> or <code class="highlighter-rouge">logRetention</code>,
and elsewhere in the code noone will be quite sure what the units are.
Worse yet, they might compute the value incorrectly, and future maintainers will wonder
why they have a bug, not realizing that the variable <code class="highlighter-rouge">logRetentionHours</code> is lying to them.
Case in point: while writing this, I accidentally divided instead of multiplied in the example code above,
before I caught my error!
Lastly, doing the conversion from hours to seconds itself is tedious (and prone to error), requiring
either a magic number or referring to the right constant value.</p>
<p>We are all acquainted with the pitfalls of working this way, but what is to be done?</p>
<p>I’ll start by pointing out that units, like hours, bytes, milliseconds, etc, are <em>annotations</em> that convey
information about a numeric value; in particular they <em>constrain</em> the interpretation of that value.
A value of 10 seconds is representing a quantity of time, and with the same measure, as a value of 30 seconds,
but is <em>different</em> than a value of 10 bytes, or even 10 minutes.</p>
<p>These kinds of constraints might sound familiar to programmers: they are acting like <em>data types</em>!
Programmers are intuitively used to working with types such as “string”, “int” or “boolean”.
We know, whether we have thought about it consciously or not, that values with the type “string” are
constrained in different ways than, for example, an “int”.
They support different kinds of operations.
They can’t be used interchangeably; if you try, either a compiler error or a run-time error will result.</p>
<p>What if <em>units</em> could be represented as data types?</p>
<p>In a world where units could be applied to numeric values as first-class data types, a mistake in unit
assignment would show up immediately as a compile error.
Units that <em>can</em> be converted (such as hours and seconds) might be automatically converted by the compiler,
eliminating the need for tedious and error-prone conversions in the code.</p>
<p>What would programming in such a world look like?</p>
<p>To explore these possibilities, I have been working on an
<a href="http://erikerlandson.github.io/blog/2019/05/03/algorithmic-unit-analysis/">algorithmic unit analysis</a>
implemented as a type system for Scala.
The project itself is called
<a href="https://github.com/erikerlandson/coulomb#coulomb">coulomb</a>;
it supports many unit analysis
<a href="https://github.com/erikerlandson/coulomb#features">features</a>,
including compile-time unit checking, unit conversions, and easily-extensible unit definitions.</p>
<p>What happens when a tool such as coulomb is used to apply unit analysis to the task of configuation?
In the following demonstration I’ll show what configuration with units looks like,
and also how they appear when they are loaded in Scala code.</p>
<p>I’ll begin by spinning up a scala REPL from the coulomb repo, and importing some definitions.
Here you can see I’m importing coulomb “core” definitions plus SI units, time units and information units.
I’m also importing the coulomb
<a href="https://erikerlandson.github.io/coulomb/latest/api/coulomb/parser/QuantityParser.html">QuantityParser</a>
and its
<a href="https://erikerlandson.github.io/coulomb/latest/api/coulomb/typesafeconfig/index.html">integration package</a>
for the
<a href="https://github.com/lightbend/config">Typesafe</a>
configuration library.</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cd /path/to/coulomb
$ sbt coulomb_tests/console
Welcome to Scala 2.13.0-M5 (OpenJDK 64-Bit Server VM, Java 1.8.0_201).
Type in expressions for evaluation. Or try :help.
scala> import coulomb._, coulomb.si._, coulomb.siprefix._, coulomb.time._, coulomb.info._, coulomb.typesafeconfig._, coulomb.parser._, com.typesafe.config._, shapeless._
</code></pre></div></div>
<p>Next I’ll construct a typesafe style configuration.
Here I’m creating it directly in the REPL, but this would typically reside in a separate file.</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">scala</span><span class="o">></span> <span class="k">val</span> <span class="n">confTS</span> <span class="k">=</span> <span class="nc">ConfigFactory</span><span class="o">.</span><span class="n">parseString</span><span class="o">(</span><span class="s">"""
| "log-retention-time" = "24 hour"
| "log-segment-size" = "1 megabyte"
| "log-flush-interval" = "10 second"
| "log-demo-bandwidth" = "10 megabyte / second"
| """</span><span class="o">)</span>
<span class="n">confTS</span><span class="k">:</span> <span class="kt">com.</span><span class="k">type</span><span class="kt">safe.config.Config</span> <span class="o">=</span> <span class="nc">Config</span><span class="o">(</span><span class="nc">SimpleConfigObject</span><span class="o">({</span><span class="s">"log-demo-bandwidth"</span><span class="k">:</span><span class="err">"10</span> <span class="kt">megabyte</span> <span class="kt">/</span> <span class="kt">second</span><span class="err">"</span><span class="o">,</span><span class="s">"log-flush-interval"</span><span class="k">:</span><span class="err">"10</span> <span class="kt">second</span><span class="err">"</span><span class="o">,</span><span class="s">"log-retention-time"</span><span class="k">:</span><span class="err">"24</span> <span class="kt">hour</span><span class="err">"</span><span class="o">,</span><span class="s">"log-segment-size"</span><span class="k">:</span><span class="err">"1</span> <span class="kt">megabyte</span><span class="err">"</span><span class="o">}))</span>
</code></pre></div></div>
<p>Let’s pause to compare this with with our example configuration up above.
First, you can see that the unit annotation now resides in the actual configuration <em>values</em>.
Already, this offers some advantages.
The values are no longer just anonymous numbers; since the units are directly applied as annotations
(and constraints), the opportunities for unit errors are reduced.
If the configuration for “log-retention-time” was configured by an admin using seconds instead of hours,
that would no longer be an error, as value itself would be “86400 seconds”.</p>
<p>Some additional features of coulomb appear here.
Prefixes such as “mega” are supported as
<a href="https://github.com/erikerlandson/coulomb#unit-prefixes">first-class units</a>.
If you look at the configuration of “log-demo-bandwidth”, it is defined as a compound unit expression:
“megabyte / second”.
Arbitrary
<a href="https://github.com/erikerlandson/coulomb#quantity-and-unit-expressions">unit expressions</a>
are constructable in coulomb.</p>
<p>Now let’s examine what it looks like to load these values in Scala code.
Continuing in our REPL, I will define a unit quantity parser and associate it with our configuration.
Here I am creating a parser that knows exactly the units I need to read my example config.
An application parser might include additional units and prefixes, as necessary.</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">scala</span><span class="o">></span> <span class="k">val</span> <span class="n">qp</span> <span class="k">=</span> <span class="nc">QuantityParser</span><span class="o">[</span><span class="kt">Second</span> <span class="kt">::</span> <span class="kt">Byte</span> <span class="kt">::</span> <span class="kt">Hour</span> <span class="kt">::</span> <span class="kt">Mega</span> <span class="kt">::</span> <span class="kt">HNil</span><span class="o">]</span>
<span class="n">qp</span><span class="k">:</span> <span class="kt">coulomb.parser.QuantityParser</span> <span class="o">=</span> <span class="n">coulomb</span><span class="o">.</span><span class="n">parser</span><span class="o">.</span><span class="nc">QuantityParser</span><span class="k">@</span><span class="mf">741f</span><span class="mi">1957</span>
<span class="n">scala</span><span class="o">></span> <span class="k">val</span> <span class="n">conf</span> <span class="k">=</span> <span class="n">confTS</span><span class="o">.</span><span class="n">withQuantityParser</span><span class="o">(</span><span class="n">qp</span><span class="o">)</span>
<span class="n">conf</span><span class="k">:</span> <span class="kt">coulomb.</span><span class="k">type</span><span class="kt">safeconfig.CoulombConfig</span> <span class="o">=</span> <span class="nc">CoulombConfig</span><span class="o">(</span><span class="nc">Config</span><span class="o">(</span><span class="nc">SimpleConfigObject</span><span class="o">({</span><span class="s">"log-demo-bandwidth"</span><span class="k">:</span><span class="err">"10</span> <span class="kt">megabyte</span> <span class="kt">/</span> <span class="kt">second</span><span class="err">"</span><span class="o">,</span><span class="s">"log-flush-interval"</span><span class="k">:</span><span class="err">"10</span> <span class="kt">second</span><span class="err">"</span><span class="o">,</span><span class="s">"log-retention-time"</span><span class="k">:</span><span class="err">"24</span> <span class="kt">hour</span><span class="err">"</span><span class="o">,</span><span class="s">"log-segment-size"</span><span class="k">:</span><span class="err">"1</span> <span class="kt">megabyte</span><span class="err">"</span><span class="o">})),</span><span class="n">coulomb</span><span class="o">.</span><span class="n">parser</span><span class="o">.</span><span class="nc">QuantityParser</span><span class="k">@</span><span class="mf">741f</span><span class="mi">1957</span><span class="o">)</span>
</code></pre></div></div>
<p>With our unit parsing ready to go, we are now in a position to load some values:</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">scala</span><span class="o">></span> <span class="k">val</span> <span class="n">logRetentionTime</span> <span class="k">=</span> <span class="n">conf</span><span class="o">.</span><span class="n">getQuantity</span><span class="o">[</span><span class="kt">Int</span>, <span class="kt">Second</span><span class="o">](</span><span class="s">"log-retention-time"</span><span class="o">)</span>
<span class="n">logRetentionTime</span><span class="k">:</span> <span class="kt">scala.util.Try</span><span class="o">[</span><span class="kt">coulomb.Quantity</span><span class="o">[</span><span class="kt">Int</span>,<span class="kt">coulomb.si.Second</span><span class="o">]]</span> <span class="k">=</span> <span class="nc">Success</span><span class="o">(</span><span class="nc">Quantity</span><span class="o">(</span><span class="mi">86400</span><span class="o">))</span>
<span class="n">scala</span><span class="o">></span> <span class="n">logRetentionTime</span><span class="o">.</span><span class="n">get</span><span class="o">.</span><span class="n">showFull</span>
<span class="n">res1</span><span class="k">:</span> <span class="kt">String</span> <span class="o">=</span> <span class="mi">86400</span> <span class="n">second</span>
</code></pre></div></div>
<p>As the above example shows, if we load this value in our code using seconds instead of hours, then the coulomb
type system automatically does the right thing.
It gives us the correct integer value, coupled with the unit <code class="highlighter-rouge">Second</code> instead of <code class="highlighter-rouge">Hour</code>.</p>
<p>Equally importantly, if we attempt to load log retention time using an <em>incompatible</em> unit, such as bytes,
it will not allow it!
That is a type error:</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">scala</span><span class="o">></span> <span class="k">val</span> <span class="n">unitMistake</span> <span class="k">=</span> <span class="n">conf</span><span class="o">.</span><span class="n">getQuantity</span><span class="o">[</span><span class="kt">Int</span>, <span class="kt">Byte</span><span class="o">](</span><span class="s">"log-retention-time"</span><span class="o">)</span>
<span class="n">unitMistake</span><span class="k">:</span> <span class="kt">scala.util.Try</span><span class="o">[</span><span class="kt">coulomb.Quantity</span><span class="o">[</span><span class="kt">Int</span>,<span class="kt">coulomb.info.Byte</span><span class="o">]]</span> <span class="k">=</span>
<span class="nc">Failure</span><span class="o">(</span><span class="n">scala</span><span class="o">.</span><span class="n">tools</span><span class="o">.</span><span class="n">reflect</span><span class="o">.</span><span class="nc">ToolBoxError</span><span class="k">:</span> <span class="kt">reflective</span> <span class="kt">compilation</span> <span class="kt">has</span> <span class="kt">failed:</span>
</code></pre></div></div>
<p>The coulomb type system gives us the same unit type protection with our values <em>after</em> we load them.
Imagine a system call that supported values with units, for example this hypothetical
system function that expects milliseconds, a common time unit at the system level:</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">scala</span><span class="o">></span> <span class="k">def</span> <span class="n">fakeSysCall</span><span class="o">(</span><span class="n">ms</span><span class="k">:</span> <span class="kt">Quantity</span><span class="o">[</span><span class="kt">Int</span>, <span class="kt">Milli</span> <span class="kt">%*</span> <span class="kt">Second</span><span class="o">])</span><span class="k">:</span> <span class="kt">String</span> <span class="o">=</span> <span class="n">ms</span><span class="o">.</span><span class="n">showFull</span>
<span class="n">fakeSysCall</span><span class="k">:</span> <span class="o">(</span><span class="kt">ms:</span> <span class="kt">coulomb.Quantity</span><span class="o">[</span><span class="kt">Int</span>,<span class="kt">coulomb.siprefix.Milli</span> <span class="kt">%*</span> <span class="kt">coulomb.si.Second</span><span class="o">])</span><span class="nc">String</span>
<span class="n">scala</span><span class="o">></span> <span class="n">fakeSysCall</span><span class="o">(</span><span class="n">logRetentionTime</span><span class="o">.</span><span class="n">get</span><span class="o">)</span>
<span class="n">res2</span><span class="k">:</span> <span class="kt">String</span> <span class="o">=</span> <span class="mi">86400000</span> <span class="n">millisecond</span>
<span class="n">scala</span><span class="o">></span> <span class="n">fakeSysCall</span><span class="o">(</span><span class="mf">60.</span><span class="n">withUnit</span><span class="o">[</span><span class="kt">Byte</span><span class="o">])</span>
<span class="o">^</span>
<span class="n">error</span><span class="k">:</span> <span class="k">type</span> <span class="kt">mismatch</span><span class="o">;</span>
<span class="n">found</span> <span class="k">:</span> <span class="kt">coulomb.Quantity</span><span class="o">[</span><span class="kt">Int</span>,<span class="kt">coulomb.info.Byte</span><span class="o">]</span>
<span class="n">required</span><span class="k">:</span> <span class="kt">coulomb.Quantity</span><span class="o">[</span><span class="kt">Int</span>,<span class="kt">coulomb.siprefix.Milli</span> <span class="kt">%*</span> <span class="kt">coulomb.si.Second</span><span class="o">]</span>
</code></pre></div></div>
<p>As with configuration loading, the type system will automatically convert units that are convertable,
but will fail to compile an attemp to use <em>incompatible</em> units.</p>
<p>The same conversion and unit checking capabilities are supported for arbitrary unit expressions.
Here I’ll load a bandwidth using gigabits per minute, when it’s configured value was set using
megabytes / second.
If I try to load it using gigabits per meter, that fails, since it is not a compatible unit.</p>
<div class="language-scala highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">scala</span><span class="o">></span> <span class="k">val</span> <span class="n">bandwidth</span> <span class="k">=</span> <span class="n">conf</span><span class="o">.</span><span class="n">getQuantity</span><span class="o">[</span><span class="kt">Double</span>, <span class="kt">Giga</span> <span class="kt">%*</span> <span class="kt">Bit</span> <span class="kt">%/</span> <span class="kt">Minute</span><span class="o">](</span><span class="s">"log-demo-bandwidth"</span><span class="o">).</span><span class="n">get</span>
<span class="n">bandwidth</span><span class="k">:</span> <span class="kt">coulomb.Quantity</span><span class="o">[</span><span class="kt">Double</span>,<span class="kt">coulomb.siprefix.Giga</span> <span class="kt">%*</span> <span class="kt">coulomb.info.Bit</span> <span class="kt">%/</span> <span class="kt">coulomb.time.Minute</span><span class="o">]</span> <span class="k">=</span> <span class="nc">Quantity</span><span class="o">(</span><span class="mf">4.8</span><span class="o">)</span>
<span class="n">scala</span><span class="o">></span> <span class="n">bandwidth</span><span class="o">.</span><span class="n">showFull</span>
<span class="n">res8</span><span class="k">:</span> <span class="kt">String</span> <span class="o">=</span> <span class="mf">4.8</span> <span class="n">gigabit</span><span class="o">/</span><span class="n">minute</span>
<span class="n">scala</span><span class="o">></span> <span class="k">val</span> <span class="n">oopsie</span> <span class="k">=</span> <span class="n">conf</span><span class="o">.</span><span class="n">getQuantity</span><span class="o">[</span><span class="kt">Double</span>, <span class="kt">Giga</span> <span class="kt">%*</span> <span class="kt">Bit</span> <span class="kt">%/</span> <span class="kt">Meter</span><span class="o">](</span><span class="s">"log-demo-bandwidth"</span><span class="o">).</span><span class="n">get</span>
<span class="n">scala</span><span class="o">.</span><span class="n">tools</span><span class="o">.</span><span class="n">reflect</span><span class="o">.</span><span class="nc">ToolBoxError</span><span class="k">:</span> <span class="kt">reflective</span> <span class="kt">compilation</span> <span class="kt">has</span> <span class="kt">failed:</span>
<span class="kt">could</span> <span class="kt">not</span> <span class="kt">find</span> <span class="kt">implicit</span> <span class="kt">value</span> <span class="kt">for</span> <span class="kt">parameter</span> <span class="kt">uc:</span> <span class="kt">coulomb.unitops.UnitConverter</span><span class="o">[</span><span class="kt">spire.math.Rational</span>,<span class="kt">coulomb.siprefix.Mega</span> <span class="kt">%*</span> <span class="kt">coulomb.info.Byte</span> <span class="kt">%/</span> <span class="kt">coulomb.si.Second</span>,<span class="kt">spire.math.Rational</span>,<span class="kt">coulomb.siprefix.Giga</span> <span class="kt">%*</span> <span class="kt">coulomb.info.Bit</span> <span class="kt">%/</span> <span class="kt">coulomb.si.Meter</span><span class="o">]</span>
</code></pre></div></div>
<p>I hope this discussion has made a case that supporting unit analysis as a programming language type system
can make it easier and safer to configure our software systems.
Several modern programming languages in addition to Scala have advanced type systems with the potential
to support the kinds of capabilities demonstrated by coulomb, for example Haskell and Rust.
If these ideas inspire the exploration of unit analysis type systems in other communities, that would be
very exciting!</p>Anyone who has worked with software has almost certainly had the experience of tracking down a software or systems problem to discover that it was caused by an incorrectly configured parameter. Settings get misconfigured for a variety of reasons, but one recurring pattern of error is a value that was set assuming a unit that wasn’t expected.Algorithmic Unit Analysis2019-05-03T15:50:00+00:002019-05-03T15:50:00+00:00http://erikerlandson.github.io/blog/2019/05/03/algorithmic-unit-analysis<p>In many computing situations, we might like to associate a unit expression with a number, for example</p>
<script type="math/tex; mode=display">9.8 \ meter / (second ^ 2)</script>
<p>As humans, we’d also like it if our computing tools could understand that many unit expressions are <em>equivalent</em>.</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{aligned}
& 9.8 \ meter / (second ^ 2) \\
= \ & 9.8 \ meter / second / second \\
= \ & 9.8 \ meter \times second ^ {-2}
\end{aligned} %]]></script>
<p>Better yet, it would be very useful for our tools to know that some unit expressions are <em>convertable</em>.</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{aligned}
9.8 \ meter / (second ^ 2) & = 32 \ foot / second / second \\
1 \ meter ^ 3 & = 1000 \ liter
\end{aligned} %]]></script>
<p>Equally important, we wish to define the idea that units may <em>not</em> be compatible.</p>
<script type="math/tex; mode=display">% <![CDATA[
1 \ meter / second ^ 2 \text{ <incompatible> } 1 \ foot / second \\
1 \ liter \text{ <incompatible> } 1 \ meter ^ 2 %]]></script>
<p>It turns out that there is a straightforward and efficient way to define such an algorithmic unit analysis.
Start with the following definition:</p>
<script type="math/tex; mode=display">\text{For some atom 'u', } base(u) \text { declares 'u' to be a "base unit" } \\</script>
<p>For example, <script type="math/tex">base(meter)</script> declares the atom <em>meter</em> to be a base unit.
Every base unit also implicitly defines an
<a href="https://en.wikipedia.org/wiki/International_System_of_Quantities"><em>abstract quantity</em></a>.
The term <script type="math/tex">base(meter)</script> defines the abstract quantity of Length;
<script type="math/tex">base(second)</script> defines an abstract quantity of Duration, and so on.</p>
<p>Unit expressions may be constructed from these atomic base units, inductively:</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{aligned}
& \bullet \text{The atom } \emptyset \text{ (unitless) is a unit expression. } \\
& \bullet \text{An atom } u \text{ is a unit expression whenever } base(u) \text{ is declared. } \\
& \bullet \text{An atom } d \text{ is a unit expresion whenever } derived(d, e, c) \text{ is declared and } \\
& \quad e \text{ is a unit expression and } c \text{ is a numeric value > 0.} \\
& \bullet \text{For unit expressions } u, v \text { and exponent } p \text{, the following} \\
& \quad \text{are all unit expressions: } \\
& \quad \quad uv \text{ or } u \times v \\
& \quad \quad u / v \\
& \quad \quad u ^ p
\end{aligned} %]]></script>
<p>In our constructions above, we introduced a special atom <script type="math/tex">\emptyset</script> represents a <em>unitless</em> expression where all units have canceled.
For example <script type="math/tex">meter / meter</script> is equivalent to <script type="math/tex">\emptyset</script>.
We also introduced the idea of a <em>derived unit</em>, where a named unit <em>d</em> is declared as equivalent to <script type="math/tex">c e</script>.
For example we can use a derived unit to define a <em>liter</em> as a unit of volume:
<script type="math/tex">derived(liter, meter^3, 1/1000)</script>.</p>
<p>Next, I will define the notion of the <em>canonical form</em> of a unit expression.
Intuitively, an expression’s canonical form is an equivalent representation expressed purely as a product
of a numeric value with base units raised to non-zero powers.
For example, the canonical form of <script type="math/tex">meter / (second^2)</script> is <script type="math/tex">1 \ meter^1 second^{-2}</script>,
and the canonical form of <script type="math/tex">liter / second</script> is <script type="math/tex">(1/1000) \ meter^3 second^{-1}</script>.</p>
<p>The canonical form of a unit expression <script type="math/tex">e \triangleq canonical(e)</script> is recursively defined, as follows.</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{aligned}
canonical(\emptyset) & = 1 \times \emptyset \\
\text{given } base(u) \text{, } canonical(u) & = 1 \times u ^ 1 \\
\text{given } derived(d, e, c) \text{, } canonical(d) & = c \times canonical(e) \\
canonical(u \times v) & = canonical(u) \times canonical(v) \\
canonical(u / v) &= canonical(u) / canonical(v) \\
canonical(u ^ p) &= (canonical(u))^p
\end{aligned} %]]></script>
<p>By convention, a canonical form of <script type="math/tex">c \times \emptyset</script> represents the unitless state where all other
powers of unit atoms have canceled out to zero, and so for example
<script type="math/tex">canonical(meter / meter) = 1 \ \emptyset</script>, and <script type="math/tex">canonical(\emptyset \times second) = 1 \ second^1</script></p>
<p>We are now in a position to define some algorithmic unit analysis!
A fundamental question of unit analysis is whether two unit expressions are <em>convertable</em>,
and if so, what is the conversion factor between them.
Using the above definition of <em>canonical forms</em>, it is straightforward to capture this idea mathematically:</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{aligned}
& \text{Unit expressions } u \text{ and } v \text{ are convertable if and only if } \\
& \frac{canonical(u)}{canonical(v)} = c \times \emptyset \text{, and if so then: } \\
& 1 \ u = c \ v \ \ \text{ and } \ \ 1 \ v = u / c \\
\end{aligned} %]]></script>
<p>Consider this example: Is <script type="math/tex">foot / second / second</script> convertable to <script type="math/tex">meter / (second^2)</script> ?
Allowing that we’ve declared <script type="math/tex">base(meter)</script>, <script type="math/tex">base(second)</script> and <script type="math/tex">derived(foot, meter, 0.3048)</script>, then:</p>
<script type="math/tex; mode=display">\frac{canonical(foot / second / second)}{canonical(meter / second^2)}
= \frac{0.3048 \ meter^1 second^{-2}}{1 \ meter^1 second^{-2}} = 0.3048 \ \emptyset</script>
<p>and so the answer is yes!
These unit expressions <em>are</em> convertable, and 0.3048 is the coefficient of conversion.</p>
<p>Likewise, this algorithm can conclude when units are <em>not</em> compatible:</p>
<script type="math/tex; mode=display">\frac{canonical(foot / second)}{canonical(meter / second^2)}
= \frac{0.3048 \ meter^1 second^{-1}}{1 \ meter^1 second^{-2}} = 0.3048 \ second \text{ (incompatible)}</script>
<p>This approach to the question of unit convertability is very amenable for use in computing.
The canonical form of a unit expression is easily representable in various data structures.
For example, the canonical form <script type="math/tex">1 \ meter^1 second^{-2}</script> can be represented as the sequence
<code class="highlighter-rouge">[(meter, 1), (second, -2)]</code>.
It might also be represented as a mapping, such that <code class="highlighter-rouge">map[meter] -> 1</code> and <code class="highlighter-rouge">map[second] -> -2</code>.
The inductive definition of unit expressions is straightforward to implement in most modern
computing languages, as is the recursive definition of the <em>canonical</em> function itself.</p>
<p>I have been experimenting with an implementation of these algorithmic definitions for defining
unit analysis as a <a href="https://github.com/erikerlandson/coulomb">static typing system for Scala</a>.
However, by defining these concepts mathematically, my hope is that it may make applying computational unit analysis
more amenable for any programming language that can support it.</p>In many computing situations, we might like to associate a unit expression with a number, for exampleThe Smooth-Max Minimum Incident of December 20182019-01-02T13:25:00+00:002019-01-02T13:25:00+00:00http://erikerlandson.github.io/blog/2019/01/02/the-smooth-max-minimum-incident-of-december-2018<p>In what is becoming an ongoing series where I climb the convex optimization learning curve by making rookie mistakes,
I tripped over yet another <a href="https://github.com/erikerlandson/gibbous/issues/1">unexpected failure</a>
in my feasible point solver while testing a couple new inequality constraints for my
<a href="https://github.com/erikerlandson/snowball">monotonic splining project</a>.</p>
<p>The symptom was that when I added <a href="https://github.com/erikerlandson/snowball/pull/1">minimum and maximum constraints</a>,
the feasible point solver began reporting failure.
These failures made no sense to me, because they were actually constraining my problem very little, if at all.
For example, I if I added constraints for <code class="highlighter-rouge">s(x) > 0</code> and <code class="highlighter-rouge">s(x) < 1</code>, the solver began failing,
even though my function (designed to behave as a CDF) was already meeting these constraints to within machine epsilon tolerance.</p>
<p>When I inspected its behavior, I discovered that my solver found a point <code class="highlighter-rouge">x</code> where the
<a href="http://erikerlandson.github.io/blog/2018/06/03/solving-feasible-points-with-smooth-max/">smooth-max was minimized</a>,
and reported this answer as also being the minimum possible value for the true maximum.
As it happened, this value for <code class="highlighter-rouge">x</code> was positive (non-satisfying) for the true max, even though better locations <em>did</em> exist!</p>
<p>This time, my error turned out to be that I had assumed the smooth-max function is “minimum preserving.”
That is, I had assumed that the minimum of smooth-max is the same as the corresponding minimum for the true maximum.
I cooked up a quick jupyter notebook to see if I could prove I was wrong about this, and sure enough came up with a simple
visual counter-example:</p>
<p><img src="/assets/images/smooth-max-plot.png" alt="Figure-1" /></p>
<p>In this plot, the black dotted line identifies the minimum of the true maximum:
the left intersection of the blue parabola and red line.
The green dotted line shows the mimimum of soft-max, and it’s easy to see that they are completely different!</p>
<p>I haven’t yet coded up a fix for this, but my basic plan is to allow the smooth-max alpha to increase whenever it
fails to find a feasible point.
Why? Increasing alpha causes the
<a href="http://erikerlandson.github.io/blog/2018/05/28/computing-smooth-max-and-its-gradients-without-over-and-underflow/">smooth-max</a>
to more closely approximate true max.
If the soft-max approximation becomes sufficiently close to the true maximum, and no solution is found,
then I can report an empty feasible region with more confidence.</p>
<p>Why did I make this blunder?
I suspect it is because I originally only visualized symmetric examples in my mind,
where the mimimum of smooth-max and true maximum is the same.
Visual intuitions are only as good as your imagination!</p>In what is becoming an ongoing series where I climb the convex optimization learning curve by making rookie mistakes, I tripped over yet another unexpected failure in my feasible point solver while testing a couple new inequality constraints for my monotonic splining project.The Backtracking ULP Incident of 20182018-09-11T07:01:00+00:002018-09-11T07:01:00+00:00http://erikerlandson.github.io/blog/2018/09/11/the-backtracking-ulp-incident-of-2018<p>This week I finally started applying my new <a href="https://github.com/erikerlandson/gibbous/">convex optimization</a> library to solve for interpolating splines with <a href="https://github.com/erikerlandson/snowball">monotonic constraints</a>. Things seemed to be going well. My convex optimization was passing unit tests. My monotone splines were passing their unit tests too. I cut an initial release, and announced it to the world.</p>
<p>Because Murphy rules my world, it was barely an hour later that I was playing around with my new toys in a REPL, and when I tried splining an example data set my library call went into an infinite loop:</p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// It looks mostly harmless:</span>
<span class="kt">double</span><span class="o">[]</span> <span class="n">x</span> <span class="o">=</span> <span class="o">{</span> <span class="mf">1.0</span><span class="o">,</span> <span class="mf">2.0</span><span class="o">,</span> <span class="mf">3.0</span><span class="o">,</span> <span class="mf">4.0</span><span class="o">,</span> <span class="mf">5.0</span><span class="o">,</span> <span class="mf">6.0</span><span class="o">,</span> <span class="mf">7.0</span><span class="o">,</span> <span class="mf">8.0</span><span class="o">,</span> <span class="mf">9.0</span> <span class="o">};</span>
<span class="kt">double</span><span class="o">[]</span> <span class="n">y</span> <span class="o">=</span> <span class="o">{</span> <span class="mf">0.0</span><span class="o">,</span> <span class="mf">0.15</span><span class="o">,</span> <span class="mf">0.05</span><span class="o">,</span> <span class="mf">0.3</span><span class="o">,</span> <span class="mf">0.5</span><span class="o">,</span> <span class="mf">0.7</span><span class="o">,</span> <span class="mf">0.95</span><span class="o">,</span> <span class="mf">0.98</span><span class="o">,</span> <span class="mf">1.0</span> <span class="o">};</span>
<span class="n">MonotonicSplineInterpolator</span> <span class="n">interpolator</span> <span class="o">=</span> <span class="k">new</span> <span class="n">MonotonicSplineInterpolator</span><span class="o">();</span>
<span class="n">PolynomialSplineFunction</span> <span class="n">s</span> <span class="o">=</span> <span class="n">interpolator</span><span class="o">.</span><span class="na">interpolate</span><span class="o">(</span><span class="n">x</span><span class="o">,</span> <span class="n">y</span><span class="o">);</span>
</code></pre></div></div>
<p>In addition to being a bit embarrassing, it was also a real head-scratcher. There was nothing odd about the data I had just given it. In fact it was a small variation of a problem it had just solved a few seconds prior.</p>
<p>There was nothing to do but put my code back up on blocks and break out the print statements. I ran my problem data set and watched it spin. Fast forward a half hour or so, and I localized the problem to a bit of code that does the <a href="https://en.wikipedia.org/wiki/Backtracking_line_search">“backtracking” phase</a> of a convex optimization:</p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span> <span class="o">(</span><span class="kt">double</span> <span class="n">t</span> <span class="o">=</span> <span class="mf">1.0</span> <span class="o">;</span> <span class="n">t</span> <span class="o">>=</span> <span class="n">epsilon</span> <span class="o">;</span> <span class="n">t</span> <span class="o">*=</span> <span class="n">beta</span><span class="o">)</span> <span class="o">{</span>
<span class="n">tx</span> <span class="o">=</span> <span class="n">x</span><span class="o">.</span><span class="na">add</span><span class="o">(</span><span class="n">xDelta</span><span class="o">.</span><span class="na">mapMultiply</span><span class="o">(</span><span class="n">t</span><span class="o">));</span>
<span class="n">tv</span> <span class="o">=</span> <span class="n">convexObjective</span><span class="o">.</span><span class="na">value</span><span class="o">(</span><span class="n">tx</span><span class="o">);</span>
<span class="k">if</span> <span class="o">(</span><span class="n">tv</span> <span class="o">==</span> <span class="n">Double</span><span class="o">.</span><span class="na">POSITIVE_INFINITY</span><span class="o">)</span> <span class="k">continue</span><span class="o">;</span>
<span class="k">if</span> <span class="o">(</span><span class="n">tv</span> <span class="o"><=</span> <span class="n">v</span> <span class="o">+</span> <span class="n">t</span><span class="o">*</span><span class="n">alpha</span><span class="o">*</span><span class="n">gdd</span><span class="o">)</span> <span class="o">{</span>
<span class="n">foundStep</span> <span class="o">=</span> <span class="kc">true</span><span class="o">;</span>
<span class="k">break</span><span class="o">;</span>
<span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>
<p>My infinite loop was happening because my backtracking loop above was “succeeding” – that is, reporting it had found a forward step – but not actually moving foward along its vector. And the reason turned out to be that my test <code class="highlighter-rouge">tv <= v + t*alpha*gdd</code> was succeding because <code class="highlighter-rouge">v + t*alpha*gdd</code> was evaluating to just <code class="highlighter-rouge">v</code>, and I effectively had <code class="highlighter-rouge">tv == v</code>.</p>
<p>I had been bitten by one of the oldest floating-point fallacies: forgetting that <code class="highlighter-rouge">x + y</code> can equal <code class="highlighter-rouge">x</code> if <code class="highlighter-rouge">y</code> gets smaller than the Unit in the Last Place (ULP) of <code class="highlighter-rouge">x</code>.</p>
<p>This was an especially evil bug, as it very frequently <em>doesn’t</em> manifest. My unit testing in <em>two libraries</em> failed to trigger it. I have since added the offending data set to my splining unit tests, in case the code ever regresses somehow.</p>
<p>Now that I understood my problem, it turns out that I could use this to my advantage, as an effective test for local convergence. If I can’t find a step size that reduces my local objective function by an amount measurable to floating point resolution, then I am as good as converged at this stage of the algorithm. I re-wrote my code to reflect this insight, and added some annotations so I don’t forget what I learned:</p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span> <span class="o">(</span><span class="kt">double</span> <span class="n">t</span> <span class="o">=</span> <span class="mf">1.0</span><span class="o">;</span> <span class="n">t</span> <span class="o">></span> <span class="mf">0.0</span><span class="o">;</span> <span class="n">t</span> <span class="o">*=</span> <span class="n">beta</span><span class="o">)</span> <span class="o">{</span>
<span class="n">tx</span> <span class="o">=</span> <span class="n">x</span><span class="o">.</span><span class="na">add</span><span class="o">(</span><span class="n">xDelta</span><span class="o">.</span><span class="na">mapMultiply</span><span class="o">(</span><span class="n">t</span><span class="o">));</span>
<span class="n">tv</span> <span class="o">=</span> <span class="n">convexObjective</span><span class="o">.</span><span class="na">value</span><span class="o">(</span><span class="n">tx</span><span class="o">);</span>
<span class="k">if</span> <span class="o">(</span><span class="n">Double</span><span class="o">.</span><span class="na">isInfinite</span><span class="o">(</span><span class="n">tv</span><span class="o">))</span> <span class="o">{</span>
<span class="c1">// this is barrier convention for "outside the feasible domain",</span>
<span class="c1">// so try a smaller step</span>
<span class="k">continue</span><span class="o">;</span>
<span class="o">}</span>
<span class="kt">double</span> <span class="n">vtt</span> <span class="o">=</span> <span class="n">v</span> <span class="o">+</span> <span class="o">(</span><span class="n">t</span> <span class="o">*</span> <span class="n">alpha</span> <span class="o">*</span> <span class="n">gdd</span><span class="o">);</span>
<span class="k">if</span> <span class="o">(</span><span class="n">vtt</span> <span class="o">==</span> <span class="n">v</span><span class="o">)</span> <span class="o">{</span>
<span class="c1">// (t)(alpha)(gdd) is less than ULP(v)</span>
<span class="c1">// Further tests for improvement are going to fail</span>
<span class="k">break</span><span class="o">;</span>
<span class="o">}</span>
<span class="k">if</span> <span class="o">(</span><span class="n">tv</span> <span class="o"><=</span> <span class="n">vtt</span><span class="o">)</span> <span class="o">{</span>
<span class="c1">// This step resulted in an improvement, so halt with success</span>
<span class="n">foundStep</span> <span class="o">=</span> <span class="kc">true</span><span class="o">;</span>
<span class="k">break</span><span class="o">;</span>
<span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>
<p>I tend to pride myself on being aware that floating point numerics are a leaky abstraction, and the various ways these leaks can show up in computations, but pride goeth before a fall, and after all these years I can still burn myself! It never hurts to be reminded that you can never let your guard down with floating point numbers, and unit testing can never <em>guarantee</em> correctness. That goes double for numeric methods!</p>This week I finally started applying my new convex optimization library to solve for interpolating splines with monotonic constraints. Things seemed to be going well. My convex optimization was passing unit tests. My monotone splines were passing their unit tests too. I cut an initial release, and announced it to the world.Equality Constraints for Cubic B-Splines2018-09-08T14:32:00+00:002018-09-08T14:32:00+00:00http://erikerlandson.github.io/blog/2018/09/08/equality-constraints-for-cubic-b-splines<p>In my <a href="http://erikerlandson.github.io/blog/2018/09/02/putting-cubic-b-splines-into-standard-polynomial-form/">previous post</a>
I derived the standard-form polynomial coefficients for cubic B-splines.
As part of the <a href="https://github.com/erikerlandson/snowball">same project</a>,
I also need to add a feature that allows the library user to declare equality constraints of the form <nobr>(x,y)</nobr>,
where <nobr>S(x) = y</nobr>. Under the hood, I am invoking a <a href="https://github.com/erikerlandson/gibbous">convex optimization</a> library, and so I need to convert these
user inputs to a linear equation form that is consumable by the optimizer.</p>
<p>I expected this to be tricky, but it turns out I did most of the work <a href="http://erikerlandson.github.io/blog/2018/09/02/putting-cubic-b-splines-into-standard-polynomial-form/">already</a>.
I can take one of my previously-derived expressions for S(x) and put it into a form that gives me coefficients for the four contributing knot points <nobr>K<sub>j-3</sub> ... K<sub>j</sub></nobr>:</p>
<p><img src="/assets/images/bspline/ybblhxfw.png" alt="eq" /></p>
<p>Recall that by the convention from my previous post, <nobr>K<sub>j</sub></nobr> is the largest knot point that is <nobr><= x</nobr>.</p>
<p>My linear constraint equation is with respect to the vector I am solving for, in particular vector (τ), and so the
equation above yields the following:</p>
<p><img src="/assets/images/bspline/y7jhvnmk.png" alt="eq" /></p>
<p>In this form, it is easy to add into a <a href="https://github.com/erikerlandson/gibbous">convex optimization</a> problem as a linear equality constraint.</p>
<p>Gradient constraints are another common equality constraint in convex optimization, and so I can apply very similar logic to get coefficient values corresponding to the gradient of S:</p>
<p><img src="/assets/images/bspline/yd5fxmwk.png" alt="eq" /></p>
<p>And so my linear equality constraint with respect to (τ) in this case is:</p>
<p><img src="/assets/images/bspline/yalk3puu.png" alt="eq" /></p>
<p>And that gives me the tools I need to let my users supply additional equality constraints as simple <nobr>(x,y)</nobr> pairs, and translate them into a form that can be consumed by convex optimization routines. Happy Computing!</p>In my previous post I derived the standard-form polynomial coefficients for cubic B-splines. As part of the same project, I also need to add a feature that allows the library user to declare equality constraints of the form (x,y), where S(x) = y. Under the hood, I am invoking a convex optimization library, and so I need to convert these user inputs to a linear equation form that is consumable by the optimizer.Putting Cubic B-Splines into Standard Polynomial Form2018-09-02T11:07:00+00:002018-09-02T11:07:00+00:00http://erikerlandson.github.io/blog/2018/09/02/putting-cubic-b-splines-into-standard-polynomial-form<p>Lately I have been working on an <a href="https://github.com/erikerlandson/snowball">implementation</a> of monotone smoothing splines, based on <a href="#ref1">[1]</a>. As the title suggests, this technique is based on a univariate cubic <a href="https://en.wikipedia.org/wiki/B-splines">B-spline</a>. The form of the spline function used in the paper is as follows:</p>
<p><img src="/assets/images/bspline/yd2guhxt.png" alt="eq1" /></p>
<p>The knot points <nobr>K<sub>j</sub></nobr> are all equally spaced by 1/α, and so α normalizes knot intervals to 1. The function <nobr>B<sub>3</sub>(t)</nobr> and the four <nobr>N<sub>i</sub>(t)</nobr> are defined in this transformed space, t, of unit-separated knots.</p>
<p>I’m interested in providing an interpolated splines using the Apache Commons Math API, in particular the <a href="https://commons.apache.org/proper/commons-math/javadocs/api-3.6/org/apache/commons/math3/analysis/polynomials/PolynomialSplineFunction.html">PolynomialSplineFunction</a> class. In principle the above is clearly such a polynomial, but there are a few hitches.</p>
<ol>
<li><code class="highlighter-rouge">PolynomialSplineFunction</code> wants its knot intervals in closed standard polynomial form <nobr>ax<sup>3</sup> + bx<sup>2</sup> + cx + d</nobr></li>
<li>It wants each such polynomial expressed in the translated space <nobr>(x-K<sub>j</sub>)</nobr>, where <nobr>K<sub>j</sub></nobr> is the greatest knot point that is <= x.</li>
<li>The actual domain of S(x) is <nobr>K<sub>0</sub> ... K<sub>m-1</sub></nobr>. The first 3 “negative” knots are there to make the summation for S(x) cleaner. <code class="highlighter-rouge">PolynomialSplineFunction</code> needs its functions to be defined purely on the actual domain.</li>
</ol>
<p>Consider the arguments to <nobr>B<sub>3</sub></nobr>, for two adjacent knots <nobr>K<sub>j-1</sub></nobr> and <nobr>K<sub>j</sub></nobr>, where <nobr>K<sub>j</sub></nobr> is greatest knot point that is <= x. Recalling that knot points are all equally spaced by 1/α, we have the following relationship in the transformed space t:</p>
<p><img src="/assets/images/bspline/ydcb2ao3.png" alt="eq" /></p>
<p>We can apply this same manipulation to show that the arguments to <nobr>B<sub>3</sub></nobr>, as centered around knot <nobr>K<sub>j</sub></nobr>, are simply <nobr>{... t+2, t+1, t, t-1, t-2 ...}</nobr>.</p>
<p>By the definition of <nobr>B<sub>3</sub></nobr> above, you can see that <nobr>B<sub>3</sub>(t)</nobr> is non-zero only for t in <nobr>[0,4)</nobr>, and so the four corresponding knot points <nobr>K<sub>j-3</sub> ... K<sub>j</sub></nobr> contribute to its value:</p>
<p><img src="/assets/images/bspline/y9tpgfqj.png" alt="eq2" /></p>
<p>This suggests a way to manipulate the equations into a standard form. In the transformed space t, the four nonzero terms are:</p>
<p><img src="/assets/images/bspline/ya6gsrjy.png" alt="eq4" /></p>
<p>and by plugging in the appropriate <nobr>N<sub>i</sub></nobr> for each term, we arrive at:</p>
<p><img src="/assets/images/bspline/yc6grwxe.png" alt="eq5" /></p>
<p>Now, <code class="highlighter-rouge">PolynomialSplineFunction</code> is going to automatically identify the appropriate <nobr>K<sub>j</sub></nobr> and subtract it, and so I can define <em>that</em> transform as <nobr>u = x - K<sub>j</sub></nobr>, which gives:</p>
<p><img src="/assets/images/bspline/y9p3vgqt.png" alt="eq6" /></p>
<p>I substitute the argument (αu) into the definitions of the four <nobr>N<sub>i</sub></nobr> to obtain:</p>
<p><img src="/assets/images/bspline/y8apdoqy.png" alt="eq7" /></p>
<p>Lastly, collecting like terms gives me the standard-form coefficients that I need for <code class="highlighter-rouge">PolynomialSplineFunction</code>:</p>
<p><img src="/assets/images/bspline/ya74mlsf.png" alt="eq8" /></p>
<p>Now I am equipped to return a <code class="highlighter-rouge">PolynomialSplineFunction</code> to my users, which implements the cubic B-spline that I fit to their data. Happy computing!</p>
<h4 id="references">References</h4>
<p><a name="anchor1" id="ref1">[1] </a>H. Fujioka and H. Kano: <a href="https://github.com/erikerlandson/snowball/blob/master/monotone-cubic-B-splines-2013.pdf">Monotone smoothing spline curves using normalized uniform cubic B-splines</a>, Trans. Institute of Systems, Control and Information Engineers, Vol. 26, No. 11, pp. 389–397, 2013</p>Lately I have been working on an implementation of monotone smoothing splines, based on [1]. As the title suggests, this technique is based on a univariate cubic B-spline. The form of the spline function used in the paper is as follows:Solving Feasible Points With Smooth-Max2018-06-03T14:21:00+00:002018-06-03T14:21:00+00:00http://erikerlandson.github.io/blog/2018/06/03/solving-feasible-points-with-smooth-max<h3 id="overture">Overture</h3>
<p>Lately I have been fooling around with an <a href="https://github.com/erikerlandson/gibbous">implementation</a> of the <a href="#cite1">Barrier Method</a> for convex optimization with constraints.
One of the characteristics of the Barrier Method is that it requires an initial-guess from inside the
<em>feasible region</em>: that is, a point which is known to satisfy all of the inequality constraints provided
by the user.
For some optimization problems, it is straightforward to find such a point by using knowledge about the problem
domain, but in many situations it is not at all obvious how to identify such a point, or even if a
feasible point exists. The feasible region might be empty!</p>
<p>Boyd and Vandenberghe discuss a couple approaches to finding feasible points in §11.4 of <a href="#cite1">[1]</a>.
These methods require you to set up an “augmented” minimization problem:
<img src="/assets/images/feasible/y9czf8u7.png" alt="eq1" /></p>
<p>As you can see from the above, you have to set up an “augmented” space x+s, where (s) represents an additional
dimension, and constraint functions are augmented to f<sub>k</sub>-s</p>
<h3 id="the-problem">The Problem</h3>
<p>I experimented a little with these, and while I am confident they work for most problems having multiple
inequality constraints, my unit testing tripped over an ironic deficiency:
when I attempted to solve a feasible point for a single planar constraint, the numerics went a bit haywire.
Specifically, a linear constraint function happens to have a singular Hessian of all zeroes.
The final Hessian, coming out of the log barrier function, could be consumed by SVD to get a search direction
but the resulting gradients behaved poorly.</p>
<p>Part of the problem seems to be that the nature of this augmented minimization problem forces the algorithms
to push (s) ever downward, but letting (s) transitively push the f<sub>k</sub> with the augmented constraint
functions f<sub>k</sub>-s. When only a single linear constraint function is in play, the resulting gradient
caused augmented dimension (s) to converge <em>against</em> the movement of the remaining (unaugmented) sub-space.
The minimization did not converge to a feasible point, even though literally half of the space on one side
of the planar surface is feasible!</p>
<h3 id="smooth-max">Smooth Max</h3>
<p>Thinking about these issues made me wonder if a more direct approach was possible.
Another way to think about this problem is to minimize the maximum f<sub>k</sub>;
If the maximum f<sub>k</sub> is < 0 at a point x, then x is a feasible point satisfying all f<sub>k</sub>.
If the smallest-possible maximum f<sub>k</sub> is > 0, then we have definitive proof that no
feasible point exists, and our constraints can’t be satisfied.</p>
<p>Taking a maximum preserves convexity, which is a good start, but maximum isn’t differentiable everywhere.
The boundaries between regions where different functions are the maximum are not smooth, and along
those boundaries there is no gradient, and therefore no Hessian either.</p>
<p>However, there is a variation on this idea, known as smooth-max, defined like so:</p>
<p><img src="/assets/images/feasible/y8cgykuc.png" alt="eq2" /></p>
<p>Smooth-max has a well defined <a href="http://erikerlandson.github.io/blog/2018/05/27/the-gradient-and-hessian-of-the-smooth-max-over-functions/">gradient and Hessian</a>, and furthermore can be computed in a <a href="http://erikerlandson.github.io/blog/2018/05/28/computing-smooth-max-and-its-gradients-without-over-and-underflow/">numerically stable</a> way.
The sum inside the logarithm above is a sum of exponentials of convex functions.
This is good news; exponentials of convex functions are log-convex, and a sum of log-convex functions is also
log-convex.</p>
<p>That means I have the necessary tools to set up the my mini-max problem:
For a given set of convex constraint functions f<sub>k</sub>, I create a functions which is the soft-max of
these, and I minimize it.</p>
<h3 id="go-directly-to-jail">Go Directly to Jail</h3>
<p>I set about implementing my smooth-max idea, and immediately ran into almost the same problem as before.
If I try to solve for a single planar constraint, my Hessian degenerates to all-zeros!
When I unpacked the smoothmax-formula for a single constraint f<sub>k</sub>, it indeed is just f<sub>k</sub>,
zero Hessian and all!</p>
<h3 id="more-is-more">More is More</h3>
<p>What to do?
Well you know what form of constraint <em>always</em> has a well behaved Hessian? A circle, that’s what.
More technically, an n-dimensional ball, or n-ball.
What if I add a new constraint of the form:</p>
<p><img src="/assets/images/feasible/yd8xg64k.png" alt="eq3" /></p>
<p>This constraint equation is quadratic, and its Hessian is I<sub>n</sub>.
If I include this in my set of constraints, my smooth-max Hessian will be non-singular!</p>
<p>Since I do not know a priori where my feasible point might lie, I start with my n-ball centered at
my initial guess, and minimize. The result might look something like this:</p>
<p><img src="/assets/images/feasible/fig1.png" alt="fig1" /></p>
<p>Because the optimization is minimizing the maximum f<sub>k</sub>, the optimal point may not be feasible,
but if not it <em>will</em> end up closer to the feasible region than before.
This suggests an iterative algorithm, where I update the location of the n-ball at each iteration,
until the resulting optimized point lies on the intersection of my original constraints and my
additional n-ball constraint:</p>
<p><img src="/assets/images/feasible/fig2.png" alt="fig2" /></p>
<h3 id="caught-in-the-underflow">Caught in the Underflow</h3>
<p>I implemented the iterative algorithm above (you can see what this loop looks like <a href="https://github.com/erikerlandson/gibbous/blob/blog/feasible-points/src/main/java/com/manyangled/gibbous/optim/convex/ConvexOptimizer.java#L134">here</a>),
and it worked exactly as I hoped…
at least on my initial tests.
However, eventually I started playing with its convergence behavior by moving my constraint region farther
from the initial guess, to see how it would cope.
Suddenly the algorithm began failing again.
When I drilled down on why, I was taken aback to discover that my Hessian matrix was once again showing
up as all zeros!</p>
<p>The reason was interesting.
Recall that I used a <a href="http://erikerlandson.github.io/blog/2018/05/28/computing-smooth-max-and-its-gradients-without-over-and-underflow/">modified formula</a> to stabilize my smooth-max computations.
In particular, the “stabilized” formula for the Hessian looks like this:</p>
<p><img src="/assets/images/smoothmax/eq3b.png" alt="eq4" /></p>
<p>So, what was going on?
As I started moving my feasible region farther away, the corresponding constraint function started to
dominate the exponential terms in the equation above.
In other words, the distance to the feasible region became the (z) in these equations, and
this z value was large enough to drive the terms corresponding to my n-ball constraint to zero!</p>
<p>However, I have a lever to mitigate this problem.
If I make the α parameter <em>small</em> enough, it will compress these exponent ranges and prevent my
n-ball Hessian terms from washing out.
Decreasing α makes smooth-max more rounded-out, and decreases the sharpness of the approximation to the true max,
but minimizing smooth-max still yields the same minimum <em>location</em> as true maximum, and so playing this
trick does not undermine my results.</p>
<p>How small is small enough?
α is essentially a free parameter, but I found that if I set it at each iteration,
such that I make sure that my n-ball Hessian coefficient never drops below 1e-3 (but may be larger),
then my Hessian is always well behaved.
Note that as my iterations grow closer to the true feasible region, I can gradually allow α to
grow larger.
Currently, I don’t increase α larger than 1, to avoid creating curvatures too large, but I have not
experimented deeply with what actually happens if it were allowed to grow larger.
You can see what this looks like in my current implementation <a href="https://github.com/erikerlandson/gibbous/blob/blog/feasible-points/src/main/java/com/manyangled/gibbous/optim/convex/ConvexOptimizer.java#L153">here</a>.</p>
<h3 id="convergence">Convergence</h3>
<p>Tuning the smooth-max α parameter gave me numeric stability, but I noticed that as the feasible region
grew more distant from my initial guess, the algorithm’s time to converge grew larger fairly quickly.
When I studied its behavior, I saw that at large distances, the quadratic “cost” of my n-ball constraint
effectively pulled the optimal point fairly close to my n-ball center.
This doesn’t prevent the algorithm from finding a solution, but it does prevent it from going long distances
very fast.
To solve this adaptively, I added a scaling factor s to my n-ball constraint function.
The scaled version of the function looks like:</p>
<p><img src="/assets/images/feasible/y9gndl2f.png" alt="eq5" /></p>
<p>In my case, when my distances to a feasible region grow large, I want s to become small, so that it
causes the cost of the n-ball constraint to grow more slowly, and allow the optimization to move
farther, faster.
The following diagram illustrates this intuition:</p>
<p><img src="/assets/images/feasible/fig3.png" alt="fig3" /></p>
<p>In my algorithm, I set s = 1/σ, where σ represents the
“scale” of the current distance to feasible region.
The n-ball function grows as the square of the distance to the ball center; therefore I
set σ=(k)sqrt(s), so that it grows proportionally to the square root of the current largest user constraint
cost.
Here, (k) is a proportionality constant.
It too is a somewhat magic free parameter, but I have found that k=1.5 yields fast convergences and
good results.
One last trick I play is that I prevent σ from becoming less than a minimum value, currently 10.
This ensures that my n-ball constraint never dominates the total constraint sum, even as the
optimization converges close to the feasible region.
I want my “true” user constraints to dominate the behavior near the optimum, since those are the
constraints that matter.
The code is shorter than the explaination: you can see it <a href="https://github.com/erikerlandson/gibbous/blob/blog/feasible-points/src/main/java/com/manyangled/gibbous/optim/convex/ConvexOptimizer.java#L143">here</a></p>
<h3 id="conclusion">Conclusion</h3>
<p>After applying all these intuitions, the resulting algorithm appears to be numerically stable and also
converges pretty quickly even when the initial guess is very far from the true feasible region.
To review, you can look at the main loop of this algorithm starting <a href="https://github.com/erikerlandson/gibbous/blob/blog/feasible-points/src/main/java/com/manyangled/gibbous/optim/convex/ConvexOptimizer.java#L128">here</a>.</p>
<p>I’ve learned a lot about convex optimization and feasible point solving from working through practical
problems as I made mistakes and fixed them.
I’m fairly new to the whole arena of convex optimization, and I expect I’ll learn a lot more as I go.
Happy Computing!</p>
<h3 id="references">References</h3>
<p><a name="cite1"></a>
[1] §11.3 of <em>Convex Optimization</em>, Boyd and Vandenberghe, Cambridge University Press, 2008</p>Overture Lately I have been fooling around with an implementation of the Barrier Method for convex optimization with constraints. One of the characteristics of the Barrier Method is that it requires an initial-guess from inside the feasible region: that is, a point which is known to satisfy all of the inequality constraints provided by the user. For some optimization problems, it is straightforward to find such a point by using knowledge about the problem domain, but in many situations it is not at all obvious how to identify such a point, or even if a feasible point exists. The feasible region might be empty!Computing Smooth Max and its Gradients Without Over- and Underflow2018-05-28T08:13:00+00:002018-05-28T08:13:00+00:00http://erikerlandson.github.io/blog/2018/05/28/computing-smooth-max-and-its-gradients-without-over-and-underflow<p>In my <a href="http://erikerlandson.github.io/blog/2018/05/27/the-gradient-and-hessian-of-the-smooth-max-over-functions/">previous post</a> I derived the gradient and Hessian for the smooth max function.
The <a href="https://www.johndcook.com/blog/">Notorious JDC</a> wrote a helpful companion post that describes <a href="https://www.johndcook.com/blog/2010/01/20/how-to-compute-the-soft-maximum/">computational issues</a> of overflow and underflow with smooth max;
values of f<sub>k</sub> don’t have to grow very large (or small) before floating point limitations start to force their exponentials to +inf or zero.
In JDC’s post he discusses this topic in terms of a two-valued smooth max.
However it isn’t hard to generalize the idea to a collection of f<sub>k</sub>.
Start by taking the maximum value over our collection of functions, which I’ll define as (z):</p>
<p><img src="/assets/images/smoothmax/eq1b.png" alt="eq1" /></p>
<p>As JDC described in his post, this alternative expression for smooth max (m) is computationally stable.
Individual exponential terms may underflow to zero, but they are the ones which are dominated by the other terms, and so approximating them by zero is numerically accurate.
In the limit where one value dominates all others, it will be exactly the value given by (z).</p>
<p>It turns out that we can play a similar trick with computing the gradient:</p>
<p><img src="/assets/images/smoothmax/eq2b.png" alt="eq2" /></p>
<p>Without showing the derivation, we can apply exactly the same manipulation to the terms of the Hessian:</p>
<p><img src="/assets/images/smoothmax/eq3b.png" alt="eq3" /></p>
<p>And so we now have a computationally stable form of the equations for smooth max, its gradient and its Hessian. Enjoy!</p>In my previous post I derived the gradient and Hessian for the smooth max function. The Notorious JDC wrote a helpful companion post that describes computational issues of overflow and underflow with smooth max; values of fk don’t have to grow very large (or small) before floating point limitations start to force their exponentials to +inf or zero. In JDC’s post he discusses this topic in terms of a two-valued smooth max. However it isn’t hard to generalize the idea to a collection of fk. Start by taking the maximum value over our collection of functions, which I’ll define as (z):The Gradient and Hessian of the Smooth Max Over Functions2018-05-27T09:36:00+00:002018-05-27T09:36:00+00:00http://erikerlandson.github.io/blog/2018/05/27/the-gradient-and-hessian-of-the-smooth-max-over-functions<p>Suppose you have a set of functions over a vector space, and you are interested in taking the smooth-maximum over those functions.
For example, maybe you are doing gradient descent, or convex optimization, etc, and you need a variant on “maximum” that has a defined gradient.
The smooth maximum function has both a defined gradient and Hessian, and in this post I derive them.</p>
<p>I am using the <a href="https://www.johndcook.com/blog/2010/01/13/soft-maximum/">logarithm-based</a> definition of smooth-max, shown here:</p>
<p><img src="/assets/images/smoothmax/eq1.png" alt="eq1" /></p>
<p>I will use the second variation above, ignoring function arguments, with the hope of increasing clarity.
Applying the chain rule gives the ith partial gradient of smooth-max:</p>
<p><img src="/assets/images/smoothmax/eq2.png" alt="eq2" /></p>
<p>Now that we have an ith partial gradient, we can take the jth partial gradient of <em>that</em> to obtain the (i,j)th element of a Hessian:</p>
<p><img src="/assets/images/smoothmax/eq3.png" alt="eq3" /></p>
<p>This last re-grouping of terms allows us to see that we can express the full gradient and Hessian in the following more compact way:</p>
<p><img src="/assets/images/smoothmax/eq4.png" alt="eq4" /></p>
<p>With a gradient and Hessian, we now have the tools we need to use smooth-max in algorithms such as gradient descent and convex optimization. Happy computing!</p>Suppose you have a set of functions over a vector space, and you are interested in taking the smooth-maximum over those functions. For example, maybe you are doing gradient descent, or convex optimization, etc, and you need a variant on “maximum” that has a defined gradient. The smooth maximum function has both a defined gradient and Hessian, and in this post I derive them.