Skip to main content

· 12 min read
Ozzie Gooen

I’ve spent a fair bit of time over the last several years iterating on a text-based probability distribution editor (the 5 to 10 input editor in Guesstimate and Foretold). Recently I’ve added some programming language functionality to it, and have decided to refocus it as a domain-specific language.

The language is currently called Squiggle. Squiggle is made for expressing distributions and functions that return distributions. I hope that it can be used one day for submitting complex predictions on Foretold and other platforms.

Right now Squiggle is very much a research endeavor. I’m making significant sacrifices for stability and deployment in order to test out exciting possible features. If it were being developed in a tech company, it would be in the “research” or “labs” division.

You can mess with the current version of Squiggle here . Consider it in pre-alpha stage. If you do try it out, please do contact me with questions and concerns. It is still fairly buggy and undocumented.

I expect to spend a lot of time on Squiggle in the next several months or years. I’m curious to get feedback from the community. In the short term I’d like to get high-level feedback, in the longer term I’d appreciate user testing. If you have thoughts or would care to just have a call and chat, please reach out! We ( The Quantified Uncertainty Research Institute ) have some funding now, so I’m also interested in contractors or hires if someone is a really great fit.

Squiggle was previously introduced in a short talk that was transcribed here , and Nuño Sempere wrote a post about using it here .

Note: the code for this has developed since my time on Guesstimate. With Guesstimate, I had one cofounder, Matthew McDermott. During the last two years, I’ve had a lot of help from a handful of programmers and enthusiasts. Many thanks to Sebastian Kosch and Nuño Sempere, who both contributed. I’ll refer to this vague collective as “we” throughout this post.


Video Demo

A Quick Tour

The syntax is forked from Guesstimate and Foretold.

A simple normal distribution

normal(5,2)

You may notice that unlike Guesstimate, the distribution is nearly perfectly smooth. It’s this way because it doesn’t use sampling for (many) functions where it doesn’t need to.

Lognormal shorthand

5 to 10

This results in a lognormal distribution with 5 to 10 being the 5th and 95th confidence intervals respectively. You can also write lognormal distributions as: ### lognormal(1,2) or ### lognormal({mean: 3, stdev: 8}) .

Mix distributions with the multimodal function

multimodal(normal(5,2), uniform(14,19), [.2, .8])

You can also use the shorthand mm(), and add an array at the end to represent the weights of each combined distribution. Note: Right now, in the demo, I believe “multimodal” is broken, but you can use “mm”.

Mix distributions with discrete data
Note: This is particularly buggy. .

multimodal(0, 10, normal(4,5), [.4,.1, .5])

Variables

expected_case = normal(5,2)
long_tail = 3 to 1000
multimodal(expected_case, long_tail, [.2,.8])

Simple calculations
When calculations are done on two distributions, and there is no trivial symbolic solution the system will use Monte Carlo sampling for these select combinations. This assumes they are perfectly independent.

multimodal(normal(5,2) + uniform(10,3), (5 to 10) + 10) * 100

Pointwise calculations
We have an infix for what can be described as pointwise distribution calculations. Calculations are done along the y-axis instead of the x-axis, so to speak. “Pointwise” multiplication is equivalent to an independent Bayesian update. After each calculation, the distributions are renormalized.

normal(10,4) .* normal(14,3)

First-Class Functions
When a function is written, we can display a plot of that function for many values of a single variable. The below plots treat the single variable input on the x-axis, and show various percentiles going from the median outwards.

myFunction(t) = normal(t,10)
myFunction

myFunction(t) = normal(t^3,t^3.1)
myFunction

Reasons to Focus on Functions

Up until recently, Squiggle didn’t have function support. Going forward this will be the primary feature.

Functions are useful for two distinct purposes. First, they allow composition of models. Second, they can be used directly to be submitted as predictions. For instance, in theory you could predict, “For any point in time T, and company N, from now until 2050, this function will predict the market cap of the company.”

At this point I’m convinced of a few things:

  • It’s possible to intuitively write distributions and functions that return distributions, with the right tooling.
  • Functions that return distributions are highly preferable to specific distributions, if possible.
  • It would also be great if existing forecasting models could be distilled into common formats.
  • There’s very little activity in this space now.
  • There’s a high amount of value of information to further exploring the space.
  • Writing a small DSL like this will be a fair bit of work, but can be feasible if the functionality is kept limited.
  • Also, there are several other useful aspects about having a simple language equivalent for Guesstimate style models.

I think that this is a highly neglected area and I’m surprised it hasn’t been explored more. It’s possible that doing a good job is too challenging for a small team, but I think it’s worth investigating further.

What Squiggle is Meant For

The first main purpose of Squiggle is to help facilitate the creation of judgementally estimated distributions and functions.

Existing solutions assume the use of either data analysis and models, or judgemental estimation for points, but not judgemental estimation to intuit models. Squiggle is meant to allow people to estimate functions in situations where there is very little data available, and it’s assumed all or most variables will be intuitively estimated.

A second possible use case is to embed the results of computational models. Functions in Squiggle are rather portable and composable. Squiggle (or better future tools) could help make the results of these models interoperable.

One thing that Squiggle is not meant for is heavy calculation. It’s not a probabilistic programming language, because it doesn’t specialize in inference. Squiggle is a high-level language and is not great for performance optimization. The idea is that if you need to do heavy computational modeling, you’d do so using separate tools, then convert the results to lookup tables or other simple functions that you could express in Squiggle.

One analogy is to think about the online estimation “calculators” and “model explorers”. See the microCOVID Project calculator and the COVID-19 Predictions . In both of these, I assume there was some data analysis and processing stage done on the local machines of the analysts. The results were translated into some processed format (like a set of CSV files), and then custom code was written for a front end to analyze and display that data.

If they were to use a hypothetical front end unified format, this would mean converting their results into a Javascript function that could be called using a standardized interface. This standardization would make it easier for these calculators to be called by third party wigets and UIs, or for them to be downloaded and called from other workflows. The priority here is that the calculators could be run quickly and that the necessary code and data is minimized in size. Heavy calculation and analysis would still happen separately.

Future “Comprehensive” Uses

On the more comprehensive end, it would be interesting to figure out how individuals or collectives could make large clusters of these functions, where many functions call other functions, and continuous data is pulled in. The latter would probably require some server/database setup that ingests Squiggle files.

I think the comprehensive end is significantly more exciting than simpler use cases but also significantly more challenging. It’s equivalent to going from Docker the core technology, to Docker hub, then making an attempt at Kubernetes. Here we barely have a prototype of the proverbial Docker, so there’s a lot of work to do.

Why doesn’t this exist already?

I will briefly pause here to flag that I believe the comprehensive end seems fairly obvious as a goal and I’m quite surprised it hasn’t really been attempted yet, from what I can tell. I imagine such work could be useful to many important actors, conditional on them understanding how to use it.

My best guess is this is due to some mix between:

  • It’s too technical for many people to be comfortable with.
  • There’s a fair amount of work to be done, and it’s difficult to monetize quickly.
  • There’s been an odd, long-standing cultural bias against clearly intuitive estimates.
  • The work is substantially harder than I realize.

Related Tools

Guesstimate
I previously made Guesstimate and take a lot of inspiration from it. Squiggle will be a language that uses pure text, not a spreadsheet. Perhaps Squiggle could one day be made available within Guesstimate cells.

Ergo
Ought has a Python library called Ergo with a lot of tooling for judgemental forecasting. It’s written in Python so works well with the Python ecosystem. My impression is that it’s made much more to do calculations of specific distributions than to represent functions. Maybe Ergo results could eventually be embedded into Squiggle functions.

Elicit
Elicit is also made by Ought . It does a few things, I recommend just checking it out. Perhaps Squiggle could one day be an option in Elicit as a forecasting format.

Causal
Causal is a startup that makes it simple to represent distributions over time. It seems fairly optimized for clever businesses. I imagine it probably is going to be the most polished and easy to use tool in its targeted use cases for quite a while. Causal has an innovative UI with HTML blocks for the different distributions; it’s not either a spreadsheet-like Guesstimate or a programming language, but something in between.

Spreadsheets
Spreadsheets are really good at organizing large tables of parameters for complex estimations. Regular text files aren’t. I could imagine ways Squiggle could have native support for something like Markdown Tables that get converted into small editable spreadsheets when being edited. Another solution would be to allow the use of JSON or TOML in the language, and auto-translate that into easier tools like tables in editors that allow for them.[2]

Probabilistic Programming Languages
There are a bunch of powerful Probabilistic Programming Languages out there. These typically specialize in doing inference on specific data sets. Hopefully, they could be complementary to Squiggle in the long term. As said earlier, Probabilistic Programming Languages are great for computationally intense operations, and Squiggle is not.

Prediction Markets and Prediction Tournaments
Most of these tools have fairly simple inputs or forecasting types. If Squiggle becomes polished, I plan to encourage its use for these platforms. I would like to see Squiggle as an open-source, standardized language, but it will be a while (if ever) for it to be stable enough.

Declarative Programming Languages
Many declarative programming languages seem relevant. There are several logical or ontological languages, but my impression is that most assume certainty, which seems vastly suboptimal. I think that there’s a lot of exploration for languages that allow users to basically state all of their beliefs probabilistically, including statements about the relationships between these beliefs. The purpose wouldn’t be to find one specific variable (as often true with probabilistic programming languages), but to more to express one’s beliefs to those interested, or do various kinds of resulting analyses.

Knowledge Graphs
Knowledge graphs seem like the best tool for describing semantic relationships in ways that anyone outside a small group could understand. I tried making my own small knowledge graph library called Ken , which we’ve been using a little in Foretold . If Squiggle winds up achieving the comprehensive vision mentioned, I imagine there will be a knowledge graph somewhere.

For example, someone could write a function that takes in a “standard location schema” and returns a calculation of the number of piano tuners at that location. Later when someone queries Wikipedia for a town, it will recognize that that town has data on Wikidata , which can be easily converted into the necessary schema.

Next Steps

Right now I’m the only active developer of Squiggle. My work is split between Squiggle, writing blog posts and content, and other administrative and organizational duties for QURI.

My first plan is to add some documentation, clean up the internals, and begin writing short programs for personal and group use. If things go well and we could find a good developer to hire, I would be excited to see what we could do after a year or two.

Ambitious versions of Squiggle would be a lot of work (as in, 50 to 5000+ engineer years work), so I want to take things one step at a time. I would hope that if progress is sufficiently exciting, it would be possible to either raise sufficient funding or encourage other startups and companies to attempt their own similar solutions.

Footnotes

[1] The main challenge comes from having a language that represents symbolic mathematics and programming statements. Both of these independently seem challenging, and I have yet to find a great way to combine them. If you read this and have suggestions for learning about making mathematical languages (like Wolfram), please do let me know.

[2] I have a distaste for JSON in cases that are primarily written and read by users. JSON was really optimized for simplicity for programming, not people. My guess is that it was a mistake to have so many modern configuration systems be in JSON instead of TOML or similar.

· 14 min read
Ozzie Gooen

This piece is meant to be read after Squiggle: An Overview . It includes technical information I thought best separated out for readers familiar with coding. As such, it’s a bit of a grab-bag. It explains the basic internals of Squiggle, outlines ways it could be used in other programming languages, and details some of the history behind it.

The Squiggle codebase is organized in this github repo . It’s open source. The code is quite messy now, but do ping me if you’re interested in running it or understanding it.

Project Subcomponents

I think of Squiggle in three distinct clusters.

  1. A high-level ReasonML library for probability distributions.
  2. A simple programming language.
  3. Custom visualizations and GUIs.

1. A high-level ReasonML library for probability distribution functions

Python has some great libraries for working with probabilities and symbolic mathematics. Javascript doesn’t. Squiggle is to be run in Javascript (for interactive editing and use), so the first step for this is to have good libraries to do the basic math.

The second step is to have-level types that could express various types of distributions and functions of distributions. For example, some distributions have symbolic representations, and others are rendered (stored as x-y coordinates). These two types have to be dealt with separately. Squiggle also has limited support for continuous and discrete mixtures, and the math for this adds more complexity.

When it comes to performing functions on expressions, there’s a lot of optimization necessary for this to go smoothly. Say you were to write the function,

multimodal(normal(5,2), normal(10,1) + uniform(1,10)) * 100

You’d want to apply a combination of symbolic, numeric, and sampling techniques in order to render this equation. In this case, Squiggle would perform sampling to compute the distribution of normal(10,1) + uniform(1,10) and then it would use numeric methods for the rest of the equation. In the future, it would be neat if Squiggle would also first symbolically modify the internal distributions to be multiplied by 100, rather than performing it as a separate numeric step.

This type-dependent function operations can be confusing to users, but hopefully less confusing than having to figure out how to do each of the three and doing them separately. I imagine there could be some debugging UI to better explain what operations are performed.

2. Simple programming language functionality

It can be useful to think of Squiggle as similar to SQL, Excel, or Probabilistic Programming Languages like WebPPL . There are simple ways to declare variables and write functions, but don’t expect to use classes, inheritance, or monads. There’s no for loops, though it will probably have some kinds of reduce() methods in the future.

So far the parsing is done with MathJS, meaning we can’t change the syntax. I’m looking forward to doing so and have been thinking about what it should be like. One idea I’m aiming for is to allow for simple dependent typing for the sake of expressing limited functions. For instance,

myFunction(t: [float from 0 to 30]) = normal(t,10)
myFunction

This function would return an error if called with a float less than 0 or greater than 30. I imagine that many prediction functions would only be estimated for limited domains.

With some introspection it should be possible to auto-generate calculator-like interfaces.

3. Visualizations and GUIs

The main visualizations need to be made from scratch because there’s little out there now in terms of quality open-source visualizations of probability distributions and similar. This is especially true for continuous and discrete mixtures. D3 seems like the main library here, and D3 can be gnarly to write and maintain.

Right now we’re using a basic Vega chart for the distribution over a variable, but this will be replaced later.

In the near term, I’m interested in making calculator-like user interfaces of various kinds. I imagine one prediction function could be used for many interfaces of calculators.

Deployment Story, or, Why Javascript?

Squiggle is written in ReasonML which compiles to Javascript. The obvious alternative is Python. Lesser obvious but interesting options are Mathematica or Rust via WebAssembly.

The plan for Squiggle is to prioritize small programs that could be embedded in other programs and run quickly. Perhaps there will be 30 submissions for a “Covid-19 over time per location” calculator, and we’d want to run them in parallel in order to find the average answer or to rank them. I could imagine many situations where it would be useful to run these functions for many different inputs; for example, for kinds of sensitivity analyses.

One nice-to-have feature would be functions that call other functions. Perhaps a model of your future income levels depends on some other aggregated function of the S&P 500, which further depends on models of potential tail risks to the economy. If this were the case you would want to have those model dependencies be easily accessible. This could be done via downloading or having a cloud API to quickly call them remotely.

Challenges like these require some programmatic architecture where functions can be fully isolated/sandboxed and downloaded and run on the fly. There are very few web application infrastructures aimed to do things like this, I assume in part because of the apparent difficulty.

Python is open source and has the most open-source tooling for probabilistic work. Ought’s Ergo is in Python, and their Elicit uses Ergo (I believe). Pyro and Edward , two of the most recent and advanced probabilistic programming languages, are accessible in Python. Generally, Python is the obvious choice.

Unfortunately, the current tooling to run small embedded Python programs, particularly in the browser, is quite mediocre. There are a few attempts to bring Python directly to the browser, like Pyrodide , but these are quite early and relatively poorly supported. If you want to run a bunch of Python jobs on demand, you could use Serverless platforms like AWS Lambda or something more specialized like PythonAnywhere . Even these are relatively young and raise challenges around speed, cost, and complexity.

I’ve looked a fair bit into various solutions. I think that for at least the next 5 to 15 years, the Python solutions will be challenging to run as conveniently as Javascript solutions would. For this time it’s expected that Python will have to run in separate servers, and this raises issues of speed, cost, and complexity.

At Guesstimate , we experimented with solutions that had sampling running on a server and found this to hurt the experience. We tested latency of around 40ms to 200ms. Being able to see the results of calculations as you type is a big deal and server computation prevented this. It’s possible that newer services with global/local server infrastructures could help here (as opposed to setups with only 10 servers spread around globally), but it would be tricky. Fly.io launched in the last year, maybe that would be a decent fit for near-user computation.

Basically, at this point, it seems important that Squiggle programs could be easily imported and embedded in the browser and servers, and for this, Javascript currently seems like the best bet. Javascript currently has poor support for probability, but writing our own probability libraries is more feasible than making Python portable. All of the options seem fairly mediocre, but Javascript a bit less so.

Javascript obviously runs well in the browser, but its versatility is greater than that. Observable and other in-browser Javascript coding platforms load in NPM libraries on the fly to run directly in the browser, which demonstrates that such functionality is possible. It’s possible (though I imagine a bit rough) to call Javascript programs from Python.

ReasonML compiles to OCaml before it compiles to Javascript. I’ve found it convenient for writing complicated code and now am hesitant to go back to a dynamic, non-functional language. There’s definitely a whole lot to do (the existing Javascript support for math is very limited), but at least there are decent approaches to doing it.

I imagine the landscape will change a lot in the next 3 to 10 years. I’m going to continue to keep an eye on the space. If things change I could very much imagine pursuing a rewrite, but I think it will be a while before any change seems obvious.

Using Squiggle with other languages

Once the basics of Squiggle are set up, it could be used to describe the results of models that come from other programs. Similar to how many programming languages have ORMs to generate custom SQL statements, similar tools could be made to generate Squiggle functions. The important thing to grok is that Squiggle functions are submitted information, not just internally useful tools. If there were an API to accept “predictions”, people would submit Squiggle code snippets directly to this API.

I’d note here that I find it somewhat interesting how few public APIs do accept code snippets. I could imagine a version of Facebook where you could submit a Javascript function that would take in information about a post and return a number that would be used for ranking it in your feed. This kind of functionality seems like it could be very powerful. My impression is that it’s currently thought to be too hard to do given existing technologies. This of course is not a good sign for the feasibility of my proposal here, but this coarse seems like a necessary one to do at some time.

Example #1:

Say you calculate a few parameters, but know they represent a multimodal combination of a normal distribution and a uniform distribution. You want to submit that as your prediction or estimate via the API of Metaculus or Foretold. You could write that as (in Javascript):

var squiggleValue = `mm(normal(${norm.mean}, ${norm.stdev}}), uniform(0, ${uni.max}))`

The alternative to this is that you send a bunch of X-Y coordinates representing the distribution, but this isn’t good. It would require you to load the necessary library, do all the math on your end, and then send (what is now a both approximated and much longer) form to the server.

With Squiggle, you don’t need to calculate the shape of the function in your code, you just need to express it symbolically and send that off.

Example #2:

Say you want to describe a distribution with a few or a bunch of calculated CDF points. You could do this by wrapping these points into a function that would convert them into a smooth distribution using one of several possible interpolation methods. Maybe in Javascript this would be something like,

var points = [[1, 30], [4, 40], [50,70]];
var squiggleValue = `interpolatePoints(${points}, metalog)`

I could imagine it is possible that the majority of distributions generated from other code would be sent this way. However, I can’t tell what the specifics of that now or what interpolation strategies may be favored. Doing it with many options would allow us to wait and learn what seems to be best. If there is one syntax used an overwhelming proportion of the time, perhaps that could be separated into its own simpler format.

Example #3:

Say you want to estimate Tesla stock at every point in the next 10 years. You decide to estimate this using a simple analytical equation, where you predict that the price of Tesla stock can be modeled as growing by a mean of -3 to 8 percent each year from the current price using a normal distribution (apologies to Nassim Taleb).

You have a script that fetches Tesla’s current stock, then uses that in the following string template:

var squiggleValue = `(t) => ${current_price} * (0.97 to 1.08)^t`

It may seem a bit silly to not just fetch Tesla’s price from within Squiggle, but it does help separate concerns. Data fetching within Squiggle would raise a bunch of issues, especially when trying to score Squiggle functions.It may seem a bit silly to not just fetch Tesla’s price from within Squiggle, but it does help separate concerns. Data fetching within Squiggle would raise a bunch of issues, especially when trying to score Squiggle functions.

History: From Guesstimate to Squiggle

The history of “Squiggle” goes back to early Guesstimate. It’s been quite a meandering journey. I was never really expecting things to go the particular way they did, but at least am relatively satisfied with how things are right now. I imagine these details won’t be interesting to most readers, but wanted to include it for those particularly close to the project, or for those curious on what I personally have been up to.

90% of the work on Squiggle has been on a probability distribution editor (“A high-level ReasonML library for probability distribution functions”). This has been a several year process, including my time with Guesstimate. The other 10% of the work, with the custom functions, is much more recent.

Things started with Guesstimate in around 2016. The Guesstimate editor used a simple sampling setup. It was built with Math.js plus a bit of tooling to support sampling and a few custom functions.[1] The editor produced histograms, as opposed to smooth shapes.

When I started working on Foretold , in 2018, I hoped we could directly take the editor from Guesstimate. It soon became clear the histograms it produced wouldn’t be adequate.

In Foretold we needed to score distributions. Scoring distributions requires finding the probability density function at different points, and that requires a continuous representation of the distribution. Converting random samples to continuous distributions requires kernel density estimation. I tried simple kernel density estimation, but couldn’t get this to work well. Randomness in distribution shape is quite poor for forecasting users. It brings randomness into scoring, it looks strange (confusing), and it’s terrible when there are long tails.

Limited distribution editors like those in Metaculus or Elicit don’t use sampling; they use numeric techniques. For example, to take the pointwise sum of three uniform distributions, they would take the pdfs at each point and add them vertically. Numeric techniques are well defined for a narrow subset of combinations of distributions. The main problem with these editors is that they are (so far) highly limited in flexibility; you can only make linear combinations of single kinds of distributions (logistic distributions in Metaculus and uniform ones with Elicit.)

It took a while, but we eventually created a simple editor that would use numeric techniques to combine a small subset of distributions and functions using a semi-flexible string representation. If users would request functionality not available in this editor (like multiplying two distributions together, which would require sampling), it would fall back to using the old editor. This was useful but suboptimal. It required us to keep two versions of the editor with slightly different syntaxes, which was not fun for users to keep track of.

The numeric solver could figure out syntaxes like,

multimodal(normal(5,2), uniform(10,13), [.2,.8])

But would break anytime you wanted to use any other function, like,

multimodal(normal(5,2) + lognormal(1,1.5), uniform(10,13), [.2,.8])*100

The next step was making a system that would more precisely use numeric methods and Monte Carlo sampling.

At this point we needed to replace most of Math.js. Careful control over the use of Monte Carlo techniques vs. numeric techniques required us to write our own interpreter. Sebastian Kosch did the first main stab at this. I then read a fair bit about how to write interpreted languages and fleshed out the functionality. If you’re interested, the book Crafting Interpreters is pretty great on this topic.{interpreters}

At this point we were 80% of the way there to having simple variables and functions, so those made sense to add as well. Once we had functions, it was simple to try out visualizations of single variable distributions, something I’ve been wanting to test out for a long time. This proved surprisingly fun, though of course it was limited (and still is.)

After messing with these functions, and spending a lot more time thinking about them, I decided to focus more on making this a formalized language in order to better explore a few areas. This is when I took this language out of its previous application (called WideDomain, it’s not important now), and renamed it Squiggle.

[1] It was great this worked at the time; writing my own version may have been too challenging, so it’s possible this hack was counterfactually responsible for Guesstimate.

· 11 min read
Ozzie Gooen

This post was originally published on Aug 2020, on LessWrong. The name of the project has since been changed from Suiggly to Squiggle

(Talk given at the LessWrong Lighting Talks in 2020. Ozzie Gooen is responsible for the talk, Jacob Lagerros and Justis Mills edited the transcript. an event on Sunday 16th of August)

Ozzie: This image is my TLDR on probability distributions: Basically, distributions are kind of old school. People are used to estimating and predicting them. We don't want that. We want functions that return distributions -- those are way cooler. The future is functions, not distributions.