Trifecta

Can we get strong guarantees from AI tools that are known to hallucinate? We discuss some strategies, and ways that Elm might be a great target for AI assistance.

https://cdn.simplecast.com/audio/6a206baa-9c8e-4c25-9037-2b674204ba84/episodes/d1c5f97c-9700-48b0-ab35-a039edbfd0d5/audio/16dc506d-5aa1-42c1-8838-9ffaa3e0e1e9/default_tc.mp3 elm radio – 080: Elm and AI page

[00:56:59] So with this Trifecta, I think each of these pieces needs to do what it is best at. [00:57:06] Compilers are good at verifying things. [00:57:08] Humans are good at, do we even need humans anymore? [00:57:15] Humans are good at critically thinking, guiding these tools. [00:57:20] Humans have Goals. [00:57:24] Humans are good at gathering requirements. >> goals

[00:57:28] I'm not going to say they're good at it, but at the moment they're better than a machine. [00:57:33] Exactly. [00:57:34] And they have to because humans have goals. [00:57:36] The AI's job is not to have goals. >> goals

[00:57:39] Humans have goals for humans. [00:57:43] When a machine wants to make a program for a machine, then it can do it on its own. [00:57:48] This is absolutely not discrimination that I'm mentioning. [00:57:51] God.

[00:57:53] The human is the customer. [00:57:55] The human is the one that gets to say whether you solved the problem or not, that gets to [00:58:00] make the calls of what the problem you're solving is. [00:58:02] So that's like, the human needs to do that. [00:58:05] There's no substitute for that.

[00:58:08] Because as you said, if the customer is a machine or an API or something, then you can [00:58:15] automate it.

[00:58:16] So the human only asks, well, I need this, and then the machine can do the rest. [00:58:22] And you can have these feedback cycles with compilers and all kinds of test suites.

[00:58:32] So if that trifecta is what becomes really interesting to me, the human sets the goals [00:58:37] and can sort of validate these criteria and accept or not accept them. [00:58:42] The compiler is a verification tool. [00:58:46] It is a tool for giving information through static analysis that is guaranteed correct [00:58:52] information and checking that information.

[00:58:55] Elm Review and other static analysis tools can provide similar input and verifications. [00:59:01] And AI can piece things together using those guardrails and inputs and verifications provided [00:59:08] by those other external systems.

[00:59:10] So when those three things are sort of interacting, then I think it becomes really interesting, [00:59:15] especially, as I said, when we are using these things to create higher level building blocks [00:59:21] as humans.

[00:59:23] So we can say, give me a decoder. [00:59:25] And I know that it satisfies these things. [00:59:27] And I don't have to use brainpower to check that because I know it's an automated verification [00:59:31] of that.

[00:59:32] So I can trust it. [00:59:34] Give me a fake it till you make it simplest thing that could possibly work green test [00:59:38] for this test case and give it guardrails that allow me to trust that it's not going [00:59:44] beyond that and filling in implementation details. [00:59:48] Then you can actually trust these things.

[00:59:50] And yeah, well, there's there's one question of, do you even need a compiler or type checker [00:59:58] in the linter and the test suites? [01:00:01] Could you not just ask the AI to verify things?

[01:00:05] But then it comes again to the point of, well, who monitors who? [01:00:10] How do you trust the right checks?

[01:00:14] And at the end of the day, we we do trust the compiler. [01:00:17] Now, that said, it is possible for the compiler to have bugs and it can. [01:00:23] But for all intents and purposes, we fully trust the compiler. [01:00:27] We fully trust Elm Review.

[01:00:28] Of course, possible for these things to have bugs. [01:00:30] But I think that's a good assumption. [01:00:32] Whereas with AI, I don't fully trust it unless I verify it.

[01:00:37] The thing that is very important for me with regards to the compilers and linters and test [01:00:43] suites is that these are consistent. [01:00:47] Like if you run the same code, if you ask the compiler to verify the same code, it's [01:00:52] going to give you the same results. >> consistent

[01:00:54] If you run the same code in a test suite, it's going to give you the same results. [01:00:59] If you ask the AI to review your code, like, hey, are there any consistency issues that [01:01:06] the linter would tell me, for instance, then from one run to another, it could tell you [01:01:12] different things.

[01:01:14] It's kind of like asking a human, hey, can you please review this code and tell me how [01:01:20] you can improve it?

[01:01:21] Well, if I ask you today to do this seriously on my code base, you're going to find a lot [01:01:26] of problems.

[01:01:27] If I ask you tomorrow to do it again from scratch, you're going to give me a whole different [01:01:32] kind of problems.

[…]

Consistency

[01:01:40] Linters, when they're dealing with consistency, they give you a certain minimum of consistency [01:01:50] of code that is written in a specific way. >> consistency

[01:01:53] And it could go higher, probably. [01:01:56] Like you want all our functions to be named in a very similar way, for instance, but that's [01:02:03] probably a bit too hard for a linter. [01:02:06] An AI would always tell you different things, and we don't want that. [01:02:10] So we need these to be trustworthy and consistent in the sense that it doesn't give you different [01:02:16] results every time.

[01:02:19] And the lower level the task, the more we can trust it. [01:02:22] Just like Elm types, because the type system is so simple, it's easy to trust it. [01:02:27] Whereas TypeScript, it's so permissive, it's hard to trust it.

[01:02:32] And there are so many caveats and exceptions that it's hard to trust such a complex and [01:02:39] permissive system.

[01:02:41] So I do think that this might be a superpower of Elm. [01:02:44] And honestly, I think that maybe this could be a really appealing thing about Elm that [01:02:50] makes it more mainstream. [01:02:53] That, wow, this language, it turns out it's really good for automating and tooling. [01:02:59] And you know what? [01:03:01] Automating and tooling is really hot these days because people are building all sorts [01:03:04] of AI automation. [01:03:05] And we can have trusted AI automation.

[01:03:11] So I think we're at this early stage where people are just sort of letting AI just write [01:03:16] their code, which is kind of crazy. [01:03:19] They're letting AI just execute shell commands for them. [01:03:23] I saw a recent thing where somebody like... [01:03:28] We all knew it was going to happen when you start letting AI just fill in commands in [01:03:33] your shell.

[01:03:42] It's kind of a crazy state of things, right? [01:03:44] But if we can have tools that we can really trust and not have to worry about it doing [01:03:50] anything that's going to put things in a bad state or go beyond the scope of what we're [01:03:55] trying to do, just like perfectly reliably solve a difficult problem that we can now [01:04:01] take for granted.

[01:04:03] That's awesome. [01:04:04] And I think Elm is a really good fit for that.

[01:04:07] I've also heard the opposite point of view where this could be pretty bad for Elm or [01:04:13] for smaller languages in the sense that the AI is trained on code that is available. [01:04:21] And there's not a lot of Elm code out there compared to more mainstream languages like [01:04:26] JavaScript.

[01:04:27] So this could make adoption of new languages harder or smaller languages in general. [01:04:35] But as you said, if there are guarantees like the ones that Elm provides, that can even [01:04:42] out the playing field.

[01:04:44] But if you're designing a language that doesn't have the same guarantees as Elm, and it's [01:04:49] just very new or very small, then you get kind of the worst of both worlds.

[01:04:56] And this all depends on writing the tooling, right? [01:05:00] And so I think we have an opportunity to build really cool stuff leveraging these techniques [01:05:07] right now. [01:05:08] So I'm definitely going to be playing around with that.

[01:05:10] Like I've got a lot of ideas. [01:05:12] I want to make this sort of automated type puzzle solver. [01:05:17] I think, you know, having it build JSON decoders starts to become really interesting where [01:05:25] like Mario and I were working on this Elm HTTP fusion thing, which is really cool for [01:05:31] like having a UI where you make an HTTP request, and then you can sort of click the JSON fields [01:05:39] you want and it generates a decoder. [01:05:41] It's like, that's great.

[01:05:42] But what if you can tell it the type you want and it can figure out what fields to get and [01:05:49] generate something that is provably correct because you actually ran it and verified it, [01:05:55] and then you can fully trust it, but it just solves your problem. [01:05:58] And it sort of can solve that last mile problem where like, there are so many things I've [01:06:03] been trying to automate where it's difficult to do that last little piece of the puzzle [01:06:09] and AI can do that missing piece.

[01:06:11] So I think this unlocks some really cool things.

[01:06:15] I've been thinking about like some other use cases I'm thinking about are like, so for [01:06:19] example, like with Elm GraphQL, you know, we've talked with Matt about Elm GQL, which [01:06:25] sort of tries to be a simpler way of just taking a raw GraphQL query as a string. [01:06:32] And it's very easy to interact with GraphQL APIs through this raw query string. [01:06:39] And then it can generate type aliases for you of the response you get, and you just [01:06:45] paste in your GraphQL query string and it spits out some Elm code to execute that query. [01:06:52] And the trade off with that approach in Elm GQL versus Elm GraphQL, as we talked about [01:06:57] in our Elm GQL episode is with Elm GraphQL, you have to explicitly write everything you're [01:07:05] decoding in Elm code.

[01:07:07] But you can maintain it in Elm code and you get more fine grained control over the types [01:07:12] you decode into. [01:07:14] So there's a trade off.

[01:07:16] But what if you had a tool that generated an Elm GraphQL query, you get complete control [01:07:24] over the fine grained code and that you decode into, but what if you could just tell an AI [01:07:31] tool generate an Elm GraphQL query.

[01:07:33] And using this sort of type puzzle solver I built, I can say here are all the functions [01:07:39] for generating Elm GraphQL types, solve this problem. [01:07:44] And here's the raw GraphQL query. [01:07:46] And here is the resulting Elm type I want. [01:07:50] And it could, I think it could solve that pretty well. [01:07:54] So some of these tools become more interesting when you have that extra bit of glue from [01:08:00] AI.

Boilerplate

[01:08:01] And that would solve all of Elm's problems because all of Elm's problems are boilerplate. [01:08:07] Exactly.

[01:08:08] It's boilerplate that's really easy to maintain once you have it. [01:08:12] So if it's very easy to confidently write boilerplate, then yeah, Elm becomes a lot [01:08:17] more exciting.

[01:08:18] If we take your last example, it does mean that you redo the same logic every time and [01:08:24] not necessarily in a framework or library oriented way. [01:08:31] So you would redo, you would inline the creation of the GraphQL query and decoding those instead [01:08:39] of using a pre-made library, which simplifies the API for that. [01:08:45] But it could be very interesting nonetheless.

[01:08:50] I think part of the challenge right now to using these tools effectively is like defining [01:08:56] the problems and the workflows to leverage these as much as possible.

Refactoring

[01:09:02] Another thing on my mind here is like refactoring. [01:09:05] So we have, you know, if you build in an IntelliJ refactoring for like extracting a function [01:09:13] to a module, like what kinds of refactoring should we invest in building in like IDEs [01:09:21] or language servers versus using AI? [01:09:24] I mean, we could also just ask an AI to write those things to be integrated into the IDE [01:09:33] for instance. [01:09:34] So for instance, if you go back to the linter example, I don't want an AI to review my code [01:09:43] because it's going to be inconsistent.

[01:09:45] I can ask it to write a linter rule once and then I can run that linter rule multiple times. [01:09:51] But yeah, I definitely agree that there are cases where you will want to have a transformation [01:09:59] using AI rather than one that is hard-coded one way or another in an IDE. [01:10:05] That could be interesting to find.

[01:10:12] I'm very bullish on what we can do with these AI tools. [01:10:17] But I'll have you ask yourself whether you should. [01:10:22] Well, that's another question.

[01:10:26] The thing I'm bearish on would be just saying AI build a plugin. [01:10:34] You know, people are, there's a lot of hype around like it built a Chrome extension for [01:10:39] this thing. [01:10:40] It built a whole app from a sketch on a napkin. [01:10:42] And so it's like, okay, that's very impressive.

[01:10:46] It's very interesting, but like how I am skeptical of how useful that is going to actually prove [01:10:54] to be. [01:10:55] Like, I don't feel like that's what's going to take people's jobs away. [01:10:58] I don't feel like that's what's going to replace the work we're doing day to day. [01:11:01] I think it's these more mature things that we can really rely on where we're choosing [01:11:07] more constrained problems to do higher level operations and verify them and put guardrails [01:11:13] on them.

[01:11:15] I think that's my personal bias and obsession and people will get over that and not worry [01:11:21] about that and be able to do cooler things than I can do. [01:11:24] That's very possible. [01:11:25] I admit that's a possibility, but that's where I'm putting my money.

Happy Path

[01:11:29] So like writing, having it write the IDE completions for extracting functions and things like that. [01:11:38] Like it's like I can, the hard part isn't writing like the happy path. [01:11:46] I can write the happy path of that. [01:11:47] I've actually, I've done that in IntelliJ refactorings. [01:11:53] The hard part is everything else that it's not considering.

[01:11:56] And if I have to babysit it to make sure it solved each of those cases, I may as well [01:12:01] do it myself.

Pair with the AI

[01:12:02] Cause like the things it's going to miss, the things that I don't trust that it did [01:12:07] and I have to go check myself, it's easier to do them myself and engage with the problem [01:12:13] and solve it my way and know that I accounted for every corner case and wrote a test for [01:12:18] it than to just trust the AI and be like, okay, now I have to go check everything that [01:12:23] it did in this crazy, impossible to understand code. [01:12:26] That's not the way I would have solved it. [01:12:29] But if you paired with the AI.

[01:12:33] That's, that's the direction I think things are going. [01:12:35] Just like tell it very high level instructions. [01:12:39] But every time you give instructions, you, there's, there's some bias, right? [01:12:44] So at least so far the, the AI is, they're always very confident.

[01:13:02] I've seen a lot of people in the Elm Slack ask for questions like, how do I do X or how [01:13:06] do I do, how do I solve this problem? [01:13:09] And there's often that XY problem. [01:13:12] Like you ask to solve it.

[01:13:14] You asked a solution to X, but you're actually trying to solve a different Y problem. [01:13:19] And so even if I imagine that the AIs will become extremely, extremely good, like 100 times better than you and me combined, or there's still only going to solve the problem [01:13:33] that you're asking them to.

[01:13:35] Just like, let's imagine it's the smartest person on the world that you have free access [01:13:40] to. [01:13:41] Well, if you ask them something and they don't know, they don't think about whether you're [01:13:46] going to, whether it makes sense to ask the question, then they're not going to tell you.

[01:13:53] So you need to prompt them, but you also need to think about how you ask the question, what [01:13:59] question you ask. [01:14:00] And I'm thinking maybe we should ask them as well. [01:14:04] Like, Hey, I have this feature that I want to do. [01:14:08] So can you tell me how I transform this HTML to a string?

[01:14:12] And maybe you should also ask, like, does this make sense by the way? [01:14:15] Because then they start asking, answering that question. [01:14:19] Well, that no, that doesn't make sense.

[01:14:22] So I said, we're good at gathering requirements, but we're not very good at those, but that [01:14:29] is our job. [01:14:30] And I think it will increasingly become our job. [01:14:34] So we're going to become AI managers.

[01:14:40] AI product owners.

[01:14:44] I think, and what you're talking about, the word that's been coming to mind for me is that [01:14:50] these AI engines are very suggestible. [01:14:54] Like if you say, I don't know, where might I have lost my keys? [01:15:03] Hint, I often forget them in my pants that I put in the hamper. [01:15:11] Then it's going to be like, are they in the hamper? [01:15:14] But it's going to run with, I've seen that with the Elm compiler sometimes says, hint, [01:15:23] and it tells it, maybe you need to do this. [01:15:25] And then it's like, okay, sure, let me try.

[01:15:28] And it gets fixated on this one path that the compiler sort of hinted at, and that's [01:15:34] not a good path.

[01:15:35] So that's why with this type puzzle, I was trying to give it a very clear set of rules [01:15:41] and say, this is the set of steps you're following. [01:15:46] And then even teach it, this is how you adjust if you are incorrect in one of your guesses. [01:15:51] And so you really have to prime it and prevent it from getting fixated and biased in one [01:15:56] direction.

[01:15:58] But you also said some guardrails and if you were wrong in sending those guardrails, that's [01:16:04] going to be a long problem for you.

Ideation

[01:16:08] And it is, I mean, these AI engines are also, they're interesting for ideation as well. [01:16:13] So there, I mean, that's a whole nother topic we could get into, but. [01:16:18] We mostly talked about using it for things that we know well, and that we can validate, [01:16:23] verify, which I completely agree is probably the way to use it. [01:16:29] But it is also very good at helping you out when you don't know something. [01:16:33] And there it becomes a lot more dangerous because it's overconfident and it's going [01:16:38] to lead you to wrong results, wrong paths, and you're not going to be able to figure [01:16:44] those out.

[01:16:45] But because it knows a lot more than you, it will, I think in a lot of cases, be used [01:16:52] in that way. [01:16:53] And there people have to weigh in the risks that are involved.

[01:17:00] So definitely in some cases, it's going to be amazing. [01:17:04] For instance, I am not a good drawer, but I can ask an AI to draw something for me. [01:17:12] I actually do have a whole website filled with drawings, but I probably shouldn't train [01:17:17] it on that.

[01:17:19] But yeah, if I ask the AI to do it, then that would probably give some better results. [01:17:26] But when it comes to code, if I can verify it, then it's better. [01:17:31] If I can't verify it, then it's something new to me. [01:17:33] Well, that is very interesting as well. [01:17:36] And the thing that I'm worried here about on that matter is that if I ask the tool to [01:17:42] do something for me for something that I don't know, whether I will start over relying on [01:17:47] it instead of learning properly and improving my own skill sets. [01:17:53] I think that's going to happen a lot with a lot of people getting into development right [01:17:57] now.

[01:17:58] And yeah, I think being an experienced developer, it's a lot easier to know what to rely on [01:18:04] it for or when it's maybe like starting to write code where you're not learning to write [01:18:14] a reg x. [01:18:15] And you probably should sort of figure that out instead of just blindly trusting a thing. [01:18:19] Or maybe it's okay to just be like, if the test passes, I don't really care how it arrived [01:18:25] at that.

[01:18:26] Maybe that's okay too. [01:18:27] You know, but yeah, I can for instance, imagine a backend developer who knows a little bit [01:18:32] of Elm and they just ask the AI to generate the UI for their application or at least the [01:18:39] view parts of the application.

[01:18:43] And that's going to be very helpful to get started. [01:18:46] But how do you make sure that things are correct with accessibility and all those concerns [01:18:52] that you don't know about?

[01:18:56] Is it going to fit well with a design system you set up? [01:19:00] And there are all these assumptions that, yeah, so you have to know what to rely on [01:19:04] it for.

[01:19:05] And if it's like, if you can have it perform a high level task that you can fully verify [01:19:12] and trust it for, that's interesting. [01:19:14] If you can have it help you with ideation and generating a list of things to think about, [01:19:22] and that's input for you to consider some other things, that's also very good. [01:19:26] Because that, if something is helping you with ideation, you can filter out a little [01:19:31] bit of junk to get the diamond in the rough. [01:19:34] Oh, this one idea, I didn't consider that. [01:19:36] And that was really good.

[01:19:37] So that's another use case. [01:19:38] But the sort of in-between space where you just YOLO it and blindly incorporate it into [01:19:45] your code, I'm honestly pretty skeptical of the utility of that. [01:19:51] And I'm skeptical of how maintainable it's going to be working with systems like that [01:19:56] and maintaining code where there's a lot of that happening. [01:19:59] I think it's going to be okay for things that you're okay with throwing away. [01:20:04] Well, that you're okay with and that you can throw away.

[01:20:09] Yeah, if you can scope something down really narrowly. [01:20:12] I used it the other day for writing something to traverse a directory structure to find [01:20:21] the root Elm project by looking until it found an Elm.json file. [01:20:26] For my Elm pages scripts, I changed it so you can do Elm pages run and then give a file [01:20:31] path and it will find the nearest Elm.json to the file you pass in. [01:20:36] And I wrote it with GPT-4 and I went through a few iterations and I guided it very clearly [01:20:43] with what I wanted in the result.

[01:20:46] But I knew it was like, this is going to generate one function for me that if it works, I can [01:20:52] just let it do its thing. [01:20:54] Although I didn't like the style it used. [01:20:56] So I told it, instead of doing a bunch of for loops and while loops, can you do it using [01:21:03] functional style mapping and recursive functions? [01:21:06] And it modified it.

[01:21:08] And then I said, can you use ESM imports instead? [01:21:11] And with a few tweaks, I had it refactor it to the style I wanted. [01:21:14] And so yeah, it was like a constrained thing. [01:21:18] And the next time you do that, you will prime it with, oh, use a functional style and use [01:21:23] ESM, etc.

[01:21:26] And that was like a constrained enough thing that I know with my experience, that it's [01:21:33] like an independent, separable problem that if it writes a function that does this, I [01:21:39] can use that and it can be useful to my workflow. [01:21:41] So I think there's an art to knowing when to rely on it as well. [01:21:45] I feel like we have a lot more to talk about, a lot of interesting aspects to cover, but [01:21:52] this has already been a quite long episode.

[01:22:06] And tell us what you've been doing with Element AI or pure functional programming and AI. [01:22:12] We would love to hear from you. [01:22:13] We'd love to hear what clever things you come up with or just how you use it in your workflow [01:22:17] and let us know what you want to hear about with Elm and AI in the future.

[01:22:22] Did you prompt the audience well enough so that they give you the answers that you're [01:22:27] looking for or do you need to rephrase it slightly? [01:22:30] Maybe let's give them some guardrails. [01:22:32] Give us your example use cases. [01:22:35] Give us an example of the problem you used with it. [01:22:38] There we go. [01:22:39] I think we're good.