Collect some example schemas to help test plugins #779

simonw · 2025-02-27T01:40:52Z

I'd like to establish a good subset of JSON schema that all of the models should be expected to support (likely based on Gemini as I think that's the lowest common denominator right now), then collect some examples which I can use to test all the plugins and encourage other plugin authors to test against.

simonw · 2025-02-27T04:18:47Z

Very simple one for a single dog:

{
    "type": "object",
    "properties": {
        "name": {
            "type": "string"
        },
        "bio": {
            "type": "string"
        }
    }
}

simonw · 2025-02-27T04:19:06Z

Here's an array of dogs with a bit more detail:

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "dogs": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "name": {
            "type": "string",
            "minLength": 1
          },
          "bio": {
            "type": "string",
            "minLength": 1
          }
        },
        "required": ["name", "bio"],
        "additionalProperties": false
      }
    }
  },
  "required": ["dogs"],
  "additionalProperties": false
}

simonw · 2025-02-27T05:14:33Z

This doesn't quite work right:

llm --schema '{
  "type": "object",
  "properties": {
    "segments": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "speaker_name": {
            "type": "string"
          },
          "spoken_text": {
            "type": "string"
          },
          "timestamp_mm_ss": {
            "type": "string"
          }
        }
      }
    }
  }
}' -m gemini-2.0-flash \
  -a https://static.simonwillison.net/static/2025/ten-minutes-of-podcast.mp3 'transcribe this'

Output:

{
  "segments": [
    {
      "speaker_name": "Host",
      "spoken_text": "on its own and and it has this sort of like uh it's like a you know, old tree in the forest, you know, kind of thing that you've built. So."
    },
    {
      "speaker_name": "Guest",
      "spoken_text": "it's also like what I feel like with online writing, never ever like stick something online and just expect people to find it. You have to. So, one of the great things about having a blog is I can be in a conversation about something and somebody asked a question, I can say, Oh, I wrote about that two and a half years ago and give people a link. So I'm constantly giving people links to things I've talked about in the past."
    },
    {
      "speaker_name": "Host",
      "spoken_text": "Yeah. It's kind of like your projects uh on GitHub also."
    },
    {
      "speaker_name": "Guest",
      "spoken_text": "Exactly."
    },
    {
      "speaker_name": "Host",
      "spoken_text": "I know, I guess people were so that before where they're like start searching for something that and then find the result. Oh, oh, I wrote about that."
    },
    {
      "speaker_name": "Guest",
      "spoken_text": "Oh, completely. That happens to me so much. Honestly, I've got, like I said, I've got 850 projects, which means that sometimes I will forget a project exists and I will go looking for a solution to something there will be a library that I wrote that I had fallen out of my head. That's that's deeply entertaining when that happens."
    },
    {
      "speaker_name": "Host",
      "spoken_text": "Yeah, yeah. Do you think it's easier than ever to start a blog?"
    },
    {
      "speaker_name": "Guest",
      "spoken_text": "Well, I don't know. Like, one of the things I haven't quite got my head around is I keep on hearing rumors that Google doesn't credit new sites nearly as much as it used to. But I but these like it's SEO, there are rumors about everything. So it might be that because my blog has existed effectively for 20 years, I've got so much sort of built up like credibility with Google that I get. Yeah."
    },
    {
      "speaker_name": "Host",
      "spoken_text": "credentials."
    },
    {
      "speaker_name": "Guest",
      "spoken_text": "I get results and I if that is the case, like say, start your blog now and in 10 years time."
    },
    {
      "speaker_name": "Host",
      "spoken_text": "Yeah, yeah. Yeah, that that that investment starts paying off in terms of the the search ranking. Although who knows what search will look like in three years time at this point, you know?"
    },
    {
      "speaker_name": "Host",
      "spoken_text": "Yeah. I feel like the benefits that you outlined already are there too though. The the idea that it's this repository of you know, what you're doing, what you've created. Uh you do a lot of link logging, which is interesting. So it's kind of like this powerful notepad and tool and then you also social media, it's just who knows how stable it's going to be. You know, it's just crazy. And so like all the stuff that you've written in those places, if you've done any kind of work and made big link types of stuff that stuff should be on a blog or you know, links to it."
    },
    {
      "speaker_name": "Guest",
      "spoken_text": "Absolutely. Like, I love there's there's a philosophy. I forget the acronym but there's a a philosophy where you publish on your own site and oh Posy, p o s e, publish an own site syndicate elsewhere. So everything goes on your blog first and then you post a link to it on all of the other platforms. Because some of the platforms like the Twitter algorithm penalizes links these days, which is very frustrating. Blue Sky doesn't, right? So."
    },
    {
      "speaker_name": "Host",
      "spoken_text": "Yeah, I know. Yeah."
    },
    {
      "speaker_name": "Guest",
      "spoken_text": "and and that's gone down. So we've got our own ways around this now. And the other thing I think that's really important, I keep on focusing on the idea of credibility. Like building credibility is so important. Like, when I'm looking for sources of information, I look for people who have earned credibility with me and I want to earn credibility myself with other people. And having like 20 years of blog content gives you instant credibility. Like I can point you to my SQlite tag on my blog which goes back to 2003 when I first heard about SQlite, you know? And that so I love that."
    },
    {
      "speaker_name": "Host",
      "spoken_text": "Right, yeah."
    },
    {
      "speaker_name": "Guest",
      "spoken_text": "I feel like and that's the kind of thing where credibility is accumulated over time and it doesn't take much. Like a link blog about a subject, run that for six months and you will become one of the top not.1% people on earth for credibility on that subject just from publishing a few notes and linking to a bunch of things about it."
    },
    {
      "speaker_name": "Host",
      "spoken_text": "Yeah, got to start. That's the trick. Yeah."
    },
    {
      "speaker_name": "Host",
      "spoken_text": "As you started this process of writing about and researching LLMs, I guess maybe we can we can cover some one of these things that you've written about recently at least on social media how people have been saying, oh, you're a shill about LMs. Which I think is really kind of fascinating because in a way there's like these camps, you know, like people who are completely one side or the other and you know, either they're, you know, they're completely useless or, you know, or they're, you know, going to change every single thing that's out there. There's a lot of there's a real huge hype uh cycle that's happening. And my co-host, he's been in programming longer than me, you know, we're similar age, but I went off and did a whole career with music and other things and I kind of got back into programming over the last uh five, six years and so he talks about AI Winters and and says, oh, we're headed for another one and I don't know. Like I don't know kind of what to look at and so forth there, but I I kind of wonder like where do you feel like you fit on the AI hype meter? Gotcha. Between things. Yeah."
    },
    {
      "speaker_name": "Guest",
      "spoken_text": "So, I just I just realized that I kind of think of it as a grid where you've got two axes. You've got the useless to useful access and then you've got the the evil to to to opposite of evil access. And I think I am very far across on the use like the one argument I will the one thing I will not accept is people say no, these things are useless. Like that that I have a very strong opinion. They are useful if you understand how to use them, which is very nonobvious and unintuitive. Yeah. Okay."
    },
    {
      "speaker_name": "Host",
      "spoken_text": "I think we want to talk about that a lot today. As far as how it connects with Python. Yeah."
    },
    {
      "speaker_name": "Guest",
      "spoken_text": "So, I'm very high very far over on the no, these things are useful if you know what you're doing with them. And then in terms of the evil to to good, I think I am right in the middle. Like neutral. I think yeah, absolutely because almost all of the negative things that people say about about LLMs are true. Like almost all of them. Um they do make things up all the time. The environmental impact of them is is is is big. They are trained on a giant pile of unlicensed data. Like you can call them a plagiaris machine. That's not exactly. There a lot of truth to that."
    },
    {
      "speaker_name": "Host",
      "spoken_text": "Right. And the way people can use, there are many harmful things that you can do with them. Right."
    },
    {
      "speaker_name": "Guest",
      "spoken_text": "On the flip side, I think a lot of there are a lot of really positive like like uses that people don't, people who are completely anti LLM don't really consider. A couple of my favorite examples are firstly for translation, oh my goodness. Like every human language, we now have a translator that won't just do a straight up Google translate style translation. It'll answer follow-up questions. You can say, oh, that seems a little bit vague, which if this is um like Mexican Spanish as opposed to Guatemalan Spanish, could this word have a different meaning? Those kinds of things. Like that's phenomenal. It's terrible news if you're a professional translator. Like that's but on that point, I kind of feel there are 7 billion people on earth who need translation abilities presented to them, automating those does make a huge impact. And then the other one that I keep coming back to is our society is set up such that having like good formal writing skills is a incredible superpower and it's often used to discriminate against people. Like if you need to write to the local town council complaining about a street light outside of your house, you need to know how to compose a formal letter and all of that kind of junk. Yeah. So does it get filed away in the circular bin, yeah. That's solved as well. Like if if if I need to coach somebody with English as a second language into writing a formal letter complaining about a pot hole, these tools will do that for them really well. And I love that. I love that we've broken the relationship between ability to write in a certain sort of formal way and you know, ability to to actually have an impact on the world. There are plenty of ways that could go wrong as well. But honestly, like that's such a thing to be celebrated. Yeah."
    },
    {
      "speaker_name": "Host",
      "spoken_text": "Yeah. So I feel like that's kind of where like the writing part is directly right into code there. And what's sort of fascinating to me and and very hard for gosh, the layman, the person kind of coming into this or who I speak to a lot with the podcast, beginner and intermediate coders. Wow, there's so many tools. You know, there's like this power ranking chart that you link to from time to time. And then it's like, well, where do I begin? And, you know, what are tools that kind of work in Python code? And so I don't even know quite where to kind of dig into this topic. Do you have a suggestion of like where we might start to think of like, okay, well, I want to write Python code with an LLM."
    },
    {
      "speaker_name": "Guest",
      "spoken_text": "Absolutely. Yeah, where would you start? So the the good news is the two best languages for LLMs are Python and JavaScript because there's so much example code out there in the in the the training set. So actually genuinely you cannot go wrong with picking an LLM for Python. If you pick one of the the sort of more powerful, the sort of top tier ones, they all do an amazing job. Okay, they've all been trained on that stuff. They've all been trained on huge amounts of this stuff. So the top ones right now it's um uh GPT4O, that's the opening AI chat GPT1, Claude 3.5 Sonic from Anthropic, Gemini 1.5 Pro from Google. Tho and then the new entrance is Deep Seek V3. This one that was Yeah. released on Christmas day for free by a Chinese AI lab and they didn't even document it. They literally dumped this giant blob on Christmas day and didn't write about. It was amazing. It was the most cyberpunk thing ever. That's a weird Christmas gift. It really is. Yeah. So the the Lama models from Meta which are openly licensed, those are really good for Python as well. So given that they're all good at it, the one that I'd pick first is actually um chat GPT because of this feature it has called code interpreter. Okay. This is the thing where chat GPT can not just write Python, it can then run the Python in a little like Kubernetes sandbox and it can see what comes back. And if it gets error messages, it will rewrite the Python to fix them. So you can actually, you can literally give it a challenge and you can watch it write some code and then try it and it doesn't work and then it'll fix it and try it again in this little loop. It's amazing. Like you can literally watch it debug itself because people will tell you these things they hallucinate. They they come up with with in terms of programming they might invent a library or a method that doesn't work. That hallucination problem is mostly fixed if they can then test the code themselves because they'll go, oh, I didn't realize that that library can't do that thing. I'll try something else. So on that basis I can't remember if the free version of chat GPT has code interpreter. I I think it does. Okay. So this is like the basic paid one maybe the $20 a month because I know there's like a $200 one that's a little steep for like a basic."
    }
  ]
}

It didn't even attempt the timestamp_mm_ss for some reason.

simonw · 2025-02-27T06:03:01Z

It's fun how this outputs the bio of a human:

llm --schema '{                                                      
    "type": "object", 
    "properties": {
        "name": {
            "type": "string"
        },
        "bio": {
            "type": "string"
        }
    }
}'

But if you add "fancy pelican" on the end you get a pelican instead:

llm --schema '{                                                      
    "type": "object", 
    "properties": {
        "name": {
            "type": "string"
        },
        "bio": {
            "type": "string"
        }
    }
}' 'a fancy pelican'

{
  "name": "Sir Pelican von Beakington",
  "bio": "Sir Pelican von Beakington is a distinguished and flamboyant pelican known for his impeccable style and charming demeanor. Often spotted around the glamorous coastal resorts, he is regularly adorned with a collection of vibrant bow ties and an exquisite monocle perched elegantly over his right eye. With a penchant for the finer things in life, he indulges in gourmet fish dishes and has a flair for theatrical storytelling, captivating both humans and fellow avian friends alike. Sir Pelican is not just a bird; he is a symbol of sophistication by the seaside."
}

simonw · 2025-02-27T06:15:17Z

I really need some examples that include descriptions as extra hints for the models.

https://docs.pydantic.dev/latest/concepts/fields/#customizing-json-schema - "Some field parameters are used exclusively to customize the generated JSON schema".

simonw · 2025-02-27T06:25:12Z

Here's a fun demo:

>>> from pydantic import BaseModel, Field
>>> 
>>> class Article(BaseModel):
...     headline: str
...     date: str = Field(title='YYYY-MM-DD')
...     places: list[str]
...     people: list[str]
...     summary: str
... 
>>> import httpx
>>> html = httpx.get("https://simonwillison.net/2025/Feb/27/typescript-types-can-run-doom/").text
l>>> len(html)
7550
>>> model = llm.get_model("gpt-4o-mini")
>>> response = model.prompt(html, schema=Article)
>>> print(response.text())
{"headline":"TypeScript types can run DOOM","date":"2025-02-27","places":[],"people":["Dimitri Mitropoulos","Simon Willison"],"summary":"A fascinating project where Dimitri Mitropoulos spent a year creating a TypeScript compiler-based runtime to run DOOM solely through TypeScript types, involving complex implementations of a virtual machine and code management for WebAssembly.","content":"This YouTube video describes an outlandishly absurd project: Dimitri Mitropoulos spent a full year getting DOOM to run entirely via the TypeScript compiler (TSC). He implemented a full WASM virtual machine within the type system, which included the 116 WebAssembly instructions necessary to run DOOM. The effort resulted in 177TB of data representing 3.5 trillion lines of type definitions, and rendering the first frame of DOOM took 12 days at 20 million type instantiations per second. This project showcases a wide range of topics, including TypeScript, WebAssembly, virtual machine designs, and the DOOM architecture."}
>>> response.usage()
Usage(input=2282, output=218, details={})

That JSON again (wrapped):

{"headline":"TypeScript types can run DOOM","date":"2025-02-27","places":[],"people":["Dimitri Mitropoulos","Simon Willison"],"summary":"A fascinating project where Dimitri Mitropoulos spent a year creating a TypeScript compiler-based runtime to run DOOM solely through TypeScript types, involving complex implementations of a virtual machine and code management for WebAssembly.","content":"This YouTube video describes an outlandishly absurd project: Dimitri Mitropoulos spent a full year getting DOOM to run entirely via the TypeScript compiler (TSC). He implemented a full WASM virtual machine within the type system, which included the 116 WebAssembly instructions necessary to run DOOM. The effort resulted in 177TB of data representing 3.5 trillion lines of type definitions, and rendering the first frame of DOOM took 12 days at 20 million type instantiations per second. This project showcases a wide range of topics, including TypeScript, WebAssembly, virtual machine designs, and the DOOM architecture."}

Weird that it added a "content" key that I didn't ask for.

Here's the fix for that:

import llm, httpx
from pydantic import BaseModel, Field, ConfigDict

class Article(BaseModel):
    headline: str
    date: str = Field(title='YYYY-MM-DD')
    tags: list[str]
    people: list[str]
    summary: str
    model_config = ConfigDict(extra="forbid")

html = httpx.get("https://simonwillison.net/2025/Feb/27/typescript-types-can-run-doom/").text

model = llm.get_model("gpt-4o-mini")
response = model.prompt(html, schema=Article)
print(response.text())

{"headline":"TypeScript types can run DOOM","date":"2025-02-27","summary":"A fascinating project where Dimitri Mitropoulos spent a year getting DOOM to run entirely via the TypeScript compiler, implementing a full WASM virtual machine within the type system.","tags":["typescript","webassembly"],"people":["Simon Willison","Dimitri Mitropoulos"]}

simonw added qa schemas labels Feb 27, 2025

simonw added this to the 0.23 (schemas) milestone Feb 27, 2025

simonw mentioned this issue Feb 27, 2025

Ability to specify a schema for structured output for models that support it #776

Closed

simonw added documentation Improvements or additions to documentation enhancement New feature or request and removed enhancement New feature or request qa labels Feb 27, 2025

simonw closed this as completed Feb 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Collect some example schemas to help test plugins #779

Collect some example schemas to help test plugins #779

simonw commented Feb 27, 2025

simonw commented Feb 27, 2025

simonw commented Feb 27, 2025

simonw commented Feb 27, 2025

simonw commented Feb 27, 2025

simonw commented Feb 27, 2025 •

edited

Loading

simonw commented Feb 27, 2025 •

edited

Loading

Collect some example schemas to help test plugins #779

Collect some example schemas to help test plugins #779

Comments

simonw commented Feb 27, 2025

simonw commented Feb 27, 2025

simonw commented Feb 27, 2025

simonw commented Feb 27, 2025

simonw commented Feb 27, 2025

simonw commented Feb 27, 2025 • edited Loading

simonw commented Feb 27, 2025 • edited Loading

simonw commented Feb 27, 2025 •

edited

Loading

simonw commented Feb 27, 2025 •

edited

Loading