top | item 38210609

(no title)

tmcneal | 2 years ago

For anyone looking to try this in an E2E testing context, we just released a library for Playwright called ZeroStep (https://zerostep.com/) that lets you script AI based actions, assertions, and extractions.

This is a working example that tests the core "book a meeting" workflow in Calendly:

    import { test, expect } from '@playwright/test'
    import { ai } from '@zerostep/playwright'

    test.describe('Calendly', () => {
      test('book the next available timeslot', async ({ page }) => {
        await page.goto('https://calendly.com/zerostep-test/test-calendly')

        await ai('Verify that a calendar is displayed', { page, test })
        await ai('Dismiss the privacy modal', { page, test })
        await ai('Click on the first available day of the month', { page, test })
        await ai('Click on the first available time in the sidebar', { page, test })
        await ai('Click the Next button', { page, test })
        await ai('Fill out the form with realistic values', { page, test })
        await ai('Submit the form', { page, test })

        const element = await page.getByText('You are scheduled')
        expect(element).toBeDefined()
      })
    })

discuss

jasonjmcghee|2 years ago

It would be much easier to consider this as solution if it would _output_ the generated test steps, and/or cache them and only modify them if needed.

Your example above - 7 function calls in one test. let's say usually closer to 5, we have hundreds of tests. Every single PR runs E2E tests. We open a handful of PRs a day. Let's call it 5. We're already looking at thousands of invocations a day. Based on your pricing, that would be incredibly expensive.

This is with 3 eng.

jaggederest|2 years ago

What's the reliability and cost on something like this? I would need to see high-90s at <$0.10 before wanting to put it into a CI loop.

tmcneal|2 years ago

Pricing is listed on https://zerostep.com - you get 1,000 ai() calls per month for free, and then the cheapest paid plan is 2,000 ai() calls per month for $20, 4,000 for $40, etc. So basically you pay a penny per ai() call.

In terms of reliability - we have a hard dependency on the OpenAI API, so that's what will affect reliability the most. We're using GPT-3.5 and GPT-4 models, which have been fairly reliable, but we'll bump to GPT-4-Turbo eventually. Right now GPT-4-Turbo is listed as "not suited for production use" in OpenAI's docs: https://platform.openai.com/docs/models

msoad|2 years ago

Nice! I'm going to try this out! Nit: For me, it would be nicer if `ai` was a fixture itself.

      test.describe('Calendly', ({ ai }) => {

tmcneal|2 years ago

Done! We added the ability to use it as a fixture. Documented here: https://github.com/zerostep-ai/zerostep#playwright-fixture

ushakov|2 years ago

Does it send the webpage contents to ZeroStep?

Cool demo btw.