Rendered at 17:38:02 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
watersb 20 hours ago [-]
Strangely reminiscent of an Electric Monk:
The Electric Monk was a labor-saving device, like a dishwasher or a video recorder. Dishwashers washed tedious dishes for you, thus saving you the bother of washing them yourself, video recorders watched tedious television for you, thus saving you the bother of looking at it yourself; Electric Monks believed things for you, thus saving you what was becoming an increasingly onerous task, that of believing all the things the world expected you to believe.
-- Douglas Adams, "Dirk Gently's Holistic Detective Agency"
docheinestages 23 hours ago [-]
Most writings about the spec-driven development I see start with a product requirements document that is assumed to be valid. But I doubt that's the case. If so, you would've written about it, and probably would've involved agents in the research that goes into it. My gut feeling tells me there's much more emphasis on implementing the feature than on questioning if it's relevant, feasible, and based on valid assumptions.
HPsquared 21 hours ago [-]
Yes, it's called "questioning attitude", one of the traits of a healthy nuclear safety culture (and a good thing to apply in other fields!)
A long time ago I dealt with the on-site radio systems at a major oil refinery. It was interesting over the ten years or so that I worked for the company that provided them to see how their safety policies changed, and how other companies with a similar risk profile (like distilleries and whisky bonds) just plain didn't.
For example, they drastically changed how they view Permits to Work. Now a rigidly-enforced PtW would have prevented the Piper A explosion - the permit was returned to the Permit Office, not actually looked at, and then someone assumed that since it had been handed back and the work was *supposed* to be complete, then the work *was* complete. Had they looked at the permit they'd have seen there was more to do and the isolations should remain in place.
Anyway, when I started doing stuff at that site then every permit required a rigorous Method Statement and Risk Assessment. Your RA would get rejected if you failed to mention every single of PPE that you were required to wear on site, and your Method Statement would be rejected if it didn't describe the function and use of every last tool you planned to bring on site.
This was, frankly, fucking *stupid*. It took longer to write the RAMS and apply for the permit than it did to carry out most jobs, and if there was any deviation no matter how small (often because of other work you'd no way of knowing about) the whole thing would have to be stopped and relogged, with a new RAMS taking into account whatever was in your way. Someone's put scaffolding up near the aerial you want to replace? Well tough, you're not getting on site today with that permit!
They changed this about halfway through my time with my previous employer, to a "Risk-Assessed Permit", where you'd describe the risks around the specific tasks you needed to carry out and how you'd mitigate them.
Now your RAP would get rejected if you *did* put on lists of PPE. You're expected to use the correct PPE, you're expected to use your tools correctly and the correct tool for the job. Don't tell me that, just do that.
Hour-long meeting to go over the RAMS? No, five-minute "Toolbox Talk" - look up "Take Five" for some helpful guidelines - and if there are no screaming blockers, crack on, get the job done, get off site, get home. Safely.
Oh now there's some scaffolding right near the aerial you want to work on? Okay, ask the operator of that job if you're going to affect their work. Oh, they're letting you use the scaff to access the roof instead of bringing a boom lift on? Excellent, cross that off the RAP, bring it up at the Toolbox Talk, far safer that way isn't it?
I still do Take Five at work even though I'm predominantly working from home managing network equipment. If there are big complex changes to make or a major piece of work, we'll discuss it together. Anyone got any question? Anyone see something that's going to blow it all up and cause a major outage? Okay, well, you know where I am if you need help. Crack on.
It's a great way to eliminate the kind of mistakes that lead to "I wish I was still bored" days.
burticlies 19 hours ago [-]
Biased cause I work there, but that’s where software like Tactiq shines. We just added an MCP, and now the agent has access to the meetings when writing the plan.
Last week I had three meetings with three stakeholders, and the agent was able to gather everyone’s ideas and make sure they are all working together in the feature.
Waterluvian 14 hours ago [-]
Add some cron jobs to run a prompt that reviews work being done against the plan and baby, you’ve got a middle manager going.
ben30 23 hours ago [-]
Most of my energy is refining a prd these days.
docheinestages 23 hours ago [-]
Then how come that process is not agentic and not well-described?
ben30 22 hours ago [-]
Personally it's well-defined and agentic - just not circulated.
/understand - agents interrogate the problem
/huddle - Thinking panel turns it into a PRD - attacks the premise, PRDs regularly die here
/tm - claude-task-master breaks the survivor into a dependency graph
Nobody writes this half up because "agent talked me out of building it" demos worse than "agent built it".
manmal 21 hours ago [-]
Sorry I have to ask. How senior are you? The notion that I‘d allow an agent to talk me out of something seems weird. 99% of cases, it’s the other way around. Architecture is just not where they shine.
srcreigh 21 hours ago [-]
What’s your process? My experience matches yours, but then again I usually just give a few lines to codex. I imagine if I tried harder to give detailed specs as input, the agent would have a lot more room to spot flaws and kill the plan.
manmal 20 hours ago [-]
Usually when they push back, it’s for obvious reasons, things I already know and actively decided to ignore. They are trained on mediocre software, and it shows.
I like using voice input a lot, I get way more info out of my brain and into the context that way.
Process wise, for bug fixes I usually just throw the ticket in and optionally some thoughts on how to fix. But if I don’t know the cause, I let it write instrumentation tests until the bug is reproed, and then the fix is easy.
For new features in brownfield projects, I usually need to align with team members because we‘re closely aligned between platforms. We iterate on what you could call a spec, which is just a mix of requirements, magic numbers we want, algorithms we‘ve picked (often by vibing prototypes), and sometimes going very specific on parts that must be done right. Eg for interfaces with other teams, and there’s not yet a document to describe that, we put that in the spec as well. We do use agents to shoot holes in those specs, and often they find inconsistencies. But architecturally, they seem to get caught up too much in what’s already in those specs, and personally I haven’t seen any worthwhile feedback that I‘d have taken up.
Sometimes we use this spec to vibe a first draft. Often the draft is so good that it can be bent to our liking. Sometimes, it just serves as a reservoir of ideas, and the feature must be implemented (with assistance) by re-assembling the pieces differently.
ben30 11 hours ago [-]
[flagged]
donaldjbiden 17 hours ago [-]
[dead]
Nicci-K26 12 hours ago [-]
[flagged]
m12k 20 hours ago [-]
I've stumbled on the same workflow. Except for one thing: If I just do as OP does, Claude Code will tend to overengineer. For example it'll build complex solutions to super rare race conditions that have trivial fallout. But I've found that all it takes is a "skeptical pass". Here's how it goes: After having a bunch of specialist subagents review the (plan/implementation), after doing the deduplication/synthesis of their findings, the main agent will bucket them into A) Trivial/obvious fix B) there's multiple possible resolutions, but the LLM had a strong lean, so it went with it on its own C) Genuine ambiguity, where it asks me what to do (and presents its lean) and D) Wontfix. Crucially, after doing this, I have it run a "skeptical pass" where it takes a hard look at these findings and see if maybe some of them deserve to be downgraded. Generally, a lot of things make their way into wontfix this way. I find, I don't need to push back against overengineering, I can have the LLM do so itself, and it'll actually do a decent job of it.
ErroneousBosh 20 hours ago [-]
This sounds harder than just writing the code.
m12k 5 hours ago [-]
If I was doing all this ad hoc, it might just be. But I’ve had Claude save this as three “skills” (standard workflows) that chain together (review-branch, triage-findings, apply-fixes) so all I need to do is say “review the branch”, make judgment calls on truly ambiguous decisions, and then “apply the fixes”. It’s effort-full but not for me
Vachyas 18 hours ago [-]
When you put it like that, it really does, lol.
yudidnmkating 19 hours ago [-]
[dead]
marcus_holmes 14 hours ago [-]
I have a similar skill that assesses project drift (how far the current project state is from the original spec and brief) and looks for artifacts of that - orphaned code or features that are no longer relevant, design decisions that made sense at the time but are now dubious, or parts of the design that no longer fit the current project state.
I found it really useful taming the spaghetti that claude tends to generate by itself.
aself101 23 hours ago [-]
This has been my attempt at wrangling the new A.I. assisted development that seems to be overtaking the software engineering profession. I jumped head first into LLM development after observing the trends from the last year and it appears this process might be a viable path forward.
ben30 23 hours ago [-]
I’ve had similar feelings how can I trust this if I no longer write the code directly.
I wrote an /assess tool. I designed it to be token light but assesses on everything I could do to regain trust and help AI to improve my code base not by add features but by adding discipline.
jnewton_dev 23 hours ago [-]
There's a selection bias here that nobody's mentioned. The people who had bad experiences with this approach probably aren't commenting.
HappySweeney 19 hours ago [-]
I think its common to develop an adversarial-collaborative approach to getting some semblance of quality out of AI. I personally favour using multiple models for different roles, having a bunch of continuity documentation maintained, and having the plan surface human-verifiable deliverables as soon as feasible. It does involve more attention than most people would tolerate probably.
tuo-lei 3 hours ago [-]
review agents have the same training biases as the one writing the code. you get 30 findings about error handling and edge cases, but wrong domain assumptions slip right through.
dbgrman 15 hours ago [-]
Enjoyed it until the first emdash (was half expecting it to arrive anyway). Sorry.
hexasquid 14 hours ago [-]
I'm coming around to liking them; they're like a sort of anti-shibboleth.
"Ah, an em-dash", I think: "now I know".
hlieberman 3 minutes ago [-]
“They inhabit the fulcrum of the process” is straight up AI psychosis talk.
The Electric Monk was a labor-saving device, like a dishwasher or a video recorder. Dishwashers washed tedious dishes for you, thus saving you the bother of washing them yourself, video recorders watched tedious television for you, thus saving you the bother of looking at it yourself; Electric Monks believed things for you, thus saving you what was becoming an increasingly onerous task, that of believing all the things the world expected you to believe.
-- Douglas Adams, "Dirk Gently's Holistic Detective Agency"
https://www.nrc.gov/docs/ML1433/ML14338A739.pdf
A long time ago I dealt with the on-site radio systems at a major oil refinery. It was interesting over the ten years or so that I worked for the company that provided them to see how their safety policies changed, and how other companies with a similar risk profile (like distilleries and whisky bonds) just plain didn't.
For example, they drastically changed how they view Permits to Work. Now a rigidly-enforced PtW would have prevented the Piper A explosion - the permit was returned to the Permit Office, not actually looked at, and then someone assumed that since it had been handed back and the work was *supposed* to be complete, then the work *was* complete. Had they looked at the permit they'd have seen there was more to do and the isolations should remain in place.
Anyway, when I started doing stuff at that site then every permit required a rigorous Method Statement and Risk Assessment. Your RA would get rejected if you failed to mention every single of PPE that you were required to wear on site, and your Method Statement would be rejected if it didn't describe the function and use of every last tool you planned to bring on site.
This was, frankly, fucking *stupid*. It took longer to write the RAMS and apply for the permit than it did to carry out most jobs, and if there was any deviation no matter how small (often because of other work you'd no way of knowing about) the whole thing would have to be stopped and relogged, with a new RAMS taking into account whatever was in your way. Someone's put scaffolding up near the aerial you want to replace? Well tough, you're not getting on site today with that permit!
They changed this about halfway through my time with my previous employer, to a "Risk-Assessed Permit", where you'd describe the risks around the specific tasks you needed to carry out and how you'd mitigate them.
Now your RAP would get rejected if you *did* put on lists of PPE. You're expected to use the correct PPE, you're expected to use your tools correctly and the correct tool for the job. Don't tell me that, just do that.
Hour-long meeting to go over the RAMS? No, five-minute "Toolbox Talk" - look up "Take Five" for some helpful guidelines - and if there are no screaming blockers, crack on, get the job done, get off site, get home. Safely.
Oh now there's some scaffolding right near the aerial you want to work on? Okay, ask the operator of that job if you're going to affect their work. Oh, they're letting you use the scaff to access the roof instead of bringing a boom lift on? Excellent, cross that off the RAP, bring it up at the Toolbox Talk, far safer that way isn't it?
I still do Take Five at work even though I'm predominantly working from home managing network equipment. If there are big complex changes to make or a major piece of work, we'll discuss it together. Anyone got any question? Anyone see something that's going to blow it all up and cause a major outage? Okay, well, you know where I am if you need help. Crack on.
It's a great way to eliminate the kind of mistakes that lead to "I wish I was still bored" days.
Last week I had three meetings with three stakeholders, and the agent was able to gather everyone’s ideas and make sure they are all working together in the feature.
/understand - agents interrogate the problem /huddle - Thinking panel turns it into a PRD - attacks the premise, PRDs regularly die here /tm - claude-task-master breaks the survivor into a dependency graph
Nobody writes this half up because "agent talked me out of building it" demos worse than "agent built it".
I like using voice input a lot, I get way more info out of my brain and into the context that way.
Process wise, for bug fixes I usually just throw the ticket in and optionally some thoughts on how to fix. But if I don’t know the cause, I let it write instrumentation tests until the bug is reproed, and then the fix is easy.
For new features in brownfield projects, I usually need to align with team members because we‘re closely aligned between platforms. We iterate on what you could call a spec, which is just a mix of requirements, magic numbers we want, algorithms we‘ve picked (often by vibing prototypes), and sometimes going very specific on parts that must be done right. Eg for interfaces with other teams, and there’s not yet a document to describe that, we put that in the spec as well. We do use agents to shoot holes in those specs, and often they find inconsistencies. But architecturally, they seem to get caught up too much in what’s already in those specs, and personally I haven’t seen any worthwhile feedback that I‘d have taken up.
Sometimes we use this spec to vibe a first draft. Often the draft is so good that it can be bent to our liking. Sometimes, it just serves as a reservoir of ideas, and the feature must be implemented (with assistance) by re-assembling the pieces differently.
I found it really useful taming the spaghetti that claude tends to generate by itself.
I wrote an /assess tool. I designed it to be token light but assesses on everything I could do to regain trust and help AI to improve my code base not by add features but by adding discipline.
"Ah, an em-dash", I think: "now I know".