top of page
Search

The conversations have changed. Here’s what large pharma is actually saying about AI.


Over the past few months, I’ve had a lot of conversations with VP-level and senior medical writing and regulatory leaders across large pharma organisations. A year ago, many of those conversations still centred on whether pharmaceutical organisations should invest in AI for medical writing at all. That debate is largely over. Most large pharma teams are already running pilots, contributing to enterprise-wide innovation programmes, and actively evaluating where different technologies fit. The question now is more nuanced - and far more important:


“What kind of AI actually fits the way regulated medical writing teams work?”


The conversation has moved beyond demos and first-draft speed. It is now about operational fit, trust, governance, and whether AI can reduce workload without creating new risk somewhere else in the process.



GenAI is useful. But GenAI alone is not solving the core problem.


There is genuine openness to generative AI across large pharma - formatting support, low-risk drafting tasks, summarisation, selected repetitive work. But there is also growing realism from people who have tested these systems inside regulated environments.


The issue is not that GenAI outputs are unreadable. Often, they are fluent and well-structured. The problem is that someone still has to verify whether they are right. If AI shifts effort later in the process, onto more senior people, or into a higher-risk review stage, that is not efficiency. That is risk reallocated.


“I don’t want to QC content. I want to stop writing it.”


The goal is not to generate more content faster. The goal is to remove manual writing effort without introducing a new verification burden downstream.



Many AI programmes still misunderstand what medical writers actually do.


One recurring frustration is that some AI initiatives appear to misunderstand the real work of medical writing. Medical writers are not simply converting tables into prose or polishing language. They are applying judgement - deciding what matters, identifying clinically relevant signals, aligning stakeholders, managing review cycles, and deciding what not to over-interpret. That distinction becomes obvious when tools move from controlled demos into real workflows.


This is why “time to first draft” is becoming a less useful metric on its own. A first draft that requires extensive checking or reconciliation has not necessarily accelerated the process. The better measure is time to a trusted, reviewable draft - one the writer can move forward with confidence.



Table-to-text is still broken - and everyone knows it.


Generating text is relatively easy. Generating text that accurately reflects the data, applies consistent logic across every table and parameter, and holds up under regulatory scrutiny is a different problem entirely. Teams are increasingly distinguishing between three categories of automation:


  • Formatting and structure - relatively well handled by existing tools

  • Narrative flexibility - useful for drafting support, with appropriate controls

  • Data-to-text generation in regulated documents - a distinct and largely unresolved category, judged by a fundamentally different standard


That third category is where the real operational gap sits. The market is moving from “Can AI write?” to “Can AI reduce the work required to trust what was written?” That is a very different standard.



Deterministic output has become non-negotiable.


One nuance I was not fully anticipating is how sensitive teams have become to output instability. Many large pharma organisations use early prototypes and dummy data to lock down messaging and align reviewers well before final results are available. In that workflow, a system that produces different text every time it is refreshed is not a feature - it is a source of friction. Every rerun that produces new wording creates new review work.


Deterministic approaches - where the same input reliably produces the same output - are seen as better aligned with how these teams actually operate. They allow teams to lock down content earlier, regenerate predictably when data updates, isolate genuine changes from noise, and reduce unnecessary re-review. That is not a technical preference. It is a workflow requirement.



Verbose outputs are part of the problem.


Many teams are dissatisfied with the shape of AI-generated content - not just its accuracy. Too often, documents describe small numerical differences at length and only later tell the reader that none of those differences were clinically meaningful. Experienced reviewers want the interpretive point early. Lean writing is not about removing useful information. It is about putting judgement before description.



New tools have to fit into workflows that already exist.


Large pharma teams are not evaluating tools in a vacuum. Many have already built lean authoring frameworks, strategic review models, acceleration pathways, prototype-led workflows, and enterprise AI governance structures. New technology has to fit into that operating model - it cannot simply generate a draft and assume the rest of the workflow will adapt around it. The question is no longer “Can it produce text?” It is “Does it reduce steps, create stable outputs, support stakeholder alignment, and fit how we already work?” 



The workflow model itself may be starting to shift.


Right now, many large pharma teams invest heavily upfront in shells, mock data, and prototypes to align stakeholders before real results are available. That process has real value - it helps teams think ahead, align on likely messaging, and reduce late-stage disagreement. But as generation speed improves, some teams are beginning to ask whether the same level of upfront drafting effort will always be necessary.


If a complete, high-quality draft can be produced rapidly once real data lands, the logic begins to shift from prepare → align → wait → rewrite, to generate → review → refine. The more interesting question is whether rapid, deterministic generation from real data can preserve the alignment benefits of prototyping while reducing the amount of speculative pre-data drafting required. Early days - but worth watching.



Adoption is organisational, not just technical.


Even where teams are genuinely interested, the path from pilot to production is rarely straightforward. The questions I hear most are not about capability - they are about fit. What happens after the pilot? What does production look like at scale? How does this sit alongside existing enterprise programmes? How do we make it budgetable, governable, and explainable internally? In large organisations, a tool does not get adopted simply because it works. The conversation has moved from demos to operational viability.



So what does a viable solution actually need to do?


It is not enough to generate text. The systems gaining real traction are those that construct outputs directly from structured source data, apply consistent predefined logic across all tables and parameters, regenerate identically when inputs change, eliminate the need to QC numerical outputs, separate high-risk data processing from lower-risk narrative support, and fit into existing workflows without introducing instability. Less drafting tool. More regulated infrastructure for evidence construction.



Final thought.


The goal is not faster drafting for its own sake. It is faster progression from data to trusted review. That means stable outputs, a reduced QC burden, better table-to-text handling, a clear division of labour between AI and human expertise - and a model that actually fits how large pharma operates.


The goal is not speed. It is speed without QC debt - and without compromising trust in the output.




If this reflects what your team is experiencing, I’d be glad to compare notes. This is a conversation worth having properly.

 
 
bottom of page