What consensus group process to use

#on/research | #on/consensus| #on/collaboration | **Consensus-Building Methods for Developing a Surgical Airway Rubric** Developing a surgical airway performance rubric with international experts from EMS, emergency medicine, and anesthesia requires a structured consensus process. Below we outline several consensus-building methods – their processes, advantages, limitations, and variations – followed by guidance on choosing the appropriate method for a given panel and project. **Delphi Method** The **Delphi method** is an iterative, survey-based technique for building consensus among a panel of experts without requiring face-to-face meetings. It typically involves multiple rounds of anonymous questionnaires, with feedback provided between rounds to converge opinions. • **Overview:** Experts respond to questionnaires in two or more rounds. After each round, a facilitator provides an anonymized summary of the group’s responses (often with statistics or compiled comments). Panelists then reconsider and rerate their answers in light of the group feedback. This process continues until a predefined criterion for consensus is met or diminishing returns are reached . Classic Delphi studies have 2–4 rounds and include steps like defining the problem, conducting a literature review, generating an initial item list, and refining that list through the rounds . • **Advantages:** The Delphi technique offers anonymity, structured feedback, and geographic flexibility. Anonymity prevents dominant personalities or senior figures from swaying others, reducing groupthink and allowing all experts to express opinions freely . Controlled feedback between rounds helps panelists see the range of opinions and rationales, prompting reflection and adjustment of their views . Delphi panels can also involve experts scattered across different regions, since interaction is asynchronous; this makes it convenient and cost-effective to gather input from a geographically diverse group . Additionally, researchers have flexibility in how to analyze responses and define consensus (e.g. percent agreement, rating scales), and the method can accommodate a relatively large number of participants compared to in-person techniques . • **Limitations:** Delphi rounds are time-consuming – a full Delphi process can take weeks or months, which may delay decisions . The effort required over multiple rounds can lead to participant fatigue and drop-out in later rounds (retention is a known challenge) . Because interaction is only through questionnaires, misunderstandings about survey items can go unclarified; careful piloting of the survey and clear definitions are needed to avoid confusion . There is also no universal standard for what defines “consensus” – studies variously use an agreement threshold (often 70%–80% agreement) or statistical measures, which must be decided in advance . Finally, while anonymity is a strength, it means the rich real-time discussion of ideas is sacrificed; Delphi is best suited when consensus can be achieved through reflection rather than debate . • **Best Practices:** To get the most from a Delphi, use predefined inclusion criteria for your expert panel and ensure they represent all key disciplines (EMS, EM, anesthesia in this case) . Define the consensus threshold and stopping criteria at the outset (for example, “If ≥80% of panelists rate an item as important, we consider it accepted”) and communicate this to participants . Keep survey questions clear and pilot-test them to minimize ambiguity. Providing both quantitative feedback (e.g. group median ratings) and qualitative feedback (anonymous comments) after each round can help participants understand others’ viewpoints and why there may be disagreement . Also, limit the number of rounds to what is necessary – many studies find 2 or 3 rounds sufficient to reach stability in responses . Researchers should report details of the process (panel selection, number of rounds, consensus definition) transparently, as variability in Delphi design is high . • **Use in Medical Education:** The Delphi method is widely used in medical education research to develop curricula, assessment tools, and checklists where evidence is sparse and expert consensus is needed . _For example, a recent study used a 3-round modified Delphi with 33 international simulation educators to develop a standardized rubric for evaluating simulation-based training; starting from 32 candidate criteria, the panel achieved 70% agreement on 18 final rubric items after three rounds ._ This illustrates Delphi’s suitability for defining consensus standards (like a skills rubric) across diverse experts in a structured way. **Nominal Group Technique (NGT)** The **Nominal Group Technique** is a structured, face-to-face meeting format designed to generate ideas and reach consensus in a single session. It is called “nominal” because initially participants work independently (in name only as a group) before sharing ideas. NGT is especially useful when the topic is not well defined and benefits from discussion and clarification among experts . • **Structure:** NGT follows a stepwise meeting agenda to ensure balanced participation. The classic NGT consists of four key stages: **(1)** silent idea generation (each participant privately writes down ideas or answers to a posed question), **(2)** round-robin sharing of ideas (each person contributes one idea in turn, recorded for the group to see, until all ideas are listed), **(3)** group discussion for clarification (the group openly discusses each recorded idea to ensure understanding, but without criticism, often moderated to keep the discussion focused and egalitarian), and **(4)** voting or ranking (each participant privately ranks or votes on the ideas to prioritize them) . The result is an ordered list of ideas or a set of top-rated items agreed upon by the group. • **Benefits:** NGT has the major advantage of producing results quickly – the whole process is completed in one meeting, yielding an immediate consensus or prioritized output . It is efficient for busy experts who may not commit to a prolonged Delphi process. The structured format ensures **equal voice**: during the idea generation and sharing, each participant has an opportunity to contribute without interruption, which prevents more vocal or senior members from dominating the discussion . Studies have found NGT sessions produce more unique ideas and more balanced participation than unstructured group discussions . The face-to-face interaction allows instant clarification of any confusion and rich discussion, which is valuable if the content (e.g. steps of a surgical airway procedure) needs collective reasoning or if there are interdisciplinary differences in terminology that need reconciliation. NGT is also adaptable – for instance, it can be done virtually via video conference (“vNGT”) if an in-person meeting is not feasible, as long as a skilled facilitator guides the process . • **Limitations:** The requirement of a live meeting is the main practical limitation. **Logistics:** getting a panel of international EMS, EM, and anesthesia experts together (physically or even virtually across time zones) can be difficult . In a global context, language and cultural differences might need accommodation (NGT has been conducted virtually and even in multiple languages, but this adds complexity). The optimal size for NGT is relatively small; typically around 5–10 participants is ideal to ensure everyone can participate within a reasonable session time . This means NGT may not capture as broad a range of opinions as a larger Delphi panel could. Additionally, if consensus on complex issues is not reached after the initial ranking, there is limited opportunity for iterative refinement (though some NGT implementations allow a brief follow-up discussion and re-vote). In summary, NGT trades depth of iterative refinement for speed and interaction. • **Comparison to Delphi:** NGT and Delphi often reach similar endpoints (a consensus list or ranking), but through different pathways. **Delphi** is better when participants are geographically dispersed and when you want the benefit of reflection over time; **NGT** is preferable when real-time discussion is needed to interpret ideas or when the group is small and available to meet . For instance, if developing a surgical airway rubric involves exploring differing practices among EMS vs. hospital providers, an NGT session could allow these experts to debate criteria in real time. On the other hand, if scheduling a meeting is impossible, a Delphi might be the only viable choice. Often, NGT might be used initially (to brainstorm rubric items or metrics) and Delphi used afterward to refine and rate them – or vice versa (a Delphi could gather a preliminary list, then an NGT meeting could finalize details). Combining methods can leverage the strengths of each. • **Use in Medical Education:** NGT has been used to develop curricula and competency frameworks, especially when diverse stakeholders need to agree on content. _For example, a recent project convened 11 experts from around the world in a_ **_virtual NGT_** _to develop a standardized science communication curriculum for health professionals. Despite being online, the structured NGT allowed each expert to equally contribute ideas and collaborate, resulting in consensus on 10 essential curriculum topics in one extended session ._ This demonstrates NGT’s feasibility for interdisciplinary, international collaboration when a focused output (like a set of curriculum elements or rubric criteria) is needed and participants can coordinate a meeting time. **RAND/UCLA Appropriateness Method (RAM)** The **RAND/UCLA Appropriateness Method (RAM)** is a specialized consensus approach originally developed to create clinical practice guidelines by combining scientific evidence with expert opinion . It is essentially a structured rating process, often described as a hybrid of Delphi and an in-person panel meeting. The goal is to determine how “appropriate” a given action or item is, under specific scenarios, by quantitatively capturing expert judgments and resolving disagreement. • **How it Works:** The RAM process typically involves a **two-phase approach**. In Phase 1, experts individually review the relevant evidence (literature summaries, data) and rate a series of specific statements or clinical scenarios on a numerical scale (e.g. 1–9, where 1 = highly inappropriate, 9 = highly appropriate). This is similar to a first-round Delphi survey, done independently and anonymously . In Phase 2, the expert panel convenes (traditionally in person for 1–2 days) to discuss the results of the first round. They see areas of disagreement or wide variance and discuss the reasons in depth, often with facilitation. After thorough discussion and potential revision of scenario wording, the experts privately rerate the scenarios . The final ratings are then analyzed to determine which items are “appropriate,” “uncertain,” or “inappropriate” based on a predefined algorithm using the median score and the level of dispersion in ratings . A common rule is that if the median rating falls in the high range and there is no major disagreement (e.g. no more than a certain number of outliers), the item is labeled “appropriate” . RAM also employs statistical measures (like interpercentile range adjusted for symmetry – IPRAS) to quantify disagreement more rigorously than a simple Delphi . • **Applicability to Medical Education:** RAM is most frequently used for clinical guideline development (e.g. determining when a procedure is justified) and health services research, rather than for educational tool development. It **synthesizes evidence and expert opinion** in a highly structured way to yield recommendations that carry a sense of formal validation . In the context of a surgical airway rubric, one might use RAM if there is substantial research evidence to inform each criterion of the rubric and one needs to judge the appropriateness of including certain items or techniques. For example, if the rubric includes steps or decisions (like when to perform a cricothyrotomy), and there is controversy or variability in practice, experts could rate the appropriateness of each step under various conditions. However, in many education cases, pure evidence is limited, so Delphi or NGT (which tolerate more opinion-based generation of content) are more common. RAM could be applied to **clinical education guidelines** – for instance, a panel might use it to rate the appropriateness of including specific training scenarios in an EMS airway curriculum given the evidence on their educational impact. The RAM’s strength is producing transparent, credible guidelines that stakeholders (e.g. accrediting bodies or guideline users) can trust, because the process explicitly combines best evidence with expert consensus . • **Pros and Cons:** A big **advantage** of RAM is its emphasis on evidence. By design, it forces a panel to consider literature and data, not just personal experience, before finalizing a recommendation. The structured rating with defined outcome categories yields clear, actionable results (e.g. a list of appropriate indications for a procedure, or a set of essential elements for a guideline). Because it includes a face-to-face component, panelists can resolve misunderstandings about evidence and assumptions, leading to informed consensus. The **disadvantages** include the need for substantial preparation (systematic review or evidence summary for the topic) and the logistics of an in-person meeting. It also usually involves a relatively small panel (commonly 7–15 experts) , so like NGT it relies on a carefully chosen group. Unlike Delphi, RAM does not require consensus on every item – it is acceptable that some items end up labeled “uncertain” or even that the panel never fully agrees (the outcome can reflect dissent). In fact, RAM explicitly does _not_ force consensus through multiple rounds of re-rating; if opinions remain split after discussion and rerating, that uncertainty is noted rather than pushing for further rounds . This can be seen as a pro (honest about disagreement) or a con (doesn’t always give a clear yes/no answer on each item). For use in education, another limitation is that RAM’s numeric scoring and focus on appropriateness might not capture qualitative nuances that educators value – it’s best when you have fairly specific questions to pose to experts. • **Use in Practice:** The RAM has been widely used to develop clinical appropriateness criteria and guidelines. _For example, researchers have used RAM to create evidence-based guidelines for imaging, such as determining when knee MRI is appropriate , and for procedures like Cesarean section indications. In such studies, a panel rates dozens of scenarios, meets to discuss, and establishes which scenarios meet the threshold for appropriate use._ Its track record in producing clinical consensus statements is strong. In medical education, formal reports of RAM usage are fewer, but one could imagine using it to establish appropriateness of certain training interventions or competencies where evidence exists (e.g. appropriate settings for surgical airway training in the field vs. operating room). If a surgical airway rubric needed to be tied to evidence (say, evidence-based steps that improve success rates), a RAM approach could rigorously validate each rubric item against both literature and expert agreement. **Modified Delphi and Variations** Many consensus projects use **modified Delphi** techniques, adapting the classic Delphi method to suit specific needs. In practice, _“modified Delphi” refers to any deviation from the original Delphi format_, and thus the term can mean many things – researchers must explicitly describe their modifications . The adaptations are often made to accommodate different professional groups, time constraints, or to integrate other consensus approaches. Here are common variations and how they help in different contexts: • **Starting with Established Statements:** A traditional Delphi might begin with an open-ended question in Round 1 to gather issues or ideas. A modified Delphi often skips this step and starts Round 1 with a prepared list of items (drawn from literature or a prior survey). This can be useful with heterogeneous expert groups – for example, gathering EMS, EM, and anesthesia experts might yield very divergent open-ended responses, so providing an initial list (e.g. draft rubric criteria from literature) gives a common ground. It speeds up the process and focuses the discussion, though it relies on the quality of the initial list . • **Fewer or Additional Rounds:** While classical Delphi had ~4 rounds, modified versions might use only two or three rounds total, especially if high consensus is reached early, to prevent participant fatigue . Conversely, some projects add a **face-to-face meeting or workshop after a couple of survey rounds**. This hybrid approach can be valuable with interdisciplinary groups: for instance, two Delphi rounds could narrow down rubric items, then a live (or virtual) meeting allows debate on contentious points, and a final round of Delphi voting confirms the outcome. This combination leverages both anonymous feedback and direct discussion. In a global panel, an interim video conference can humanize the process and resolve confusion that surveys alone might not . • **Panel Composition and Grouping:** With different professional groups involved, a modification is sometimes to segment feedback by subgroup. For example, in a Delphi that includes paramedics, emergency physicians, and anesthesiologists, the facilitator might feed back the responses stratified by profession (so each group sees if one profession rated an item differently) . Alternatively, some modified Delphis run parallel panels – e.g. separate Delphi panels for each region or profession in Round 1 – then merge the results in later rounds. These tactics can ensure one subgroup doesn’t drown out another and can illuminate disagreements stemming from professional perspective. Best practices here include deliberately recruiting a diverse panel and possibly weighting representation so that each stakeholder group’s input is balanced . Researchers have emphasized considering geographic and professional diversity (equity, inclusion of various levels of experts, etc.) when assembling a modified Delphi panel for broad topics . • **Feedback and Analysis Tweaks:** A modified Delphi might provide more than just a numerical summary between rounds. Some use qualitative feedback summaries (theming the comments) or provide detailed rationales to the group. Others enforce stricter rules like dropping items that fall below a certain rating after each round to streamline the list (sometimes called a “snowballing Delphi”) . There are also “Real-time Delphi” methods, using software that allows experts to see group feedback immediately as they input their ratings, which can accelerate reaching consensus. However, these require comfort with technology and can disadvantage those in different time zones if not managed well. In short, **modified Delphi** approaches are flexible. They are often tailored to the professional groups at hand – for instance, if one is working with busy clinicians, one might limit to two rounds and use very clear, pre-defined items to respect their time. Or if working with mixed experts and patient representatives, one might hold an initial orientation session (a modification) to ensure all understand the topic before the survey. The key is to **justify each modification** and report it transparently . There is no one “right” way to modify Delphi; rather, the process should fit the context. In our scenario (EMS, EM, anesthesia experts globally), a reasonable modified Delphi might be: two survey rounds via email, with an optional virtual meeting after Round 2 to discuss any criteria lacking consensus, then a final third round to confirm the rubric. This would acknowledge the different perspectives and allow direct interdisciplinary dialogue if needed. Notably, sometimes strong disagreements in a Delphi are as informative as consensus – for example, if anesthesiology and EMS have divergent views on an airway technique, that finding can identify areas for future training or standardization rather than forcing an artificial agreement . Researchers should document such “dissensus” points as outcomes too. **Consensus Development Conference (CDC)** A **Consensus Development Conference (CDC)** is a formal, often public, meeting format for consensus-building, distinguished by having experts deliberate in person and in open forum. This approach was pioneered by the U.S. National Institutes of Health (NIH) in the 1970s for evaluating medical interventions and technologies . It gathers a panel of experts (typically ~8–15 people) to review evidence and discuss a set of predefined questions, usually culminating in a consensus statement or guideline. • **Format:** A CDC usually spans one or more days of intensive meetings. First, **evidence is presented**: subject-matter experts (who are not part of the decision-making panel) give structured presentations on relevant research data, such as clinical trial results, registries, or case series related to the topic . For example, in a surgical airway rubric conference, one might hear from researchers about success rates of different airway techniques, simulation training outcomes, etc. The panel of experts (the voting members) can question the presenters to clarify the evidence . After the public session, the panelists meet (often behind closed doors) to **deliberate** on the consensus questions. They discuss the evidence and their expert interpretations, aiming to reach agreement on answers or recommendations. A facilitator or chairperson guides this discussion and ensures each panelist contributes . The end product is typically a written consensus statement or a set of recommendations, which is then presented at the conference’s conclusion – sometimes with a press conference or public announcement to disseminate the findings . • **Advantages:** The CDC method allows **rich dialogue and debate** in a way no survey can match . Panelists can hash out disagreements in real time, ask for clarifications, and build on each other’s ideas. The presence of systematically presented evidence grounds the consensus in science as much as possible, lending credibility to the outcome. This method is excellent for **interdisciplinary and international panels** because it provides space to understand different viewpoints deeply – for instance, the anesthesiology expert can explain why certain surgical airway techniques are preferred in the OR, while an EMS expert can share field constraints, and the panel can find common ground. The public nature of many consensus conferences also means the process is transparent and the results are more likely to be accepted by the wider community. Additionally, the immediate dissemination (often a conference statement is published or announced) helps spread the agreed guidelines quickly . • **Limitations:** Feasibility is the biggest hurdle. Organizing a consensus conference is **resource-intensive** – it requires funding for travel, venue, and time (often 2–3 days of meetings) . For a global panel, coordinating such an event is challenging, though not impossible (one strategy is to hold it adjacent to an international scientific meeting to piggyback on travel). The cost and effort mean CDCs are typically reserved for high-stakes or broad topics (e.g. national policy, global health guidelines) rather than narrower educational tools. Another limitation is **time constraints during the conference**: even over a few days, the panel can only hear so much evidence. If the evidence base is huge or complex, presenters may not cover everything, and panelists have to digest a lot in a short time . This compressed format may overlook nuances that a longer Delphi process could capture through careful surveys. Also, because discussion is face-to-face, there is a risk that louder or more forceful personalities could sway the group (though a good moderator and the professionalism of expert panels usually keeps this in check). In an international panel, differences in language proficiency or communication style could inadvertently give some members more influence; planning for an inclusive discussion (and perhaps providing facilitation or translation as needed) is important. **Summary:** CDCs are powerful for consensus when you need a **definitive, widely endorsed statement quickly**, and you have the means to bring people together. • **Relevance for a Global Panel:** For developing a surgical airway rubric across continents, a consensus conference could be feasible if tied to a major conference (e.g., invite the experts to a workshop at a world congress of emergency medicine or anesthesia). It would allow hands-on discussion, perhaps even demonstration of airway techniques to build consensus on what “counts” in the rubric. The global scope means careful representation (you’d want voices from different healthcare systems). Increasingly, virtual consensus conferences are also considered – using video meetings over several days – which cuts cost and can include a broader audience, though scheduling across time zones is a barrier. A **modified CDC** might involve a series of shorter virtual meetings where evidence is presented and discussed in installments, mimicking the conference structure. The feasibility comes down to resources and the importance of real-time debate for the rubric content. If the rubric will set international training standards, a one-time consensus conference might be worth the effort to ensure buy-in from all fields. If not, a Delphi might achieve a similar result with far less cost. • **Use Cases:** CDCs have been used to formulate guidelines and policy at national and international levels. _For example, NIH consensus development conferences in the past produced influential statements on topics like breast cancer screening and management of hepatitis – the panel process ensured the recommendations had multi-disciplinary support._ In education, an example might be a **Consensus Conference on Simulation Education Standards** where educators from various countries convene to agree on best practices; such a model could be applied to procedural training standards as well. While less common in literature than Delphi/NGT, the CDC model remains relevant for globally relevant, complex issues where deliberation is key. **Best Practices for Selecting a Consensus Method** Choosing the right consensus-building method for your project is crucial. The optimal technique depends on practical factors (like panel size and availability), the nature of the topic, and the desired output. Below are key considerations and best practices for selecting and implementing the appropriate method: • **Panel Size and Composition:** Consider how many experts and what mix of disciplines/stakeholders need to be involved. Small panels (under ~10 people) work well with in-person techniques like NGT or RAM – everyone can meaningfully participate in a single discussion . Larger or more internationally dispersed panels might lean toward Delphi, since it can include dozens of participants responding remotely . If you require input from multiple professional groups (EMS, EM, anesthesia, possibly surgeons, nurses, etc.), ensure the method allows fair input from each group. A modified Delphi can explicitly account for subgroup inputs, whereas a free-for-all discussion might inadvertently favor the majority group . Always _select experts carefully_: use objective criteria (years of experience, role, geography) to get a credible and diverse panel, and be upfront about expectations (time commitment, number of rounds/meetings). • **Geographical Distribution and Time Constraints:** If experts are spread across different cities or countries (as in an international rubric effort), logistical feasibility is a major factor. **Delphi or web-based RAM panels** are often the go-to for international work because they eliminate travel and scheduling barriers – participants can contribute on their own schedule . A **virtual NGT** is possible if the group is not too large and time zones can be managed (consider rotating meeting times or using asynchronous idea generation tools if needed). If time is limited and a decision is needed quickly, a one-day NGT or a focused consensus meeting might be preferable to a 3-month Delphi . On the other hand, if the topic is complex, allocating that extra time for a Delphi can improve the quality of consensus by allowing people to reflect deeply. Match the method to your timeline: for example, don’t start a Delphi if you need a final rubric in two weeks; conversely, if you have months and global participants, a Delphi or iterative process might be more thorough. • **Need for Discussion vs. Anonymity:** Determine how important group discussion is for your topic. If **clarifying differences and building mutual understanding** among experts is paramount (often true for interdisciplinary topics or contentious issues), choose a method that includes live discussion – e.g. NGT, CDC, or a modified Delphi that adds a workshop. For instance, developing performance descriptors in a rubric might benefit from real examples and debate, which an NGT or meeting provides. However, if there are **power dynamics or the potential for one viewpoint to overshadow others**, an anonymous Delphi can protect against that and ensure all opinions carry equal weight . In our scenario, if you worry that (for example) anesthesia physicians might dominate the conversation over prehospital providers, a Delphi or anonymous rating phase could be wise. Some projects even use a combination: start with an anonymous Delphi to gather honest input, then have a meeting once the range of opinions is known (so nobody can retroactively dominate the initial idea generation). • **Interdisciplinary Collaboration:** When multiple specialties are involved, plan for **common understanding**. Early in the process, you might need a shared glossary or a brief orientation so that, say, EMS and anesthesiology experts are on the same page regarding terminology in the rubric. Methods like World Café or an initial open forum can serve to align perspectives before drilling down into decision-making. If using Delphi, ensure your survey language is inclusive of all fields (avoid jargon or clarify it). If using NGT or a conference, compose mixed groups intentionally (not all anesthesiologists at one table, for example) to stimulate cross-pollination of ideas. Representation matters: try to have a roughly balanced number of experts from each target specialty so that consensus truly reflects all groups. Also consider including educators alongside clinicians, since the rubric is for educational assessment – their input is valuable and can be integrated via the same consensus methods. • **Decision-Making Structure and Output Type:** Align the method with the kind of output you need. For a **checklist or rubric**, Delphi studies are very common because they excel at refining lists of items and reaching agreement on each item’s inclusion or wording . If you need a **set of ranked priorities** or top five items, NGT inherently produces a rank ordering in one session . If you’re formulating **guidelines with an evidence basis**, the RAM provides a validated structure to incorporate literature and quantify agreement on appropriateness . And if you need a **consensus statement** that stakeholders will widely accept, a Consensus Conference might give the process and visibility needed to legitimize it. Sometimes a combination is best: for instance, you could use a Delphi to develop a draft rubric, then convene a consensus conference of the international experts to discuss and formally endorse the rubric (giving it more authority and buy-in). Be open to **hybrid approaches** – what matters is achieving a reliable consensus, not adhering strictly to one methodology. • **Transparency and Documentation:** Whichever method you choose, follow published best practices for conducting and reporting it. Clearly document how you chose your panel, how the process was run, and how consensus was defined and measured . This not only strengthens the credibility of your rubric but also allows others to replicate or build on your work. If using Delphi, consider guidelines like the _CORE-Delphi_ or _e-Delphi_ reporting checklists that have been proposed . If using NGT or World Café, describe the session setup, participant details, and how results were analyzed. In sum, _the quality of the consensus method implementation often matters more than the choice of method itself_. A well-run Delphi with engaged experts can be as effective as an in-person meeting – and vice versa. Focus on maintaining rigor (systematic feedback, fair participation, clear criteria for consensus) regardless of method . **Conclusion:** For developing an international, interdisciplinary surgical airway rubric, there is no one-size-fits-all solution. Delphi, NGT, RAM, Consensus Conferences, and World Café each offer distinct strengths. In practice, many projects blend methods to balance the need for broad input, thoughtful deliberation, and efficient decision-making. By considering your panel’s size and makeup, logistical realities, and the goals of the rubric, you can select a consensus-building strategy (or combination of strategies) that will yield a well-vetted, widely accepted airway performance rubric. The key is to remain flexible and panel-centered – use the method that best empowers your experts to share their knowledge and come to agreement. With careful planning and adherence to best practices, consensus methods will provide a solid foundation for your rubric, as demonstrated in numerous medical education studies and training guideline developments .