Most brands choose a Shopify agency the same way they choose a restaurant: they look at the pictures, read a couple of reviews, and go with whichever one feels right. The problem is that a Shopify build is not a dinner reservation. It is a five-figure investment that will shape your revenue for the next two to five years.

After twenty years building and scaling ecommerce brands, I have seen what happens when agency selection is driven by gut feeling rather than evidence. The agency with the best pitch deck wins. The agency with the slickest sales process wins. And six months later, the brand is stuck with a slow store, a bloated app stack, and a contract they cannot exit.

This guide gives you a structured, repeatable framework for evaluating Shopify agencies. Use it as a scorecard. Score each agency across six categories, weight the results, and make your decision based on data rather than charm.

Why you need a scorecard (and why gut feeling fails)

The agency selection process is inherently asymmetric. Agencies pitch for a living. They have polished presentations, carefully curated case studies, and sales teams trained to handle objections. You, on the other hand, might evaluate agencies once every three to five years. You are an amateur buyer in a market full of professional sellers.

This asymmetry creates predictable failure patterns. Brands over-index on presentation quality and under-index on technical competence. They focus on what an agency says rather than what it has demonstrably done. They confuse confidence with capability.

The best agencies I have worked with — as a client, as a partner, and now as a founder — were rarely the ones with the best pitch. They were the ones who asked the best questions and gave the most honest answers about what they could and could not do.

A scorecard forces objectivity. It ensures you evaluate every agency against the same criteria, with the same weighting, so the decision is based on comparable evidence rather than incomparable impressions.

For brands doing £250k to £5M in annual revenue, the stakes are particularly high. You are typically investing £10,000 to £40,000 in a build, plus ongoing retainer costs. A bad choice does not just waste that budget — it costs you months of lost revenue growth while you fix what went wrong and potentially start again.

The six scorecard categories

The scorecard is built around six categories, each scored from 1 to 10. The categories are not equally important — we will discuss weighting later — but every agency should be evaluated across all six.

Category What it measures Suggested weight
Technical competence Shopify-specific development skill 25%
Ecommerce expertise Commercial understanding of online retail 25%
Process & transparency How they work and communicate 15%
Portfolio & evidence Proven results with verifiable outcomes 15%
Support & partnership Post-launch relationship model 10%
Commercial fit Pricing, contracts, and value alignment 10%

Let us break down each category, including the specific questions to ask and what good answers look like.

Category 1: Technical competence (25%)

This is where most evaluations start, but few go deep enough. Technical competence is not just "can they build a Shopify theme." It is whether they understand the platform at a level that will protect your investment for years.

What to assess

Liquid and theme architecture. Shopify's templating language is Liquid, and how an agency structures its theme code determines everything from page speed to maintainability. Ask to see a sample theme file. Good agencies write clean, modular Liquid with minimal reliance on third-party JavaScript. Bad agencies paste together code snippets from Stack Overflow and hope for the best.

Performance engineering. This is non-negotiable. Ask every agency: "What is your target mobile PageSpeed score?" If they hesitate, that tells you everything. Good agencies target 90+ on mobile. They understand Largest Contentful Paint, Total Blocking Time, and Cumulative Layout Shift — not as abstract concepts, but as specific technical challenges they solve on every build.

// What good performance targets look like
const targets = {
  mobilePageSpeed:  90,     // Minimum score
  largestPaint:     '2.5s', // Maximum LCP
  totalBlockTime:   '200ms', // Maximum TBT
  cumulativeShift:  0.1,    // Maximum CLS
  appCount:         '<8',   // Fewer apps = better performance
};

App architecture. Every Shopify app adds JavaScript, HTTP requests, and potential points of failure. Good agencies build custom solutions for critical functionality — cart drawers, product filtering, upsells — rather than installing an app for every feature. Ask how many apps their typical builds use. If the answer is more than twelve, that is a warning sign. Our builds typically run with fewer than eight, as we discussed in our guide on the real cost of a slow Shopify store.

Shopify APIs and integrations. For brands doing £250k+, you likely need integrations with your ERP, warehouse management system, accounting software, or marketing platforms. Ask the agency about their experience with the Shopify Admin API, Storefront API, and common integration patterns. If they cannot discuss webhooks, metafields, and custom apps fluently, they are not technical enough for a growth-stage brand.

Scoring guide

  • 9-10: Deep Shopify-specific expertise. Custom theme development. Strong performance engineering. Builds custom apps where needed.
  • 7-8: Solid Shopify experience. Uses premium themes as a starting point with significant customisation. Good performance awareness.
  • 5-6: General web development agency that also does Shopify. Basic theme customisation. Limited performance optimisation.
  • 1-4: No demonstrable Shopify-specific expertise. Relies entirely on off-the-shelf themes and apps.

Category 2: Ecommerce expertise (25%)

This is the category most brands underweight, and it is arguably the most important. Technical competence tells you whether an agency can build a store. Ecommerce expertise tells you whether they can build a store that makes money.

What to assess

Conversion rate understanding. Ask the agency how they approach product page design. A technically competent agency will talk about layout and code quality. An ecommerce-expert agency will talk about how the product page structure affects add-to-cart rate, why social proof placement matters, and how variant selection UI impacts conversion. The distinction is critical.

SEO knowledge. Shopify has specific SEO challenges: duplicate content from variant URLs, limited collection page customisation, automatically generated canonical tags that are sometimes wrong. Does the agency understand these? Can they articulate their approach to collection page SEO and product schema markup?

Average order value strategy. A store's revenue is sessions multiplied by conversion rate multiplied by average order value. Good agencies think about all three. Ask about their approach to cart upsells, cross-sells, bundle pricing, and threshold-based incentives. If they only talk about making the store "look nice," they are a design agency, not an ecommerce agency.

Customer lifetime value thinking. The best agencies understand that a store's long-term profitability depends on repeat purchase rates and customer retention. They will talk about post-purchase flows, loyalty integration, subscription models, and how email marketing fits into the customer lifecycle — not as add-on services, but as fundamental design considerations.

Industry-specific experience. An agency that has built stores in your sector will understand category-specific challenges. Fashion brands need robust size guides and returns experiences. Food and drink brands need subscription handling and compliance information. Health and wellness brands need specific regulatory considerations. Ask for examples in your vertical.

Scoring guide

  • 9-10: Deep ecommerce operators. Have built or run their own ecommerce brands. Think about revenue first, design second.
  • 7-8: Strong ecommerce knowledge. Understand conversion optimisation and can point to measurable results.
  • 5-6: Aware of ecommerce concepts but primarily design/development focused.
  • 1-4: Build websites. Do not understand the commercial mechanics of online retail.

Category 3: Process and transparency (15%)

How an agency works is as important as what it builds. A brilliant team with a chaotic process will deliver a chaotic project. The evaluation should cover their project management approach, communication cadence, and how they handle the inevitable complications.

What to assess

Project methodology. Ask the agency to walk you through a recent project from start to finish. What were the phases? What happened at each stage? How long did each phase take? Good agencies have a documented, repeatable process. They can tell you exactly what happens in week one, week four, and week eight. If the answer is vague — "we work in an agile way" without any specifics — they are winging it.

Communication structure. How often will you receive updates? Who is your primary point of contact? Can you speak directly to the developer working on your project? The answers reveal how the agency is structured. Sales-led agencies put an account manager between you and the technical team. Delivery-led agencies let you talk directly to the people building your store.

Scope management. Every project has scope changes. The question is how they are handled. Ask: "What happens when we want to add a feature that was not in the original scope?" Good agencies have a clear change request process with transparent pricing. Bad agencies either say yes to everything (and then miss deadlines) or charge unpredictable rates for changes.

Pricing transparency. Can the agency explain exactly what is included in their quote? Is there a written scope document? What is excluded? What assumptions have they made? If you are comparing a £12,000 quote with a £25,000 quote, you need to understand what accounts for the difference. Often, the cheaper quote excludes critical items like data migration, testing, or training. For more on choosing the right agency, see our comprehensive guide.

Scoring guide

  • 9-10: Fully documented process. Written scope. Regular communication schedule. Clear change request handling.
  • 7-8: Good process with some documentation. Responsive communication. Reasonable scope management.
  • 5-6: Basic process. Communication can be inconsistent. Scope management is ad hoc.
  • 1-4: No documented process. Unpredictable communication. Scope and pricing are unclear.

Category 4: Portfolio and evidence (15%)

Every agency has a portfolio page. Very few have verifiable evidence of outcomes. Your job during the evaluation is to look past the screenshots and find the substance.

What to assess

Live store quality. Do not just look at portfolio screenshots. Visit the actual live stores. Are they still running the theme the agency built, or has the client moved on? Run each store through Google PageSpeed Insights. Test the mobile experience. Add a product to cart and go through the checkout flow. This ten-minute exercise will tell you more than any pitch deck.

Case studies with metrics. Good agencies publish case studies with specific, measurable outcomes: "We improved mobile conversion rate by 34%," "We reduced page load time from 6.2 seconds to 1.8 seconds," "Revenue increased 28% in the first quarter post-launch." If the case studies only describe what was built without mentioning results, the agency either does not track outcomes or is not proud of them.

Client references. Ask for two or three client references and actually contact them. The questions to ask references are different from the questions you ask the agency: How did the project compare to the original timeline and budget? How responsive were they when issues arose? Would you use them again? The answers are gold.

Shopify Partner status and reviews. Check the agency's profile on the Shopify Partner directory. Look at their reviews on platforms like Clutch or Google. While reviews should not be the sole decision factor, consistent patterns — positive or negative — are meaningful. A detailed guide to questions to ask before hiring will help you structure these conversations.

Scoring guide

  • 9-10: Multiple live stores with strong performance scores. Case studies with verifiable metrics. Willing to provide references.
  • 7-8: Good portfolio with some metrics. Live stores perform well. References available on request.
  • 5-6: Portfolio exists but metrics are vague. Some live stores but performance is mixed.
  • 1-4: No live portfolio. No case studies. No references.

Category 5: Support and partnership model (10%)

The launch is the beginning, not the end. How an agency supports you post-launch determines whether your store continues to improve or slowly degrades over time.

What to assess

Post-launch support options. What does ongoing support look like? Is it a retainer, pay-as-you-go, or something else? What is the response time for critical issues? Who handles support — the same team that built your store, or a separate support team that has never seen your codebase?

Proactive vs. reactive. A reactive agency waits for you to report problems. A proactive agency monitors your store's performance, flags issues before they affect revenue, and suggests improvements based on data. Ask which model they operate. The answer tells you whether they see the relationship as transactional or strategic.

Knowledge transfer. Will the agency train your team to manage day-to-day content updates, product uploads, and basic customisations? Or will you be dependent on them for every minor change? Good agencies want to empower your team because it frees them to focus on higher-value work. Bad agencies want to create dependency because it guarantees ongoing revenue.

Contract flexibility. What are the contract terms? Are you locked in for twelve months, or can you leave with reasonable notice? As we explain in our post on why we avoid lock-in contracts, the best agencies earn your business every month rather than trapping you in a commitment you cannot exit.

Scoring guide

  • 9-10: Proactive monitoring. Same team handles support. Training included. Flexible contracts.
  • 7-8: Good support with reasonable response times. Some training. Standard contract terms.
  • 5-6: Reactive support only. Limited training. Fixed-term contracts.
  • 1-4: No post-launch support. No training. Rigid long-term contracts.

Category 6: Commercial fit (10%)

Commercial fit is not just about price. It is about whether the agency's business model, pricing structure, and working style align with your needs and constraints.

What to assess

Pricing model. Does the agency charge fixed-price, time and materials, or a hybrid? Each model has trade-offs. Fixed-price gives you budget certainty but often leads to scope disputes. Time and materials gives you flexibility but requires trust and good project management. Hybrid models — fixed price for the build, time and materials for ongoing work — often work best for brands in the £250k to £5M range.

Value alignment. Is the agency focused on the same outcomes you care about? If you are obsessed with conversion rate and the agency only talks about design awards, there is a misalignment. If you need a partner who understands Shopify development as a revenue growth lever rather than a creative exercise, make sure that is reflected in how they talk about their work.

Team size and capacity. An agency that is too small may not be able to deliver on time. An agency that is too large may treat you as a small fish. For brands doing £250k to £5M, the sweet spot is usually an agency with five to thirty people — large enough to have specialists, small enough that the founder or technical director is involved in your project.

Cultural fit. This sounds soft, but it matters. You will work closely with this agency for weeks or months. If their communication style, values, or working hours do not align with yours, friction will build. A quick call with the team — not just the sales person — will reveal whether the working relationship will be productive or painful.

Scoring guide

  • 9-10: Pricing is fair and transparent. Strong value alignment. Right size for your project. Cultural fit is excellent.
  • 7-8: Good pricing with minor concerns. Mostly aligned on values. Reasonable fit.
  • 5-6: Pricing is unclear or seems high/low without justification. Some misalignment.
  • 1-4: Significant pricing concerns. Clear value misalignment. Poor cultural fit.

How to weight the scores

Not every category matters equally, and the weighting should reflect your specific situation. The default weighting I recommend — 25% technical, 25% ecommerce, 15% process, 15% portfolio, 10% support, 10% commercial — works for most brands in the £250k to £5M range.

However, you should adjust the weights based on your priorities:

  • If you are migrating from another platform: Increase technical competence to 30% and reduce commercial fit to 5%. Migration projects are technically complex and errors are costly.
  • If you are an established brand with a strong internal team: Increase ecommerce expertise to 30% and reduce support to 5%. You need strategic thinking, not hand-holding.
  • If this is your first Shopify store: Increase process and transparency to 20% and increase support to 15%. You need an agency that will guide you through the process and be there when you need help.

To calculate the final score, multiply each category score by its weight percentage. For example, an agency scoring 8 in technical competence with a 25% weight contributes 2.0 to the final score (8 x 0.25). Sum all weighted scores for a final number out of 10.

Common evaluation mistakes

Even with a scorecard, brands make predictable mistakes during the evaluation process. Here are the ones I see most often:

Mistake 1: Evaluating the sales team, not the delivery team

Many agencies have charismatic founders or polished sales directors who lead the pitch. But these are not the people who will build your store. Insist on meeting the project manager and lead developer who will actually work on your project. If the agency will not arrange this, that is a significant red flag.

Mistake 2: Comparing quotes without comparing scope

A £12,000 quote and a £30,000 quote might cover completely different scopes. The cheaper quote might exclude data migration, redirect mapping, performance optimisation, training, and post-launch support. Before comparing prices, create a normalised scope document and ask each agency to quote against it.

Mistake 3: Ignoring the agency's client profile

An agency that primarily serves enterprise brands with £50M+ turnover will not give a £500k brand the attention it deserves. Conversely, an agency that mostly works with startups may not have the sophistication to handle complex product catalogues, multi-warehouse logistics, or enterprise integrations. Make sure you fit within the agency's ideal client range.

Mistake 4: Over-indexing on Shopify Partner tier

Shopify's partner programme has tiers — Partner, Plus Partner, and so on. While higher tiers indicate more experience and revenue through Shopify, they do not guarantee quality on your specific project. Some of the best work I have seen comes from boutique agencies that have not yet reached the highest tiers but deliver exceptional results for every client they take on.

Mistake 5: Not checking performance of live stores

This takes ten minutes and most brands skip it. Open three to five live stores from the agency's portfolio. Run each through PageSpeed Insights. Test on mobile. If the scores are consistently below 50, the agency does not prioritise performance — regardless of what they say in their pitch. For context on why this matters, read about the real cost of a slow Shopify store.

Putting the scorecard into practice

Here is the practical process I recommend for UK ecommerce brands evaluating Shopify agencies:

Step 1: Create your shortlist (1 week)

Start with five to eight agencies. Source them from the Shopify Partner directory, recommendations from other brand founders, and your own research. Do a quick screen — visit their website, check their portfolio, and eliminate any that obviously do not fit.

Step 2: Send a brief (1 week)

Write a one to two page brief describing your project, your goals, your budget range, and your timeline. Send it to your shortlisted agencies. This serves two purposes: it gives agencies the information they need to respond intelligently, and it lets you compare responses against a consistent brief.

Step 3: Evaluate proposals against the scorecard (1-2 weeks)

As proposals come in, score each agency across the six categories. Be honest and specific. Do not inflate scores because you "liked the vibe." Use the scoring guides above and make notes to justify each score.

Step 4: Deep-dive with your top two (1 week)

Narrow to two agencies and conduct deeper evaluations. Ask for references. Visit live stores. Meet the delivery team. Ask the hard questions about scope, pricing, and process. This is where the scorecard becomes most valuable — it forces you to compare like-for-like rather than relying on which meeting felt better.

Step 5: Make your decision

Calculate weighted scores and make your decision. If the scores are very close, go with the agency where you had the strongest rapport with the delivery team. That relationship will sustain you through the inevitable challenges of a complex build.

The total evaluation process takes three to five weeks. That feels like a long time when you are eager to start building, but it is nothing compared to the months you will lose if you choose the wrong agency and have to start over.

A final thought on evaluation

The best agency for your brand is not the best agency in absolute terms. It is the one that best fits your specific needs, budget, timeline, and working style. A scorecard helps you find that fit by replacing subjective impressions with structured evidence.

We built our Shopify development practice around the belief that agencies should be evaluated on outcomes, not presentations. Every store we build has measurable performance targets. Every project has a documented process. Every client has direct access to the people doing the work.

If you are evaluating agencies and want to include us in your scorecard, start a conversation. We will answer every question on this scorecard honestly — including the ones where we are not the best fit.