X (Twitter)

#Service Migration I recently migrated all my services to Cloudflare, and I'm starting this thread to record the pitfalls I encountered and the lessons I learned (random thoughts). The images show a rough comparison before and after the migration.

First, there's the language choice. The legacy service was developed using Java. As a standalone application, this wasn't a good choice. Firstly, Java applications consume more resources and have higher server requirements, resulting in higher server costs. Currently, I'm using a 2-core, 4GB Alibaba Cloud ECS instance, which costs approximately 280 yuan per month.

Secondly, in terms of development efficiency, Java is completely outmatched by languages like PHP, Node.js, and Python, and deployment speed is crucial for independent products. Thirdly, its support for serverless environments is inferior to JavaScript; for example, it doesn't support CF workers. Fourthly, Java cannot use cursors, making it impossible to switch back to IntelliJ IDEA.

In the fifth ecosystem, there are many interesting packages available for using JavaScript, while Java is mostly used for enterprise applications, and independent products usually have no chance of winning in this area. With so many drawbacks, why choose Java from the start? Actually, I initially planned to use PHP or Node.js, and I even spent a lot of time learning them, but I either gave up halfway through or my enthusiasm completely wore off.

I spent about two years repeatedly learning these optimal technology stacks, learning and forgetting, then learning again. One day, I suddenly realized that two years had passed, and all my energy had been wasted on these superficial things; I hadn't created any product, and I'd even forgotten what I initially wanted to create. Then I made a decision: use what you know, and stop worrying about it. I used my old Java skills for the backend and jQuery, HTML, and CSS for the frontend. The most important thing was to get it running.

Now that business has basically stabilized (revenue is unstable 🤡), I'll take some time to switch the service to Node.js and embrace the JavaScript ecosystem. First and foremost, I'll definitely use TypeScript; it's easier to spot problems. Having used Java before, TypeScript is fine. Next is the backend framework. When I was with Java, I always used Spring Boot. After switching to TypeScript, my initial thought was to use NestJS, which is very similar to Java. However, I later abandoned that idea.

The main reasons are: First, NestJS is very powerful and feature-rich, much like Spring Boot, which makes it heavy and has a high learning curve. Second, it took me a long time to understand that CF workers and Node.js use different runtime environments, and many Node.js packages are incompatible. Since the goal is to migrate to CF, this point was ruled out. Then I discovered a new framework called honojs, and after researching it, I ultimately chose it. There were two reasons:

First, it's very lightweight and simple, with minimal mental burden. It's sufficient for my business. Second, it can easily run on different runtime environments, such as native Node.js, Bundle, CF Worker, AWS, Vercel, etc., thus minimizing platform dependency. Then there's the ORM framework; I used MyBatis Plus in Java. After the migration, I'd like to find something similar.

There don't seem to be many ORM framework options for TypeScript. I mainly researched Prisma and Dizzle. Personally, I felt Prisma was more powerful, while Dizzle was simpler, so I chose Dizzle because of its lower learning curve. This is because I don't currently need any particularly powerful features. Regarding the choice of commonly used toolkits, I previously used Hutool for Java, which was comprehensive and extensive. I've been looking for something similar in the JavaScript ecosystem, but unfortunately, I haven't found one yet.

Different packages can only be imported for different functions. The basic toolkit uses Radash, which is lighter and more up-to-date than Lodash. DayJS is used for time processing, and ky is used for the request library, also because it is very lightweight.

The reason for emphasizing lightweight design is twofold. First, CF workers have limitations on the size of published packages; excessively large packages may fail to deploy. Second, heavier packages have deeper dependencies on Node.js, potentially making them unusable during CF worker runtime. Importing packages into a worker feels like opening a blind box; you have no idea if it will work before starting it. This is a crucial point to remember when using workers.

A major reason for migrating to Cloudflare was its generous free plan. 100,000 requests per day, 100,000 database writes, and 5 million reads per day—more than enough for my business; I'd be thrilled to exceed that. However, I ultimately opted for the $5/month paid plan. A key reason was that the free version only allowed 10ms of CPU time per request, which could cause a complex API to crash before it even finished executing.

Furthermore, even paid plans have limitations: a single API request can take a maximum of 15 seconds, and scheduled tasks or message queues have a 15-minute limit. The runtime of migrating to worker APIs must be carefully evaluated. Don't even consider anything that's too time-consuming. I encountered this pitfall myself.

In JavaScript, asynchronous methods are modified with the `async` keyword. Adding `await` before the method's execution will wait for it to complete before executing subsequent code. Without it, the code will execute asynchronously without affecting the execution of later code. For code calls like logging or push notifications that don't affect subsequent logic, I usually don't use `await`; I don't want it to block later logic, but rather execute asynchronously.

However, after deploying to the worker, I found that these calls without the `await` modifier were most likely not executed. The reason is that they didn't block the main process; subsequent code continued execution until the request and response were completed, and then the process terminated, discarding all outstanding tasks, whether still executing or not. The worst part is that these asynchronous operations work fine in the local development environment, but they are discarded in the production environment.

How to solve this? The simplest way is to add an `await` statement to all asynchronous method calls. However, if an interface has too many `await` statements, you'll find that the interface's response becomes very slow. My Paddle callback encountered this problem; Paddle found that calls to my callback interface kept timing out and kept retrying. And when it times out, it actively closes the connection, and subsequent logic stops executing.

In this case, you can use the worker's waitUntil method instead. It can guarantee thdevelopers.cloudflare.com/workers/runtim…hod will still be executed after the request is finished, provided that the CPU usage limit is not exceeded. (It's too late, I'll continue writing tomorrow.) https://t.co/N2y3KguqWF

I performed the migration operation twice. After the first migration, I encountered numerous errors, which seemed impossible to resolve quickly, so I migrated back and then slowly worked out the problem. The online error messages are as follows:

Error: Cannot perform I/O on behalf of a different request. I/O objects (such as streams, request/response bodies, and others) created in the context of one request handler cannot be accessed from a different request's handler.

This is a limitation of Cloudflare Workers which allows us to improve overall performance. (I/O type: ReadableStreamSource) The translation is as follows:

Error: Unable to perform I/O on behalf of different requests. I/O objects (such as streams, request/response bodies, etc.) created in the context of a request handler cannot be accessed from different request handlers. This is a limitation of Cloudflare Workers that allows us to improve overall performance. (I/O type: ReadableStreamSource)

This exception left me a bit confused. How could I access the content of another request within a single request? It's beyond my comprehension; aren't each request independent? I searched online but didn't find any useful information. Then I passed the exception information to the cursor, letting it scan the entire codebase to investigate which code was causing the error. The cursor provided two locations; the first was where I logged the entry point.

Originally, to avoid affecting the request object, I used `c.req.raw.clone().json()` to retrieve request parameters, intending to use the cloned object. However, this cloning operation might have affected different requests, so I later changed to `c.req.json()` to resolve the issue. Secondly, the context object is used in many places in hono, such as databases, caches, and responses, and the default context object is obtained at the entry point of the request.

In other words, the request needs to be passed down from the entry point. This results in each method needing a context parameter, which is cumbersome. In Java, I would use ThreadLocal to solve this problem. So I wondered if there was something similar in JavaScript, and I actually found it—globalThis. I assign the context to the property of globalThis.honoContext at the entry point of the request.

Later, when I wanted to use the context object, I could simply call `globalThis.honoContext` to obtain it. This achieved the desired efhono.dev/docs/middlewar…ately, this might have triggered worker sharing restrictions. Fortunately, Hono recently released a method for globally obtaining the context, which resolved the issue after I replaced it with that method. https://t.co/Dh0zidQ1fc

First of all, this problem is very tricky because I couldn't reproduce it, regardless of whether it was in my local development environment or the production environment. It ran perfectly locally without any errors, and all functions worked correctly. However, it wasn't always reproducible in the production environment either. Even running a multi-threaded concurrent environment didn't reproduce the issue. I could only fix the two suspicious parts and deploy it to production to see if it would reappear. Fortunately, it didn't reappear, so I'm pretty sure this is the problem.

Another easily overlooked issue is the time zone. The worker runs in UTC-0. In the older service, I used GMT+8, which is 8 hours ahead of China. However, since the database stores timestamps, this has little impact on users. But it can affect SQL statistics and scheduled tasks. For example, I had a scheduled task that started at 7 AM (China time).

It queries for new users from yesterday, specifically the user table for new data added between 00:00 and 23:59:59 yesterday. After migrating, I found inconsistencies between the old and new services. Investigation revealed a timezone issue. Running the script at 7:00 AM was incorrect because the timezones were 8 hours apart; the worker was still in yesterday's time zone, essentially querying yesterday's yesterday. I then changed the task time to 8:30 AM.

Additionally, there's an 8-hour time difference. If you want to analyze it using domestic time, the start and end times also need to be pushed back 8 hours. Another point to note is that Java timestamps are in milliseconds by default and have a length of 13 digits, while JavaScript timestamps are in seconds by default and have a length of 10 digits. This conversion must be consistent; either use all 13 digits or all 10 digits, otherwise you might see dates from 1970 or 50,000 years in the future.

Next, let's talk about the database. Previously, the old service used Docker to run MySQL. We researched several migration options: Cloudflare's D1, Supabase's PostgreSQL, and Turso. Their free quotas were very generous, sufficient for my project. D1 and Turso are based on SQLite, while Supabase is based on PostgreSQL.

Ultimately, I chose D1, primarily due to two factors: First, ease of use. D1 is a Cloud Computing (CF) product, making it easier to integrate with workers. Second, network overhead. Workers and D1 are both on the CF network, and can even be configured in the same region, reducing network overhead for database access and indirectly improving speed. Subabase and Turso, on the other hand, incur these overheads.

After the migration, some users reported being unable to log in to their accounts. I tested my own account and logged in normally. Log analysis showed no issues; the user's email address and password were correct, but the system kept displaying an "incorrect username or password" message. Completely baffled, I then put the user's email address into an SQL query on database D1, and surprisingly, it couldn't be found. Initially, I suspected it was a problem with invisible characters.

When I finally located the problem, I was completely stunned. It was about migrating from MySQL to D1 (SQLite). I was mentally prepared for the differences, as SQLite supports far fewer types than MySQL, making one-to-one mappings impossible. Also, in SQLite, even if you define types, you can store data without specifying the type without error, and there are other syntax differences. However, I overlooked the case sensitivity issue.

When I create a MySQL database, I usually set it to be case-insensitive. This is standard practice; almost all the databases I've used over the years have been case-insensitive. It's become muscle memory; I never even considered that SQLite might be case-sensitive. Returning to the user's question above, the user entered Abc@gmail.com during registration, and then used abc@gmail.com again when logging in.

This works fine on the old MySQL server because it's case-insensitive, and it's considered the same user. But on D1, it's clearly two different users. Then I thought about how to solve it. First, I couldn't find a configuration like MySQL's for globally case-insensitive settings on D1. It requires using the `COLLATE NOCASE` keyword when creating the table to specify that a certain field should ignore case. So I thought I should modify the table to add this keyword.

Then I discovered that the SQLite ALTER TABLE statement only supports modifying table names and adding columns, but cannot directly modify the definition of existing columns (this is also a pitfall). Modifying the table is impossible, so I had to change the code logic. In three steps: First, I extracted all SQL operations involving email into a common method, and used `lower(email)` to force a lowercase conversion during the query.

Second, at the request entry point, if the parameter includes "email", call `toLowerCase()`. Third, merge and remove duplicate accounts caused by inconsistent capitalization. Writing this code was a real pain; it felt like a mountain of poop.

The case sensitivity issue has been temporarily resolved. After observing the logs for a few days, I noticed that the "Error: D1_ERROR: Network connection lost." error occasionally occurs, with a probability of about a few per thousand. I'm unsure whether it's a problem with my code or with D1 itself. I searched online and found that many people on the CF forum and Discord community have reported similar issues, but most haven't received a solution.

The only slightly helpful solution was to let myself retry, which also indicated that this would normally happen 😕. A major reason I ultimately chose the D1 was to reduce network overhead. If even with this, there are still network connection loss issues, I have doubts about the D1's stability. This problem isn't completely resolved yet; I'll update you with any progress.

Continuing with the topic of enterprise email, I previously used Alibaba Cloud's enterprise email, which generally worked well, but the free version's entry point seemed to be hidden. It couldn't bind multiple domains, and the sending limit was opaque. I wanted to switch to a service that supports multiple domains. After some research, I found that using Gmail with Cloudflare had resulted in data loss, but the Gmail address was still visible.

iCloud is an option, supporting multiple domain bindings, but its stability is notoriously poor. However, when replying to user emails, they often can't receive them because their iCloud account is full, so it's risky. Lark is currently free, but its future is uncertain. I wanted something more reliable. I finally chose Zoho because it's cheap—$1 per user per month, and usually only one user is needed.

It also supports unlimited domains. The features are quite comprehensive. If you don't use some of the advanced features, the free version is sufficient. One thing to note is that Zoho's pricing plans differ by country. The Chinese version costs 5 RMB per person per month, which seems cheap at first glance, but requires a minimum of 5 users. We recommend choosing the international version.

In my Java application, I directly use the SMTP protocol to send emails. However, sending emails via SMTP on the worker fails because the protocol is not supported. I need to use the HTTP API (which is not available in the free version).

Previously, I used DataGrip to view databases, but it doesn't support D1 (at least not in the version chromewebstore.google.com/detail/drizzle…interface is also extremely basic. Currently, I mainly use two plugins: one is Drizzle's Drizzle Studio plugin, which is very convenient: https://t.co/wMKRcC1LaC

Another option is TablePlus. While Dizzle Studio is convenient, it's a browser plugin with limited functionality, lacking features like SQL predictive text and history saving. It's suitable for temporary use. TablePlus has more comprehensive features, and I also have a setapp version that can be used for free; buying it separately is too expensive, so I don't recommend it.

Regarding log collection and analysis, we previously used Alibaba Cloud's log service, similar to the ELK suite, for Java. By default, Cloud Computing (CF) wdevelopers.cloudflare.com/workers/observ…s; you can only view real-time logs. If you want to transfer logs to other log services, you can use the tail worker feature. However, the paid version of the worker can use BaseLime's service for free. The latter has been acquired by Cloud Computing and can be integrated with a single click. https://t.co/YuNLJQo6tP

Regarding customer support, unlike some other cloud providers, Cloud Computing (CF) only allows invoices, account information, and registered tickets by default, not technical issues. Furthermore, online support requires upgrading to the Business plan, which costs $250 per month. This plan primarily utilizes CDN-related product permissions, not worker permissions. It's unlikely they would open a separate account specifically for customer support. This is understandable, given the large user base and the focus on serving core users.

So what if you really encounter technical problems? There are two options: one is to post for help on the CF developer forum. However, the timeliness is hard to guarantee. Also, if it's not a very common problem, it's very likely no one can answer it. I've encountered some problem posts that haven't received a single reply even after months, and the replies are usually just things like, "Hey buddy, I encountered the same problem, did you solve it?"

Secondly, you can join the CF Discord community for feedback, which is relatively more timely. However, considering the time difference, others might still be sleeping when you are there. Also, the official statement emphasizes that the staff here are "not technical support personnel, but ordinary developers and technical experts who voluntarily answer questions in their spare time." Therefore, when asking questions here, it's best to manage your expectations and emotions.

Thread by KIWI (@kiwiflysky)

Author details

Thread content