# Refactor guide

This is a step by step overview how to refactor an architecture.

It can also be used to add a new architecture module. As long as it is supported by LLVM or a fork of it.

Please always contact us in the [Auto-Sync tracking issue](https://github.com/capstone-engine/capstone/issues/2015)
before working on a module.
We can provide support and save you a lot of time.

Don't hesitate to ask any questions in our [Telegram Community channel](https://t.me/CapstoneEngine).

Especially if you feel stuck or struggle to understand where an issue is coming from.
The update process is, although already simplified, relatively complex.

## Refactoring

Note:
- If we talk about C++ files in the steps below, we always refer to the files in the LLVM repo.
- `PrinterCapstone` is the class defined in `llvm-capstone/llvm/utils/TabelGen/PrinterCapstone.cpp`
- Always attempt to make the translated C file behave as closely as possible to the original C++ file! This greatly helps debugging and assures that Capstone behaves almost exactly the same as original LLVM.

- ### Prepare
  - Read `CONTRIBUTING.md`
  - Read `docs/ARCHITECTURE.md`
  - Read `suite/auto-sync/README.md`
  - Read `suite/auto-sync/ARCHITECTURE.md`
  - Read `suite/auto-sync/intro.md`
	- Delete all files in `arch/<ARCH>/`, except the `ARCHModule.*` and `ARCHMapping.*`.
	- `cd suite/auto-sync/`
- ### Generate `inc` files
	- `pip install -e .`
	- Clone and build `llvm-tblgen` (see docs)
  - Quickly check options of the updater `ASUpdater -h`
	- Add Arch name in `Target.py`
	- In [llvm-capstone](https://github.com/capstone-engine/llvm-capstone) handle arch in `PrinterCapstone.cpp::decoderEmitterEmitDecodeInstruction()` (add decoder function)
	  [!NOTE] Architecture specific code generation.
		There are several oddities of architectures which require slightly different generated code.
		If you search through `PrinterCapstone.cpp` for architecture names like `AArch64`, `ARM`, or `Sparc` you can see how these are handled.
	- Generate: `ASUpdater -s IncGen -a ARCH`
		- Errors? Check if the error message tells you what to do. If no hint exists, ask us.
	- Check if `inc` files in `build` look good.
- ### Translation and Patching
	- Check for template functions in `<ARCH>InstPrinter.cpp` and `<ARCH>Disassember.cpp`
	- Copy new config in `arch_conf.json` (LoongArch for a minimal example).
		- Don't forget to add `ARCHIntPrinter.cpp` to the list of the `AddCSDetail` tests!
	- Add as a minimum the `<ARCH>InstPrinter.cpp`, `<ARCH>InstPrinter.h` and `<ARCH>Disassembler.cpp` to the translation list.
		- Tip: The variables use in there are defined in `path_vars.json`
	- Add architecture specific includes in `Patches/Includes.py`. Copy the code from another architecture for the beginning.
	- Prepare API header (`<arch>.h`) for patching:
		- Check the generated `inc` files. Files names like `<ARCH>GenCS<something>Enum.inc` contain enumerations for the header. Those get patched into the main header file of the architecture.
		- Remove old values and add `// generated content <...> begin` comments for patching. Checkout `longarch.h` as example.
  - Commit all changes so far.
  - The next step will write to the `arch/` and `include/capstone/<arch>.h` header!
	- Run generation, translation and copy/patch the files: `ASUpdater -a <ARCH> -w --copy-translated -s IncGen Translate PatchArchHeader`
- ### Clean up
	- #### Check: All necessary files
		- Arch header:
			- Invalid characters in enum identifiers? Replace char in `PrinterCapstone::normalizedMnemonic`
		- In `arch/<ARCH>`
			- Missing identifier/symbols? -> Check if they are somewhere in the generated files. If yes, included them and update `Include.py`. If not, you have to find the LLVM source file where they are defined and add it to the `arch_config.json` to translate it.
				- OR it needs the `SystemOperands.inc` file. Also can be generated by adding the arch to the list in `inc_gen.json`.
		- Note: When you start the next step, you likely don't want to generate, translate and copy files again. Because your had-made fixes get overwritten. So ensure you no longer use the `-w` flag for the `ASUpdater` and you checked thoroughly that all necessary files got translated!
    - Commit to save changes so far.
	- #### Remove and fix C++ syntax
		- Remove all **obvious irrelevant** C++ code from the translated files (e.g. class initializes)
		- Double check non-obvious cases, if they are important. Rember: removing something might lead to bugs later!
			- If in doubt, ask us.
		- If you fix the same syntax over and over again, consider adding a Patch for the `CppTranslator`.
		- Common problems:
			- Missing namespace prefix `unsigned GR32Regs[]` should be `unsigned ARCH_GR32Regs[]`. See `namespace begin/end` comments in the code.
      - TODO: Add more.
		- If in doubt, check the original C++ file in the LLVM repo.
- ### Make it build
	- Add `ARCHLinkage.h` and the functions in the `InstPrinter.c`, `ArchDisassembler.c`.
		- Explanation: The idea behind `ARCHLinkage.h` is to separate the Capstone and LLVM code, at least loosely, into compile units.
			So the LLVM and Capstone code can at some point live in their own object files. This is not yet implemented, but
			we try to keep them from becoming too entangled.
	- Add essential code in `ARCHMapping.c`. Esential is everything **not** releated to details.
	- If unsure how to do Capstone <-> LLVM code things, always check LoongArch. If LoongArch doesn't handle this case, check Mips, SystemZ
- ### Run tests & Fixing bugs
	- Update regression MC tests: Map LLVM `mattr` and `mcpu` names to the CS identifiers if necessary. -> Edit the `mcupdater.json` config file.
  - Update tests: `ASUpdater -s MCUpdate -a Arch -w`
		- It can happen that `MCUpdate` doesn't generate any tests. This means LLVM has no disassembly tests for this architecture.
			You can add your arch to `use_assembly_tests` in `mcupdater.json` to do so.
			Keep in mind that some tests can later fail even though they are correct.
			The compiler can assemble an instruction to a semantically equivalent, but syntactically different one.
			This syntactic mismatch can later make those tests fail in Capstone.
	- Run MC tests: `cstest tests/MC/Arch`
- ### Add details
	- Effectively copy behavior from `LoongArchMapping.c` or `SystemZMapping.c` but change values.
	- Changes to the API (structs in `arch.h`) are only allowed if it was wrong before. Otherwise only extensions.
	- Don't forget to update the Python bindings.
	- Run detail tests to check results.
	- Run detail tests with coverage. `ArchMapping.c` should be covered near 100%
