In literature, PIBMA, a linear-feedback-shift-register (LFSR) decoder, has been shown to be the most efficient high-speed decoder for Reed-Solomon (RS) codes. In this work, we follow the same design principles and present two high-speed LFSR decoder architectures for binary BCH codes, both achieving the critical path of one multiplier and one adder. We identify a key insight of the Berlekamp algorithm that iterative discrepancy computation involves only even-degree terms. The first decoder separates the even and odd-degree terms of the error-locator polynomial to iterate homogeneously with discrepancy computation. The resulting LFSR decoder architecture, dubbed PIBA, has $\lfloor {}\frac {3t}{2}\rfloor +1$ processing elements (PEs), each containing two registers, two multipliers, one adder, and two multiplexers (same as that of PIBMA), which compares favorably against the best existing architecture composed by $2t+1$ PEs. The second one, dubbed pPIBA, squeezes the entire error-locator polynomial into the even-term array of the first one to iterate along with discrepancy computation, which comes at the cost of a controlled defect rate. pPIBA employs $t+1+f$ systolic units with a defect probability of $2^{-q(f+1)}$ , where $q$ denotes the finite field dimension and $f$ is a design parameter, which significantly reduces the number of PEs for a large correcting power $t$ . The proposed architectures can be arbitrarily folded to trade off complexity with latency, due to the systolic nature. GII decoding has been notorious for the composition of many seemly irrelevant functional blocks. We are motivated by the unified framework UPIBA which can be reconfigured to carry out both error-only and error-and-erasure decoding of RS codes in the most efficient manner. We devise a unified LFSR decoder for GII-RS, GII-ERS (referring to erasure correction of GII-RS codes), and GII-BCH codes, respectively. Each LFSR decoder can be reconfigured (but not multiplexed) to execute different functional blocks, and moreover achieves the same critical path of one multiplier, one adder, and one multiplexer. The resulting GII-RS/BCH decoder contains only four functional blocks, which are literally the same as the decoder for single RS/BCH codes. For GII-RS and GII-BCH decoding, we also incorporate the original mechanism by Tang and $\text{K}\ddot {\text {o}}$ tter to minimize the miscorrection rate, which comes surprisingly at a negligible cost. Our proposed high-speed low-complexity GII-ERS decoder renders the multi-layer GII codes highly attractive against other locally recoverable codes.